WhyChips

A professional platform focused on electronic component information and knowledge sharing.

Optical Computing for AI Inference: 2026 Energy Benchmarks

Panoramic shot of a glowing CPU microprocessor with blue and pink data streams and a vertical light beam, representing high-speed computing and advanced semiconductor technology.

Optical computing — performing the core mathematical operations of neural networks with photons instead of electrons — has emerged as the most credible candidate. In 2025 and early 2026, a wave of peer-reviewed results finally turned decades of laboratory promise into measurable, reproducible benchmarks. This article examines those benchmarks, explains the underlying technology, maps the competitive landscape, and gives hardware buyers an honest assessment of what is real today and what remains aspirational.

Why optical computing for AI inference matters right now

Three converging forces make 2026 the inflection year:

  1. Power-constrained data centers. Hyperscalers cannot simply add another megawatt to an existing facility. Every additional inference operation must come from better hardware efficiency, not more grid power.
  2. Trillion-parameter models in production. Inference cost per query — not training cost — now dominates total cost of ownership for companies deploying frontier AI.
  3. Peer-reviewed photonic benchmarks. Two landmark publications — Lightmatter’s April 2025 paper in Nature and Tsinghua University’s Taichi chiplet in Science — have moved optical computing from academic curiosity to documented, reproducible performance.

Market analysts at Future Markets Inc. described photonic AI processing in April 2026 as the “second wave” of the AI hardware buildout, following photonic interconnects — and projected the global optical computing market to grow exponentially through 2036.

Computing — at its essence, the mapping of inputs to outputs — can be compiled onto new architectures of linear and nonlinear physics that enable a fundamentally different scaling law of computation versus effort needed. — Prof. Dirk Englund, MIT Research Laboratory of Electronics

What is optical computing? A first-principles explanation

In a conventional electronic AI accelerator such as a GPU or TPU, every multiply-accumulate (MAC) operation requires charging and discharging transistor gates, then shuttling the result through resistive metal interconnects. Each step dissipates energy — typically in the low picojoule range per MAC.

In an optical computing system, the same linear-algebra operation is performed physically by light. Input values are encoded onto the amplitude or phase of coherent laser beams. Those beams then pass through a programmable arrangement of optical components — beam-splitters, phase shifters, Mach-Zehnder interferometers (MZIs), micro-ring resonators, or diffractive elements — that collectively implement the matrix transformation W·x. Photodetectors at the output capture the result.

Because the computation happens during propagation, the latency is determined by the physical length of the light path — on the order of picoseconds across an integrated photonic chip. And because photons do not generate resistive heat the way electrons do, the energy per MAC in the optical domain trends toward the sub-femtojoule regime in published architectures — orders of magnitude below silicon.

How do linear optical circuits perform matrix multiplication?

The heart of every optical AI accelerator is a linear optical circuit — a programmable photonic structure that maps an input vector of light amplitudes to an output vector via matrix multiplication. This is the same operation (matrix-vector product, or “matmul”) that consumes 70–90 % of inference FLOPs in transformer and convolutional neural networks.

Three major architectural families have demonstrated working AI inference in 2024–2026:

1. Coherent Mach-Zehnder interferometer (MZI) meshes

This approach, pioneered at MIT and commercialized by Lightmatter (Boston, USA), programs a triangular or rectangular mesh of MZIs so that the interference pattern of light beams traversing the mesh equals the desired matrix W. Each MZI acts as a tunable 2×2 unitary gate; cascading many of them implements an arbitrary unitary — or, with additional amplitude control, an arbitrary real-valued matrix.

Lightmatter’s Envise processor, detailed in the April 2025 Nature paper titled “Universal photonic artificial intelligence acceleration,” is built around four photonic chips manipulating 512 light beams through more than 200,000 optical components, with 50 billion transistors on the electronic side for control, memory, and nonlinear operations. Key published figures:

  • 262 TOPS/W (tera-operations per second per watt) in the photonic core
  • ~200 ps per matrix-vector product
  • Near-electronic precision on ResNet, BERT, and the Atari deep-RL benchmark — the first photonic chip to demonstrate this breadth of real-world AI workloads in a peer-reviewed venue

2. Wavelength- and time-multiplexed integrated processors

Tsinghua University’s Taichi photonic chiplet, published in Science in 2024, takes a different approach: it uses wavelength-division multiplexing (WDM) and a chiplet-based architecture to scale up the effective matrix size. Taichi reported 160 TOPS/W for general-intelligence workloads and demonstrated tasks ranging from image classification to content generation.

Separately, a silicon photonic reservoir-computing engine published in Nature Communications (2024) achieved processing speeds above 60 GHz and over 200 TOPS throughput on prediction, emulation, and classification benchmarks — with a compact footprint and high tolerance to fabrication errors.

More recently, a time- and wavelength-multiplexed photonic matrix-matrix multiplication processor with on-chip wavelength (de)multiplexers was published in Optics Express (2026), demonstrating how TDM architectures can scale to large matrices using a limited number of on-chip optical components.

3. Free-space diffractive and tensor-optical designs

Researchers at Aalto University (Finland) demonstrated in 2025 a system that encodes an entire convolutional neural-network layer onto a single light pulse passing through free-space optics. The computation completes in picoseconds with no electronic switching or analog-to-digital converters in the optical path. The researchers estimate an optimized on-chip version could reduce energy per inference by a factor of 100× to 1,000× compared with today’s GPUs — a target, not yet a shipping product number.

At MIT, a fully integrated photonic processor demonstrated in December 2024 performed all key deep-neural-network computations optically on-chip, achieving:

  • >96 % accuracy during training tests
  • >92 % accuracy during inference
  • Each forward pass completed in under half a nanosecond

These results confirm that optical computing is no longer limited to simplified toy tasks — photonic hardware can now handle production-class neural-network inference.

What does the energy-efficiency comparison actually look like?

MetricElectronic accelerator (GPU/TPU, 2026)Photonic prototype (2025–2026)
Energy per MAC (linear stage)Low picojoule rangeSub-femtojoule targets (published architectures)
Latency per matrix-vector productTens of nanoseconds~200 ps (Lightmatter Envise, Nature 2025)
Reported TOPS/W (compute core)Tens of TOPS/W160 TOPS/W (Taichi, Science); 262 TOPS/W (Envise, Nature)
Workload maturityFull training + inference across all model familiesInference on fixed or slowly-updated weight matrices
Nonlinear operationsNative (transistor logic)Hybrid — electronics handle activations, softmax, KV-cache
PrecisionFP8 / INT8 / FP16 standardTypically 6–8 bits; Envise demonstrated near-electronic precision

The key takeaway: optical computing delivers a decisive energy-efficiency advantage on the linear (matmul) stage — which is where the vast majority of AI inference FLOPs reside. It does not yet replace digital silicon for nonlinearities, memory management, or training.

Who is building and shipping optical computing hardware in 2026?

The competitive landscape has matured significantly in the past 12 months. Here are the key players with publicly documented progress:

Lightmatter — Envise processor & Passage interconnect

Based in Boston, Lightmatter is arguably the furthest along in commercializing photonic AI compute. Beyond Envise, the company’s Passage 3D co-packaged optics (CPO) platform targets chip-to-chip interconnect; an October 2024 preprint (arXiv:2510.15893) demonstrates a 2.7× reduction in time-to-train for trillion-parameter Mixture-of-Experts models when GPUs communicate through a Passage 3D-CPO fabric. The dual strategy — photonic compute (Envise) plus photonic interconnect (Passage) — positions Lightmatter to address both the arithmetic and the data-movement bottlenecks in AI infrastructure.

Quantum Computing Inc. — NeuraWave

On April 23, 2026, QCi (NASDAQ: QUBT) announced that its NeuraWave photonic reservoir-computing platform is now deployment-ready. Designed for edge AI inference in telecom, autonomous vehicles, robotics, and healthcare, NeuraWave uses a hybrid photonic-digital architecture to deliver low-latency, energy-efficient processing for tasks like time-series prediction and anomaly detection. It is positioned as a scalable alternative to GPU-based systems at the network edge.

Q/C Technologies — Optical Processing Unit (OPU)

On March 18, 2026, Q/C Technologies (NASDAQ: QCLS) launched an initiative to design and prototype a proprietary silicon-photonic OPU aimed squarely at overcoming the performance and energy constraints in AI inference infrastructure. The company is targeting the same 5–50 W edge-inference envelope as NeuraWave.

Optalysys, Luminous Computing, PsiQuantum, Xanadu

Future Markets Inc.’s April 2026 Global Optical Computing Market Report identifies these companies as transitioning from research demonstrations to commercial deployments in the 2027–2031 timeframe, targeting the AI inference compute market with architectures “structurally positioned to win on energy efficiency grounds as model sizes grow to trillions of parameters.”

Academic milestones shaping the roadmap

  • Cornell University (November 2025): Published optics-specific pruning methods grounded in wave physics that dramatically reduce the size of optical neural networks with minimal accuracy loss — a key step toward affordable, manufacturable photonic chips.
  • Optica (2026): Time- and wavelength-multiplexed photonic matrix-matrix multiplication with on-chip WDM, showing how TDM can solve the scalability problem for larger matrices.
  • Fully parallel optical matrix-matrix multiplication (arXiv, 2023): A theoretical framework that could replace the decades-old vector-matrix paradigm, greatly improving throughput.

Where are the market gaps and opportunities?

Based on the published evidence, three underserved segments stand out:

  1. Edge inference at 5–50 W. GPUs are inefficient at this power envelope, but a small photonic die plus low-power driver electronics could fit. NeuraWave and Q/C Technologies’ OPU are the first 2026-era entrants targeting this space.
  2. Always-on linear preprocessing in lidar, radar, optical-network monitoring, and high-frequency-trading systems — applications where the signal is already optical, and a photonic accelerator avoids the costly optical-to-electronic-to-optical conversion.
  3. Per-rack inference appliances for retrieval-augmented generation (RAG), where the workload is dominated by large-batch matrix projections (embedding lookups, key/value projections) that map naturally onto a programmable MZI mesh.

What are the honest limitations of optical computing for AI inference?

No technology review is complete without a frank assessment of what does not work yet:

  • Precision constraints. Most photonic processors operate at 6–8 effective bits due to analog noise, thermal drift, and fabrication variation. The Lightmatter Nature paper is notable precisely because it achieved near-electronic precision — but this required 50 billion transistors of electronic support circuitry.
  • Hybrid architectures are mandatory. Every commercially relevant photonic AI system in 2026 is a hybrid: the linear stage runs in optics, but activations (ReLU, GELU), softmax, layer normalization, KV-cache management, and weight loading all happen in conventional electronics.
  • Reconfiguration is slow. Thermo-optic phase shifters take microseconds to milliseconds to reprogram, which makes optical computing strongly favor inference with fixed weights over training, where weights change every iteration.
  • Optical loss and thermal drift require closed-loop calibration. This is now standard in 2026-class prototypes, but it adds a small static power overhead and engineering complexity.
  • Manufacturing maturity. Silicon photonics foundries exist (e.g., GlobalFoundries, TSMC), but photonic process design kits (PDKs) are less mature than electronic ones, and yield on large photonic meshes remains a challenge.

Frequently asked questions

Is optical computing faster than a GPU for AI inference?

For the linear (matrix-multiplication) stage, yes — published latency is in the 200-picosecond range per matrix-vector product on Lightmatter’s Envise, compared with tens of nanoseconds for an electronic tensor core. However, end-to-end inference latency depends on the hybrid digital path for nonlinear operations, memory access, and data conversion.

How much energy can optical computing save compared to GPUs?

Peer-reviewed devices report 160–262 TOPS/W in the photonic core (Taichi in Science; Envise in Nature). Aalto University’s 2025 estimate projects 100×–1,000× lower energy per inference than today’s GPUs once a fully integrated photonic chip is built at scale. The upper bound should be treated as a research target rather than a shipping product specification.

What exactly do linear optical circuits compute?

They perform the matrix multiplications that sit at the core of every neural-network layer — convolutions, fully connected projections, and the key/query/value projections in attention mechanisms. These linear operations account for the majority of compute in modern AI models.

When will photonic chips replace GPUs?

They will not “replace” GPUs in the near term. Instead, they will offload the linear compute stage in hybrid systems, while GPUs or custom ASICs continue to handle nonlinearities, memory, and orchestration. Lightmatter’s own April 2025 blog post acknowledges that photonic interconnects are the near-term commercial focus, with photonic compute as the longer-horizon opportunity.

Is optical computing the same as co-packaged optics (CPO)?

No. Co-packaged optics move data between chips using light — reducing interconnect power and latency — but the computation itself still happens in electronic silicon. Optical computing performs the math in light. Some companies, like Lightmatter, are building both.

What should hardware buyers track in 2026?

If your roadmap involves serving frontier AI models under a constrained power budget, demand three concrete metrics from any photonic vendor:

  1. Measured TOPS/W under a real AI model — not a synthetic linear-only benchmark.
  2. End-to-end joules per inference, including all electronic overhead (ADCs, DACs, memory, control logic).
  3. Precision retention compared to an FP8 or INT8 GPU baseline on the same model.

The 2024–2026 publications reviewed in this article show these numbers are finally being reported in peer-reviewed venues. That is why optical computing for AI inference has shifted from speculative to investable in a remarkably short window.

发表回复