Optical compute promises game-changing AI performance

Lightmatter, the MIT spinout developing optical compute processors for AI acceleration, presented a test chip at Hot Chips 32 this week. Using techniques from silicon photonics and MEMS, the processor performs matrix vector multiplication at the speed of light (in silicon), powered by a milliwatt laser light source. Computation is orders of magnitude faster than transistor-based chips, including the latest GPUs, and it uses very little power.

Lightmatter’s intention is to prove that its approach to processor design is solid by showing off this test chip. The company is one of the first to present a working optical compute (silicon photonics) chip tailored for AI inference workloads.

Lightmatter will have its first commercial product available in autumn 2021, a PCIe card with optical compute chip based on a successor to this demonstrator. It is designed for data center AI inference workloads.

Advances in silicon photonics technology — propagating light through a silicon chip — are enabling complex on-chip structures which can be manipulated to perform MAC operations in a completely different way to traditional electronics-based on transistors (see our primer: How Does Optical Computing Work?). Since transistor-based chips reached the limits of Dennard scaling, power dissipation per unit area has risen, and the practical limits of cooling technologies can’t keep up with larger chips. There is therefore room for a different technology with energy efficiency advantages.

We’ve skirted the whole energy scaling problem by going to a completely different type of physics — we’re using light,” said Lightmatter CEO Nick Harris, in a pre-Hot Chips interview with EE Times. “That means that we can scale using a different set of rules, so [optical compute] is faster and lower energy.”

Exactly how fast, and how low energy?

We could take existing AI data centers and reduce the energy consumption by a factor of 20 and shrink the physical footprint by a factor of five,” Harris said. “And that’s just with the first generation of what we’re building. There’s a long road map ahead.”

While Harris emphasized that this test chip has been built as a demonstrator for the technology, and not to do well on benchmarks, he was adamant that in a practical application Lightmatter’s demonstrator would still beat the market leader for AI acceleration, Nvidia’s Ampere A100. Harris said that compared to the A100, Lightmatter’s chip offers 20 times the energy efficiency and at least five times the throughput on workloads like BERT and Resnet-50 inference.

Chip design
Lightmatter’s chip is actually two die stacked vertically. On top is a 12nm ASIC that houses memory and orchestrates control of the 90nm optical computer die, which sits below. Both die are fabricated at GlobalFoundries on standard CMOS processes.

The photonic processor has a 64 x 64 photonic matrix vector product calculator; data propagates across the chip in less than 200 picocseconds, orders of magnitude faster than transistor calculations which would take multiple clock cycles. The compute engine is driven by a 50-mW laser.

According to Harris, one of the benefits of such a low-power optical compute chip is that it can be 3D-stacked with the control/memory ASIC; a transistor-based compute chip would dissipate too much heat. Harris points out that the stacked die shorten the trace lines between the operand store on the ASIC and the compute element on the photonic die – from the data converter to the optical compute engine is less than a millimeter of total routing. This in turn improves latency and power.

There’s a nice positive feedback loop here,” said Harris. “Saving power lets us stack, and stacking saves more power.”

Lightmatter's optical compute test chip, block diagram

A DAC takes digital input signals, converts them to an analog voltage and uses that to drive the laser (this technology is well established in fiber optic transmitters). Light from that laser enters the compute array. The computational element is the Mach Zehnder Interferometer (MZI). Coherent light entering the MZI is split in two, with each half’s phase adjusted by a different amount. Combining the signals with different phases results in constructive or destructive interference, which effectively modulates the brightness of the light passing through the MZI (the modulation may be thought of as a multiplication operation). Where waveguides (the “wires” that carry the light) meet, the signals are effectively added together. This is the basis of the optical MAC. Light output from the compute array reaches a photodiode, whose signal is fed through an ADC in order to interface with the rest of the digital circuitry.

Lightmatter optical compute array

The key operation in the MZI, shifting the phase of the light, is achieved by mechanical means. In his Hot Chips presentation, Lightmatter VP engineering Carl Ramey explained that the photonics chip uses a nano-optical electromechanical system (NOEMS). Similar to a MEMS device, the waveguide structure is suspended by etching underneath and then deflected by adding charge to capacitor plates above and below it. This successfully changes the phase of the light by the required amount.

The NOEMS devices have some really amazing properties,” Ramey said. “They’re extremely low loss and the static power dissipation is nearly zero. We simply dump some electrons onto the small capacitors and there’s almost no leakage – the capacitance is small enough that the dynamic power used for actuation is also really tiny…. [the structures] can also be actuated at relatively high speed, up to hundreds of megahertz.”

Energy saving
Lightmatter’s demonstrator has 64 x 64 compute elements, but this could easily be scaled up, Ramey said.

Similar to transistor-based systolic arrays, the amount of compute scales linearly with the area,” he said. “The latency is also scaling with the dimension of the array. So In a typical pipeline transistor design, you take 64 clock cycles to perform the operations here, going left to right. Our latency also scales with the array dimensions, but we’re three orders of magnitude faster. So even a thousand by a thousand array would have a latency well under a nanosecond.”

Interestingly, power consumed by the optical compute array scales with the square root of the area. This is because power consumption is largely attributed to the data conversion.

As we add each new element to the array, we’re getting that much more performance, but we’re only paying the square root of that, in terms of power,” Ramey said. “So our chips are actually getting more efficient, the bigger we build them. This is very different to an electronic system, which would just scale linearly: more performance, more power.”

As well as the energy associated with computation, there is also the energy concerned with moving data around the chip (large transistor-based AI chips might burn 50-100W moving data across the silicon). With optical compute, moving data optically means no power is required, a huge saving.

The result is a device that operates on less than 3W, a fraction of energy per inference operation of other compute methods.

Another interesting feature of optical compute is the ability to multiplex. Multiple independent data streams can be encoded onto different wavelengths of light, similar to techniques used in optical communication, and fed into the compute engine simultaneously. This means an optical compute chip could perform multiple AI inferences simultaneously.

This is a pretty unique property to optical compute,” said Lightmatter CEO Nick Harris. “What it means is that you have one physical resource, one processor, but it’s acting like an array of processors.”

While the designated spectrum (1310 to 1600nm) can theoretically fit at least a thousand channels, Harris said that laser technology, which is relatively immature, limits this to 8 channels.

Showing it working
Lightmatter’s target customer is the data center, including scale-out systems such as high-performance computing, though that might expand in future; autonomous driving is on the far-future roadmap, but Harris concedes that the reliability engineering needed to enter this sector would be “a massive undertaking.”

Lightmatter has a complete software stack which can integrate with TensorFlow or Pytorch; Harris said they aim to be plug-and-play with both machine learning frameworks.

The company is currently 46 people out of Boston, Massachusetts. Founded in 2017, Lightmatter has raised $33 million in funding from investors including Google Ventures, and holds 30 patents.

One of the first challenges for the startup may be selling the entire concept of optical compute to skeptical customers. How does Harris plan to do this?

It is a tall challenge,” he said. “In the history of computing since the 1960s, there has never been a technology that has replaced electronic transistors for compute. It’s never happened. People have tried and it didn’t work out. I think that this is the first time that you’re going to see it happen, and the way that we’re selling it is by showing it working.”