Photonic-electronic NIC serves live machine learning inference requests

A burgeoning growth in machine learning applications is occurring alongside shrinking computing resources. As computers approach the limits of their power according to Moore’s law, the need to design platforms that support the computational demands of machine learning applications intensifies.

Lightning, a hybrid computing platform developed by an MIT research team, is the first photonic computing system to serve machine learning inference requests in real time. Lightning is a reconfigurable, photonic-electronic, smart network interface card (smartNIC) that is both fast and energy-efficient.

Unlike their electronic counterparts, photonic computing devices lack the memory or instructions to control dataflows. Lightning removes this obstacle and ensures that data moves smoothly between a computer’s electronic and photonic components. It uses a fast data path to feed traffic from the NIC into the photonic domain without creating bottlenecks.

MIT researchers introduced Lightning, a reconfigurable, photonic-electronic, smartNIC that serves real-time deep neural network inference requests at 100 Gbit/s. Courtesy of Alex Shipps/MIT CSAIL via Midjourney.

To achieve a fast, smooth data path, the system leverages a reconfigurable count-action abstraction that keeps track of the required computation operations of each inference packet. The count-action abstraction controls access to the data moving through the system. It counts the number of operations in each task and triggers the execution of the next task without interrupting the dataflow.

The count-action abstraction connects electronics and photonics. The information carried by electrons is translated into photons that work at light speed to assist in the completion of an inference task. Then, the photons are converted back to electrons to relay the information to the computer.

According to researcher Zhizhen Zhong, photonic computing provides significant advantages when it comes to bulky linear computing tasks such as matrix multiplication, but for other tasks, it needs an assist from electronics. This, he said, creates a significant amount of data to be exchanged between the photonic components and the electronic components in order to complete tasks such as machine learning inference requests.

By combining the speed of photonics with the dataflow control capabilities of electrons, Lightning can serve real-time, deep neural network inference requests at an impressive speed of 100 Gbit/s.

“Controlling this dataflow between photonics and electronics was the Achilles’ heel of past state-of-the-art photonic computing works,” Zhong said. “Even if you have a superfast photonic computer, you need enough data to power it without stalls. Otherwise, you’ve got a supercomputer just running idle without making any reasonable computation.”

The researchers said that previous attempts to develop a photonic-electronic computing platform used a “stop-and-go” approach. In that approach, control software makes all the decisions about the movement of data, slowing the dataflow.

“Building a photonic computing system without a count-action programming abstraction is like trying to steer a Lamborghini without knowing how to drive,” said professor Manya Ghobadi. “You probably have a driving manual in one hand, then press the clutch, then check the manual, then let go of the brake, then check the manual, and so on. This is a stop-and-go operation because, for every decision, you have to consult some higher-level entity to tell you what to do.

“But that’s not how we drive,” Ghobadi continued. “We learn how to drive and then use muscle memory without checking the manual or driving rules behind the wheel. Our count-action programming abstraction acts as the muscle memory in Lightning. It seamlessly drives the electrons and photons in the system at runtime.”

The photons used by Lightning move faster and generate less heat than electrons, allowing Lightning to operate at a faster frequency and more efficiently than nonhybrid computers.

To measure the system’s energy efficiency, the researchers compared it to standard graphics processing units, data processing units, smartNICs, and other accelerators by synthesizing a Lightning chip. They found that Lightning was more efficient than the other accelerators when completing inference requests.

“Our synthesis and simulation studies show that Lightning reduces machine learning inference power consumption by orders of magnitude, compared to state-of-the-art accelerators,” researcher Mingran Yang said.

In all, the researchers evaluated the system’s performance using four platforms: a hybrid photonic-electronic prototype, an emulation environment, chip synthesis, and large-scale simulation. The Lightning prototype demonstrated the feasibility of performing 8-bit photonic, multiply-accumulate operations with 99.25% accuracy.

Machine learning services like ChatGPT and BERT require heavy computing resources. As a fast, cost-effective option for serving real-time, deep neural network inference requests, Lightning offers a potential upgrade for data centers that wish to reduce their machine learning model’s carbon footprint, while accelerating the inference response time for users.