, taking advantage of high bandwidth, high parallelism and low energy consumption. Some of the most advanced designs are based on integrated photonics, typically implementing generic matrix-vector multiplications at GHz rates. These approaches are well suited to applications such as convolutional neural networks for edge computing, but are intrinsically limited to small dimensional signals.
Here, we take a different approach, and target heavy data-center computations involving extremely high-dimensional signals - up to 1 million. These data appear in many modern Machine Learning applications, such as Graph Neural Networks, Natural Language Processing - based on ”transformers” such as GPT-3 -, or neural view synthesis. At these sizes, the ”von Neumann bottleneck” becomes more acute, as matrix sizes may outsize the RAM limits, especially in GPUs. We here introduce LightOn Appliance, released March 7th, 2021, based on the Optical Processing Unit (OPU) technology.
Ii LightOn’s Optical Processing Unit
, whose entries follow an independent and identically distributed complex Gaussian distribution. The output is, with element-wise non-linearity . The built-in non-linearity can also be suppressed by interferometric measurements, leading to . The benefits of the OPU comes from the dimensionality of the data, the speed at which these computations are made, and the low power consumption. In the LightOn Appliance OPU, (binary) and (8-bit) scale up to dimension 1 million and 2 million, respectively, and independent computations can be made at 1.9 kHz, for a power consumption of 30 W. It thus reaches 1500 TeraOPS, or 50 TeraOPS / W.
The OPU operates in a ”Non von Neumann” regime: although the weights of the matrix are fixed by design, they are accessed instantly, at no energy cost: plays the role of a large read-only memory (terabytes equivalent), that can be used in matrix multiplications, literally at the speed of light and in a passive way. Speed limitations and power consumption arise as a result of communication and formatting, D/A and A/D conversion, and laser power. In contrast to von Neumann architectures, where computing time and memory requirements scale with the size of the data, i.e. for a matrix-vector multiplication, the computation time is here independent on the data size. At large - typically above -, this NvN operation gets faster - but more importantly allows direct single-chip implementation on larger signals without reaching RAM limits.
The LightOn Appliance OPU is packaged as a 2U rackable device, linked to its host server through Gen2 x4 external PCIe, as shown on Fig. 1. It contains a single compact photonic core, custom FPGA boards for data i/o, a laser and power supply. All components, including light modulators and detectors, are mass produced for consumer markets.
The software layer has been designed to offer a smooth experience to Machine Learning experts, without any knowledge in photonics. The custom API library LightOnML
, integrated in Python, provides pre-processing functions for different types of input data. This API is compatible with Pytorch and Scikit-learn.
Iii Hybrid Computing Architectures
Fig. 2 displays some neural network architectures that use the OPU in hybrid computing pipelines, such as for Natural Language Processing, change-point detection in multi-dimensional time series , molecular dynamics , event classification in particle physics, graph neural networks  as well as more fundamental studies: supervised random projections or kernel computations. Interestingly, some properties are due to the analog nature of the OPU, such as increased robustness from adversarial attacks . More details can be found on LightOn’s blog , and public GitHub source code repository speedups and energy savings compared to the same code on CPU/GPU only, with the same final accuracy. This example  can be run directly on the LightOn Cloud. Finally, let us emphasize the particular case of Direct Feedback Aligment , where the OPU random projections are used in the feedback loop, as an alternative to back-propagation training. This represents, to our knowledge, the only optical training applied to large-scale ( 1 million parameters) modern Neural Network architectures, including Graph Neural Networks , or transformers.
HPC applications: Accelerated Linear Algebra
Randomized Numerical Linear Algebra is a widely studied technique, to speed-up large computations in various HPC applications such as inverse problems or finance. Here, we only discuss how the OPU technology offers an alternative view, and refer to the companion study  for details. At the simplest level, for a large random matrix , one has (up to normalization). A matrix-vector product can be approximated in the compressed domain: , assuming that is fat , with . With the OPU, the products (pre-computed once, assuming is fixed) and can be performed efficiently. Finally, one is left with computing in the compressed domain. At sizes where the OPU random projection takes negligible time, approximate matrix-vector multiplication is performed with a speedup . Fig. 3 shows that optimized OPU pipelines provide approximate results close to full precision randomization. The same principle has been applied to Randomized SVD , that can serve as a basis for recommender systems. For large dense matrices, such methods may represent the only practical alternative.
Iv Conclusion: towards “optical advantage”
In many ML / HPC computing tasks, not all coefficients need to be updated. Free space photonics is currently the most promising way to leverage the Non von Neumann principle at scale, with instantaneous and energy-passive access to trillion size coefficient arrays. With LightOn’s OPU, this technology is now mature, seamlessly integrated in standard computing pipelines - as a complement to standard CPU / GPU programmable chips. Here, we have demonstrated a few examples of hybrid computing. As data and models become larger and larger, the benefit of such technologies becomes clearer: we believe that, in order to scale up already massive language models such as GPT-3, it offers a unique pathway to “optical advantage” - i.e. the use of a ”beyond pure silicon” technology in business-relevant computations, that would otherwise require dedicated supercomputers.
-  Adversarial robustness by design through analog computing and synthetic gradients. arXiv preprint 2101.02115. Cited by: §III.
-  (2020) Online change point detection in molecular dynamics with optical random features. arXiv:2006.08697. Cited by: §III.
-  (2021) Fast graph kernel with optical random features. In IEEE ICASSP, Cited by: §III.
End-to-end optical backpropagation for training neural networks. arXiv:1912.12256. Cited by: §I.
-  Photonic co-processors in HPC: using LightOn OPUs for randomized numerical linear algebra. In Hot Chips 33, 2021, Cited by: Fig. 3, §III.
-  (2018) Training of photonic neural networks through in situ backpropagation and gradient measurement. Optica 5(7). Cited by: §I.
-  (2020) NEWMA: a new method for scalable model-free online change-point detection. IEEE Trans. Sig. Proc. vol. 68. Cited by: §III.
-  Hardware beyond backpropagation: a photonic co-processor for direct feedback alignment. In NeurIPS workshops, 2020, Cited by: §III.
-  Light-in-the-loop: using a photonics co-processor for scalable training of neural networks. In IEEE Hot Chips 32, 2020, Cited by: §III.
-  LightOn blog website. Note: https://www.lighton.ai/blog/ Cited by: §III.
-  LightOn documentation: recommender system using randomized svd. Note: https://docs.lighton.ai/examples/randomized_svd.html Cited by: §III.
-  LightOn documentation: transfer learning. Note: https://docs.lighton.ai/examples/transfer_learning.html Cited by: §III.
-  LightOn public github repository. Note: https://github.com/lightonai/ Cited by: §III.
-  Kernel computations from large-scale random features obtained by optical processing units. In ICASSP 2020, Cited by: §III.
Silicon photonics for artificial intelligence acceleration. In IEEE Hot Chips 32, Cited by: §I.
-  Random projections through multiple optical scattering: approximating kernels at the speed of light. In ICASSP 2016, Cited by: §II.