LightOn Optical Processing Unit: Scaling-up AI and HPC with a Non von Neumann co-processor

07/25/2021 ∙ by Charles Brossollet, et al. ∙ LightOn 0

We introduce LightOn's Optical Processing Unit (OPU), the first photonic AI accelerator chip available on the market for at-scale Non von Neumann computations, reaching 1500 TeraOPS. It relies on a combination of free-space optics with off-the-shelf components, together with a software API allowing a seamless integration within Python-based processing pipelines. We discuss a variety of use cases and hybrid network architectures, with the OPU used in combination of CPU/GPU, and draw a pathway towards "optical advantage".

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

In recent years, a number of photonic chips for AI computations have emerged [6, 4, 15]

, taking advantage of high bandwidth, high parallelism and low energy consumption. Some of the most advanced designs are based on integrated photonics, typically implementing generic matrix-vector multiplications at GHz rates. These approaches are well suited to applications such as convolutional neural networks for edge computing, but are intrinsically limited to small dimensional signals.

Here, we take a different approach, and target heavy data-center computations involving extremely high-dimensional signals - up to 1 million. These data appear in many modern Machine Learning applications, such as Graph Neural Networks, Natural Language Processing - based on ”transformers” such as GPT-3 -, or neural view synthesis. At these sizes, the ”von Neumann bottleneck” becomes more acute, as matrix sizes may outsize the RAM limits, especially in GPUs. We here introduce LightOn Appliance, released March 7th, 2021, based on the Optical Processing Unit (OPU) technology.

Ii LightOn’s Optical Processing Unit

The OPU leverages light scattering [16] to perform, in the analog domain, Random Projections, i.e. the multiplication of input vectors by a fixedrandom matrix

, whose entries follow an independent and identically distributed complex Gaussian distribution. The output is

, with element-wise non-linearity . The built-in non-linearity can also be suppressed by interferometric measurements, leading to . The benefits of the OPU comes from the dimensionality of the data, the speed at which these computations are made, and the low power consumption. In the LightOn Appliance OPU, (binary) and (8-bit) scale up to dimension 1 million and 2 million, respectively, and independent computations can be made at 1.9 kHz, for a power consumption of 30 W. It thus reaches 1500 TeraOPS, or 50 TeraOPS / W.

Fig. 1: The hybrid data processing architecture, featuring LightOn’s OPU as an external co-processor.

The OPU operates in a ”Non von Neumann” regime: although the weights of the matrix are fixed by design, they are accessed instantly, at no energy cost: plays the role of a large read-only memory (terabytes equivalent), that can be used in matrix multiplications, literally at the speed of light and in a passive way. Speed limitations and power consumption arise as a result of communication and formatting, D/A and A/D conversion, and laser power. In contrast to von Neumann architectures, where computing time and memory requirements scale with the size of the data, i.e. for a matrix-vector multiplication, the computation time is here independent on the data size. At large - typically above -, this NvN operation gets faster - but more importantly allows direct single-chip implementation on larger signals without reaching RAM limits.

Hardware

The LightOn Appliance OPU is packaged as a 2U rackable device, linked to its host server through Gen2 x4 external PCIe, as shown on Fig. 1. It contains a single compact photonic core, custom FPGA boards for data i/o, a laser and power supply. All components, including light modulators and detectors, are mass produced for consumer markets.

Software

The software layer has been designed to offer a smooth experience to Machine Learning experts, without any knowledge in photonics. The custom API library LightOnML

, integrated in Python, provides pre-processing functions for different types of input data. This API is compatible with Pytorch and Scikit-learn.

Iii Hybrid Computing Architectures

ML applications

Fig. 2 displays some neural network architectures that use the OPU in hybrid computing pipelines, such as for Natural Language Processing, change-point detection in multi-dimensional time series [7], molecular dynamics [2], event classification in particle physics, graph neural networks [3] as well as more fundamental studies: supervised random projections or kernel computations[14]. Interestingly, some properties are due to the analog nature of the OPU, such as increased robustness from adversarial attacks [1]. More details can be found on LightOn’s blog [10], and public GitHub source code repository [13]

. As an example of typical speedup, in a Transfer Learning experiment, using the OPU for a dense layer between convolutional features and ridge regression leads to

speedups and energy savings compared to the same code on CPU/GPU only, with the same final accuracy. This example [12] can be run directly on the LightOn Cloud. Finally, let us emphasize the particular case of Direct Feedback Aligment [9], where the OPU random projections are used in the feedback loop, as an alternative to back-propagation training. This represents, to our knowledge, the only optical training applied to large-scale ( 1 million parameters) modern Neural Network architectures, including Graph Neural Networks [8], or transformers.

Fig. 2: Different Neural Network architectures taking advantage of LightOn’s OPU - position indicated by the ”flare” logo in the hybrid processing pipeline. Other arrows indicate computations performed by CPU or GPU.

HPC applications: Accelerated Linear Algebra

Randomized Numerical Linear Algebra is a widely studied technique, to speed-up large computations in various HPC applications such as inverse problems or finance. Here, we only discuss how the OPU technology offers an alternative view, and refer to the companion study [5] for details. At the simplest level, for a large random matrix , one has (up to normalization). A matrix-vector product can be approximated in the compressed domain: , assuming that is fat , with . With the OPU, the products (pre-computed once, assuming is fixed) and can be performed efficiently. Finally, one is left with computing in the compressed domain. At sizes where the OPU random projection takes negligible time, approximate matrix-vector multiplication is performed with a speedup . Fig. 3 shows that optimized OPU pipelines provide approximate results close to full precision randomization. The same principle has been applied to Randomized SVD [11], that can serve as a basis for recommender systems. For large dense matrices, such methods may represent the only practical alternative.

Fig. 3: Approximate matrix-vector multiplications (from [5]). Left: experimental verification of . Right: approximation vs. compression ratio, comparison of baseline numerical approximation with different OPU schemes

Iv Conclusion: towards “optical advantage”

In many ML / HPC computing tasks, not all coefficients need to be updated. Free space photonics is currently the most promising way to leverage the Non von Neumann principle at scale, with instantaneous and energy-passive access to trillion size coefficient arrays. With LightOn’s OPU, this technology is now mature, seamlessly integrated in standard computing pipelines - as a complement to standard CPU / GPU programmable chips. Here, we have demonstrated a few examples of hybrid computing. As data and models become larger and larger, the benefit of such technologies becomes clearer: we believe that, in order to scale up already massive language models such as GPT-3, it offers a unique pathway to “optical advantage” - i.e. the use of a ”beyond pure silicon” technology in business-relevant computations, that would otherwise require dedicated supercomputers.

References

  • [1] A. Cappelli and al. Adversarial robustness by design through analog computing and synthetic gradients. arXiv preprint 2101.02115. Cited by: §III.
  • [2] A. Chatelain and al. (2020) Online change point detection in molecular dynamics with optical random features. arXiv:2006.08697. Cited by: §III.
  • [3] H. Ghanem and al. (2021) Fast graph kernel with optical random features. In IEEE ICASSP, Cited by: §III.
  • [4] X. Guo and al

    End-to-end optical backpropagation for training neural networks

    .
    arXiv:1912.12256. Cited by: §I.
  • [5] D. Hesslow and al. Photonic co-processors in HPC: using LightOn OPUs for randomized numerical linear algebra. In Hot Chips 33, 2021, Cited by: Fig. 3, §III.
  • [6] T. W. Hughes and al (2018) Training of photonic neural networks through in situ backpropagation and gradient measurement. Optica 5(7). Cited by: §I.
  • [7] N. Keriven and al. (2020) NEWMA: a new method for scalable model-free online change-point detection. IEEE Trans. Sig. Proc. vol. 68. Cited by: §III.
  • [8] J. Launay and al. Hardware beyond backpropagation: a photonic co-processor for direct feedback alignment. In NeurIPS workshops, 2020, Cited by: §III.
  • [9] J. Launay and al. Light-in-the-loop: using a photonics co-processor for scalable training of neural networks. In IEEE Hot Chips 32, 2020, Cited by: §III.
  • [10] LightOn blog website. Note: https://www.lighton.ai/blog/ Cited by: §III.
  • [11] LightOn documentation: recommender system using randomized svd. Note: https://docs.lighton.ai/examples/randomized_svd.html Cited by: §III.
  • [12] LightOn documentation: transfer learning. Note: https://docs.lighton.ai/examples/transfer_learning.html Cited by: §III.
  • [13] LightOn public github repository. Note: https://github.com/lightonai/ Cited by: §III.
  • [14] R. Ohana and al. Kernel computations from large-scale random features obtained by optical processing units. In ICASSP 2020, Cited by: §III.
  • [15] C. Ramey (2020)

    Silicon photonics for artificial intelligence acceleration

    .
    In IEEE Hot Chips 32, Cited by: §I.
  • [16] A. Saade and al. Random projections through multiple optical scattering: approximating kernels at the speed of light. In ICASSP 2016, Cited by: §II.