Optical Transformers

02/20/2023
by   Maxwell G. Anderson, et al.
0

The rapidly increasing size of deep-learning models has caused renewed and growing interest in alternatives to digital computers to dramatically reduce the energy cost of running state-of-the-art neural networks. Optical matrix-vector multipliers are best suited to performing computations with very large operands, which suggests that large Transformer models could be a good target for optical computing. To test this idea, we performed small-scale optical experiments with a prototype accelerator to demonstrate that Transformer operations can run on optical hardware despite noise and errors. Using simulations, validated by our experiments, we then explored the energy efficiency of optical implementations of Transformers and identified scaling laws for model performance with respect to optical energy usage. We found that the optical energy per multiply-accumulate (MAC) scales as 1/d where d is the Transformer width, an asymptotic advantage over digital systems. We conclude that with well-engineered, large-scale optical hardware, it may be possible to achieve a 100 × energy-efficiency advantage for running some of the largest current Transformer models, and that if both the models and the optical hardware are scaled to the quadrillion-parameter regime, optical computers could have a >8,000× energy-efficiency advantage over state-of-the-art digital-electronic processors that achieve 300 fJ/MAC. We analyzed how these results motivate and inform the construction of future optical accelerators along with optics-amenable deep-learning approaches. With assumptions about future improvements to electronics and Transformer quantization techniques (5× cheaper memory access, double the digital–analog conversion efficiency, and 4-bit precision), we estimated that optical computers' advantage against current 300-fJ/MAC digital processors could grow to >100,000×.

READ FULL TEXT

page 2

page 5

page 6

page 7

page 21

page 24

page 26

page 27

research
04/27/2021

An optical neural network using less than 1 photon per multiplication

Deep learning has rapidly become a widespread tool in both scientific an...
research
07/31/2023

The physics of optical computing

There has been a resurgence of interest in optical computing over the pa...
research
05/31/2023

DOTA: A Dynamically-Operated Photonic Tensor Core for Energy-Efficient Transformer Accelerator

The wide adoption and significant computing resource consumption of atte...
research
07/07/2023

ITA: An Energy-Efficient Attention and Softmax Accelerator for Quantized Transformers

Transformer networks have emerged as the state-of-the-art approach for n...
research
08/19/2022

Nonlinear Optical Data Transformer for Machine Learning

Modern machine learning models use an ever-increasing number of paramete...
research
08/07/2023

Analysis of Optical Loss and Crosstalk Noise in MZI-based Coherent Photonic Neural Networks

With the continuous increase in the size and complexity of machine learn...
research
08/03/2023

The Data Movement Bottleneck: Theoretical Shortcomings of Analog Optical Fourier Transform and Convolution Computing Accelerators

Modern computing tasks are constrained to having digital electronic inpu...

Please sign up or login with your details

Forgot password? Click here to reset