Deep physical neural networks enabled by a backpropagation algorithm for arbitrary physical systems

04/27/2021 ∙ by Logan G. Wright, et al. ∙ cornell university 76

Deep neural networks have become a pervasive tool in science and engineering. However, modern deep neural networks' growing energy requirements now increasingly limit their scaling and broader use. We propose a radical alternative for implementing deep neural network models: Physical Neural Networks. We introduce a hybrid physical-digital algorithm called Physics-Aware Training to efficiently train sequences of controllable physical systems to act as deep neural networks. This method automatically trains the functionality of any sequence of real physical systems, directly, using backpropagation, the same technique used for modern deep neural networks. To illustrate their generality, we demonstrate physical neural networks with three diverse physical systems-optical, mechanical, and electrical. Physical neural networks may facilitate unconventional machine learning hardware that is orders of magnitude faster and more energy efficient than conventional electronic processors.



There are no comments yet.


page 4

page 7

page 26

page 30

page 35

page 38

page 40

page 41

Code Repositories


Instructional implementation of Physics-Aware Training (PAT) with demonstrations on simulated experiments.

view repo
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Like many historical developments in artificial intelligence

brooks1991intelligence; hooker2020hardware

, the widespread adoption of deep neural networks (DNNs) was enabled in part by synergistic hardware. In 2012, building on numerous earlier works, Krizhevsky et al. showed that the backpropagation algorithm for stochastic gradient descent (SGD) could be efficiently executed with graphics-processing units to train large convolutional DNNs

krizhevsky2012imagenet to perform accurate image classification. Since 2012, the breadth of applications of DNNs has expanded, but so too has their typical size. As a result, the computational requirements of DNN models have grown rapidly, outpacing Moore’s Law schwartz2019green; patterson2021carbon. Now, DNNs are increasingly limited by hardware energy efficiency.

The emerging DNN energy problem has inspired special-purpose hardware: DNN ‘accelerators’ reuther2020survey; markovic2020physics. Several proposals push beyond conventional electronics with alternative physical platforms markovic2020physics, such as optics wetzstein2020inference; shen2017deep; hamerly2019large; shastri2021photonics or memristor crossbar arrays prezioso2015training; hu2014memristor. These devices rely on approximate analogies between the hardware physics and the mathematical operations in DNNs (Fig. 1A). Consequently, their success will depend on intensive engineering to push device performance toward the limits of the hardware physics, while carefully suppressing parts of the physics that violate the analogy, such as unintended nonlinearities, noise processes, and device variations.

More generally, however, the controlled evolutions of physical systems are well-suited to realizing deep learning models. DNNs and physical processes share numerous structural similarities, such as hierarchy, approximate symmetries, redundancy, and nonlinearity. These structural commonalities explain much of DNNs’ success operating robustly on data from the natural, physical world


. As physical systems evolve, they perform, in effect, the mathematical operations within DNNs: controlled convolutions, nonlinearities, matrix-vector operations and so on. We can harness these physical computations by encoding input data into the initial conditions of the physical system, then reading out the results by performing measurements after the system evolves (Fig. 1C). Physical computations can be controlled by adjusting physical parameters. By cascading such controlled physical input-output transformations, we can realize trainable, hierarchical physical computations: deep physical neural networks (PNNs, Fig. 1D). As anyone who has simulated the evolution of complex physical systems appreciates, physical transformations are typically faster and consume less energy than their digital emulations: processes which take nanoseconds and nanojoules frequently require seconds and joules to digitally simulate. PNNs are therefore a route to scalable, energy-efficient, and high-speed machine learning.


Figure 1: Introduction to physical neural networks. A.

Conventional artificial neural networks are built from an operational unit (the layer) that involves a trainable matrix-vector multiplication followed by element-wise nonlinear activation such as the rectifier (ReLU). The weights of the matrix-vector multiplication are adjusted during training so that the ANN implements a desired mathematical operation.

B. By cascading these operational units together, one creates a deep neural network (DNN), which can learn a multi-step (hierarchical) computation. C. When physical systems evolve, they perform computations. We can partition the controllable properties of the system into input data, and control parameters. By changing control parameters, we can alter the physical transformation performed on the input data. In this paper, we consider three examples. In a mechanical system, input data and parameters are encoded into the time-dependent force applied to a suspended metal plate. The physical computation results in an input- and parameter-dependent multimode oscillation pattern, which is read out by a microphone. In a nonlinear optical system, a pulse passes through a nonlinear medium, producing a frequency-doubled output. Input data and parameters are encoded in the amplitude of frequency components of the pulse, and outputs are obtained from the frequency-doubled pulse’s spectrum. In an electronic system, analog signals encode input data and parameters, which interact in nonlinear electronics to produce an output signal. D. Just as hierarchical information processing is realized in DNNs by a sequence of trainable nonlinear functions, we can construct deep physical neural networks (PNNs) by cascading layers of trainable physical transformations. In these deep PNNs, each physical layer implements a controllable function, which can be of a more general form than that of a conventional DNN layer.

Theoretical proposals for physical learning hardware have recently emerged in various fields, such as optics hughes2019wave; Wu2020neuromorphicmetasurface; furuhata2021physical; nakajima2020neural, spintronic nano-oscillators Romera2018spintronicPNNvowel, nanoelectronic devices Chen2020geneticalgorithmSi; euler2020deep, and small-scale quantum computers Mitarai2018quantumcircuitlearning; Benedetti2019parameterizedQCL; havlivcek2019supervised; fosel2021quantum. A related trend is physical reservoir computing Tanaka2019reservoircomputingreview; Nakajima2020reservoircomputingreview, in which the information transformations of a physical system ‘reservoir’ are not trained but are instead linearly combined by a trainable output layer. Reservoir computing harnesses generic physical processes for computation, but its training is inherently shallow: it does not allow the hierarchical process learning that characterizes modern deep neural networks. In contrast, the newer proposals for physical learning hardware hughes2019wave; Wu2020neuromorphicmetasurface; Mitarai2018quantumcircuitlearning; Benedetti2019parameterizedQCL; Chen2020geneticalgorithmSi; euler2020deep; Romera2018spintronicPNNvowel; stern2020supervised; furuhata2021physical; nakajima2020neural overcome this by training the physical transformations themselves.

There have been few experimental studies on physical learning hardware, however, and those that exist have relied on gradient-free learning algorithms Chen2020geneticalgorithmSi; Romera2018spintronicPNNvowel; bueno2018reinforcement. While these works have made critical steps, it is now appreciated that gradient-based learning algorithms, such as the backpropagation algorithm, are essential for the efficient training and good generalization of large-scale DNNs poggio2020theoretical. To solve this problem, proposals to realize backpropagation on physical hardware have appeared hughes2019wave; lin2018all; Wu2020neuromorphicmetasurface; euler2020deep; furuhata2021physical; hermans2015trainable; hughes2018training; lopez2021self. While inspirational, these proposals nonetheless often rely on restrictive assumptions, such as linearity or dissipation-free evolution. The most general proposals hughes2019wave; Wu2020neuromorphicmetasurface; furuhata2021physical; euler2020deep may overcome such constraints, but still rely on performing training in silico, i.e., wholly within numerical simulations. Thus, to be realized experimentally, and in scalable hardware, they will face the same challenges as hardware based on mathematical analogies: intense engineering efforts to force hardware to precisely match idealized simulations.

Here we demonstrate a universal framework to directly train arbitrary, real physical systems to execute deep neural networks, using backpropagation. We call these trained hierarchical physical computations physical neural networks (PNNs). Our approach is enabled by a hybrid physical-digital algorithm, physics-aware training (PAT). PAT allows us to efficiently and accurately execute the backpropagation algorithm on any sequence of physical input-output transformations, directly in situ

. We demonstrate the universality of this approach by experimentally performing image classification using three distinct systems: the multimode mechanical oscillations of a driven metal plate, the analog dynamics of a nonlinear transistor oscillator, and ultrafast optical second harmonic generation. We obtain accurate hierarchical classifiers, which utilize each system’s unique physical transformations, and which inherently mitigate each system’s unique noise processes and imperfections. While PNNs are a radical departure from traditional hardware, they are easily integrated into modern machine learning. We show that PNNs can be seamlessly combined with conventional hardware and neural network methods via physical-digital hybrid architectures, in which conventional hardware learns to opportunistically cooperate with unconventional physical resources using PAT. Ultimately, PNNs provide a basis for hardware-physics-software codesign in artificial intelligence

markovic2020physics, routes to improving the energy efficiency and speed of machine learning by many orders of magnitude, and pathways to automatically designing complex functional devices, such as functional nanoparticles peurifoy2018nanophotonic, robots veit2016residual, and smart sensors Zhou2020smartsensor; Martel2020smartsensor; Mennel2020smartsensor; Duarte2008singlepixelimaging.

Ii Results

Figure 2 shows an example physical neural network based on the propagation of broadband optical pulses in quadratic nonlinear optical media, i.e. ultrafast second harmonic generation (SHG)111Technically speaking, propagation of intense broadband pulses in quadratic nonlinear optical media gives rise not only to second-harmonic generation, but also to sum-frequency and difference-frequency processes among the multitude of optical frequencies. For simplicity, we refer to these complex nonlinear optical dynamics as SHG.. The ultrafast SHG process realizes a physical computation, a nonlinear transformation of the input near-infrared pulse’s spectrum ( 800 nm center-wavelength) to a new optical spectrum in the ultraviolet ( 400 nm center-wavelength). To use and control this physical computation, input data and parameters are encoded into sections of the near-infrared pulse’s spectrum by modulating its frequency components using a pulse shaper (Fig. 2A). This modulated pulse then propagates through a nonlinear optical crystal, producing a broadband ultraviolet pulse, whose spectrum is measured to read-out the computation result.

A task routinely used to demonstrate novel machine learning hardware is the classification of spoken vowels according to their formant frequencies shen2017deep; Romera2018spintronicPNNvowel; voweldata. In this task, the machine learning device must learn to predict spoken vowels from 12-dimensional input data vectors of formant frequencies extracted from audio recordings.

To realize vowel classification with SHG, we construct a multi-layer SHG-PNN (Fig. 1B) where the input data for the first physical layer (i.e., the first physical transformation) consists of the vowel formant frequency vector. At the final layer, the SHG-PNN’s predicted vowel is read out by taking the maximum value of 7 spectral bins in the measured ultraviolet spectrum (Fig. 1C). In each layer, the output spectrum is digitally renormalized before being passed to the next layer, along with a trainable digital rescaling (i.e., ). Thus, the classification is achieved almost entirely using the trained nonlinear optical transformations. The results of Fig. 2 show that the SHG-PNN adapts the physical transformation of ultrafast SHG into a hierarchical physical computation, which consists of five sequential, trained nonlinear transformations that cooperate to produce a final optical spectrum that accurately predicts vowels (Fig. 1B,D).


Figure 2: An example physical neural network, implemented experimentally using broadband optical second-harmonic generation (SHG) with shaped femtosecond pulses. A. Inputs to the system are encoded into the spectrum of a laser pulse using a pulse shaper, based on a diffraction grating and a digital micromirror device (see Supplementary Section 2). To control the physical transformations implemented by the broadband nonlinear interactions, we assign a portion of the pulse’s spectrum for trainable parameters (yellow). The result of the physical computation is obtained from the spectrum of the frequency-doubled (blue, 390 nm) output of the SHG process. B. To construct a deep PNN with femtosecond SHG, we take the output of a SHG process and use it as the input to a second SHG process, which has its own independent trainable parameters. This cascading is repeated three more times, resulting in a multilayer PNN with five trainable physical layers ( 250 physical parameters and 10 digital parameters). For the final layer, the spectrum is summed into 7 spectral bins, and the largest-sum bin gives the predicted vowel class. C-D. When this SHG-PNN is trained using PAT (see main text, Fig. 3), it is able to perform classification of vowels to 93% accuracy. C.

The confusion matrix for the PNN on a test data set, showing the labels predicted by the SHG-PNN versus the true (correct) labels.

D. Representative examples of final-layer output for input vowels correctly classified as “ah” and “iy”.

To train PNNs’ parameters, we use physics-aware training (PAT, Fig. 3), an algorithm that allows us to effectively perform the backpropagation algorithm for stochastic gradient descent (SGD) directly on any sequence of physical input-output transformations, such as the sequence of nonlinear optical transformations in Fig. 2. In the backpropagation algorithm, automatic differentiation efficiently determines the gradient of a loss function with respect to trainable parameters. This makes the algorithm

-times more efficient than finite-difference methods for gradient estimation (where

is the number of parameters). PAT has some similarities to quantization-aware training algorithms used to train neural networks for low-precision hardware hubara2017quantized, and feedback alignment lillicrap2016random. PAT can be seen as solving a problem analogous to the ‘simulation-reality gap’ in robotics jakobi1995noise, which is increasingly addressed by hybrid physical-digital techniques howison2021reality.


Figure 3: Introduction to Physics-Aware Training. A. Physics-aware training (PAT) is a hybrid physical-digital algorithm that implements backpropagation to train the controllable parameters of a sequence of dynamical evolutions of physical systems, in situ. PAT’s goal is to train the physical systems to perform machine-learning tasks accurately, even though digital models describing physics are always imperfect, and the systems may have noise and other imperfections. This motivates the hybrid nature of the algorithm, where rather than performing the training solely with the digital model (i.e., in silico), the physical systems are directly used to compute forward passes, which keeps the training on track. For simplicity only one physical layer is depicted in A and we assume a mean-squared error loss function. As with conventional backpropagation, the algorithm generalizes straightforwardly to multiple layers and different loss functions, see Supplementary Section 1. First (1), training input data (e.g., an image) is input to the physical system, along with parameters. Second (2), in the forward pass, the physical system applies its transformation to produce an output. Third (3), the physical output is compared to the intended output (e.g., for an image of an ‘8’, a predicted label of 8) to compute the error. Fourth (4), using a differentiable digital model to estimate the gradients of the physical system(s), the gradient of the loss is computed with respect to the controllable parameters. Finally, (5) the parameters are updated according to the inferred gradient. This process is repeated, iterating over training examples, to reduce the error. B.

Comparison of the validation accuracy versus training epoch with PAT and

in silico training (i.e., training in which the digital model is used for both forward propagation and gradient estimation), for the experimental SHG-PNN depicted in Fig. 2C (5 physical layers). C. Final experimental test accuracy for PAT and in silico training for SHG-PNNs with increasing numbers of physical layers. Due to the compounding of simulation-reality mismatch error through training, in silico training results in accuracy barely distinguishable from random guessing, while PAT permits training the physical device to realize 93% accuracy. As physical layers are added, in silico training fails, but PAT is able to produce accurate physical neural networks. Due to the relative simplicity of the task, and the design of the SHG-PNN (which was chosen to be intuitive and involve unconventional physics, at the expense of information-processing capacity), the PNN’s performance improvement saturates for depth greater than 5 layers.

PAT’s essential advantages stems from the forward pass being executed by the actual physical hardware, rather than by a simulation. Given an accurate model, one could attempt to train by autodifferentiating simulations, then transferring the final parameters to the physical hardware (termed in silico training in Fig. 3). Our data-driven digital model for SHG is remarkably accurate (Supplementary Figure 20) and even includes an accurate noise model (Supplementary Figures 18-19). However, as evidenced by Fig. 3B, in silico training still fails, reaching a maximum vowel-classification accuracy of 40%. In contrast, PAT succeeds, accurately training the SHG-PNN, even when additional layers are added (Fig. 3B-C).

An intuitive motivation for why PAT works is that the training’s optimization of parameters is always grounded in the true optimization landscape by the physical forward pass. With PAT, even if gradients are estimated only approximately, the true loss function is always precisely known. Moreover, the true output from each intermediate layer is also known, so gradients of intermediate physical layers are always computed with respect to correct inputs. In contrast, in any form of in silico training, compounding errors build up through the simulation of each physical layer, leading to a rapidly diverging simulation-reality gap as training proceeds (see Supplementary Section 1 for details). As a secondary benefit, PAT ensures learned models are inherently resilient to noise and other imperfections beyond a digital model, since the change of loss along noisy directions in parameter space will tend to average to zero. This facilitates the learning of noise-resilient (more speculatively, noise-enhanced) models that automatically adapt to device-device variations and brain-like stochastic dynamics markovic2020physics.

PNNs can learn to accurately perform more complex tasks, and can be realized with virtually any physical system. In Figure 4, we present three PNN classifiers for the MNIST handwritten digit classification task, based on three distinct physical systems. For each physical system, we also demonstrate a different PNN architecture, illustrating the wide variety of PNN models possible. In all cases, models were constructed and trained in PyTorch

paszke2017automatic, where each trainable physical transformation called an automated experimental apparatus with a vector of the input data, , and parameters , y=run_exp(x,theta), which is made differentiable by using the digital model for backward operations (see Supplementary Section 1).

In the mechanical PNN (Fig. 4A-D), a metal plate is driven by a time-varying force, which encodes both input data and trainable parameters. We find that the plate’s multimode oscillation enacts a controllable convolution on input data (Supplementary Figures 16-17). Using the oscillating plate’s trainable transformation three times in sequence, we classify 28 by 28 (784-pixel) images input as unrolled, 784-dimensional time series. To control the physical transformations, we train element-wise rescaling of the time-series of forces sent to the plate digitally (i.e., , where and are trainable and is the input image or an output from a previous layer). The PNN achieves 87% test accuracy, close to the optimal performance of a linear classifier lecun1998gradient. To ensure that the digital re-scaling operations are not responsible for a significant part of the physical computation, we repeat the experiment with the physical transformation replaced by an identity operation. When this model is trained, its optimal performance is comparable to random guessing (10%). This shows that most of the PNN’s functionality comes from the controlled physical transformations.

An analog-electronic PNN is implemented with a circuit featuring a transistor (Fig. 4E-H), which results in a noisy, highly-nonlinear transient response (Supplementary Figures 12-13) that is more challenging to accurately model than the oscillating plate’s response (Supplementary Figure 23). Inputs and parameters are realized through the time-varying voltage applied across the circuit and the output is taken as the voltage time-series measured across the inductor and resistor (Fig. 4F). The electronic PNN architecture is similar to the mechanical PNN’s, except that the final prediction is obtained by averaging the predictions of 7 independent, 3-layer PNNs. Since it is nonlinear, the electronic PNN outperforms the mechanical PNN, achieving 93% test accuracy.

Finally, using broadband SHG, we demonstrate a physical-digital hybrid PNN (Fig. 4I-L). The hybrid architecture performs an inference as follows: First, the MNIST image is acted on by a trainable linear input layer, which produces the input to the SHG transformation. The full PNN architecture involves 4 separate channels, each with 2 physical layers, whose outputs are concatenated to produce the PNN’s prediction. The inclusion of the trainable SHG transformations boosts the performance of the digital operations from roughly 90% accuracy to 97%. Since the classification task’s difficulty is nonlinear with respect to accuracy, such an improvement typically requires increasing the digital operations count by at least one order of magnitude lecun1998gradient. This illustrates how a hybrid physical-digital PNN can automatically learn to offload portions of a computation from an expensive digital processor to a fast, energy-efficient physical co-processor.


Figure 4: Image classification with diverse physical systems. We trained PNNs based on three distinct physical systems (mechanical, electronic, and optical) to classify images of handwritten digits. A. A photo and sketch of the mechanical PNN, wherein the multimode oscillations of a metal plate are driven by time-dependent forces that encode the input image data and parameters. B. A depiction of the mechanical PNN multi-layer architecture. To efficiently encode parameters and input data, we make use of digital element-wise rescaling of each input by trainable scaling factors and offsets. C. The validation classification accuracy versus training epoch for the mechanical PNN trained using PAT. Because some digital operations are used in the PNN, we also plot the same curve for a reference model where the physical transformations implemented by the speaker are replaced by identity operations. The PNN with physical transformations reaches nearly 90 accuracy, whereas the digital-only baseline experiment does not exceed random-guessing (10). D. Confusion matrix showing the classified digit label predicted by the mechanical PNN versus the correct result. E-H. The same as A-D, but for a nonlinear electrical PNN based on a transistor embedded in an RLC oscillator. I-L. The same as A-D, for a hybrid physical-digital PNN combining digital linear input layers (trainable matrix-vector multiplications) followed by trainable physical transformations using broadband optical second-harmonic generation. Final test accuracy is 87%, 93% and 97% for mechanical, electrical, and optics-based PNNs respectively.

Iii Discussion and Conclusion

While the above results demonstrate that a variety of physical systems can be trained to perform machine learning, these proof-of-concept experiments leave open practical questions such as: What physical systems are good candidates for physical neural networks (PNNs), and how much can they speed up or reduce the energy consumption of machine learning? One naive approach is to analyze the costs of digitally simulating programmable physical systems relative to evolving them physically, as is done in recent work demonstrating quantum computational advantage arute2019quantum; zhong2020quantum. Of course, such self-simulation advantages

will overestimate PNN advantages for practical tasks, but simulation analysis can still be insightful: it allows us to upper-bound and evaluate how potential advantages scale with physical parameters, to appreciate the mathematical operations that each physical system effectively approximates, and to therefore identify practical tasks each system may be best-suited for. We find that typical self-simulation advantages grow as the connectivity of the PNN’s physical degrees of freedom increases, as the dimensionality of the physics increases, and as the nonlinearity of interactions increases (Supplementary Section 3). Realistic device implementations routinely achieve self-simulation advantages over

() for speed (energy), with much larger values possible by size-scaling. Simulation analysis reveals that candidate PNN systems approximate standard machine learning operations like controllable matrix-vector multiplications, as in multimode wave propagation saade2016random or coupled spintronic oscillators Romera2018spintronicPNNvowel

or electrical networks, as well as less-common operations, like the nonlinear convolutions in ultrafast optical SHG and the higher-order tensor contractions implemented by (multimode) nonlinear wave propagation in Kerr media


Reaching the performance ceiling implied by self-simulation for tasks of practical interest should be possible, but will require a co-design of physics and algorithms markovic2020physics

. On one hand, many candidate PNN systems, such as multimode optical waves, and networks of coupled transistors, lasers or nano-oscillators, exhibit dynamical evolutions that closely resemble standard neural networks and neural ordinary differential equations

chen2018neural (see Supplementary Section 3). For these systems, training PNNs with physics-aware training (PAT) will facilitate the learning of hierarchical physical computations of similar form as those in conventional deep neural networks, but overcoming the imperfect simulations, device variations, and higher-order physical processes that would pose an impenetrable challenge for approaches based on in silico training or approximate mathematical analogies. Since their controlled dynamics are well-known to resemble conventional neural networks, it is likely that PNNs based on these systems will be able to translate self-simulation advantages of and higher into comparable speed-ups and energy efficiency gains on problems of practical interest. As our SHG-based machine learning shows, however, PNNs do not need not be limited to physical operations that resemble today’s popular machine learning algorithms. Since they have not yet been widely studied in machine learning models, it is difficult to estimate how more exotic physical operations like nonlinear-optics-based high-order tensor contractions will translate to practical computations. Nonetheless, if we can learn how to harness them productively, PNNs incorporating these novel operations may realize even more significant computational advantages. In addition, there are clear technical strategies to design PNNs to maximize their practical benefits. Even though the inference stage may account for as much as 90 of energy consumption due to DNNs in commercial deployments jassy2019aws; patterson2021carbon, it will also be valuable to improve PAT to facilitate physics-enhanced or physics-based learning hughes2018training; lopez2021self. Meanwhile, general design techniques like multidimensional layouts, in-place stored parameters, and physical data feedforward will permit scaling PNN complexity with minimal added energy cost (for a full exposition, see Supplementary Section 4).

This work has focused so far on PNNs specifically as accelerators for machine learning, but PNNs are promising for other applications as well. PNNs can perform computations on data within its physical domain, allowing for smart sensors Zhou2020smartsensor; Martel2020smartsensor; Mennel2020smartsensor; Duarte2008singlepixelimaging that pre-process information prior to conversion to the electronic domain (e.g., a low-power, microphone-coupled circuit tuned to recognize specific hotwords). Since the achievable sensitivity and resolution of many sensors is limited by digital electronics, PNN-sensors should have advantages over conventional approaches. More broadly, with PAT, one is simply training the complex functionality of physical systems. While machine learning and sensing are important functionalities, they are but two of many peurifoy2018nanophotonic; howison2021reality that PAT, and the concept of PNNs, could be applied to.


L.G.W., T.O. and P.L.M. conceived the project and methods. T.O. and L.G.W. performed the SHG-PNN experiments. L.G.W. performed the electronic-PNN experiments. M.M.S. performed the oscillating-plate-PNN experiments. T.W., D.T.S., and Z.H. contributed to initial parts of the work. L.G.W., T.O., M.M.S. and P.L.M. wrote the manuscript. P.L.M. supervised the project. L.G.W. and T.O. contributed equally to the work overall. The authors wish to thank NTT Research for their financial and technical support. Portions of this work were supported by the National Science Foundation (award CCF-1918549). L.G.W. and T.W. acknowledge support from Mong Fellowships from Cornell Neurotech during early parts of this work. P.L.M. acknowledges membership of the CIFAR Quantum Information Science Program as an Azrieli Global Scholar. We acknowledge helpful discussions with Danyal Ahsanullah, Vladimir Kremenetski, Edwin Ng, Sebestian Popoff, Sridhar Prabhu, Hidenori Tanaka and Ryotatsu Yanagimoto, and members of the NTT PHI Lab / NSF Expeditions research collaboration, and thank Philipp Jordan for discussions and illustrations.

Competing interests: L.G.W., T.O., M.M.S. and P.L.M. are listed as inventors on a U.S. provisional patent application (No. 63/178,318) on physical neural networks and physics-aware training.

Data and code availability

An expandable demonstration code for applying physics-aware training to train physical neural networks is available at: All data generated and code used for this work is available at