## Introduction

Over the course of the last decade, artificial intelligence and artificial neural networks have become extremely valuable tools in the industry, research, and everyday life. Deep neural networks excel in many tasks, including image and speech recognition, language processing, or autonomous driving

[LeCun_DeepLearning]. Substantial part of the success can be attributed to the development of effective teaching algorithms, and the widely applied backpropagation method [Rumelhart_Backpropagation] in particular. While the effectiveness of existing solutions is unquestionable, it is commonly believed that further progress, in particular in edge computing applications, can only be sustained if software simulations of neural networks are replaced by neuromorphic systems, where neural structure of the network is implemented in hardware [Huang_ReviewNeuromorphic, Misra_ANNSurvey, Grollier_review]. This is dictated by the necessity to develop systems characterized by high speed and high energy efficiency, which is difficult to achieve in the von Neumann computer architecture, where huge amounts of data is transmitted back and forth between memory and computing units.In recent years, neuromorphic computing has been realized in many systems, including CMOS electronics, memristors, and photonic systems [Merolla, Loihi, furber2014spinnaker, benjamin2014neurogrid, prucnal2017neuromorphic, Wetzstein_review, Grollier_review, Shastri_review, Feldmann_AllOpticalSpikingNetwork, tait2017neuromorphic, Vandoorne, Huang_ReviewNeuromorphic, Lin, antonik2019large, Soljacic_DeepLearning]. In particular, recent realizations using exciton-polaritons in optical microcavities achieved state-of-the-art accuracy in the MNIST handwritten digit recognition benchmark [Opala_NeuromorphicComputing, Ballarini_Neuromorphic, Mirek_Neuromorphic]. Exciton-polaritons are composite quasiparticles that result from strong quantum coupling of semiconductor excitons and cavity photons [Kavokin_Book, Carusotto_QuantumFluids]. They are characterized by efficient transport via the photonic component and strong interparticle interactions due to the matter component. These properties make them promising candidates for future applications in efficient information processing [Opala_NeuromorphicComputing, Baumberg_SubfemtojouleSwitches, Bramati_SpinSwitches, Sanvitto_TwoFluid, Sanvitto_Transistor, Savvidis_TransistorSwitch, Lagoudakis_RTOrganicTransistor, Baranikov_AllOptical, Liew_Neurons, Espinosa_perceptrons].

In most hardware neural network implementations, tunability of individual neurons is limited, which makes the implementation of efficient teaching algorithms such as backpropagation difficult. One often designs the system according to the reservoir computing paradigm

[Jaeger_HarnessingESN, Maass_RealTimeComputing, Lukosevicius_RCapproachestoRNN], in which the majority of synaptic weights are static and unchanged during the teaching phase, while only the synaptic connections in the last layer, implemented in software, are adjusted. Despite its simplicity and limitations, this approach was successful in implementing various machine learning tasks, including speech recognition, time series prediction, and image recognition

[Vandoorne, Brunner_ParallelPhotonicIPGigabyte, Torrejon, Du_MemristorRC, Tanaka]. Notably, reservoir networks achieved very high efficiency in photonic systems in terms of speed of data processing [Larger_HiSpeedReservoirComputing]. On the other hand, due to the limited synaptic tunability, a large number of reservoir nodes is usually required to achieve high levels accuracy, and even in the case of large networks it cannot match software simulations accuracy in many machine learning tasks.Here, we take a different approach that allows to use the backpropagation algorithm to teach neural networks in which nonlinear hardware nodes are non-tunable. We demonstrate a system that includes exciton-polariton nodes that exhibit a strong nonlinear input-output dependence that can be measured precisely. We show that such a precise characterisation of the nodes can be used to improve the teaching. The idea is to physically separate the tunable linear weights from the non-tunable nonlinear nodes, whose only task is to apply a nonlinear activation function. We propose a new method of network training, where the uncontrollable and static activation functions of each of the nodes is determined experimentally before the teaching phase, which allows the application of the backpropagation algorithm offline.

In our proof-of-principle demonstration, we realize a single hidden layer feedforward neural network using optically excited exciton-polariton nodes, where both input and output weights are applied electronically. Despite the experimental imperfections, we achieve the MNIST inference accuracy of 96%, close to that of a software simulation carried out in the Tensorflow package

[TensorFlow]. We emphasize that the linear weighting could be realized in principle all-optically, as has been demonstrated previously [Brunner_ReinforcementLearning, Zuo_AllOpticalNN, chang2018hybrid, farhat1985optical, goodman1978fully, gruber2000planar, lu1989two, Spall_OVMM, Zhou_LargeScale]. Our work opens the way to more complex realizations, including deep and recurrent neuromorphic networks in systems with limited hardware tunability [Huang_ReviewNeuromorphic].## Results

Implementation of machine learning typically consists of two separate stages. In the teaching stage, the system is taught to classify or predict using data from the teaching dataset. In this stage the synaptic connection weights are tuned, with the aim to increase the accuracy of predictions. The second stage, called the inference (or testing) stage, begins after all teaching samples have been processed. In this stage, synaptic weights are no longer tuned, and the system is not learning any more. The system is processing a testing dataset that consists of samples that it has not seen before. The accuracy of predictions in the inference stage is the most important benchmark of the network.

While teaching is usually a time consuming and demanding process, once taught, the system is able to make predictions for an arbitrary number of samples in the inference stage. In many practical applications, it is the inference stage that requires larger amount of resources, and may take indefinitely long time. For example, language processing models can be used to process arbitrarily large number of sentences without any need for readjustment after initial teaching. Therefore, from the practical point of view, it is valid to seek methods that improve the efficiency of inference, even when not increasing the efficiency of teaching at the same time [SurveyAccelerators].

In our approach we focused on the efficient hardware implementation of inference. To this end, we add an additional initial stage before teaching – the measurement of the response of each physical node – that allows to construct an accurate software model describing each of the hardware neurons. This allows to perform teaching in software, while inference is realized in hardware. The entire process is schematically depicted in Fig. 1. A standard artificial neuron realizes two functions: (i) tunable synaptic weighting of inputs and (ii) a non-tunable activation function. We separate physically function (i) from function (ii), the latter being realized by a set of non-tunable nonlinear nodes. In our case, these nodes are optically excited exciton-polariton modes of an optical microcavity. The knowledge of node response, measured in the initial stage as depicted in Fig. 1(a), allows to perform the teaching stage entirely in software, using the backpropagation method as shown in Fig. 1(b). The resulting synaptic weights are implemented physically only in the inference stage, as in Fig. 1(c). We choose a simple single-hidden-layer feedforward neural network model to facilitate the backpropagation teaching procedure. Importantly, in contrast to reservoir computing method, we adjust all synaptic weights of the network, including those in the input layer. We will demonstrate that this allows to significantly improve the accuracy of predictions.

Figure 1(a) presents the scheme of the experimental setup. A phase-only spatial light modulator (SLM) is used to modulate the intensity profile of a laser beam into an array of bright spots with individually tunable intensity. These spots, distributed in a 33 square lattice, are imaged on the surface of a semiconductor microcavity at resonance with the exciton-polariton energy. Due to the significant exciton-polariton nonlinearities, the light intensity transmitted through the microcavity follows a sigmoid-like behavior as a function of the input intensity [Ballarini_Neuromorphic], as shown in the inset of Fig. 1(a). The light intensity of each polariton node, labeled (A,B,C,..I) in the figures, is measured by a CCD camera and collected on a computer. Each node is spatially separated from the others to ensure that the output intensity of a polariton node depends only on its input intensity, independently from the input intensity of the other polariton nodes. This configuration is chosen for simplicity, as it facilitates the teaching procedure. However, our method is not limited to the case of isolated nodes. In the case of interconnected nodes, the system could be more accurate, even if requiring longer times and more complex methods to perform teaching.

The teaching stage, depicted in Fig. 1(b), is realized entirely in software. We use the Tensorflow package to simulate a single-layer feedforward neural network shown schematically in the Figure. We consider the MNIST handwritten digit dataset, which contains 60 000 samples in the teaching set and 10 000 samples in the testing set, each sample being a 2828 grayscale image of a digit between 0 and 9 and the corresponding label [LeCun_MNIST]. The number of nodes in the hidden layer of the network is equal to , where is the number of experimental shots that will be used to process a single digit in the inference stage. The activation function of each hidden node is given by the node response measured in the initial stage. In the output layer, the softmax function is used to choose the class corresponding to one of the ten digits. We use the backpropagation algorithm to teach the network, which provides optimal values of synaptic weights in both in the hidden layer and output layer connections.

In the inference stage, shown in Fig. 1(c), we use the spatial light modulator to encode input data from the testing set. Additionally, data is multiplied by input synaptic weights, which is in our case realized in software. Light intensity incident on each microcavity node is a sum of inputs multiplied by the corresponding input synaptic weights. The role of the microcavity is to apply the nonlinear activation function via exciton-polariton interactions. The transmitted light corresponds therefore to the neuron outputs. Further, the intensity measured on a CCD camera is collected, multiplied by output synaptic weights in software and a softmax function is used to determine the predicted class.

In Figure 2(a), we show the measured response of each polariton node, together with the corresponding analytical fits. The analytical functions are necessary to apply the backpropagation teaching method, which requires the knowledge of the derivative of an activation function in order to calculate the update of input synaptic weights according to the formula

(1) |

where is the learning rate, and

are the input and output vectors, respectively, and

is the target output vector. Activations of neurons in the hidden layer are given by the equation

, where is an activation function and is the number of inputs. We approximate the polariton response by sigmoid analytical activation functions(2) |

where , , and are parameters obtained by fitting to the experimental data, where

. Overall, we fit 9 sigmoid functions to the response of 9 polariton nodes in the

network. Figure 2(b) shows the results of the experimentally obtained accuracy at the inference stage, i.e. predictions for 500 digits from the testing set. For comparison, in Figure 2(c) we show the accuracy obtained in a Tensorflow software simulation of the same network and with the same dataset. The accuracy achieved in the experiment of 96.2% is comparable to the result obtained with a binarized network in

[Mirek_Neuromorphic] and higher than that obtained in a reservoir computing approach with a similar number of nodes [Ballarini_Neuromorphic]. In comparison to the binarized network [Mirek_Neuromorphic], the setup presented here is much simpler, the number of nodes is much lower (tens instead of tens of thousands), and there is no need to create optical binary gates first. Consequently it has the potential to achieve higher speed of data processing and is more scalable.We emphasize that this excellent result has been obtained with a relatively small network, including only 90 nodes in the hidden layer (). This results from the use of backpropagation, which adjusts synaptic weights in all the network layers. Achieveing a similar accuracy with a reservoir computing network requires a much larger number of nodes in the hidden layer [Opala_NeuromorphicComputing]. To investigate the advantage of our approach in more detail, in Fig. 3(a) we compare the accuracy of our network, simulated in software, to the accuracy of an "extreme leraning machine" (ELM) network [Huang_ExtremeLearningMachines, Conti_ELM]. The latter network has an identical architecture to ours, however the input synaptic weights are random and not adjusted in the teaching phase. Since only the synaptic weights in the output layer are adjusted, it can be considered a simplified feedforward analog of a reservoir computing network. Additionally, we compare these results with the accuracy level obtained using a linear classification method (taught using logistic regression), where nonlinear transformation of data is absent. It is clear that the use of backpropagation to adjust the weights results in a significant improvement of accuracy as compared to the ELM network, and in contrast to reservoir computing, it surpasses the linear classification method even for a network with a very small number of nodes.

We also consider the effect of noise on the performance of the network. In Figure 3(b) we show accuracy as a function of the amplitude of noise. We add to the activation functions random variables ,

, that have Gaussian distributions with zero mean and standard deviations equal to

(3) |

where is the amplitude of the i-th polariton neuron response and is the relative amplitude of noise. The accuracy typically remains large if the noise is weak, even in the case of small neural networks. Only large amplitude noise can significantly deteriorate the network performance, while larger networks appear to be more resistant to strong noise.

We emphasize that while in our experimental implementation the tunable weights are implemented in software, there are known all-optical methods for vector-matrix multiplication. These methods have been realized in many experiments, both in the case of coherent and incoherent light [Brunner_ReinforcementLearning, Zuo_AllOpticalNN, chang2018hybrid, farhat1985optical, goodman1978fully, gruber2000planar, lu1989two]. In particular, multiplication of a vector by a matrix containing 3 000 elements was implemented recently [Spall_OVMM]. Thus, the inference stage could be realized in an all-optical system, without use of any electronic elements. Such an optical network would be completely passive, as it would not require any external power supply except for the laser source. Moreover, replacing the spatial light modulator with ultrafast modulators working at the GHz data rate, one could take advantage of the very short, picosecond timescales of the optical system. Another possibility is to use an on-chip integrated version of the system, where transmission is tuned by optoelectronic modulators using the Stark effect acting on the exciton component [Sanvitto_Stark].

## Model

Our implementation is a dense feed-forward neural network that contains a single hidden layer between the input and output layers. The hidden layer contains

neurons, which activations were obtained by fitting functions (2) to the measured polariton node responses. The particular activation function depends on the physical node that it corresponds to. The neural network transforms the input vector into the output according to the following equation(4) |

where is the softmax function, is the weight matrix of connections between the output and the hidden layer. Function applies neuron activation functions . Matrix contains the weight between the input and the hidden layer, while and

are the input and output bias vectors, respectively. We optimise the weight matrices and bias vectors using the ADAM optimiser. We use the supervised learning method and the backpropagation algorithm. The application of the backpropagation method is possible thanks to the simulation of neuron activations by sigmoid functions, which are continuous and differentiable.

## Methods

The semiconductor microcavity used in these experiments is a planar cavity with three 8 nm InGaAs quantum wells embedded between two AlAs/GaAs distributed Bragg reflectors, and kept at a temperature of K. The high quality-factor () of this sample results in a polariton lifetime of 10 ps. In particular the region of the sample we used exhibits an exciton-cavity detuning of meV and polaritons are pumped by a continuous wave laser tuned at nm.

To shape the profile of the laser beam we used a spatial light modulator (SLM), a liquid crystal display with a surface area of approximately 2 cm and a 19201080 resolution. Applying a voltage to the cells changes the orientation of the liquid crystals and in turn the effective refractive index seen by the incident light. The control of the birefringence of each pixel allows us to spatially design the amplitude and phase of the reflected wave.

To create the node lattice on the sample we reconstructed the real space image of the SLM, reduced in size by a factor of 50. On top of the displayed lattice pattern we used a blazed grating pattern that serves two purposes: on the one hand it allows us to block the zeroth-order reflection from the SLM that brings the non-modulated part of the laser, on the other hand by changing the diffraction efficiency of the grating we are able to tune the intensity of the individual nodes. Furthermore a phase difference of is applied between each node, for a better separation.

The pattern in momentum space is focused onto the microcavity sample by an objective lens with a focal length of cm and increasing the pump power, i.e. increasing the grating efficiency, the interactions of polaritons bring the dispersion at resonance with the laser frequency and momentum, resulting in a sigmoidal response function. Finally, the emission is collected with cm aspheric lens and recorded on a coupled charge device (CCD).

Data availability The data that support the findings of this study are available from the corresponding authors upon reasonable request.

Code availability The codes are available from the corresponding authors upon reasonable request.

Acknowledgements MM acknowledges support from National Science Center, Poland grant 2017/25/Z/ST3/03032 under the QuantERA program. AO acknowledges support from National Science Center, Poland grant 2019/35/N/ST3/01379. BP acknowledges support from National Science Center, Poland grant 2020/37/B/ST3/01657.