Supervised Learning in Spiking Neural Networks with Phase-Change Memory Synapses

05/28/2019 ∙ by S. R. Nandakumar, et al. ∙ New Jersey Institute of Technology ibm 0

Spiking neural networks (SNN) are artificial computational models that have been inspired by the brain's ability to naturally encode and process information in the time domain. The added temporal dimension is believed to render them more computationally efficient than the conventional artificial neural networks, though their full computational capabilities are yet to be explored. Recently, computational memory architectures based on non-volatile memory crossbar arrays have shown great promise to implement parallel computations in artificial and spiking neural networks. In this work, we experimentally demonstrate for the first time, the feasibility to realize high-performance event-driven in-situ supervised learning systems using nanoscale and stochastic phase-change synapses. Our SNN is trained to recognize audio signals of alphabets encoded using spikes in the time domain and to generate spike trains at precise time instances to represent the pixel intensities of their corresponding images. Moreover, with a statistical model capturing the experimental behavior of the devices, we investigate architectural and systems-level solutions for improving the training and inference performance of our computational memory-based system. Combining the computational potential of supervised SNNs with the parallel compute power of computational memory, the work paves the way for next-generation of efficient brain-inspired systems.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 3

page 5

page 7

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

Results

.1 SNN learning experiment

Figure 2: SNN training problem. The audio signal is passed through a silicon cochlea chip to generate spike streams. These spike streams are sub-sampled and applied as input to train the single layer SNN. The desired spike response from the networks representing the images ( pixels) corresponding to the characters in the audio is also shown.

The training problem and the network we used for the experiment are illustrated in Fig. 2. The learning task of the network is to recognize and translate audio signals corresponding to spoken alphabets into corresponding images, with all information encoded in the spike domain, as described below. An audio signal captured when a human speaker utters the characters ‘IBM’ (Eye..Bee..Em) is converted to a set of spike streams using a Silicon cochlea chipLiu et al. (2014) and the resulting 132 spike streams (representing the signal components in 64 frequency bands) are subsampled to an average spike rate of 10 Hz to generate the binary spike inputs to the network (see Methods for more details). A raster plot of the generated spikes is shown in Fig. 2. At the output of the network, there are 168 spiking neurons, with the spike in each neuron representing the instantaneous pixel intensity of the image corresponding to the input audio signal. The desired spike stream from each output neuron is obtained from a Poisson random process whose arrival rate is chosen to be proportional to the corresponding pixel intensities in the images (1412 pixels showing the characters ‘I’, ‘B’, and ‘M’), inspired by similar statistical distributions observed in animal retinaUzzell and Chichilnisky (2004). Each image has an average duration of 230 ms and is mapped to the corresponding time window in the audio signal. The network hence receives 132 spike streams corresponding to the audio signals and is connected to 168 spiking neurons at the output, corresponding to the pixels of the image. In the experiment, the synaptic strength between the input streams and the output neurons is represented using the conductance of the PCM devices.

An input spike, arriving at time on an input synapse, triggers a current flow into the output neuron. The synaptic current in response to each spike is modeled as multiplied by synaptic weight , where is the Heaviside step function (with ms and ms). The sum of all the weighted currents are integrated by leaky-integrate and fire (LIF) neurons to determine a voltage analogous to the membrane potential of the biological neurons. When this voltage exceeds a threshold, it is reset to a resting potential and a spike is assumed to be generated. During the course of training, PCM conductance values read from hardware are used to calculate the synaptic currents and the neuronal dynamics are implemented in software. A supervised training algorithm is used to determine the desired weight updates such that the observed spikes from the SNN are at the desired time instances. The weight updates will be implemented by modulating the corresponding PCM conductance values by applying a sequence of programming pulses. We avoid verifying if the observed conductance change matches the desired update. This blind programming scheme (without expensive read-verify) is expected to be the norm of computational memory based learning systems in the future and in this study we experimentally evaluate the potential of analog PCM conductance to precisely encode spike time information in SNNs.

.2 Phase-change memory synapse

For our on-chip training experiment, we used a prototype chip containing more than one million doped-GeSbTe (GST) based PCM devices fabricated in 90 nm CMOS technology nodeClose et al. (2010). The GST dielectric has a lower resistivity in its poly-crystalline state and a high resistivity in its amorphous phase. An amorphous region is created around the narrow bottom electrode via a melt-quench process. Its conductance can be gradually increased by a sequence of partial-SET pulses applied to the device. A threshold switching phenomenon permits large current flowing though the amorphous volume to increase its temperature and to initiate crystal growth. We have characterized the crystal growth driven conductance evolution in the PCM array and have created statistically accurate modelsNandakumar et al. (2018). The PCM models are used to pre-validate the experiment and to evaluate methods to improve training performance.

While the conductance increment (SET) operation in PCM can be gradual and accumulative, the melt-quench driven conductance decrement (RESET) process is non-accumulative. This leads to an asymmetric update behavior in conductance increase and decrease, necessitating the use of the standard differential configuration for weight updatesSuri et al. (2011). In this scheme, each network weight is realized as the difference of two PCM conductance values and ( where is a scaling factor to be implemented in the peripheral circuit of the computational memory array). This allows both increment and decrement of the to be implemented as partial-SET operations on and , respectively. This differential configuration improves the symmetry of weight updates and partially compensates the conductance driftSuri et al. (2013). Further improvement in conductance change granularity, stochasticity, and drift behavior can be achieved via a multi-PCM configurationBoybat et al. (2018a, b). In our training experiment, both the and are realized as the sum of four PCM devices. For each synaptic update desired by the training algorithm, only one of the four devices is programmed, chosen cyclically so that on average all devices receive approximately equal number of update pulses Boybat et al. (2018a). The energy overhead from the multiple devices per synapse is not expected to be significant since PCM devices can be read with low energy (1 – 100 fJ per device) Le Gallo et al. (2017) and only one of the devices is programmed per update as in a conventional synapse. Although we are increasing the area for each synapse, it is worth noting that typical computational memory based design area for neural networks are dominated by the circuits for peripheral neurons rather than the synapse. Moreover, PCM devices have been shown to scale to nanoscale dimensions Xiong et al. (2011) and through technology scaling, the synaptic area could reduce significantly Choi et al. (2012). Thus in our implementation, each synapse is realized using 8 PCM devices, making a total of 177,408 devices to represent the weights of 22,176 synapses in the network.

.3 Training algorithm

The supervised training of SNNs is a challenging task as the gradient descent based backpropagation algorithms do not apply directly due to the non-differentiable dynamical behavior of spiking neurons (i.e., the membrane potential encounters a discontinuity at the point of spike). One approach to circumvent this limitation is to train a continuous-valued ANN using standard backpropagation algorithm and then convert it into a SNN

Cao, Chen, and Khosla (2015); Diehl et al. (2016); Rueckauer et al. (2017). However, in this method, the input data and neuron activations in the ANN are translated to spike rates in the SNN, losing the advantage of precise time-based signal encoding, and necessitating longer processing times leading to sub-par performance and energy efficiencyPfeiffer and Pfeil (2018). Also, the unconstrained training of the floating point synapses without taking into account the non-idealities of analog memory devices will lead to further loss in accuracy when the the trained weights are transferred to nanoscale synapses in hardware. Moreover, training approaches that implement back-propagation in SNNs using approximate derivatives of the membrane potential around the time of spikes are also aimed at minimizing cost functions, which have been described in terms of the output spike rate rather than precise spike timesLee, Delbruck, and Pfeiffer (2016); Woźniak, Pantazi, and Eleftheriou (2018). Encoding events using precise spike times could be more efficient as it leads to sparse computations and low latencies for decision makingBohte, La Poutré, and Kok (2002); Crotty and Levy (2005); Gütig and Sompolinsky (2006); Wang et al. (2016); Merolla et al. (2014).

Recently, several approximate spike time based supervised training algorithms have been proposed of varying computational complexity that have demonstrated various degrees of success in benchmark problems in machine learning. Among these, SpikeProp

Bohte, La Poutré, and Kok (2002) is designed to generate single spikes, TempotronGütig and Sompolinsky (2006) uses a non-event driven error computation, and ReSuMePonulak and Kasiński (2010) and NormADAnwani and Rajendran (2015) (with relatively higher convergence rate) are designed to generate spikes at precise time instances via spike driven weight updates. In our experiment, we use the normalized approximate descent (NormAD) algorithm which has been successful in achieving high classification accuracy for the MNIST hand-written digit recognition problemKulkarni and Rajendran (2018). According to this algorithm, the weight updates are computed in an event-driven manner, using the relation

(1)

where is the learning rate, is the pattern duration, and is the difference between desired and observed binary spike trains. is obtained by convolving the input spike stream with and an approximate impulse response of the LIF neuron (see Methods). The weight updates are computed only at the time instants corresponding to a spike generated by the learning network, or the instances where a spike was desired (i.e., when

). These are accumulated over the training pattern duration (one epoch) and is used to modulate the network weights. The

s were converted to desired conductance changes using the scaling factor . The desired conductance changes lying in the interval [0.1, 1.5] S were mapped to amplitudes of 50 ns programming current pulses from 40 A to 130 A. The smaller conductance changes were neglected. The conductance updates during the training were performed by blindly applying the programming pulses without verifying if the observed conductance change matches the desired update.

.4 Training performance

Figure 3: Training experiment using PCM devices. a Simulated training accuracy as a function of the number of devices in a multi-PCM synapse (92.5% maximum accuracy). Accuracy is defined as the fraction of the spike events in the desired pattern corresponding to which a spike was generated from respective output neurons within a certain time interval. The lower bound of the shaded lines correspond to 5 ms interval, the middle line to 10 ms and the upper bound to 25 ms. b Accuracy as a function of training epochs from the experiment using on-chip PCM devices. Each synapse was realized using 8 PCM devices in differential configuration. The corresponding training simulation using the PCM model shows excellent agreement with the experimental result. The experiment, PCM model, and the reference floating point (FP64) training achieve maximum accuracies of 85.7%, 87%, and 98.9% respectively for the 25 ms error tolerance. c The raster plot of the desired and observed spike trains from the trained network. A visualization of the character images whose pixel intensities are generated from the observed spike rates is also shown above the raster plot.

First, we used the PCM model to pre-validate and optimize the training scheme. Fig. 3a shows the improvement in network training accuracy as the number of PCM devices used per synapse increases (in differential configuration). The performance of the network is evaluated using an accuracy metric defined as the percentage of the number of spikes out of a total of 987 in the desired pattern which have an observed spike from the SNN within a certain time interval. In the line plot of accuracy with shaded bounds, the lower bound, middle line, and the upper bound respectively correspond to spike time tolerance intervals of 5 ms, 10 ms and 25 ms. Note that the average output spike rate for each of the character duration was less than 20 Hz corresponding to an inter-arrival time of 50 ms, and the task of the network is to create spikes each one of which can be unambiguously associated with one of the target spikes. A fixed weight range obtained from the reference high-precision training was mapped to the sum of conductance of 1 to 16 differential pairs and networks were trained for 100 epochs. Using more number of devices in parallel, with only one programmed at each weight update, permitted smaller weight updates to be programmed more reliably. Although the accuracy was found to improve with more PCM devices, increasing the total number of devices beyond 16 in this problem did not lead to corresponding improvements in accuracy. One possible explanation is that, with more number of devices the observed conductance change (which has a limited dynamic range for a chosen partial-SET programming scheme) captures smaller desired weight changes but neglects the larger desired weight changes, leading to slower convergence. The maximum accuracy observed from the simulation was 92.5% at 25 ms timing error for 16 devices per synapse.

We performed the training experiment with the synapses realized using eight on-chip PCM devices in differential configuration and the SNN generated more than 85% of the spikes within the 25 ms error tolerance (Fig. 3b). The training experimental results agree well with the observations from the PCM model based simulation. The training accuracy obtained from the corresponding 64-bit floating point (FP64) training simulation is also shown for reference. A raster plot of the spikes observed from the SNN trained in the experiment is shown in Fig. 3c as a function of time along with the desired spikes. The character images shown on top are created using the average spike rate for the duration of each character and it indicates that the network was successfully trained to generate the spikes to create the images.

Figure 4: Role of input correlations in network performance. a

Input spike streams with spike times jittered by random amounts uniformly distributed in [-25, 25] ms.

b The cross- correlation between the jittered spike streams are shifted towards zero compared to the experimental input c The simulated training accuracy is improved when trained with input spike streams of reduced correlation.

While the maximum accuracy obtained by the training the PCM devices is limited by the non-linearity, stochasticity, and granularity of its conductance change, we observed that accuracy of the SNN could be further enhanced by modifying the input encoding scheme. The ability of a neural network to classify its inputs depends on the correlation between the inputs. In Fig. 

4 we show using the PCM model simulation that the accuracy gap between those from the experiment and floating point training simulation can be reduced by decreasing the correlation between the input spike streams. We added a random temporal jitter uniformly distributed in the interval [-25, 25] ms to each input spike which causes the cross-correlation between the input spike streams to decrease. The correlation coefficients between the binary spike streams were determined after smoothing them using a Gaussian kernel () of ms. Even though the added jitter only reduces the correlation by a very small amount (Fig. 4b), the training performance improves substantially, suggesting that encoding schemes or network structures that inherently separate input features will improve training performance using low-precision devices such as PCM.

.5 On-chip inference

Figure 5: On-chip inference and drift compensation. a Inference using trained PCM array. Due to conductance drift, the accuracy drops over time (black line). The effect of drift can be compensated by a time-aware scaling method (red line). Percentage accuracy drop over 4s was reduced from 70% to 13.6% at 25 ms error tolerance. b The drifted conductance distribution at the end of 10s is compared with the trained conductance distribution. The effect of scaling on the drifted conductance is also shown. c The images generated by the SNN at the end of training for the audio input (top). The images generated after 10 s (middle). The images generated with drift compensation (bottom). The brightness of each pixel represents the spike rate for the duration of each character.

The ability of a PCM based SNN to retain the trained state is evaluated by reading the conductance at logarithmic intervals of time and using it to calculate the network response. Both the spike-time accuracy (Fig. 5a) and the average spike rate (depicted as pixel intensities in Fig. 5c) drops due to conductance drift over time (Fig. 5b). The conductance decrease causes the net current flowing into the neurons to reduce which result in errors in spike times and a drop in the neuron spike rate. However, we show that this can be compensated via an array level scaling method described below.

The conductance drift in PCM is modeled using the empirical relationKarpov et al. (2007); Le Gallo et al. (2018):

(2)

where is the conductance of the device at time , denotes the time when it received a programming pulse and represent the time instant at which its conductance was last read after programming. Thus, each programming pulse effectively re-initializes the conductance driftBoybat et al. (2018b). As a result, the devices in the array will drift by different amounts during training, based on the instant they received the last weight update. However, once sufficient time has elapsed after training, (i.e., when becomes much larger than all the values of the devices in the array), the conductance drift can be compensated by an array level scaling. In our study, all the measured conductances were scaled by where the is the time elapsed since training and is the effective drift coefficient for the conductance range of the devices in the array. Fig. 5 shows the improvement in spike-time accuracy and spike rate obtained using this scaling method. The drop in accuracy after the compensation can be attributed to the conductance state dependency and variability of the drift coefficient. The inference performance of SNN using PCM synapses could be further improved by reducing the inherent conductance drift from the devices. The recently demonstrated projected-PCM cell architecture with an order of magnitude lower drift coefficient is a promising step in this directionKoelmans et al. (2015); Giannopoulos et al. .

Discussion

One of the key questions that we have evaluated in this work is the ability of stochastic analog memory devices to represent the synaptic strength in SNNs that have been trained to create spikes at precise time instances. As opposed to supervised learning in second generation ANNs whose network output is determined typically by normalization functions such as softmax, learning to generate multiple spikes at precise time instants is a harder problem. Compared to classification problems, the accuracy of which depends only the relative magnitude of the response from one out of several output neurons, the task here is to generate close to 1000 spikes at the desired time instances over a period of 1250 ms from 168 spiking neurons, which are only excited by 132 spike streams. Furthermore, the high correlation observed among several input spike-streams (due to the inherent correlations present in the frequency components of the input audio signal) also makes the learning problem challenging for networks with low-precision weights. While the spike rate based pixel intensity plots clearly represents the desired images, we chose to evaluate our training performance using an accuracy metric defined in terms of spike time tolerance, since SNNs designed to process precise spike times rather than spike rates could be expected to have higher energy efficiency and smaller response time.

At the same time, the observed conductance characteristics of biological synapses is not all too different from those exhibited by our nanoscale phase change memory devices. PCM device conductance changes in a stochastic manner when programmed using partial-SET pulses, and the conductance saturates in approximately pulses corresponding to a bit precision on the order of

bits. Synaptic transmission in biology is also observed to be stochastic and quantized, and previous studies have estimated that biological synapses have a precision of about

bitsBartol et al. (2015).

However, a major difference between our experiments and biology is the dynamics of the spiking neurons and the learning algorithms used for weight updates. We have implemented the highly simplified leaky-integrate-and-fire model with an artificial refractory period to model the neuronal dynamics. Numerous studies have pointed out that neuronal integration and spiking in biology is a highly non-linear and error-tolerant process, with the most striking behavior revealed by the experiments of Mainen and Sejnowski showing extremely reliable spiking behavior of neocortical neurons when excited by noisy input currentsMainen and Sejnowski (1995b). Such non-linear behaviors may also play a key role in allowing biological networks to create spikes with more reliability and precision.

While several algorithms have been developed from mathematical formulations of cost-functions involving spike rates and spike times, the mechanisms employed by nature to achieve the same task are still not well-understood. Most of the neuroscience literature focuses on local learning rules such as hebbain plasticity, STDP, triplet-STDP, etc. It is not clear how these different local unsupervised learning rules come together to enable biological networks to encode and process information using precise spike times. Neverthless, the artificial algorithms being developed are achieving increasing amounts of success in showing software-equivalent performance in several common benchmark tasks in machine learning.

In summary, we analyzed the potential of the PCM devices to realize synapses in SNNs that can learn to generate spikes at precise time instances via large scale (approximately 180,000 PCMs) supervised training and inference experiments and simulations. We proposed several strategies to improve the performance of these PCM based learning networks to compensate for the device level non-idealities. For example, synapse update granularity improved via multi-PCM configurations can improve the training accuracy. Also, the performance drop during inference due to the conductance drift could be compensated via array level scaling based on a global factor which is a function of the time elapsed since training alone. We successfully demonstrate that in spite of its state-dependent conductance update and drift behavior, PCM synaptronic networks could be trained to generate spikes with a few milliseconds of precision in SNNs. In conclusion, PCM based computational memory presents a promising candidate to realize energy efficient bio-mimetic parallel architectures for processing time encoded SNNs in real time.

Methods

Audio to spike conversion

The silicon cochlea chip has 64 band-pass filters with frequency bands logarithmically distributed from 8 Hz to 20 kHz and generates spikes representing left and right channels. Further, due to the synchronous delta modulation scheme used to create the spikes, there were on-spikes and off-spikes. The silicon cochlea generated spikes with a time resolution of 1 s. The spikes were further sub-sampled to a time resolution of 0.1 ms. The final input spike streams used for the training experiments have an average spike rate of 10 Hz. Combining all the filter responses with non-zero spikes for left and right channels and the on and off spikes, there are 132 input spike streams.

Neuron model

The SNN output neurons were modeled using leaky-integrate and fire (LIF) model. Its membrane potential is given by the differential equation

where is the membrane capacitance, is the leak conductance, is the leak reversal potential, and is the net synaptic current flowing into the neuron. When exceed a threshold voltage , is reset to and a spike is assumed to be generated. Once a spike is generated, the neuron is prevented from creating another spike within a short time period called refractory period . For the training experiment, we used pF, nS, mV, mV, ms. For the NormAD training algorithm, the approximate impulse response of the LIF neuron is given as where and is the Heaviside step function. During training, the neuron responses were simulated using ms time resolution.

PCM platform

The experimental platform is built around a prototype chip of 3 million PCM cells. The PCM devices are based on doped-GeSbTe integrated in 90 nm CMOS technology Breitwisch et al. (2007). The fabricated PCM cell area is 50 F (F is the feature size for the 90 nm technology node), and each memory device is connected to two parallel 240 nm wide n-type FETs. The chip has circuitry for cell addressing, ADC for readout, and circuits for voltage or current mode programming.

The PCM chip is interfaced with the Matlab workstation via FPGA boards and a high-performance analog-front-end (AFE) board. AFE implements digital to analog converters, electronics for power supplies, and voltage and current references. FPGA board implements digital logic for interfacing PCM and AFE board and perform data acquisition. A second FPGA board has an embedded processor and Ethernet unit for overall system control and data management.

Experiment

The SNN training problem was initially simulated using double-precision (FP64) synapses in the Matlab simulation environment. The weight range for the SNN was approximately in the range [-6000, 6000]. To map the weights to the PCM conductance values in a multi-PCM configuration, the conductance range contribution from each device is assumed to be [0.1 S, 8 S]. The conductance values are read from the hardware using a constant read voltage of 0.3 V, scaled to the network weights and are used for matrix-vector multiplications in the software simulator. When different number of PCM device are used per synapse, a scaling factor is determined such that the total conductance map to the same weight range. The weight updates determined from the training algorithm at the end of an epoch is programmed to the PCM devices using partial-SET pulses of a duration 50 ns and amplitudes in the range [40 A, 130 A]. The device conductance values are read after each epoch and is used to update the SNN synapse values. Since each conductance values are read and programmed in series, each training epoch was emulated in an average of 6.3 s.

For inference, the PCM conductance values were read at logarithmic time intervals after 100 epochs of training and the effect of compensation schemes were evaluated in the software simulator.

References

  • Lecun, Bengio, and Hinton (2015) Y. Lecun, Y. Bengio,  and G. Hinton, “Deep learning,” Nature 521, 436–444 (2015).
  • Merolla et al. (2014) P. A. Merolla, J. V. Arthur, R. Alvarez-Icaza, A. S. Cassidy, J. Sawada, F. Akopyan, B. L. Jackson, N. Imam, C. Guo, Y. Nakamura, et al., “A million spiking-neuron integrated circuit with a scalable communication network and interface,” Science 345, 668–673 (2014).
  • Davies et al. (2018) M. Davies, N. Srinivasa, T.-H. Lin, G. Chinya, Y. Cao, S. H. Choday, G. Dimou, P. Joshi, N. Imam, S. Jain, Y. Liao, C.-K. Lin, A. Lines, R. Liu, D. Mathaikutty, S. McCoy, A. Paul, J. Tse, G. Venkataramanan, Y.-H. Weng, A. Wild, Y. Yang,  and H. Wang, “Loihi: A Neuromorphic Manycore Processor with On-Chip Learning,” IEEE Micro 38, 82–99 (2018).
  • Burr et al. (2017) G. W. Burr et al., “Neuromorphic computing using non-volatile memory,” Advances in Physics: X 2, 89–124 (2017).
  • Le Gallo et al. (2018) M. Le Gallo, A. Sebastian, R. Mathis, M. Manica, H. Giefers, T. Tuma, C. Bekas, A. Curioni,  and E. Eleftheriou, “Mixed-precision in-memory computing,” Nature Electronics 1, 246–253 (2018).
  • Sebastian et al. (2018) A. Sebastian, M. Le Gallo, G. W. Burr, S. Kim, M. BrightSky,  and E. Eleftheriou, “Tutorial: Brain-inspired computing using phase-change memory devices,” Journal of Applied Physics 124, 111101 (2018).
  • Xia and Yang (2019) Q. Xia and J. J. Yang, “Memristive crossbar arrays for brain-inspired computing,” Nature materials 18, 309 (2019).
  • Lichtsteiner, Posch, and Delbruck (2008) P. Lichtsteiner, C. Posch,  and T. Delbruck, “A 128128 120 dB 15 s Latency Asynchronous Temporal Contrast Vision Sensor,” IEEE Journal of Solid-State Circuits 43, 566–576 (2008).
  • Liu et al. (2014) S. C. Liu, A. Van Schaik, B. A. Minch,  and T. Delbruck, “Asynchronous binaural spatial audition sensor with Channel output,” IEEE Transactions on Biomedical Circuits and Systems 8, 453–464 (2014).
  • Bi and Poo (1998) G.-q. Bi and M.-m. Poo, “Synaptic modifications in cultured hippocampal neurons: dependence on spike timing, synaptic strength, and postsynaptic cell type,” The Journal of Neuroscience 18, 10464–10472 (1998).
  • Burr et al. (2015) G. W. Burr et al., “Experimental demonstration and tolerancing of a large-scale neural network (165 000 synapses) using phase-change memory as the synaptic weight element,” IEEE Transactions on Electron Devices 62, 3498–3507 (2015).
  • Boybat et al. (2018a) I. Boybat, M. Le Gallo, S. R. Nandakumar, T. Moraitis, T. Parnell, T. Tuma, B. Rajendran, Y. Leblebici, A. Sebastian,  and E. Eleftheriou, “Neuromorphic computing with multi-memristive synapses,” Nature Communications 9, 2514 (2018a)1711.06507 .
  • Ambrogio et al. (2018) S. Ambrogio, P. Narayanan, H. Tsai, R. M. Shelby, I. Boybat, C. di Nolfo, S. Sidler, M. Giordano, M. Bodini, N. C. P. Farinha, B. Killeen, C. Cheng, Y. Jaoudi,  and G. W. Burr, “Equivalent-accuracy accelerated neural-network training using analogue memory,” Nature 558, 60–67 (2018).
  • Kuzum et al. (2011) D. Kuzum, R. G. Jeyasingh, B. Lee,  and H.-S. P. Wong, “Nanoelectronic programmable synapses based on phase change materials for brain-inspired computing,” Nano letters 12, 2179–2186 (2011).
  • Jackson et al. (2013) B. L. Jackson, B. Rajendran, G. S. Corrado, M. Breitwisch, G. W. Burr, R. Cheek, K. Gopalakrishnan, S. Raoux, C. T. Rettner, A. Padilla, et al., “Nanoscale electronic synapses using phase change devices,” ACM Journal on Emerging Technologies in Computing Systems (JETC) 9, 12 (2013).
  • Tuma et al. (2016) T. Tuma, M. Le Gallo, A. Sebastian,  and E. Eleftheriou, “Detecting correlations using phase-change neurons and synapses,” IEEE Electron Device Letters 37, 1238–1241 (2016).
  • Sidler et al. (2017) S. Sidler, A. Pantazi, S. Woźniak, Y. Leblebici,  and E. Eleftheriou, “Unsupervised learning using phase-change synapses and complementary patterns,” in International Conference on Artificial Neural Networks (Springer, 2017) pp. 281–288.
  • Diehl and Cook (2015) P. Diehl and M. Cook, “Unsupervised learning of digit recognition using spike-timing-dependent plasticity,” Frontiers in Computational Neuroscience 9, 99 (2015).
  • Mainen and Sejnowski (1995a) Z. F. Mainen and T. J. Sejnowski, “Reliability of spike timing in neocortical neurons.” Science (New York, N.Y.) 268, 1503–6 (1995a).
  • Reich et al. (1997)

    D. S. Reich, J. D. Victor, B. W. Knight, T. Ozaki,  and E. Kaplan, “Response variability and timing precision of neuronal spike trains in vivo,” 

    Journal of Neurophysiology 77, 2836–2841 (1997), pMID: 9163398, https://doi.org/10.1152/jn.1997.77.5.2836 .
  • Uzzell and Chichilnisky (2004) V. J. Uzzell and E. J. Chichilnisky, “Precision of spike trains in primate retinal ganglion cells,” Journal of Neurophysiology 92, 780–789 (2004), pMID: 15277596, https://doi.org/10.1152/jn.01171.2003 .
  • Maass (1997) W. Maass, “Noisy Spiking Neurons with Temporal Coding have more Computational Power than Sigmoidal Neurons,” Advances in Neural Information Processing Systems 9 9, 211–217 (1997).
  • Close et al. (2010) G. F. Close et al., “Device, circuit and system-level analysis of noise in multi-bit phase-change memory,” in IEEE International Electron Devices Meeting (IEDM) (IEEE, 2010) pp. 29.5.1–29.5.4.
  • Nandakumar et al. (2018) S. Nandakumar, M. Le Gallo, I. Boybat, B. Rajendran, A. Sebastian,  and E. Eleftheriou, “A phase-change memory model for neuromorphic computing,” Journal of Applied Physics 124, 152135 (2018).
  • Suri et al. (2011) M. Suri, O. Bichler, D. Querlioz, O. Cueto, L. Perniola, V. Sousa, D. Vuillaume, C. Gamrat,  and B. DeSalvo, “Phase change memory as synapse for ultra-dense neuromorphic systems: Application to complex visual pattern extraction,” in Electron Devices Meeting (IEDM), 2011 IEEE International (2011) pp. 4.4.1–4.4.4.
  • Suri et al. (2013) M. Suri, D. Garbin, O. Bichler, D. Querlioz, D. Vuillaume, C. Gamrat,  and B. Desalvo, “Impact of PCM resistance-drift in neuromorphic systems and drift-mitigation strategy,” Proceedings of the 2013 IEEE/ACM International Symposium on Nanoscale Architectures, NANOARCH 2013 , 140–145 (2013).
  • Boybat et al. (2018b) I. Boybat, S. R. Nandakumar, M. L. Gallo, B. Rajendran, Y. Leblebici, A. Sebastian,  and E. Eleftheriou, “Impact of conductance drift on multi-pcm synaptic architectures,” in 2018 Non-Volatile Memory Technology Symposium (NVMTS) (2018) pp. 1–4.
  • Le Gallo et al. (2017) M. Le Gallo, A. Sebastian, G. Cherubini, H. Giefers,  and E. Eleftheriou, “Compressed sensing recovery using computational memory,” in IEEE International Electron Devices Meeting (IEDM) (IEEE, 2017) pp. 28–3.
  • Xiong et al. (2011) F. Xiong, A. D. Liao, D. Estrada,  and E. Pop, “Low-power switching of phase-change materials with carbon nanotube electrodes,” Science 332, 568–570 (2011).
  • Choi et al. (2012) Y. Choi, I. Song, M. Park, H. Chung, S. Chang, B. Cho, J. Kim, Y. Oh, D. Kwon, J. Sunwoo, J. Shin, Y. Rho, C. Lee, M. G. Kang, J. Lee, Y. Kwon, S. Kim, J. Kim, Y. Lee, Q. Wang, S. Cha, S. Ahn, H. Horii, J. Lee, K. Kim, H. Joo, K. Lee, Y. Lee, J. Yoo,  and G. Jeong, “A 20nm 1.8v 8gb PRAM with 40MB/s program bandwidth,” in Proc. IEEE International Solid-State Circuits Conference (ISSCC) (2012) pp. 46–48.
  • Cao, Chen, and Khosla (2015)

    Y. Cao, Y. Chen,  and D. Khosla, “Spiking Deep Convolutional Neural Networks for Energy-Efficient Object Recognition,” 

    International Journal of Computer Vision 113, 54–66 (2015)arXiv:1502.05777 .
  • Diehl et al. (2016)

    P. U. Diehl, G. Zarrella, A. Cassidy, B. U. Pedroni,  and E. Neftci, “Conversion of Artificial Recurrent Neural Networks to Spiking Neural Networks for Low-power Neuromorphic Hardware,” 

     (2016)arXiv:1601.04187 .
  • Rueckauer et al. (2017) B. Rueckauer, I.-A. Lungu, Y. Hu, M. Pfeiffer,  and S.-C. Liu, “Conversion of Continuous-Valued Deep Networks to Efficient Event-Driven Networks for Image Classification,” Frontiers in Neuroscience 11, 1–12 (2017).
  • Pfeiffer and Pfeil (2018) M. Pfeiffer and T. Pfeil, “Deep learning with spiking neurons: Opportunities and challenges,” Frontiers in Neuroscience 12, 774 (2018).
  • Lee, Delbruck, and Pfeiffer (2016) J. H. Lee, T. Delbruck,  and M. Pfeiffer, “Training deep spiking neural networks using backpropagation,” Frontiers in Neuroscience 10, 508 (2016).
  • Woźniak, Pantazi, and Eleftheriou (2018) S. Woźniak, A. Pantazi,  and E. Eleftheriou, “Deep Networks Incorporating Spiking Neural Dynamics,”  , 1–9 (2018)arXiv:1812.07040 .
  • Bohte, La Poutré, and Kok (2002) S. M. Bohte, H. La Poutré,  and J. N. Kok, “Error-Backpropagation in Temporally Encoded Networks of Spiking Neurons,” Neurocomputing 48, 17–37 (2002).
  • Crotty and Levy (2005) P. Crotty and W. B. Levy, “Energy-efficient interspike interval codes,” Neurocomputing 65-66, 371–378 (2005).
  • Gütig and Sompolinsky (2006) R. Gütig and H. Sompolinsky, “The tempotron: a neuron that learns spike timing-based decisions.” Nature neuroscience 9, 420–8 (2006).
  • Wang et al. (2016) B. Wang, W. Ke, J. Guang, G. Chen, L. Yin, S. Deng, Q. He, Y. Liu, T. He, R. Zheng, Y. Jiang, X. Zhang, T. Li, G. Luan, H. D. Lu, M. Zhang, X. Zhang,  and Y. Shu, “Firing Frequency Maxima of Fast-Spiking Neurons in Human, Monkey, and Mouse Neocortex,” Frontiers in Cellular Neuroscience 10, 1–13 (2016).
  • Ponulak and Kasiński (2010) F. Ponulak and A. Kasiński, “Supervised learning in spiking neural networks with resume: Sequence learning, classification, and spike shifting,” Neural Computation 22, 467–510 (2010), pMID: 19842989, https://doi.org/10.1162/neco.2009.11-08-901 .
  • Anwani and Rajendran (2015) N. Anwani and B. Rajendran, “Normad-normalized approximate descent based supervised learning rule for spiking neurons,” in International Joint Conference on Neural Networks (IJCNN) (IEEE, 2015) pp. 1–8.
  • Kulkarni and Rajendran (2018) S. R. Kulkarni and B. Rajendran, “Spiking neural networks for handwritten digit recognition—Supervised learning and network optimization,” Neural Networks 103, 118–127 (2018).
  • Karpov et al. (2007) I. V. Karpov, M. Mitra, D. Kau, G. Spadini, Y. A. Kryukov,  and V. G. Karpov, “Fundamental drift of parameters in chalcogenide phase change memory,” Journal of Applied Physics 102, 124503 (2007).
  • Le Gallo et al. (2018) M. Le Gallo, D. Krebs, F. Zipoli, M. Salinga,  and A. Sebastian, “Collective structural relaxation in phase-change memory devices,” Advanced Electronic Materials 4, 1700627 (2018).
  • Koelmans et al. (2015) W. W. Koelmans, A. Sebastian, V. P. Jonnalagadda, D. Krebs, L. Dellmann,  and E. Eleftheriou, “Projected phase-change memory devices,” Nature communications 6, 8181 (2015).
  • (47) I. Giannopoulos, A. Sebastian, M. Le Gallo, V. P. Jonnalagadda, M. Sousa, M. N. Boon,  and E. Eleftheriou, “8-bit Precision In-Memory Multiplication with Projected Phase-Change Memory,” .
  • Bartol et al. (2015) T. M. Bartol, C. Bromer, J. P. Kinney, M. A. Chirillo, J. N. Bourne, K. M. Harris,  and T. J. Sejnowski, “Hippocampal Spine Head Sizes are Highly Precise,” bioRxiv , 016329 (2015).
  • Mainen and Sejnowski (1995b) Z. Mainen and T. Sejnowski, “Reliability of spike timing in neocortical neurons,” Science 268, 1503–1506 (1995b).
  • Breitwisch et al. (2007) M. Breitwisch, T. Nirschl, C. Chen, Y. Zhu, M. Lee, M. Lamorey, G. Burr, E. Joseph, A. Schrott, J. Philipp, et al., “Novel lithography-independent pore phase change memory,” in IEEE Symposium on VLSI Technology (IEEE, 2007) pp. 100–101.

Acknowledgments

We would like to thank Dr. Shih-Chii Liu from the Institute of Neuroinformatics, University of Zurich, for technical assistance with converting the audio input to spike streams using a silicon cochlea chip. A.S. acknowledges support from the European Research Council through the European Unions Horizon 2020 Research and Innovation Program under grant number 682675. B.R. was supported partially by the National Science Foundation through the grant 1710009 and Semiconductor Research Foundation through the grant 2717.001.

Author Contributions

B.R. and A.S. conceived the main ideas in the project. S.R.N and A.S. designed the experiment and S.R.N performed the simulations. I.B. and S.R.N performed the experiment. M.L.G and I.B. provided critical insights. S.R.N and B.R. co-wrote the manuscript with inputs from other authors. B.R., A.S., and E.E. directed the work.