1 Introduction
Since the early 2010s, computer vision has been dominated by the introduction of convolutional neural networks (CNNs), which have yielded unprecedented success in previously challenging tasks such as image recognition, image segmentation or object detection, among others. Considering the theory of neural networks was mostly developed decades earlier, one of the main driving factors behind this evolution was the widespread availability of highperformance computing devices and general purpose Graphic Processing Units (GPU). In parallel with the increase in computational requirements
(Strubell et al., 2019), the last decades have seen a considerable development of portable, miniaturized, batterypowered devices, which pose constraints on the maximum power consumption.Attempts at reducing the power consumption of traditional deep learning models have been made. Typically, these involve optimizing the network architecture, in order to find more compact networks (with fewer layers, or fewer neurons per layer) that perform equally well as larger networks. One approach is energyaware pruning, where connections are removed according to a criterion based on energy consumption, and accuracy is restored by finetuning of the remaining weights
(Yang et al., 2017; Molchanov et al., 2016). Other work looks for more efficient network structures through a fullfledged architecture search (Cai et al., 2018). The latter work was one of the winners of the Google “Visual Wake Words Challenge” at CVPR 2019, which sought models with memory usage under 250 kB, model size under 250 kB and perinference multiplyadd count (MAC) under 60 millions.Using spiking neural networks (SNNs) on neuromorphic hardware is an entirely different, and much more radical, approach to the energy consumption problem. In SNNs, like in biological neural networks, neurons communicate with each other through isolated, discrete electrical signals (spikes), as opposed to continuous signals, and work in continuous instead of discrete time. Neuromorphic hardware (Indiveri et al., 2011; Furber, 2016; Thakur et al., 2018; Esser et al., 2016) is specifically designed to run such networks with very low power overhead, with electronic circuits that faithfully reproduce the dynamics of the model in real time, rather than simulating it on traditional von Neumann computers.
The challenge of using SNNs for machine learning tasks, however, is in their training. Mimicking the learning process used in the brain’s spiking networks is not yet feasible, because neither the learning rules, nor the precise fitness functions being optimized are sufficiently well understood, although this is currently a very active area of research
(Marblestone et al., 2016; Richards et al., 2019). Supervised learning routines for spiking networks have been developed
(Bohte et al., 2002; Mostafa, 2017; Nicola and Clopath, 2017; Neftci et al., 2019; Shrestha and Orchard, 2018), but are slow and challenging to use. For applications which have little or no dependence on temporal aspects, it is more efficient to train an analog network (i.e., a traditional, nonspiking one) with the same structure, and transfer the learned parameters onto the SNN, which can then operate through rate coding. In particular, the conversion of pretrained CNNs to SNNs has been shown to be a scalable and reliable process, without much loss in performance (Diehl et al., 2015; Rueckauer et al., 2017; Sengupta et al., 2019). But this approach is still challenging, because the naive use of analog CNN weights does not take into account the specific characteristics and requirements of SNNs. In particular, SNNs are more sensitive than analog networks to the magnitude of the input. Naive weight transfer can, therefore, lead to a silent SNN, or, conversely, to one with unnecessarily high firing rates, which have a high energy cost.Here, we propose a hybrid training strategy which maintains the efficiency of training analog CNNs, while accounting for the fact that the network is being trained for eventual use in SNNs. Furthermore, we include the energy cost of the network’s computations directly in the loss function during training, in order to minimize it automatically and dynamically. We demonstrate that networks trained with this strategy perform better per Joule of energy utilized. While we demonstrate the benefit of optimizing based on energy consumption, we believe this strategy is extendable to any approach that uses backpropagation to train the network, be it through a spiking network or a nonspiking network.
In the following sections, we will detail the training techniques we devised and applied for these purposes. We will test our networks on two standard problems. The first is the MNISTDVS dataset of Dynamic Vision Sensor recordings. DVSs are eventbased sensors, and, as such, the analysis of their recordings is an ideal application of spikebased neural networks. The second is the standard CIFAR10 object recognition benchmark, which provides a reasonable comparison on computation cost to nonspiking networks. For each of these tasks, we will demonstrate the energyaccuracy tradeoff of the networks trained with our methods. We show that significant amounts of energy can be saved with a small loss in performance, and conclude that ours is a viable strategy for training neuromorphic systems with a limited power budget.
2 Materials and Methods
In most stateofthe art neuromorphic architectures with time multiplexed units like Merolla et al. (2014); Davies et al. (2018); Furber et al. (2014), the various states need to be fetched from memory and rewritten. Such operations happen every time a neuron receives a synaptic event. Whenever one of these operations is performed, the neuromorphic hardware consumes a certain amount of energy. For instance in Indiveri and Sandamirskaya (2019) the authors show that this energy consumption is usually of the order of . While there are several other processes that consume power on a neuromorphic device, the bulk of the active power on these devices is used by the synaptic operations. Reducing their number is therefore the most natural way to keep energy usage low.
In this paper we explore strategies to lower synaptic operations and evaluate their effect on the network’s computational performance.
2.1 Training strategies
2.1.1 Parameter scaling
By scaling the weights, biases and/or thresholds of neurons in different layers, we can influence the number of spikes generated in each layer, thereby allowing us to tune the synaptic activity of the model. This is easy to do, even with pretrained weights. For a scaleinvariant network, such as any network whose only nonlinearities are ReLUs, this method attains perfect results, because a linear rescaling of the weights causes a linear rescaling of the output, which gives identical results for classification tasks where we select the class that receives the highest activation.
We use this method as a baseline comparison for our results. We chose to rescale the weights of the first convolutional layer of our network by a variable factor :
which is equivalent to a rescaling of the input signal by the same factor. Note that an increase/decrease in the first layer’s output firing rate causes a correspondent increase/decrease in the activation of all the subsequent layers, and thereby reduces the global energy consumption of the whole network.
However, the aforementioned scaleinvariance property does not hold for the corresponding spiking network, and small activation values could cause discretization errors, or even yield a completely silent spiking network from a perfectly functional analog network.
2.1.2 Synaptic Operation optimization
We measure the activity of the network, for each layer group, in correspondence with the ReLU operations, which effectively correspond to the spikes from an equivalent SNN. We denote the activity of neuron
in layer as . We define the fanout of each group of layers, , as the number of units of layer that receive the signal emitted by a single neuron in layer. This measure is essential in estimating the number of synaptic operations (SynOps)
elicited by each layer:(1) 
We directly add this number to the loss we want to minimize, optionally specifying a target value for the desired number of SynOps:
(2) 
where is the crossentropy loss, is the target label, and is a constant. We will refer to this additional term as SynOp loss. In this work, we will always choose , in order to keep the SynOp loss term normalized independently of .
2.1.3 Quantizationaware training and surrogate gradient
Optimizing for energy consumption with the SynOp loss mentioned above has unintended consequences. During training, the optimizer tries to achieve smaller activations, but cannot account for the fact that, when the activations are too small, discretization errors become more prominent. To solve this issue, we introduce a form of quantization during the training itself, in order to mimic the behaviour of a spiking network in the context of an analog one. To this end, we turn all ReLU activation functions into “quantized” (i.e. stepwise) ReLUs, which additionally truncate the inputs to integers, as follows:
(3) 
where indicates the floor operation. This choice introduces a further problem: this function is discontinuous, and its derivative is uniformly zero wherever it is defined. To avoid the zeroing of gradients during the backward pass, we use a surrogate gradient method (Neftci et al., 2019), whereby the gradient of QReLU is approximated with the gradient of a normal ReLU during the backward pass:
(4) 
This is not the only way to approximate the gradient of a stepwise function in a meaningful way, and closer approximations are certainly possible; however, we found that this linear approximation works sufficiently well for our purposes.
In this work, we apply QReLUs in combination with the SynOp loss term illustrated in the previous section, but quantization on activations could be independently used for a more accurate training of spiking networks. We note that quantizationaware training in different forms has been used before (Guo, 2018; Hubara et al., 2017)
, but its typical purpose is to sharply decrease the memory consumption of ANNs, by storing both activations and weights as lowerprecision numbers (e.g. as int8 instead of the typical float32). PyTorch recently started providing support utilities for this purpose.
^{1}^{1}1https://pytorch.org/docs/stable/quantization.html2.2 Spiking network simulations with Sinabs
After training, we tested our trained weights on spiking network simulations. Unlike tests done on analog networks, these are timedependent simulations, which fully account for the time dynamics of the input spike trains, and closely mimic the behaviour of a neuromorphic hardware implementation, like DynapCNN (Liu et al., 2019). Our simulations are written using the Sinabs Python library^{2}^{2}2https://aictx.gitlab.io/sinabs/, which uses nonleaky integrateandfire neurons with a linear response function. The subthreshold neuron dynamics of the nonleaky integrate and fire neurons are described as follows:
(5)  
(6) 
where is the membrane potential of the neuron, is a constant, is the synaptic input current, is a constant input current term, is the synaptic weight matrix and
is a vector of input spike trains. For the results presented in this paper, we assume
without any loss of generality. Upon reaching a spiking threshold the neuron’s membrane potential is reduced by a value (not reset to zero).As a result of the above, between times and , for a total, slowly varying, input current , the neurons generate a number of spikes given by the following equation:
(7) 
In order to simulate the equivalent SNN model on Sinabs, the ANN’s pretrained weights are directly transferred to the equivalent SNN.
2.3 Digit recognition on DVS recordings
2.3.1 Task and Dataset
As a benchmark to assess the performance of the above training methods, we used an image recognition task on real data recorded by a Dynamic Vision Sensor (DVS). Given a spike train generated by the DVS, our spiking networks identify the class to which the object belongs—corresponding to the fastestfiring neuron in the output layer. For this task, we used the MNISTDVS dataset at scale 16 (SerranoGotarredona and LinaresBarranco, 2015; Liu et al., 2016), a collection of DVS recordings where digits from the classic MNIST dataset (LeCun et al., 1998) are shown to the DVS camera as they move on a screen. We divided all recordings in chunks, each containing 3000 spikes. During training and testing on the analog convolutional network, each chunk was shown to the network as a single accumulated frame. During testing on the spiking network simulation, the corresponding spike trains were presented to the network with 1 ms time resolution. This value was chosen to enable reasonable simulation times but could be lowered if needed. (Figure 1AC), to simulate the realtime event transmission between the DVS and a neuromorphic chip. The network state was reset between the presentation of a data chunk and the next. The polarity of events was ignored. Of the original 10000 recordings (1000 per digit from zero to nine), we set aside as test set.
2.3.2 Network architecture
In order to solve the task mentioned above, we used a simple convolutional neural network, with three 2D convolutional layers (3x3 filters), each followed by an average pooling layer (2x2 filters) and a rectified linear unit. The choice of average pooling is due to the difficulties of implementing max pooling in spiking networks
Rueckauer et al. (2017). The last layer is a linear (fully connected) layer, which outputs the class predictions (Figure 1D). We used a crossentropy loss function to evaluate the model predictions and optimized the network weights using the Adam optimizer (Kingma and Ba, 2014) with a learning rate of . Bias parameters were deactivated everywhere in the network. A 50% dropout was used just before the fully connected layer at training time. The network was implemented using PyTorch (Paszke et al., 2017).2.4 Object recognition on CIFAR10
2.4.1 Task and Dataset
In order to validate the approach on a dataset with higher complexity than MNIST, we also benchmarked our work on CIFAR10 (Krizhevsky et al., 2009), a visual object classification task. The input images were augmented with random crop and horizontal flip, and then normalized to . A 20% dropout rate was applied to the input layer to further augment the input data.
For the experimental results on this dataset, we directly injected the image pixel analog value to the first layer of convolutions as input current in each simulation time step for timesteps. The magnitude of the current was scaled down by the same value , in order to have an accumulated current, over the whole simulation, equal to the analog input value. The Sinabs simulations were run for time steps, obtaining SynOps and accuracy values. The network state was reset between the presentation of an image and the next.
2.4.2 Network architecture
In order to solve the task mentioned above, we used an AllConvNet (Springenberg et al., 2014), a 9layer convolutional network, without bias terms, which has 1.9M parameters in total. The ReLU layers in the model, including the last output layer, were replaced with QReLUs. All the convolutional layers in this network are followed by a dropout layer with a rate of 10%, which not only prevents overfitting, but also compensates the SNN’s discrete representations of analog values. As illustrated in Springenberg et al. (2014)
, training lasts 350 epochs, and the learning rate is initialized at
and scaled down by a factor of 10 at epochs [200, 250, 300]. We use the Adam optimizer with weight decay of . Note that the model was trained without ReLU on the last output layer, since it is harder to train the classification layer when the outputs are only positive, while the classification accuracy was tested with ReLU on the output layer, in order to have an equivalent network to the spiking model.2.4.3 SynOps Optimization
Before training the network with QReLU activations, the network was first trained with ReLU to get an initial set of parameters. The network with QReLU was then initialized with the scaled parameters (scaling up by on the first layer). The scaling factor was chosen to initialize the network in a state where enough information is propagated through layers so that the network performs reasonably well. Consequently, the weights of the last weighted layer were scaled by , in order to adapt the classification loss back to its original range.
During testing, we measured the ANN and SNN performance in terms of their accuracy and SynOps, and found a mismatch of SynOps between training and testing. There are two main reasons: 1) The output of a dropout layer (with a dropout rate ) is always scaled down by to compensate the dropped out activations, however the mismatch could be large after a sequence of dropout layers. 2) Due to discrete spike events operated in the network where the order (not only the count) of the spikes matters, the mismatch occurs between the spike countbased analog activation and the actual spiking ones.
To compensate for this mismatch, for all the trained models we tested the performance with both and scaledup first layer weights. Lastly, we optimized the QReLUbased model with the objective of minimizing the classification error given a target SynOps. We trained 30 models with lower and lower target SynOps, and each model was initialized with the trained weights of the previous one.
3 Results
3.1 The SynOp loss term leads to a reduction in network activity on DVS data
We used three methods to reduce the activity of the network, in a way that yields energy savings. First, as a baseline, we trained a traditional CNN using a crossentropy loss function, and rescaled down the weights of its first layer. This is equivalent to rescaling the input values, and has the effect of proportionally reducing the activity in all subsequent layers of the network. Second, we introduced an additional term in the loss function, the SynOp loss, which directly pushes the estimated number of SynOps to a given value. We trained CNN models, each with a different target number of synaptic operations, independently of each other. Furthermore, excessive reduction of the SynOps leads to the silencing of certain neurons, and other discretization errors, causing an immediate drop in accuracy. To account for this, as a third method, we jointly use the SynOp loss term and quantizationaware training.
We tested our training methods on a realworld use case of spiking neural networks. Dynamic Vision Sensors (DVS) are used in neuromorphic engineering as verylowpower sources of visual information, and are a natural data source for SNNs simulated on neuromorphic hardware. We transferred the weights learned with the methods described above onto a spiking network simulation, and used it to identify the digits presented to the DVS in the MNISTDVS dataset.
Our results show that adding a requirement on the number of synaptic operations to the loss yields better results in terms of accuracy compared to rescaling weights (Figure 2, orange). Using the SynOp loss together with quantization during training outperforms the simpler methods, allowing for further reduction of the SynOps value with smaller losses in accuracy (Figure 2, blue).
Among the models trained in this way, we selected one with a good balance between energy consumption and accuracy, and used it for a direct comparison with the baseline (that is, weights from an ANN without quantization and no additional loss terms). The second and third panels of figure 2 graphically show the large decrease in the number of synaptic operations required by each layer of our model, and the very small reduction in performance. This particular model brings accuracy down from 96.3% to 95.0%, while reducing the number of synaptic operations from 3.86M to 0.63M, an 84% reduction of the SynOprelated energy consumption.
3.2 The SynOp loss leads to a lower operations count compared to ANNs on CIFAR10
SNNs are a natural way of working with DVS events, having advantages over ANNs in eventdriven processing. However, it is also interesting to highlight the benefits of using SNNs over ANNs in conventional nonspiking computer vision tasks, e.g. CIFAR10, where SNNs can still offer advantages in power consumption. As stated in Section 2.4 we have trained the network with two approaches: 1) conventional ANN training plus weight scaling as the baseline; 2) further training with QReLU and SynOp Loss for performance optimization.
3.2.1 Weight Scaling
We first trained the analog AllConvNet on CIFAR10, attaining an accuracy of 91.37% and a MAC of 306M (red star in Figure 3
). Then, we transferred the trained weights directly on the equivalent SNN and scaled the weights of its first layer to manipulate the overall activity level. This is shown by the light blue crosses in Figure
3: as the SynOp count grows, so does the accuracy. However, the SynOps are around 10 times to the MAC of ANN when the accuracy reaches an acceptable rate of 90.7%. To improve on this result, we finetuned this training by adding quantization and the SynOp loss.A faster way to measure the same quantities is by testing the analog model, with ReLU layers all replaced with QReLU, and count the activation levels instead of the Sinabs spike counts. Estimations based on quantized CNNs are shown as red crosses in Figure 3. The performance on accuracy and SynOps of the analog network and its spiking equivalent are well aligned, showing that quantized activations are a good proxy for the firing rates of the simulated SNN, at least in this regime.
3.2.2 SynOp loss optimization
We further finetuned one of the weightscaled models obtained above, with the addition of quantizationaware training and the SynOp loss. Figure 3 also shows the classification accuracy and SynOps for both quantizedanalog and spiking models (blue and green dots respectively) trained with this method.
Multiple SNN test trials achieve better accuracy than the original ANN model (red star, 91.37%), thanks to the further training with QReLU. As the SynOp goes down, the accuracy stays above the original ANN model until 91.43% when SynOps are at 277M (see one of the green ‘+’ in Figure 3). Note that, the SNN has outperformed ANN both on accuracy and operations count, where the number of MAC in the original ANN is 306M. As another good example of accuracySynOp tradeoff (90.37% at 127M), our model could perform reasonably well, above 90%, by reducing 58% (SynMAC ratio is 0.42) of computing operations from the original ANN. Therefore, running the SNN model on neuromorphic hardware will benefit on energy efficiency not only from SynOp’s lower computation cost but also from the significant reduction on operation counts. Additionally, the plot shows how this method outperforms weight scaling in terms of operation counts by roughly a factor of 10 for all accuracy values.
As far as we know, our converted SNN model from the AllConvNet reached the stateoftheart accuracy at 91.75% among SNN models (see detailed comparison in Table 1 and Figure 3). In addition, our model is the smallest, at 1.9M parameters, while the BinaryConnect model (Rueckauer et al., 2017) is 7 times larger in size and WeightNorm, consisting of a VGG16 (Sengupta et al., 2019), is 8fold in size. Although achieving the best accuracy requires a SynOp of 2,179M, this can easily be reduced by 27% by giving up 0.02% in accuracy, see the two green ‘+’ on the topright of Figure 3. Comparing to the result from Sengupta et al. (2019) (purple triangle on the right of Figure 3), our model achieves 91.47% in accuracy at 368M SynOps, thus only loses 0.08% in accuracy but saves 41% of SynOps and energy. Thanks to the optimization of the SynOp loss, the number of SynOps is continuously pushed down while keeping an appropriate accuracy, e.g. 85.71% at a SynOp of 64M. This result not only outperforms most of the early attempts of SNN models for the CIFAR10 task (Panda and Roy, 2016; Cao et al., 2015; Hunsberger and Eliasmith, 2015), but also brings down the SynOps to only 1/5 of the MAC and saves 86% energy compared to Rueckauer et al. (2017).
In a breif summary, 1) the energyaware training strategy pushes down the SynOps 10 times compared to its weight scaling baseline; 2) the QReLUtrained SNN achieves the stateoftheart accuracy in CIFAR10 task; and 3) the tradeoff performances between accuracy and energy show a significant save in computation cost/energy comparing to exisitng SNN models and the equivalent nonspiking CNN.
SNN Models  Net Architecture  Best Accuracy  AccuracySynOps tradeoff  
N. par.  MAC  Acc.  SynOps 

Acc.  SynOps 


BinaryConnect  14M  616M  90.85  N/A  N/A  84.87  460M  0.75  
WeightNorm  15M  313M  91.55  618M  1.98  91.55  618M  1.98  
Ours  1.9M  306M 






4 Discussion and Conclusion
We presented two techniques that significantly improve the energy requirements of machine learning models that run on neuromorphic hardware, while maintaining similar performances.
The first improvement consisted in optimising the energy expenditure by directly adding it to the loss function during training. This method encourages smaller activations in all neurons, which is not in itself an issue in analog models, but can lead to discretization errors, due to the lower firing rates, once the weights are transferred to a spiking network. To solve this problem, we introduced the second improvement; quantizationaware training, whereby the network activity is quantized at each layer, i.e. only integer activations are allowed. Discretising the network’s activity would normally reduce all gradients to zero: we showed that this can be solved by substituting the true gradient with a surrogate.
Applying these two methods together, we achieved an up to tenfold drop in the number of synaptic operations and the consequent energy consumption in the DVSMNIST task, with only a minor (12%) loss in performance. To demonstrate scalability of this approach, we also show that, as the network grows bigger to solve a much more complex task of CIFAR10 image classification, the SynOps are reduced to 42% of the MAC, while losing 1% of accuracy (90.37% at 127M). The accuracyenergy tradeoff can be flexibly tuned at training time.
Our work emphasizes the fact that each layer’s activity is weighted differently in its contribution to synaptic activity depending on the layer’s fanout. Consequently, the learning algorithm could potentially converge to a different set of weights than if one were to simply perform a L1 or L2 cost optimization (Neil et al., 2016) on the total activity of the network.
While training based on static frames is not the optimal approach to leverage all the benefits of spike based computation, it enables fast training with the use of stateoftheart deep learning tools. In addition, the hybrid strategy to train SNNs based on a target power metric is unique to SNNs. Conversely, optimizing the energy requirement of an ANN/CNN requires modification of the network architecture itself, which can require large amounts of computational resources (Cai et al., 2018). In this work we demonstrated that we can train an SNN to a target energy level without the need to alter the network hyper parameters. The quantization and SynOpbased optimization used in this paper can potentially be applied, beyond the method illustrated here, in more general contexts such as algorithms based on backpropagation through time to reduce power usage.
Such a reduction in power usage can make a large difference when the model is ran on a mobile, batterypowered, neuromorphic device, with potential for a significant impact in the industrial applications.
Author Contributions
SS designed research; QL and SS contributed to the methods; MS, QL, and MB contributed code and performed experiments; all authors wrote the paper.
Funding
This work is supported in part by H2020 ECSEL grant TEMPO (826655) and by aiCTX AG. The authors performed this work as part of their duties at aiCTX AG.
Acknowledgments
The authors would like to thank Mr. Felix Bauer, Mr. Ole Richter, Dr. Dylan Muir and Dr. Ning Qiao for their support and feedback on this work.
Data and Code Availability
The thirdparty datasets used in this study are available from their respective authors, cited in the main text. The Python/PyTorch code used for training and analysis is publicly available at gitlab.com/aiCTX/synoploss. Reuse and feedback are encouraged, within the terms of the license provided.
References

Bohte et al. (2002)
Bohte, S. M., Kok, J. N., and La Poutre, H. (2002).
Errorbackpropagation in temporally encoded networks of spiking neurons.
Neurocomputing 48, 17–37  Cai et al. (2018) Cai, H., Zhu, L., and Han, S. (2018). Proxylessnas: Direct neural architecture search on target task and hardware. arXiv preprint arXiv:1812.00332
 Cao et al. (2015) Cao, Y., Chen, Y., and Khosla, D. (2015). Spiking deep convolutional neural networks for energyefficient object recognition. International Journal of Computer Vision 113, 54–66
 Davies et al. (2018) Davies, M., Srinivasa, N., Lin, T.H., Chinya, G., Cao, Y., Choday, S. H., et al. (2018). Loihi: A neuromorphic manycore processor with onchip learning. IEEE Micro 38, 82–99

Diehl et al. (2015)
Diehl, P. U., Neil, D., Binas, J., Cook, M., Liu, S.C., and Pfeiffer, M.
(2015).
Fastclassifying, highaccuracy spiking deep networks through weight and threshold balancing.
In 2015 International Joint Conference on Neural Networks (IJCNN) (IEEE), 1–8  Esser et al. (2016) Esser, S. K., Merolla, P. A., Arthur, J. V., Cassidy, A. S., Appuswamy, R., Andreopoulos, A., et al. (2016). Convolutional networks for fast energyefficient neuromorphic computing. Proc. Nat. Acad. Sci. USA 113, 11441–11446
 Furber (2016) Furber, S. (2016). Largescale neuromorphic computing systems. Journal of neural engineering 13, 051001
 Furber et al. (2014) Furber, S. B., Galluppi, F., Temple, S., and Plana, L. A. (2014). The spinnaker project. Proceedings of the IEEE 102, 652–665
 Guo (2018) Guo, Y. (2018). A survey on methods and theories of quantized neural networks. arXiv preprint arXiv:1808.04752
 Hubara et al. (2017) Hubara, I., Courbariaux, M., Soudry, D., ElYaniv, R., and Bengio, Y. (2017). Quantized neural networks: Training neural networks with low precision weights and activations. The Journal of Machine Learning Research 18, 6869–6898
 Hunsberger and Eliasmith (2015) Hunsberger, E. and Eliasmith, C. (2015). Spiking deep networks with LIF neurons. arXiv preprint arXiv:1510.08829
 Indiveri et al. (2011) Indiveri, G., LinaresBarranco, B., Hamilton, T. J., Van Schaik, A., EtienneCummings, R., Delbruck, T., et al. (2011). Neuromorphic silicon neuron circuits. Frontiers in neuroscience 5, 73
 Indiveri and Sandamirskaya (2019) Indiveri, G. and Sandamirskaya, Y. (2019). The importance of space and time for signal processing in neuromorphic agents: The challenge of developing lowpower, autonomous agents that interact with the environment. IEEE Signal Processing Magazine 36, 16–28
 Kingma and Ba (2014) Kingma, D. P. and Ba, J. (2014). Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980
 Krizhevsky et al. (2009) Krizhevsky, A., Hinton, G., et al. (2009). Learning multiple layers of features from tiny images. Tech. rep., Citeseer

LeCun et al. (1998)
LeCun, Y., Cortes, C., and Burges, C. J. (1998).
The mnist database of handwritten digits, 1998.
URL http://yann.lecun.com/exdb/mnist 10, 34  Liu et al. (2016) Liu, Q., PinedaGarcía, G., Stromatias, E., SerranoGotarredona, T., and Furber, S. B. (2016). Benchmarking spikebased visual recognition: a dataset and evaluation. Frontiers in neuroscience 10, 496

Liu et al. (2019)
Liu, Q., Richter, O., Nielsen, C., Sheik, S., Indiveri, G., and Qiao, N.
(2019).
Live demonstration: Face recognition on an ultralow power eventdriven convolutional neural network asic.
InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops
. 0–0  Marblestone et al. (2016) Marblestone, A. H., Wayne, G., and Kording, K. P. (2016). Toward an integration of deep learning and neuroscience. Frontiers in computational neuroscience 10, 94
 Merolla et al. (2014) Merolla, P. A., Arthur, J. V., AlvarezIcaza, R., Cassidy, A. S., Sawada, J., Akopyan, F., et al. (2014). A million spikingneuron integrated circuit with a scalable communication network and interface. Science 345, 668–673. doi:10.1126/science.1254642
 Molchanov et al. (2016) Molchanov, P., Tyree, S., Karras, T., Aila, T., and Kautz, J. (2016). Pruning convolutional neural networks for resource efficient inference. arXiv preprint arXiv:1611.06440
 Mostafa (2017) Mostafa, H. (2017). Supervised learning based on temporal coding in spiking neural networks. IEEE transactions on neural networks and learning systems 29, 3227–3235
 Neftci et al. (2019) Neftci, E. O., Mostafa, H., and Zenke, F. (2019). Surrogate gradient learning in spiking neural networks. arXiv preprint arXiv:1901.09948
 Neil et al. (2016) Neil, D., Pfeiffer, M., and Liu, S.C. (2016). Learning to be efficient: Algorithms for training lowlatency, lowcompute deep spiking neural networks. In ACM Symposium on Applied Computing (Proceedings of the 31st Annual ACM Symposium on Applied Computing)
 Nicola and Clopath (2017) Nicola, W. and Clopath, C. (2017). Supervised learning in spiking neural networks with force training. Nature communications 8, 2208
 Panda and Roy (2016) Panda, P. and Roy, K. (2016). Unsupervised regenerative learning of hierarchical features in spiking deep networks for object recognition. In 2016 International Joint Conference on Neural Networks (IJCNN) (IEEE), 299–306
 Paszke et al. (2017) Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., et al. (2017). Automatic differentiation in PyTorch. In NIPS Autodiff Workshop
 Richards et al. (2019) Richards, B. A., Lillicrap, T. P., Beaudoin, P., Bengio, Y., Bogacz, R., Christensen, A., et al. (2019). A deep learning framework for neuroscience. Nature neuroscience 22, 1761–1770
 Rueckauer et al. (2017) Rueckauer, B., Lungu, I.A., Hu, Y., Pfeiffer, M., and Liu, S.C. (2017). Conversion of continuousvalued deep networks to efficient eventdriven networks for image classification. Frontiers in neuroscience 11, 682
 Sengupta et al. (2019) Sengupta, A., Ye, Y., Wang, R., Liu, C., and Roy, K. (2019). Going deeper in spiking neural networks: VGG and residual architectures. Frontiers in neuroscience 13
 SerranoGotarredona and LinaresBarranco (2015) SerranoGotarredona, T. and LinaresBarranco, B. (2015). Pokerdvs and mnistdvs. their history, how they were made, and other details. Frontiers in neuroscience 9, 481
 Shrestha and Orchard (2018) Shrestha, S. B. and Orchard, G. (2018). Slayer: Spike layer error reassignment in time. In Advances in Neural Information Processing Systems. 1412–1421
 Springenberg et al. (2014) Springenberg, J. T., Dosovitskiy, A., Brox, T., and Riedmiller, M. (2014). Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806
 Strubell et al. (2019) Strubell, E., Ganesh, A., and McCallum, A. (2019). Energy and policy considerations for deep learning in nlp. arXiv preprint arXiv:1906.02243
 Thakur et al. (2018) Thakur, C. S. T., Molin, J., Cauwenberghs, G., Indiveri, G., Kumar, K., Qiao, N., et al. (2018). Largescale neuromorphic spiking array processors: A quest to mimic the brain. Frontiers in neuroscience 12, 891
 Yang et al. (2017) Yang, T.J., Chen, Y.H., and Sze, V. (2017). Designing energyefficient convolutional neural networks using energyaware pruning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5687–5695