Introduction
Spiking neural networks, a subcategory of braininspired computing models, use spatiotemporal dynamics to mimic neural behaviors and binary spike signals to communicate between units. Benefit from the eventdriven processing paradigm (computation occurs only when the unit receives spike event), SNNs can be efficiently implemented on specialized neuromorphic hardware for powerefficient processing, such as SpiNNaker [Khan et al.2008], TrueNorth [Merolla et al.2014], and Loihi [Davies et al.2018].
As well known, the powerful error backpropagation (BP) algorithm and larger and larger model size enabled by ANNoriented programming frameworks (e.g. Tensorflow, Pytorch) has boosted the wide applications of ANNs in recent years. However, the rich spatiotemporal dynamics and eventdriven paradigm make SNNs much different from conventional ANNs. Till now SNNs have not yet demonstrated comparable performance to ANNs due to the lack of effective learning algorithms and efficient programming frameworks, which greatly limit the network scale and application spectrum
[Tavanaei et al.2018].In terms of learning algorithms, three challenges exist for training largescale SNNs. First, the complex neural dynamics in both spatial and temporal domains make BP obscure. Specifically, the neural activities not only propagate layer by layer in spatial domain, but also affect the states along the temporal direction, which is more complicated than typical ANNs. Second, the eventdriven spiking activity is discrete and nondifferentiable, which impedes the BP implementation based on gradient descent. Third, SNNs are more sensitive to parameter configuration because of binary spike representation. Especially in the training phase, we should simultaneously ensure timely response for presynaptic stimulus and avoid too many spikes that will probably degrade the neuronal selectivity. Although previous work has proposed many techniques to adjust firing rate, such as modelbased normalization
[Diehl et al.2015] and spikebased normalization [Sengupta et al.2018], all of them are specialized for ANNstoSNNs conversion learning, not for the direct training of SNNs as considered in this paper.In terms of programming frameworks, we lack suitable platforms to support the training of deep SNNs. Although there exist several platforms serving for simulating biological features of SNNs with varied abstraction levels [Brette et al.2007, Carnevale and Hines2006, Hazan et al.2018], little work is designed for training deep SNNs. Researchers have to build applicationoriented models from scratch and the training speed is usually slow. On the other hand, emerging ANNoriented frameworks can provide much better efficiency, especially for large models, hence a natural idea is to map the SNN training onto these frameworks. However, these frameworks are designed for aforementioned ANN algorithms so that they cannot be directly applied to SNNs because of the neural dynamic of spiking neurons.
In this paper, we propose a fullstack solution towards faster, larger, and better SNNs from both algorithm and programming aspects. We draw inspirations from the recent work [Wu et al.2018]
, which proposes the spatiotemporal back propagation (STBP) method for the direct training of SNNs, and significantly extend it to much deeper structure, larger dataset, and better performance. We first propose the NeuNorm method to balance neural selectivity and increase the performance. Then we improve the rate coding to fasten the convergence and convert the LIF model into an explicitly iterative version to make it compatible with a machine learning framework (Pytorch). As a results, compared to the running time on Matlab, we achieve tens of times speedup that enables the direct training of deep SNNs. The best accuracy on neuromorphic datasets (NMNIST and DVSCIFAR10) and comparable accuracy as existing ANNs and pretrained SNNs on nonspiking datasets (CIFAR10) are demonstrated. This work enables the exploration of direct training of highperformance SNNs and facilitates the SNN applications via compatible programming within the widely used machine learning (ML) framework.
Related work
We aims to direct training of deep SNNs. To this end, we mainly make improvements on two aspects: (1) learning algorithm design; (2) training acceleration optimization. We chiefly overview the related works in recent years.
Learning algorithm for deep SNNs
. There exist three ways for SNN learning: i) unsupervised learning such as spike timing dependent plasticity (STDP); ii) indirect supervised learning such as ANNstoSNNs conversion; iii) direct supervised learning such as gradient descentbased back propagation. However, by far most of them limited to very shallow structure (network layer less than 4) or toy small dataset (e.g. MNIST, Iris), and little work points to direct training deep SNNs due to their own challenges.
STDP is biologically plausible but the lack of global information hinders the convergence of large models, especially on complex datasets [Timothée and Thorpe2007, Diehl and Matthew2015, Tavanaei and Maida2017].
The ANNstoSNNs conversion [Diehl et al.2015, Cao, Chen, and Khosla2015, Sengupta et al.2018, Hu et al.2018]
is currently the most successful method to model largescale SNNs. They first train nonspiking ANNs and convert it into a spiking version. However, this indirect training help little on revealing how SNNs learn since it only implements the inference phase in SNN format (the training is in ANN format). Besides, it adds many constraints onto the pretrained ANN models, such as no bias term, only average pooling, only ReLU activation function, etc. With the network deepens, such SNNs have to run unacceptable simulation time (1001000 time steps) to obtain good performance.
The direct supervised learning method trains SNNs without conversion. They are mainly based on the conventional gradient descent. Different from the previous spatial backpropagation [Lee, Delbruck, and Pfeiffer2016, Jin, Li, and Zhang2018], [Wu et al.2018] proposed the first backpropagation in both spatial and temporal domains to direct train SNNs, which achieved stateoftheart accuracy on MNIST and NMNIST datasets. However, the learning algorithm is not optimized well and the slow simulation hinders the exploration onto deeper structures.
Normalization. Many normalization techniques have been proposed to improve the convergence [Ioffe and Szegedy2015, Wu and He2018]. Although these methods have achieved great success in ANNs, they are not suitable for SNNs due to the complex neural dynamics and binary spiking representation of SNNs. Furthermore, in terms of hardware implementation, batchbased normalization techniques essentially incur lateral operations across different stimuluses which are not compatible with existing neuromorphic platforms [Merolla et al.2014, Davies et al.2018]. Recently, several normalization methods (e.g. modelbased normalization [Diehl et al.2015], databased normalization [Diehl et al.2015], spikebased normalization [Sengupta et al.2018]) have been proposed to improve SNN performance. However, such methods are specifically designed for the indirect training with ANNstoSNNs conversion, which didn’t show convincing effectiveness for direct training of SNNs targeted by this work.
SNN programming frameworks. There exist several programming frameworks for SNN modeling, but they have different aims. NEURON [Carnevale and Hines2006] and Genesis [Bower and Beeman1998]
mainly focus on the biological realistic simulation from neuron functionality to synapse reaction, which are more beneficial for the neuroscience community. BRIAN2
[Goodman and Brette2009] and NEST [Gewaltig and Diesmann2007] target the simulation of larger scale SNNs with many biological features, but not designed for the direct supervised learning for high performance discussed in this work. BindsNET [Hazan et al.2018] is the first reported framework towards combining SNNs and practical applications. However, to our knowledge, the support for direct training of deep SNNs is still under development. Furthermore, according to the statistics from ModelDB ^{1}^{1}1ModelDB is an open website for storing and sharing computational neuroscience models., in most cases researchers even simply program in generalpurpose language, such as C/C++ and Matlab, for better flexibility. Besides the different aims, programming from scratch on these frameworks is user unfriendly and the tremendous execution time impedes the development of largescale SNNs. Instead of developing a new framework, we provide a new solution to establish SNNs by virtue of mature ML frameworks which have demonstrated easytouse interface and fast running speed.Approach
In this section, we first convert the LIF neuron model into an easytoprogram version with the format of explicit iteration. Then we propose the NeuNorm method to adjust the neuronal selectivity for improving model performance. Furthermore, we optimize the rate coding scheme from encoding and decoding aspects for faster response. Finally, we describe the whole training and provide the pseudo codes for Pytorch.
Explicitly iterative LIF model
LIF model is commonly used to describe the behavior of neuronal activities, including the update of membrane potential and spike firing. Specifically, it is governed by
(1)  
(2) 
where is the membrane potential, is a time constant, is presynaptic inputs, and is a given fire threshold. Eq. (1)(2) describe the behaviors of spiking neuron in a way of updatingfiringresetting mechanism (see Fig. 1ab). When the membrane potential reaches a given threshold, the neuron will fire a spike and is reset to ; otherwise, the neuron receives presynapse stimulus and updates its membrane potential according to Eq. (1).
The above differential expressions described in continuous domain are widely used for biological simulations, but it is too implicit for the implementations on mainstream ML frameworks (e.g. Tensorflow, Pytorch) since the embedded automatic differentiation mechanism executes codes in a discrete and sequential way [Paszke et al.2017]. Therefore, it motivates us to convert Eq. (1)(2) into an explicitly iterative version for ensuring computational tractability. To this end, we first use Euler method to solve the firstorder differential equation of Eq. (1), and obtain an iterative expression
(3) 
We denote the factor as a decay factor and expand the presynaptic input to a linearing summation . The subscript indicates the index of presynapse and denotes the corresponding presynaptic spike which is binary (0 or 1). By incorporating the scaling effect of into synapse weights , we have
(4) 
Next we add the firingandresetting mechanism to Eq. (4). By assuming as usual, we get the final update equations as below
(5)  
(6) 
where and denote the th layer and its neuron number, respectively, is the synaptic weight from the th neuron in prelayer () to the th neuron in the postlayer (). is the step function, which satisfies when , otherwise . Eq (5)(6) reveal that firing activities of will affect the next state via the updatingfiringresetting mechanism (see Fig. 1c). If the neuron emits a spike at time step , the membrane potential at step will clear its decay component via the term , and vice versa.
Neuron normalization (NeuNorm)
Considering the signal communication between two convolutional layers, the neuron at the location of the th feature map (FM) in the th layer receives convolutional inputs , and updates its membrane potential by
(7)  
(8) 
where denotes the weight kernel between the th FM in layer and the th FM in layer , denotes the convolution operation, and denotes the local receptive filed of location .
An inevitable problem for training SNNs is to balance the whole firing rate because of the binary spike communication [Diehl et al.2015, Wu et al.2018]. That is to say, we need to ensure timely and selective response to presynaptic stimulus but avoid too many spikes that probably harm the effective information representation. In this sense, it requires that the strength of stimulus maintains in a relatively stable range to avoid activity vanishing or explosion as network deepens.
Observing that neurons respond to stimulus from different FMs, it motivates us to propose an auxiliary neuron method for normalizing the input strength at the same spatial locations of different FMs (see Fig. 2). The update of auxiliary neuron status is described by
(9) 
where denotes the decay factor, denotes the constant scaling factor, and denotes the number of FMs in the th layer. In this way, Eq.(9) calculates the average response to the input firing rate with momentum term . For simplicity we set .
Next we suppose that the auxiliary neuron receives the lateral inputs from the same layer, and transmits signals to control the strength of stimulus emitted to the next layer through trainable weights , which has the same size as the FM. Hence, the inputs of neurons in the next layer can be modified by
(10) 
Inferring from Eq. (Neuron normalization (NeuNorm)
), NeuNorm method essentially normalize the neuronal activity by using the input statistics (moving average firing rate). The basic operation is similar with the zeromean operation in batch normalization. But NeuNorm has different purposes and different data processing ways (normalizing data along the channel dimension rather than the batch dimension). This difference brings several benefits which may be more suitable for SNNs. Firstly, NeuNorm is compatible with neuron model (LIF) without additional operations and friendly for neuromorphic implementation. Besides that, NeuNorm is more bioplausible. Indeed, the biological visual pathways are not independent. The response of retina cells for a particular location in an image is normalized across adjacent cell responses in the same layer
[Carandini and Heeger2012, Mante, Bonin, and Carandini2008]. And inspired by these founding, many similar techniques for exploiting channel features have been successfully applied to visual recognition, such as group normalization (GN) [Wu and He2018], DOG [Daral2005] and HOG [Daral2005]. For example, GN divides FMs of the same layer into several groups and normalizes the features within each group also along channel dimension.Encoding and decoding schemes
To handle various stimulus patterns, SNNs often use abundant coding methods to process the input stimulus. For visual recognition tasks, a popular coding scheme is rate coding. On the input side, the realvalued images are converted into a spike train whose fire rate is proportional to the pixel intensity. The spike sampling is probabilistic, such as following a Bernoulli distribution or a Poisson distribution. On the output side, it counts the firing rate of each neuron in the last layer over given time windows to determine network output. However, the conventional rate coding suffers from long simulation time to reach good performance. To solve this problem, we take a simpler coding scheme to accelerate the simulation without compromising the performance.
Encoding. One requirement for long simulation time is to reduce the sampling error when converting realvalue inputs to spike signals. Specifically, given the time window , a neuron can represent information by levels of firing rate, i.e. (normalized). Obviously, the rate coding requires sufficient long window to achieve satisfactory precision. To address this issue, we assign the first layer as an encoding layer and allow it to receive both spike and nonspike signals (compatible with various datasets). In other words, neurons in the encoding layer can process the evendriven spike train from neuromorphic dataset naturally, and can also convert realvalued signals from nonspiking dataset into spike train with enough precision. In this way, the precision is remained to a great extent without depending much on the simulation time.
Decoding. Another requirement for long time window is the representation precision of network output. To alleviate it, we adopt a voting strategy [Diehl and Matthew2015] to decode the network output. We configure the last layer as a voting layer consisting of several neuron populations and each output class is represented by one population. In this way, the burden of representation precision of each neuron in the temporal domain (firing rate in a given time windows) is transferred much to the spatial domain (neuron group coding). Thus, the requirement for long simulation time is significantly reduced. For initialization, we randomly assign a label to each neuron; while during training, the classification result is determined by counting the voting response of all the populations.
In a nutshell, we reduce the demand for longterm window from above two aspects, which in some sense extends the representation capability of SNNs in both input and output sides. We found similar coding scheme is also leveraged by previous work on ANN compression [Tang, Hua, and Wang2017, Hubara, Soudry, and Ran2016]. It implies that, regardless of the internal lower precision, maintaining the first and last layers in higher precision are important for the convergence and performance.
Overall training implementation
We define a loss function
measuring the mean square error between the averaged voting results and label vector within a given time window(11) 
where denotes the voting vector of the last layer at time step . denotes the constant voting matrix connecting each voting neuron to a specific category.
From the explicitly iterative LIF model, we can see that the spike signals not only propagate through the layerbylayer spatial domain, but also affect the neuronal states through the temporal domain. Thus, gradientbased training should consider both the derivatives in these two domains. Based on this analysis, we integrate our modified LIF model, coding scheme, and proposed NeuNorm into the STBP method [Wu et al.2018] to train our network. When calculating the derivative of loss function with respect to and in the th layer at time step , the STBP propagates the gradients from the th layer and from time step as follows
(12)  
(13) 
A critical problem in training SNNs is the nondifferentiable property of the binary spike activities. To make its gradient available, we take the rectangular function to approximate the derivative of spike activity [Wu et al.2018]. It yields
(14) 
where the width parameter determines the shape of . Theoretically, it can be easily proven that Eq. (14) satisfies
(15) 
Based on above methods, we also give a pseudo code for implementing the overall training of proposed SNNs in Pytorch, as shown in Algorithm 2.
Experiment
We test the proposed model and learning algorithm on both neuromorphic datasets (NMNIST and DVSCIFAR10) and nonspiking datasets (CIFAR10) from two aspects: (1) training acceleration; (2) application accuracy. The dataset introduction, preprocessing, training detail, and parameter configuration are summarized in Appendix.
Network structure
Tab. 1 and Tab. 2 provide the network structures for acceleration analysis and accuracy evaluation, respectively. The structure illustrations of what we call AlexNet and CIFARNet are also shown in Appendix, which are for fair comparison to pretrained SNN works with similar structure and to demonstrate comparable performance with ANNs, respectively. It is worth emphasizing that previous work on direct training of SNNs demonstrated only shallow structures (usually 24 layers), while our work for the first time can implement a direct and effective learning for largerscale SNNs (e.g. 8 layers).
Neuromorphic Dataset  

Small  128C3(Encoding)AP2128C3AP2512FCVoting 
Middle  128C3(Encoding)128C3AP2256AP21024FCVoting 
Large  128C3(Encoding)128C3AP2384C3384C3AP2 
1024FC512FCVoting  
Nonspiking Dataset  
Small  128C3(Encoding)AP2256C3AP2256FCVoting 
Middle  128C3(Encoding)AP2256C3512C3AP2512FCVoting 
Large  128C3(Encoding)256C3AP2512C3AP21024C3512C3 
1024FC512FCVoting 
Neuromorphic Dataset  

Our model  128C3(Encoding)128C3AP2128C3256C3AP2 
1024FCVoting  
Nonspiking Dataset  
AlexNet  96C3(Encoding)256C3AP2384C3AP2384C3 
256C31024FC1024FCVoting  
CIFARNet  128C3(Encoding)256C3AP2512C3AP21024C3 
512C31024FC512FCVoting 
Training acceleration
Runtime. Since Matlab is a highlevel language widely used in SNN community, we adopt it for the comparison with our Pytorch implementation. For fairness, we made several configuration restrictions, such as software version, parameter setting, etc. More details can be found in Appendix.
Fig. 3
shows the comparisons about average runtime per epoch, where batch size of 20 is used for simulation. Pytorch is able to provide tens of times acceleration on all three datasets. This improvement may be attributed to the specialized optimizations for the convolution operation in Pytorch. In contrast, currently these optimizations have not been well supported in most of existing SNN platforms. Hence building spiking model may benefit a lot from DL platform.
Network scale. If without acceleration, it is difficult to extend most existing works to large scale. Fortunately, the fast implementation in Pytorch facilitates the construction of deep SNNs. This makes it possible to investigate the influence of network size on model performance. Although it has been widely studied in ANN field, it has yet to be demonstrated on SNNs. To this end, we compare the accuracy under different network sizes, shown in Fig. 4. With the size increases, SNNs show an apparent tendency of accuracy improvement, which is consistent with ANNs.
Simulation length. SNNs need enough simulation steps to mimic neural dynamics and encode information. Given simulation length , it means that we need to repeat the inference process times to count the firing rate. So the network computation cost can be denoted as . For deep SNNs, previous work usually requires 100 even 1000 steps to reach good performance [Sengupta et al.2018], which brings huge computational cost. Fortunately, by using the proposed coding scheme in this work, the simulation length can be significantly reduced without much accuracy degradation. As shown in Fig. 5, although the longer the simulation windows leads to better results, our method just requires a small length () to achieve satisfactory results. Notably, even if extremely taking onestep simulation, it can also achieve not bad performance with significantly faster response and lower energy, which promises the application scenarios with extreme restrictions on response time and energy consumption.
Application accuracy
Neuromorphic datasets. Tab. 3 records the current stateoftheart results on NMNIST datasets. [Li2018] proposes a technique to restore the NMNIST dataset back to the static MNIST, and achieves 99.23% accuracy using nonspiking CNNs. Our model can naturally handle raw spike streams using direct training and achieves significantly better accuracy. Furthermore, by adding NeuNorm technique, the accuracy can be improved up to 99.53%.
Model  Method  Accuracy  

[Neil, Pfeiffer, and Liu2016]  LSTM  97.38%  
[Lee, Delbruck, and Pfeiffer2016]  Spiking NN  98.74%  
[Wu et al.2018]  Spiking NN  98.78%  
[Jin, Li, and Zhang2018]  Spiking NN  98.93%  
[Neil and Liu2016]  Nonspiking CNN  98.30%  
[Li2018]  Nonspiking CNN  99.23%  

without NeuNorm  99.44%  

with NeuNorm  99.53% 
Model  Method  Accuracy 

[Orchard et al.2015a]  Random Forest  31.0% 
[Lagorce et al.2017]  HOTS  27.1% 
[Sironi et al.2018]  HAT  52.4% 
[Sironi et al.2018]  GaborSNN  24.5% 
Our model  without NeuNorm  58.1% 
Our model  with NeuNorm  60.5% 
Tab. 4 further compares the accuracy on DVSCIFAR10 dataset. DVSCIFAR10 is more challenging than NMNIST due to the larger scale, and it is also more challenging than nonspiking CIFAR10 due to less samples and noisy environment (Dataset introduction in Appendix). Our model achieves the best performance with 60.5%. Moreover, experimental results indicate that NeuNorm can speed up the convergence.Without NeuNorm the required training epochs to get the best accuracy 58.1% is 157, while it’s reduced to 103 with NeuNorm.
Nonspiking dataset. Tab. 5 summarizes the results of existing stateoftheart results and our CIFARNet on CIFAR10 dataset. Prior to our work, the best direct training method only achieves 75.42% accuracy. Our model is significantly better than this result, with 15% improvement (reaching 90.53%). We also make a comparison with nonspiking ANN model and other pretrained works (ANNstoSNNs conversion) on the similar structure of AlexNet, wherein our direct training of SNNs with NeuNorm achieves slightly better accuracy.
Model  Method  Accuracy 

[Panda and Roy2016]  Spiking NN  75.42% 
[Cao, Chen, and Khosla2015]  Pretrained SNN  77.43% 
[Rueckauer et al.2017]  Pretrained SNN  90.8% 
[Sengupta et al.2018]  Pretrained SNN  87.46% 
Baseline  Nonspiking NN  90.49% 
Our model  without NeuNorm  89.83% 
Our model  with NeuNorm  90.53% 
Model  Method  Accuracy 

[Hunsberger and Eliasmith2015]  Nonspiking NN  83.72% 
[Hunsberger and Eliasmith2015]  Pretrained SNN  83.52% 
[Sengupta et al.2018]  Pretrained SNN  83.54% 
Our model  —  85.24% 
Furthermore, to visualize the differences of learning with or without NeuNorm, Fig. 6 shows the average confusion matrix of network voting results on CIFAR10 over 500 randomly selected images. Each location in the 1010 tiles is determined by the voting output and the actual labels. High values along the diagonal indicate correct recognition whereas high values anywhere else indicate confusion between two categories. It shows that using NeuNorm brings a clearer contrast (i.e. higher values along the diagonal), which implies that NeuNorm enhances the differentiation degree among the 10 classes. Combining the results from Tab. 35, it also confirms the effectiveness of proposed NeuNorm for performance improvement.
Conclusion
In this paper, we present a direct training algorithm for deeper and larger SNNs with high performance. We propose the NeuNorm to effectively normalize the neuronal activities and improve the performance. Besides that, we optimize the rate coding from encoding aspect and decoding aspect, and convert the original continuous LIF model to an explicitly iterative version for friendly Pytorch implementation. Finally, through tens of times training accelerations and larger network scale, we achieve the best accuracy on neuromorphic datasets and comparable accuracy with ANNs on nonspiking datasets. To our best knowledge, this is the first time report such high performance with direct training on SNNs. The implementation on mainstream ML framework could facilitate the SNN development.
Acknowledgments
The work was supported by the Project of NSFC (No. 61836004, 61620106010, 61621136008, 61332007, 61327902, 61876215), the National Key Research and Development Program of China (No. 2017YFA0700900), the BrainScience Special Program of Beijing (Grant Z181100001518006), the SuzhouTsinghua innovation leading program 2016SZ0102, and the Project of NSF (No. 1725447, 1730309).
References
 [Bower and Beeman1998] Bower, J. M., and Beeman, D. 1998. The Book of GENESIS. Springer New York.
 [Brette et al.2007] Brette, R.; Rudolph, M.; Carnevale, T.; Hines, M.; Beeman, D.; Bower, J. M.; Diesmann, M.; Morrison, A.; Goodman, P. H.; and Harris, F. C. 2007. Simulation of networks of spiking neurons: A review of tools and strategies. Journal of Computational Neuroscience 23(3):349–398.

[Cao, Chen, and Khosla2015]
Cao, Y.; Chen, Y.; and Khosla, D.
2015.
Spiking Deep Convolutional Neural Networks for EnergyEfficient Object Recognition
. Kluwer Academic Publishers.  [Carandini and Heeger2012] Carandini, M., and Heeger, D. J. 2012. Normalization as a canonical neural computation. Nature Reviews Neuroscience 13(1):51–62.
 [Carnevale and Hines2006] Carnevale, N. T., and Hines, M. L. 2006. The neuron book. Cambridge Univ Pr.
 [Daral2005] Daral, N. 2005. Histograms of oriented gradients for human detection. Proc Cvpr.
 [Davies et al.2018] Davies, M.; Srinivasa, N.; Lin, T. H.; Chinya, G.; Cao, Y.; Choday, S. H.; Dimou, G.; Joshi, P.; Imam, N.; and Jain, S. 2018. Loihi: A neuromorphic manycore processor with onchip learning. IEEE Micro 38(1):82–99.
 [Diehl and Matthew2015] Diehl, P. U., and Matthew, C. 2015. Unsupervised learning of digit recognition using spiketimingdependent plasticity:. Frontiers in Computational Neuroscience 9:99.

[Diehl et al.2015]
Diehl, P. U.; Neil, D.; Binas, J.; Cook, M.; Liu, S. C.; and Pfeiffer, M.
2015.
Fastclassifying, highaccuracy spiking deep networks through weight and threshold balancing.
In International Joint Conference on Neural Networks, 1–8.  [Gewaltig and Diesmann2007] Gewaltig, M. O., and Diesmann, M. 2007. Nest (neural simulation tool). Scholarpedia 2(4):1430.
 [Goodman and Brette2009] Goodman, D. F., and Brette, R. 2009. The brian simulator. Frontiers in Neuroscience 3(2):192.
 [Hazan et al.2018] Hazan, H.; Saunders, D. J.; Khan, H.; Sanghavi, D. T.; Siegelmann, H. T.; and Kozma, R. 2018. Bindsnet: A machine learningoriented spiking neural networks library in python. arXiv preprint arXiv:1806.01423.
 [Hu et al.2018] Hu, Y.; Tang, H.; Wang, Y.; and Pan, G. 2018. Spiking deep residual network. arXiv preprint arXiv:1805.01352.
 [Hubara, Soudry, and Ran2016] Hubara, I.; Soudry, D.; and Ran, E. Y. 2016. Binarized neural networks. arXiv preprint arXiv:1602.02505.
 [Hunsberger and Eliasmith2015] Hunsberger, E., and Eliasmith, C. 2015. Spiking deep networks with lif neurons. Computer Science.
 [Ioffe and Szegedy2015] Ioffe, S., and Szegedy, C. 2015. Batch normalization: Accelerating deep network training by reducing internal covariate shift. JMLR 448–456.
 [Jin, Li, and Zhang2018] Jin, Y.; Li, P.; and Zhang, W. 2018. Hybrid macro/micro level backpropagation for training deep spiking neural networks. arXiv preprint arXiv:1805.07866.
 [Khan et al.2008] Khan, M. M.; Lester, D. R.; Plana, L. A.; and Rast, A. 2008. Spinnaker: Mapping neural networks onto a massivelyparallel chip multiprocessor. In IEEE International Joint Conference on Neural Networks, 2849–2856.

[Lagorce et al.2017]
Lagorce, X.; Orchard, G.; Gallupi, F.; Shi, B. E.; and Benosman, R.
2017.
Hots: A hierarchy of eventbased timesurfaces for pattern recognition.
IEEE Transactions on Pattern Analysis and Machine Intelligence PP(99):1–1.  [Lee, Delbruck, and Pfeiffer2016] Lee, J. H.; Delbruck, T.; and Pfeiffer, M. 2016. Training deep spiking neural networks using backpropagation. Frontiers in Neuroscience 10.
 [Li et al.2017] Li, H.; Liu, H.; Ji, X.; Li, G.; and Shi, L. 2017. Cifar10dvs: An eventstream dataset for object classification. Frontiers in Neuroscience 11.
 [Li2018] Li, L. I. . Y. C. . H. 2018. Is neuromorphic mnist neuromorphic? analyzing the discriminative power of neuromorphic datasets in the time domain. arXiv preprint arXiv:1807.01013 1–23.
 [Mante, Bonin, and Carandini2008] Mante, V.; Bonin, V.; and Carandini, M. 2008. Functional mechanisms shaping lateral geniculate responses to artificial and natural stimuli. Neuron 58(4):625–638.
 [Merolla et al.2014] Merolla, P. A.; Arthur, J. V.; Alvarezicaza, R.; Cassidy, A. S.; Sawada, J.; Akopyan, F.; Jackson, B. L.; Imam, N.; Guo, C.; and Nakamura, Y. 2014. Artificial brains. a million spikingneuron integrated circuit with a scalable communication network and interface. Science 345(6197):668–673.
 [Neil and Liu2016] Neil, D., and Liu, S. C. 2016. Effective sensor fusion with eventbased sensors and deep network architectures. In IEEE International Symposium on Circuits and Systems.
 [Neil, Pfeiffer, and Liu2016] Neil, D.; Pfeiffer, M.; and Liu, S. C. 2016. Phased lstm: Accelerating recurrent network training for long or eventbased sequences. arXiv preprint arXiv:1610.09513.
 [Orchard et al.2015a] Orchard, G.; Meyer, C.; EtienneCummings, R.; and Posch, C. 2015a. Hfirst: A temporal approach to object recognition. IEEE Trans Pattern Anal Mach Intell 37(10):2028–40.
 [Orchard et al.2015b] Orchard, G.; Jayawant, A.; Cohen, G. K.; and Thakor, N. 2015b. Converting static image datasets to spiking neuromorphic datasets using saccades. Frontiers in Neuroscience 9(178).
 [Panda and Roy2016] Panda, P., and Roy, K. 2016. Unsupervised regenerative learning of hierarchical features in spiking deep networks for object recognition. In International Joint Conference on Neural Networks, 299–306.
 [Paszke et al.2017] Paszke, A.; Gross, S.; Chintala, S.; Chanan, G.; Yang, E.; DeVito, Z.; Lin, Z.; Desmaison, A.; Antiga, L.; and Lerer, A. 2017. Automatic differentiation in pytorch. NIPS.
 [Rueckauer et al.2017] Rueckauer, B.; Lungu, I. A.; Hu, Y.; Pfeiffer, M.; and Liu, S. C. 2017. Conversion of continuousvalued deep networks to efficient eventdriven networks for image classification:. Frontiers in Neuroscience 11:682.
 [Sengupta et al.2018] Sengupta, A.; Ye, Y.; Wang, R.; Liu, C.; and Roy, K. 2018. Going deeper in spiking neural networks: Vgg and residual architectures. arXiv preprint arXiv:1802.02627.
 [Sironi et al.2018] Sironi, A.; Brambilla, M.; Bourdis, N.; Lagorce, X.; and Benosman, R. 2018. Hats: Histograms of averaged time surfaces for robust eventbased object classification. arXiv preprint arXiv:1803.07913.
 [Tang, Hua, and Wang2017] Tang, W.; Hua, G.; and Wang, L. 2017. How to train a compact binary neural network with high accuracy? In AAAI, 2625–2631.
 [Tavanaei and Maida2017] Tavanaei, A., and Maida, A. S. 2017. Bioinspired spiking convolutional neural network using layerwise sparse coding and stdp learning. arXiv preprint arXiv:1611.03000.
 [Tavanaei et al.2018] Tavanaei, A.; Ghodrati, M.; Kheradpisheh, S. R.; Masquelier, T.; and Maida, A. S. 2018. Deep learning in spiking neural networks. arXiv preprint arXiv:1804.08150.
 [Timothée and Thorpe2007] Timothée, M., and Thorpe, S. J. 2007. Unsupervised learning of visual features through spike timing dependent plasticity. Plos Computational Biology 3(2):e31.
 [Wu and He2018] Wu, Y., and He, K. 2018. Group normalization. arXiv preprint arXiv:1803.08494.
 [Wu et al.2018] Wu, Y.; Deng, L.; Li, G.; Zhu, J.; and Shi, L. 2018. Spatiotemporal backpropagation for training highperformance spiking neural networks. Frontiers in Neuroscience 12.
Appendix A Supplementary material
In this supplementary material, we provide the details of our experiment, including the dataset introduction, training setting, and programing platform comparison.
Appendix B A Dataset Introduction
Neuromorphic dataset
NMNIST converts the framebased MNIST handwritten digit dataset into its DVS (dynamic vision sensor) version [Orchard et al.2015b] with event streams (in Fig.8). For each sample, DVS scans the static MNIST image along given directions and collects the generated spike train which is triggered by detecting the intensity change of pixels. Since the intensity change has two directions (increase or decrease), DVS can capture two kinds of spike events, denoted as Onevent and Offevent. Because of the relative shift of images during moving process, the pixel dimension is expanded to 3434. Overall, each sample in NMNIST is a spatiotemporal spike pattern with size of , where is the length of temporal window.
DVSCIFAR10 converts 10000 static CIFAR10 images into the format of spike trains (in Fig.8), consisting of 1000 images per class with size of for each image . Since different DVS types and different movement paths are used [Li et al.2017], the generated spike train contains imbalanced spike events and larger image resolution. We adopt different parameter configurations, shown in Tab. 7. DVSCIFAR10 has 6 times less samples than the original CIFAR10 dataset, and we randomly choose 9000 images for training and 1000 for testing.
Nonspiking dataset
CIFAR10 is widely used in ANN domain, which contains 60000 color images belonging to 10 classes with size of for each image. We divide it into 50000 training images and 10000 test images as usual.
Appendix C B Training setting
Data preprocessing
On NMNIST, we reduce the time resolution by accumulating the spike train within every 5 ms. On DVSCIFAR10, we reduce the spatial resolution by downsampling the original 128 128 size to 42
42 size (stride = 3, padding = 0), and reduce the temporal resolution by similarly accumulating the spike train within every 5 ms. On CIFAR10, as usual, we first crop and flip the original images along each RGB channel, and then rescale each image by subtracting the global mean value of pixel intensity and dividing by the resulting standard variance along each RGB channel.
Optimizer
On neuromorphic datasets, we use Adam (adaptive moment estimation) optimizer. On the nonspiking dataset, we use the stochastic gradient descent (SGD) optimizer with initial learning rate
and momentum 0.9, and we let decay to over each 40 epochs.Parameters  CIFAR10  DVSCIFAR10  NMNIST 

0.75  0.05  0.25  
1.0  0.1  0.25  
0.25  0.35  0.3  
0.9  0.9  0.9  
Dropout rate  0.5  0  0 
Max epoch  150  200  200 
Adam  = 0.9,0.999, 1 
Parameter configuration
Appendix D C Details for Acceleration experiment
A total of 10 experiments are counted for average. Considering the slow running time on the Matlab, 10 simulation steps and batch size of 20 per epoch are used. All codes run on server with i76700K CPU and GTX1060 GPU. The Pytorch version is 3.5 and the Matlab version is 2018b. Both of them enable GPU execution, but without parallelization.
It is worth noting that recently there emerge more supports on Matlab for deep learning, either official
^{2}^{2}2Since 2017 version, Matlab provides deep learning libraries. or unofficial (e.g. MatConvNet toolbox^{3}^{3}3A MATLAB toolbox designed for implementing CNNs. ). However, their highly encapsulated function modules are unfriendly for users to customize deep SNNs. Therefore, in terms of flexibility, all convolution operations we adopted are based on the officially provided operations (i.e. function).