Direct Training for Spiking Neural Networks: Faster, Larger, Better

09/16/2018 ∙ by Yujie Wu, et al. ∙ Tsinghua University 0

Spiking neural networks (SNNs) are gaining more attention as a promising way that enables energy efficient implementation on emerging neuromorphic hardware. Yet now, SNNs have not shown competitive performance compared with artificial neural networks (ANNs), due to the lack of effective learning algorithms and efficient programming frameworks. We address this issue from two aspects: (1) We propose a neuron normalization technique to adjust the neural selectivity and develop a direct learning algorithm for large-scale SNNs. (2) We present a Pytorch-based implementation method towards the training of deep SNNs by narrowing the rate coding window and converting the leaky integrate-and-fire (LIF) model into an explicitly iterative version. With this method, we are able to train large-scale SNNs with tens of times speedup. As a result, we achieve significantly better accuracy than the reported works on neuromorphic datasets (N-MNIST and DVS-CIFAR10), and comparable accuracy as existing ANNs and pre-trained SNNs on non-spiking datasets (CIFAR10). To our best knowledge, this is the first work that demonstrates direct training of large-scale SNNs with high performance, and the efficient implementation is a key step to explore the potential of SNNs.



There are no comments yet.


page 6

page 7

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.


Spiking neural networks, a sub-category of brain-inspired computing models, use spatio-temporal dynamics to mimic neural behaviors and binary spike signals to communicate between units. Benefit from the event-driven processing paradigm (computation occurs only when the unit receives spike event), SNNs can be efficiently implemented on specialized neuromorphic hardware for power-efficient processing, such as SpiNNaker [Khan et al.2008], TrueNorth [Merolla et al.2014], and Loihi [Davies et al.2018].

As well known, the powerful error backpropagation (BP) algorithm and larger and larger model size enabled by ANN-oriented programming frameworks (e.g. Tensorflow, Pytorch) has boosted the wide applications of ANNs in recent years. However, the rich spatio-temporal dynamics and event-driven paradigm make SNNs much different from conventional ANNs. Till now SNNs have not yet demonstrated comparable performance to ANNs due to the lack of effective learning algorithms and efficient programming frameworks, which greatly limit the network scale and application spectrum

[Tavanaei et al.2018].

In terms of learning algorithms, three challenges exist for training large-scale SNNs. First, the complex neural dynamics in both spatial and temporal domains make BP obscure. Specifically, the neural activities not only propagate layer by layer in spatial domain, but also affect the states along the temporal direction, which is more complicated than typical ANNs. Second, the event-driven spiking activity is discrete and non-differentiable, which impedes the BP implementation based on gradient descent. Third, SNNs are more sensitive to parameter configuration because of binary spike representation. Especially in the training phase, we should simultaneously ensure timely response for presynaptic stimulus and avoid too many spikes that will probably degrade the neuronal selectivity. Although previous work has proposed many techniques to adjust firing rate, such as model-based normalization

[Diehl et al.2015] and spike-based normalization [Sengupta et al.2018], all of them are specialized for ANNs-to-SNNs conversion learning, not for the direct training of SNNs as considered in this paper.

In terms of programming frameworks, we lack suitable platforms to support the training of deep SNNs. Although there exist several platforms serving for simulating biological features of SNNs with varied abstraction levels [Brette et al.2007, Carnevale and Hines2006, Hazan et al.2018], little work is designed for training deep SNNs. Researchers have to build application-oriented models from scratch and the training speed is usually slow. On the other hand, emerging ANN-oriented frameworks can provide much better efficiency, especially for large models, hence a natural idea is to map the SNN training onto these frameworks. However, these frameworks are designed for aforementioned ANN algorithms so that they cannot be directly applied to SNNs because of the neural dynamic of spiking neurons.

In this paper, we propose a full-stack solution towards faster, larger, and better SNNs from both algorithm and programming aspects. We draw inspirations from the recent work [Wu et al.2018]

, which proposes the spatio-temporal back propagation (STBP) method for the direct training of SNNs, and significantly extend it to much deeper structure, larger dataset, and better performance. We first propose the NeuNorm method to balance neural selectivity and increase the performance. Then we improve the rate coding to fasten the convergence and convert the LIF model into an explicitly iterative version to make it compatible with a machine learning framework (Pytorch). As a results, compared to the running time on Matlab, we achieve tens of times speedup that enables the direct training of deep SNNs. The best accuracy on neuromorphic datasets (N-MNIST and DVS-CIFAR10) and comparable accuracy as existing ANNs and pre-trained SNNs on non-spiking datasets (CIFAR10) are demonstrated. This work enables the exploration of direct training of high-performance SNNs and facilitates the SNN applications via compatible programming within the widely used machine learning (ML) framework.

Related work

We aims to direct training of deep SNNs. To this end, we mainly make improvements on two aspects: (1) learning algorithm design; (2) training acceleration optimization. We chiefly overview the related works in recent years.

Learning algorithm for deep SNNs

. There exist three ways for SNN learning: i) unsupervised learning such as spike timing dependent plasticity (STDP); ii) indirect supervised learning such as ANNs-to-SNNs conversion; iii) direct supervised learning such as gradient descent-based back propagation. However, by far most of them limited to very shallow structure (network layer less than 4) or toy small dataset (e.g. MNIST, Iris), and little work points to direct training deep SNNs due to their own challenges.

STDP is biologically plausible but the lack of global information hinders the convergence of large models, especially on complex datasets [Timothée and Thorpe2007, Diehl and Matthew2015, Tavanaei and Maida2017].

The ANNs-to-SNNs conversion [Diehl et al.2015, Cao, Chen, and Khosla2015, Sengupta et al.2018, Hu et al.2018]

is currently the most successful method to model large-scale SNNs. They first train non-spiking ANNs and convert it into a spiking version. However, this indirect training help little on revealing how SNNs learn since it only implements the inference phase in SNN format (the training is in ANN format). Besides, it adds many constraints onto the pre-trained ANN models, such as no bias term, only average pooling, only ReLU activation function, etc. With the network deepens, such SNNs have to run unacceptable simulation time (100-1000 time steps) to obtain good performance.

The direct supervised learning method trains SNNs without conversion. They are mainly based on the conventional gradient descent. Different from the previous spatial backpropagation [Lee, Delbruck, and Pfeiffer2016, Jin, Li, and Zhang2018], [Wu et al.2018] proposed the first backpropagation in both spatial and temporal domains to direct train SNNs, which achieved state-of-the-art accuracy on MNIST and N-MNIST datasets. However, the learning algorithm is not optimized well and the slow simulation hinders the exploration onto deeper structures.

Normalization. Many normalization techniques have been proposed to improve the convergence [Ioffe and Szegedy2015, Wu and He2018]. Although these methods have achieved great success in ANNs, they are not suitable for SNNs due to the complex neural dynamics and binary spiking representation of SNNs. Furthermore, in terms of hardware implementation, batch-based normalization techniques essentially incur lateral operations across different stimuluses which are not compatible with existing neuromorphic platforms [Merolla et al.2014, Davies et al.2018]. Recently, several normalization methods (e.g. model-based normalization [Diehl et al.2015], data-based normalization [Diehl et al.2015], spike-based normalization [Sengupta et al.2018]) have been proposed to improve SNN performance. However, such methods are specifically designed for the indirect training with ANNs-to-SNNs conversion, which didn’t show convincing effectiveness for direct training of SNNs targeted by this work.

SNN programming frameworks. There exist several programming frameworks for SNN modeling, but they have different aims. NEURON [Carnevale and Hines2006] and Genesis [Bower and Beeman1998]

mainly focus on the biological realistic simulation from neuron functionality to synapse reaction, which are more beneficial for the neuroscience community. BRIAN2

[Goodman and Brette2009] and NEST [Gewaltig and Diesmann2007] target the simulation of larger scale SNNs with many biological features, but not designed for the direct supervised learning for high performance discussed in this work. BindsNET [Hazan et al.2018] is the first reported framework towards combining SNNs and practical applications. However, to our knowledge, the support for direct training of deep SNNs is still under development. Furthermore, according to the statistics from ModelDB 111ModelDB is an open website for storing and sharing computational neuroscience models., in most cases researchers even simply program in general-purpose language, such as C/C++ and Matlab, for better flexibility. Besides the different aims, programming from scratch on these frameworks is user unfriendly and the tremendous execution time impedes the development of large-scale SNNs. Instead of developing a new framework, we provide a new solution to establish SNNs by virtue of mature ML frameworks which have demonstrated easy-to-use interface and fast running speed.


In this section, we first convert the LIF neuron model into an easy-to-program version with the format of explicit iteration. Then we propose the NeuNorm method to adjust the neuronal selectivity for improving model performance. Furthermore, we optimize the rate coding scheme from encoding and decoding aspects for faster response. Finally, we describe the whole training and provide the pseudo codes for Pytorch.

Explicitly iterative LIF model

LIF model is commonly used to describe the behavior of neuronal activities, including the update of membrane potential and spike firing. Specifically, it is governed by


where is the membrane potential, is a time constant, is pre-synaptic inputs, and is a given fire threshold. Eq. (1)-(2) describe the behaviors of spiking neuron in a way of updating-firing-resetting mechanism (see Fig. 1a-b). When the membrane potential reaches a given threshold, the neuron will fire a spike and is reset to ; otherwise, the neuron receives pre-synapse stimulus and updates its membrane potential according to Eq. (1).

Figure 1: Illustration of iterative LIF. Spike communication between neurons; (b) The update of membrane potential according to Eq. (1)-Eq. (2); (c) Iterative LIF described by Eq. (5)-Eq. (6).

The above differential expressions described in continuous domain are widely used for biological simulations, but it is too implicit for the implementations on mainstream ML frameworks (e.g. Tensorflow, Pytorch) since the embedded automatic differentiation mechanism executes codes in a discrete and sequential way [Paszke et al.2017]. Therefore, it motivates us to convert Eq. (1)-(2) into an explicitly iterative version for ensuring computational tractability. To this end, we first use Euler method to solve the first-order differential equation of Eq. (1), and obtain an iterative expression


We denote the factor as a decay factor and expand the pre-synaptic input to a linearing summation . The subscript indicates the index of pre-synapse and denotes the corresponding pre-synaptic spike which is binary (0 or 1). By incorporating the scaling effect of into synapse weights , we have


Next we add the firing-and-resetting mechanism to Eq. (4). By assuming as usual, we get the final update equations as below


where and denote the -th layer and its neuron number, respectively, is the synaptic weight from the -th neuron in pre-layer () to the -th neuron in the post-layer (). is the step function, which satisfies when , otherwise . Eq (5)-(6) reveal that firing activities of will affect the next state via the updating-firing-resetting mechanism (see Fig. 1c). If the neuron emits a spike at time step , the membrane potential at step will clear its decay component via the term , and vice versa.

Through Eq. (5)-(6), we convert the implicit Eq. (1)-(2) into an explicitly version, which is easier to implement on ML framework. We give a concise pseudo code based on Pytorch for an explicitly iterative LIF model in Algorithm 1.

0:  Previous potential and spike output , current spike input

, and weight vector

0:  Next potential and spike output Function StateUpdate(,
1:   ;
3:  return  
Algorithm 1 State update for an explicitly iterative LIF neuron at time step in the -th layer.

Neuron normalization (NeuNorm)

Considering the signal communication between two convolutional layers, the neuron at the location of the -th feature map (FM) in the -th layer receives convolutional inputs , and updates its membrane potential by


where denotes the weight kernel between the -th FM in layer and the -th FM in layer , denotes the convolution operation, and denotes the local receptive filed of location .

Figure 2: Illustration of NeuNorm. Blue squares represent the normal neuronal FMs, and orange squares represents the auxiliary neuronal FMs. The auxiliary neurons receive lateral inputs (solid lines), and fire signals (dotted lines) to control the strength of stimulus emitted to the next layer.

An inevitable problem for training SNNs is to balance the whole firing rate because of the binary spike communication [Diehl et al.2015, Wu et al.2018]. That is to say, we need to ensure timely and selective response to pre-synaptic stimulus but avoid too many spikes that probably harm the effective information representation. In this sense, it requires that the strength of stimulus maintains in a relatively stable range to avoid activity vanishing or explosion as network deepens.

Observing that neurons respond to stimulus from different FMs, it motivates us to propose an auxiliary neuron method for normalizing the input strength at the same spatial locations of different FMs (see Fig. 2). The update of auxiliary neuron status is described by


where denotes the decay factor, denotes the constant scaling factor, and denotes the number of FMs in the -th layer. In this way, Eq.(9) calculates the average response to the input firing rate with momentum term . For simplicity we set .

Next we suppose that the auxiliary neuron receives the lateral inputs from the same layer, and transmits signals to control the strength of stimulus emitted to the next layer through trainable weights , which has the same size as the FM. Hence, the inputs of neurons in the next layer can be modified by


Inferring from Eq. (Neuron normalization (NeuNorm)

), NeuNorm method essentially normalize the neuronal activity by using the input statistics (moving average firing rate). The basic operation is similar with the zero-mean operation in batch normalization. But NeuNorm has different purposes and different data processing ways (normalizing data along the channel dimension rather than the batch dimension). This difference brings several benefits which may be more suitable for SNNs. Firstly, NeuNorm is compatible with neuron model (LIF) without additional operations and friendly for neuromorphic implementation. Besides that, NeuNorm is more bio-plausible. Indeed, the biological visual pathways are not independent. The response of retina cells for a particular location in an image is normalized across adjacent cell responses in the same layer

[Carandini and Heeger2012, Mante, Bonin, and Carandini2008]. And inspired by these founding, many similar techniques for exploiting channel features have been successfully applied to visual recognition, such as group normalization (GN) [Wu and He2018], DOG [Daral2005] and HOG [Daral2005]. For example, GN divides FMs of the same layer into several groups and normalizes the features within each group also along channel dimension.

Encoding and decoding schemes

To handle various stimulus patterns, SNNs often use abundant coding methods to process the input stimulus. For visual recognition tasks, a popular coding scheme is rate coding. On the input side, the real-valued images are converted into a spike train whose fire rate is proportional to the pixel intensity. The spike sampling is probabilistic, such as following a Bernoulli distribution or a Poisson distribution. On the output side, it counts the firing rate of each neuron in the last layer over given time windows to determine network output. However, the conventional rate coding suffers from long simulation time to reach good performance. To solve this problem, we take a simpler coding scheme to accelerate the simulation without compromising the performance.

Encoding. One requirement for long simulation time is to reduce the sampling error when converting real-value inputs to spike signals. Specifically, given the time window , a neuron can represent information by levels of firing rate, i.e. (normalized). Obviously, the rate coding requires sufficient long window to achieve satisfactory precision. To address this issue, we assign the first layer as an encoding layer and allow it to receive both spike and non-spike signals (compatible with various datasets). In other words, neurons in the encoding layer can process the even-driven spike train from neuromorphic dataset naturally, and can also convert real-valued signals from non-spiking dataset into spike train with enough precision. In this way, the precision is remained to a great extent without depending much on the simulation time.

Decoding. Another requirement for long time window is the representation precision of network output. To alleviate it, we adopt a voting strategy [Diehl and Matthew2015] to decode the network output. We configure the last layer as a voting layer consisting of several neuron populations and each output class is represented by one population. In this way, the burden of representation precision of each neuron in the temporal domain (firing rate in a given time windows) is transferred much to the spatial domain (neuron group coding). Thus, the requirement for long simulation time is significantly reduced. For initialization, we randomly assign a label to each neuron; while during training, the classification result is determined by counting the voting response of all the populations.

In a nutshell, we reduce the demand for long-term window from above two aspects, which in some sense extends the representation capability of SNNs in both input and output sides. We found similar coding scheme is also leveraged by previous work on ANN compression [Tang, Hua, and Wang2017, Hubara, Soudry, and Ran2016]. It implies that, regardless of the internal lower precision, maintaining the first and last layers in higher precision are important for the convergence and performance.

Overall training implementation

We define a loss function

measuring the mean square error between the averaged voting results and label vector within a given time window


where denotes the voting vector of the last layer at time step . denotes the constant voting matrix connecting each voting neuron to a specific category.

From the explicitly iterative LIF model, we can see that the spike signals not only propagate through the layer-by-layer spatial domain, but also affect the neuronal states through the temporal domain. Thus, gradient-based training should consider both the derivatives in these two domains. Based on this analysis, we integrate our modified LIF model, coding scheme, and proposed NeuNorm into the STBP method [Wu et al.2018] to train our network. When calculating the derivative of loss function with respect to and in the -th layer at time step , the STBP propagates the gradients from the -th layer and from time step as follows

0:  Network inputs , class label , parameters and states of convolutional layers (, , , ) and fully-connected layers (, , ), simulation windows , voting matrix  // Map each voting neuron to label.
0:  Update network parameters. Forward (inference):
1:  for  to  do
2:      EncodingLayer()  
3:     for  to  do
4:          AuxiliaryUpdate(, ) // Eq. (9).
5:         (, ) StateUpdate(, , , ) // Eq. (6)-(7), and (9)-(Neuron normalization (NeuNorm))
6:     end for
7:     for  to  do
8:         (, ) StateUpdate(, , , )// Eq. (5)-(6)
9:     end for
10:  end for Loss:
11:   ComputeLoss(, , ) // Eq. (11).  Backward:
12:  Gradient initialization: .
13:  for  to 1  do
14:      LossGradient()// Eq. (5)-(6), and (11)-(13).
15:     for  to 1  do
16:         (, , ) BackwardGradient( ) // Eq. (5)-(6), and (12)-(13).
17:     end for
18:     for  to 2 do
19:         (, , , ) BackwardGradient( ,)// Eq. (6)-(7), (9)-(Neuron normalization (NeuNorm)), and (12)-(13).
20:     end for
21:  end for 
22:  Update parameters based on gradients.  Note: All the parameters and states with layer index in the or loop belong to the convolutional or fully-connected layers, respectively. For clarity, we just use the symbol of .
Algorithm 2 Training codes for one iteration.

A critical problem in training SNNs is the non-differentiable property of the binary spike activities. To make its gradient available, we take the rectangular function to approximate the derivative of spike activity [Wu et al.2018]. It yields


where the width parameter determines the shape of . Theoretically, it can be easily proven that Eq. (14) satisfies


Based on above methods, we also give a pseudo code for implementing the overall training of proposed SNNs in Pytorch, as shown in Algorithm 2.


We test the proposed model and learning algorithm on both neuromorphic datasets (N-MNIST and DVS-CIFAR10) and non-spiking datasets (CIFAR10) from two aspects: (1) training acceleration; (2) application accuracy. The dataset introduction, pre-processing, training detail, and parameter configuration are summarized in Appendix.

Network structure

Tab. 1 and Tab. 2 provide the network structures for acceleration analysis and accuracy evaluation, respectively. The structure illustrations of what we call AlexNet and CIFARNet are also shown in Appendix, which are for fair comparison to pre-trained SNN works with similar structure and to demonstrate comparable performance with ANNs, respectively. It is worth emphasizing that previous work on direct training of SNNs demonstrated only shallow structures (usually 2-4 layers), while our work for the first time can implement a direct and effective learning for larger-scale SNNs (e.g. 8 layers).

Neuromorphic Dataset
Small 128C3(Encoding)-AP2-128C3-AP2-512FC-Voting
Middle 128C3(Encoding)-128C3-AP2-256-AP2-1024FC-Voting
Large 128C3(Encoding)-128C3-AP2-384C3-384C3-AP2-
Non-spiking Dataset
Small 128C3(Encoding)-AP2-256C3-AP2-256FC-Voting
Middle 128C3(Encoding)-AP2-256C3-512C3-AP2-512FC-Voting
Large 128C3(Encoding)-256C3-AP2-512C3-AP2-1024C3-512C3-
Table 1: Network structures used for training acceleration.
Neuromorphic Dataset
Our model 128C3(Encoding)-128C3-AP2-128C3-256C3-AP2-
Non-spiking Dataset
AlexNet 96C3(Encoding)-256C3-AP2-384C3-AP2-384C3-
CIFARNet 128C3(Encoding)-256C3-AP2-512C3-AP2-1024C3-
Table 2: Network structures used for accuracy evaluation.

Training acceleration

Runtime. Since Matlab is a high-level language widely used in SNN community, we adopt it for the comparison with our Pytorch implementation. For fairness, we made several configuration restrictions, such as software version, parameter setting, etc. More details can be found in Appendix.

Fig. 3

shows the comparisons about average runtime per epoch, where batch size of 20 is used for simulation. Pytorch is able to provide tens of times acceleration on all three datasets. This improvement may be attributed to the specialized optimizations for the convolution operation in Pytorch. In contrast, currently these optimizations have not been well supported in most of existing SNN platforms. Hence building spiking model may benefit a lot from DL platform.

Figure 3: Average runtime per epoch.

Network scale. If without acceleration, it is difficult to extend most existing works to large scale. Fortunately, the fast implementation in Pytorch facilitates the construction of deep SNNs. This makes it possible to investigate the influence of network size on model performance. Although it has been widely studied in ANN field, it has yet to be demonstrated on SNNs. To this end, we compare the accuracy under different network sizes, shown in Fig. 4. With the size increases, SNNs show an apparent tendency of accuracy improvement, which is consistent with ANNs.

Figure 4: Influence of network scale. Three different network scales are defined in Tab.1.

Simulation length. SNNs need enough simulation steps to mimic neural dynamics and encode information. Given simulation length , it means that we need to repeat the inference process times to count the firing rate. So the network computation cost can be denoted as . For deep SNNs, previous work usually requires 100 even 1000 steps to reach good performance [Sengupta et al.2018], which brings huge computational cost. Fortunately, by using the proposed coding scheme in this work, the simulation length can be significantly reduced without much accuracy degradation. As shown in Fig. 5, although the longer the simulation windows leads to better results, our method just requires a small length () to achieve satisfactory results. Notably, even if extremely taking one-step simulation, it can also achieve not bad performance with significantly faster response and lower energy, which promises the application scenarios with extreme restrictions on response time and energy consumption.

Figure 5: Influence of simulation length on CIFAR10.

Application accuracy

Neuromorphic datasets. Tab. 3 records the current state-of-the-art results on N-MNIST datasets. [Li2018] proposes a technique to restore the N-MNIST dataset back to the static MNIST, and achieves 99.23% accuracy using non-spiking CNNs. Our model can naturally handle raw spike streams using direct training and achieves significantly better accuracy. Furthermore, by adding NeuNorm technique, the accuracy can be improved up to 99.53%.

Model Method Accuracy
[Neil, Pfeiffer, and Liu2016] LSTM 97.38%
[Lee, Delbruck, and Pfeiffer2016] Spiking NN 98.74%
[Wu et al.2018] Spiking NN 98.78%
[Jin, Li, and Zhang2018] Spiking NN 98.93%
[Neil and Liu2016] Non-spiking CNN 98.30%
[Li2018] Non-spiking CNN 99.23%
Our model
without NeuNorm 99.44%
Our model
with NeuNorm 99.53%
Table 3: Comparison with existing results on N-MNIST.
Model Method Accuracy
[Orchard et al.2015a] Random Forest 31.0%
[Lagorce et al.2017] HOTS 27.1%
[Sironi et al.2018] HAT 52.4%
[Sironi et al.2018] Gabor-SNN 24.5%
Our model without NeuNorm 58.1%
Our model with NeuNorm 60.5%
Table 4: Comparison with existing results on DVS-CIFAR10.

Tab. 4 further compares the accuracy on DVS-CIFAR10 dataset. DVS-CIFAR10 is more challenging than N-MNIST due to the larger scale, and it is also more challenging than non-spiking CIFAR10 due to less samples and noisy environment (Dataset introduction in Appendix). Our model achieves the best performance with 60.5%. Moreover, experimental results indicate that NeuNorm can speed up the convergence.Without NeuNorm the required training epochs to get the best accuracy 58.1% is 157, while it’s reduced to 103 with NeuNorm.

Non-spiking dataset. Tab. 5 summarizes the results of existing state-of-the-art results and our CIFARNet on CIFAR10 dataset. Prior to our work, the best direct training method only achieves 75.42% accuracy. Our model is significantly better than this result, with 15% improvement (reaching 90.53%). We also make a comparison with non-spiking ANN model and other pre-trained works (ANNs-to-SNNs conversion) on the similar structure of AlexNet, wherein our direct training of SNNs with NeuNorm achieves slightly better accuracy.

Model Method Accuracy
[Panda and Roy2016] Spiking NN 75.42%
[Cao, Chen, and Khosla2015] Pre-trained SNN 77.43%
[Rueckauer et al.2017] Pre-trained SNN 90.8%
[Sengupta et al.2018] Pre-trained SNN 87.46%
Baseline Non-spiking NN 90.49%
Our model without NeuNorm 89.83%
Our model with NeuNorm 90.53%
Table 5: Comparison with existing state-of-the-art results on non-spiking CIFAR10.
Model Method Accuracy
[Hunsberger and Eliasmith2015] Non-spiking NN 83.72%
[Hunsberger and Eliasmith2015] Pre-trained SNN 83.52%
[Sengupta et al.2018] Pre-trained SNN 83.54%
Our model 85.24%
Table 6: Comparison with previous results using similar AlexNet struture on non-spiking CIFAR10.
Figure 6: Confusion matrix of voting output with or without NeuNorm. High values along the diagonal indicate correct recognition whereas high values anywhere else indicate confusion between two categories.

Furthermore, to visualize the differences of learning with or without NeuNorm, Fig. 6 shows the average confusion matrix of network voting results on CIFAR10 over 500 randomly selected images. Each location in the 1010 tiles is determined by the voting output and the actual labels. High values along the diagonal indicate correct recognition whereas high values anywhere else indicate confusion between two categories. It shows that using NeuNorm brings a clearer contrast (i.e. higher values along the diagonal), which implies that NeuNorm enhances the differentiation degree among the 10 classes. Combining the results from Tab. 3-5, it also confirms the effectiveness of proposed NeuNorm for performance improvement.


In this paper, we present a direct training algorithm for deeper and larger SNNs with high performance. We propose the NeuNorm to effectively normalize the neuronal activities and improve the performance. Besides that, we optimize the rate coding from encoding aspect and decoding aspect, and convert the original continuous LIF model to an explicitly iterative version for friendly Pytorch implementation. Finally, through tens of times training accelerations and larger network scale, we achieve the best accuracy on neuromorphic datasets and comparable accuracy with ANNs on non-spiking datasets. To our best knowledge, this is the first time report such high performance with direct training on SNNs. The implementation on mainstream ML framework could facilitate the SNN development.


The work was supported by the Project of NSFC (No. 61836004, 61620106010, 61621136008, 61332007, 61327902, 61876215), the National Key Research and Development Program of China (No. 2017YFA0700900), the Brain-Science Special Program of Beijing (Grant Z181100001518006), the Suzhou-Tsinghua innovation leading program 2016SZ0102, and the Project of NSF (No. 1725447, 1730309).


Appendix A Supplementary material

In this supplementary material, we provide the details of our experiment, including the dataset introduction, training setting, and programing platform comparison.

Appendix B A Dataset Introduction

Neuromorphic dataset

N-MNIST converts the frame-based MNIST handwritten digit dataset into its DVS (dynamic vision sensor) version [Orchard et al.2015b] with event streams (in Fig.8). For each sample, DVS scans the static MNIST image along given directions and collects the generated spike train which is triggered by detecting the intensity change of pixels. Since the intensity change has two directions (increase or decrease), DVS can capture two kinds of spike events, denoted as On-event and Off-event. Because of the relative shift of images during moving process, the pixel dimension is expanded to 3434. Overall, each sample in N-MNIST is a spatio-temporal spike pattern with size of , where is the length of temporal window.

DVS-CIFAR10 converts 10000 static CIFAR10 images into the format of spike trains (in Fig.8), consisting of 1000 images per class with size of for each image . Since different DVS types and different movement paths are used [Li et al.2017], the generated spike train contains imbalanced spike events and larger image resolution. We adopt different parameter configurations, shown in Tab. 7. DVS-CIFAR10 has 6 times less samples than the original CIFAR10 dataset, and we randomly choose 9000 images for training and 1000 for testing.

Figure 7: Illustration of neuromorphic datasets. The upper image and the lower image are sampled from N-MNIST and DVS-CIFAR10, respectively. Each sub-picture shows a 5 ms-width spike train.

Non-spiking dataset

CIFAR10 is widely used in ANN domain, which contains 60000 color images belonging to 10 classes with size of for each image. We divide it into 50000 training images and 10000 test images as usual.

Appendix C B Training setting

Data pre-processing

On N-MNIST, we reduce the time resolution by accumulating the spike train within every 5 ms. On DVS-CIFAR10, we reduce the spatial resolution by down-sampling the original 128 128 size to 42

42 size (stride = 3, padding = 0), and reduce the temporal resolution by similarly accumulating the spike train within every 5 ms. On CIFAR10, as usual, we first crop and flip the original images along each RGB channel, and then rescale each image by subtracting the global mean value of pixel intensity and dividing by the resulting standard variance along each RGB channel.


On neuromorphic datasets, we use Adam (adaptive moment estimation) optimizer. On the non-spiking dataset, we use the stochastic gradient descent (SGD) optimizer with initial learning rate

and momentum 0.9, and we let decay to over each 40 epochs.

Figure 8: Illustration of network structures. For simplicity, we denote the left as AlexNet and the right as CIFARNet.
0.75 0.05 0.25
1.0 0.1 0.25
0.25 0.35 0.3
0.9 0.9 0.9
Dropout rate 0.5 0 0
Max epoch 150 200 200
Adam = 0.9,0.999, 1-
Table 7: Parameter configuration used for model evaluation.

Parameter configuration

The configuration of simulation parameters for each dataset is shown in Tab. 7. The structures of what we call AlexNet and CIFARNet are illustrated in Fig. 8.

Appendix D C Details for Acceleration experiment

A total of 10 experiments are counted for average. Considering the slow running time on the Matlab, 10 simulation steps and batch size of 20 per epoch are used. All codes run on server with i7-6700K CPU and GTX1060 GPU. The Pytorch version is 3.5 and the Matlab version is 2018b. Both of them enable GPU execution, but without parallelization.

It is worth noting that recently there emerge more supports on Matlab for deep learning, either official

222Since 2017 version, Matlab provides deep learning libraries. or unofficial (e.g. MatConvNet toolbox333A MATLAB toolbox designed for implementing CNNs. ). However, their highly encapsulated function modules are unfriendly for users to customize deep SNNs. Therefore, in terms of flexibility, all convolution operations we adopted are based on the officially provided operations (i.e. function).