Spatio-Temporal Backpropagation for Training High-performance Spiking Neural Networks

06/08/2017 ∙ by Yujie Wu, et al. ∙ Tsinghua University 0

Compared with artificial neural networks (ANNs), spiking neural networks (SNNs) are promising to explore the brain-like behaviors since the spikes could encode more spatio-temporal information. Although pre-training from ANN or direct training based on backpropagation (BP) makes the supervised training of SNNs possible, these methods only exploit the networks' spatial domain information which leads to the performance bottleneck and requires many complicated training skills. Another fundamental issue is that the spike activity is naturally non-differentiable which causes great difficulties in training SNNs. To this end, we build an iterative LIF model that is more friendly for gradient descent training. By simultaneously considering the layer-by-layer spatial domain (SD) and the timing-dependent temporal domain (TD) in the training phase, as well as an approximated derivative for the spike activity, we propose a spatio-temporal backpropagation (STBP) training framework without using any complicated technology. We achieve the best performance of multi-layered perceptron (MLP) compared with existing state-of-the-art algorithms over the static MNIST and the dynamic N-MNIST dataset as well as a custom object detection dataset. This work provides a new perspective to explore the high-performance SNNs for future brain-like computing paradigm with rich spatio-temporal dynamics.



There are no comments yet.


page 6

page 7

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Deep neural networks (DNNs) have achieved outstanding performance in diverse areas [1, 2, 3, 4, 5], while it seems that the brain uses another network architecture, spiking neural networks, to realize various complicated cognitive functions[6, 7, 8]. Compared with the existing DNNs, SNNs mainly have two superiorities: 1) the spike pattern flowing through SNNs fundamentally codes more spatio-temporal information, while most DNNs lack timing dynamics, especially the widely used feedforward DNNs; and 2) event-driven paradigm of SNNs can make it more hardware friendly, and be adopted by many neuromorphic platforms [9, 10, 11, 12, 13, 14].

However, it remains challenging in training SNNs because of the quite complicated dynamics and non-differentiable nature of the spike activity. In summary, there exist three kinds of training methods for SNNs: 1) unsupervised learning; 2) indirect supervised learning; 3) direct supervised learning. The first one origins from the biological synaptic plasticity for weight modification, such as spike timing dependent plasticity (STDP)

[15, 16, 17]

. Because it only considers the local neuronal activities, it is difficult to achieve high performance. The second one firstly trains an ANN, and then transforms it into its SNN version with the same network structure where the spiking rate of SNN neurons acts as the analog activity of ANN neurons

[18, 19, 20, 21]. This is not a bio-plausible way to explore the learning nature of SNNs. The most promising method to obtain high-performance training is the recent direct supervised learning based on the gradient descent theory with error backpropagation. However, such a method only considers the layer-by-layer spatial domain and ignores the dynamics in temporal domain [22, 23]. Therefore many complicated training skills are required to improve performance[19, 24, 23], such as fixed-amount-proportional reset, lateral inhibition, error normalization, weight/threshold regularization, etc. Thus, a more general dynamic model and learning framework on SNNs are highly required.

In this paper, we propose a direct supervised learning framework for SNNs which combines both the SD and TD in the training phase. Firstly, we build an iterative LIF model with SNNs dynamics but it is friendly for gradient descent training. Then we consider both the spatial direction and temporal direction during the error backpropagation procedure, i.e, spatio-temporal backpropagation (STBP), which significantly improves the network accuracy. Furthermore, we introduce an approximated derivative to address the non-differentiable issue of the spike activity. We test our SNNs framework by using the fully connected and convolution architecture on the static MNIST and a custom object detection dataset, as well as the dynamic N-MNIST. Many complicated training skills which are generally required by existing schemes, can be avoided due to the fact that our proposed method can make full use of STD information that captures the nature of SNNs. Experimental results show that our proposed method could achieve the best accuracy on either static or dynamic dataset, compared with existing state-of-the-art algorithms. The influence of TD dynamics and different methods for the derivative approximation are systematically analyzed. This work shall open a way to explore the high-performance SNNs for future brain-like computing paradigms with rich STD dynamics.

Ii Method and Material

Fig. 1: Illustration of the spatio-temporal characteristic of SNNs. Besides the layer-by-layer spatial dataflow like ANNs, SNNs are famous for the rich temporal dynamics and non-volatile potential integration. However, the existing training algorithms only consider either the spatial domain such as the supervised ones via backpropagation, or the temporal domain such as the unsupervised ones via timing-based plasticity, which causes the performance bottleneck. Therefore, how to build an learning framework making full use of the spatio-temporal domain (STD) is fundamentally required for high-performance SNNs that forms the main motivation of this work.

Ii-a Iterative Leaky Integrate-and-Fire Model in Spiking Neural Networks

Compared with existing deep neural networks, spiking neural networks fundamentally code more spatio-temporal information due to two facts that i) SNNs can also have deep architectures like DNNs, and ii) each neuron has its own neuronal dynamic properties. The former one grants SNNs rich spatial domain information while the later one offers SNNs the power of encoding temporal domain information. However, currently there is no unified framework that allows the effective training of SNNs just as implementing backpropagation (BP) in DNNs by considering the spatio-temporal dynamics. This has challenged the extensive use of SNNs in various applications. In this work, we will present a framework based on iterative leaky integrate-and-fire (LIF) model that enables us to apply spatio-temporal backpropagation for training spiking neural networks.

It is known that LIF is the most widely applied model to describe the neuronal dynamics in SNNs, and it can be simply governed by


where is the neuronal membrane potential at time , is a time constant and denotes the pre-synaptic input which is determined by the pre-neuronal activities or external injections and the synaptic weights. When the membrane potential exceeds a given threshold , the neuron fires a spike and resets its potential to . As shown in Figure 1, the forward dataflow of the SNN propagates in the layer-by-layer SD like DNNs, and the self-feedback injection at each neuron node generates non-volatile integration in the TD. In this way, the whole SNN runs with complex STD dynamics and codes spatio-temporal information into the spike pattern. The existing training algorithms only consider either the SD such as the supervised ones via backpropagation, or the TD such as the unsupervised ones via timing-based plasticity, which causes the performance bottleneck. Therefore, how to build an learning framework making full use of the STD is fundamentally required for high-performance SNNs that forms the main motivation of this work.

However, obtaining the analytic solution of LIF model in (1) directly makes it inconvenient/obscure to train SNNs based on backpropagation. This is because the whole network shall present complex dynamics in both SD and TD. To address this issue, the following event-driven iterative updating rule


can be well used to approximate the neuronal potential in (1

) based on the last spiking moment

and the pre-synaptic input . The membrane potential exponentially decays until the neuron receives pre-synaptic inputs, and a new update round will start once the neuron fires a spike. That is to say, the neuronal states are co-determined by the spatial accumulations of and the leaky temporal memory of .

As we know, the efficiency of error backpropagation for training DNNs greatly benefits from the iterative representation of gradient descent which yields the chain rule for layer-by-layer error propagation in the SD backward pass. This motivates us to propose a iterative LIF based SNN in which the iterations occur in both the SD and TD as follows:




In above formulas, the upper index denotes the moment at time , and and denote the layer and the number of neurons in the layer, respectively. is the synaptic weight from the neuron in pre-synaptic layer to the neuron in the post-synaptic layer, and is the neuronal output of the neuron where denotes a spike activity and denotes nothing occurs. is a simplified representation of the pre-synaptic inputs of the neuron, similar to the in the original LIF model. is the neuronal membrane potential of the neuron and is a bias parameter related the threshold .

Actually, formulas (4)-(5) are also inspired from the LSTM model [25, 26, 27] by using a forget gate to control the TD memory and an output gate to fire a spike. The forget gate controls the leaky extent of the potential memory in the TD, the output gate generates a spike activity when it is activated. Specifically, for a small positive time constant , can be approximated as


since . In this way, the original LIF model could be transformed to an iterative version where the recursive relationship in both the SD and TD is clearly describe, which is friendly for the following gradient descent training in the STD.

Ii-B Spatio-Temporal Backpropagation Training

In order to present STBP training methodology, we define the following loss function

in which the mean square error for all samples under a given time windows is to be minimized


where and

denote the label vector of the

th training sample and the neuronal output vector of the last layer , respectively.

Fig. 2: Error propagation in the STD. (a) At the single-neuron level, the vertical path and horizontal path represent the error propagation in the SD and TD, respectively. (b) Similar propagation occurs at the network level, where the error in the SD requires the multiply-accumulate operation like the feedforward computation.

By combining equations (3)-(9) together it can be seen that is a function of and . Thus, to obtain the derivative of with respect to and is required for the STBP algorithm based on gradient descent. Assume that we have obtained derivative of and at each layer at time , which is an essential step to obtain the final and . Figure2 describes the error propagation (dependent on the derivation) in both the SD and TD at the single-neuron level (figure2.a) and the network level (figure2.b). At the single-neuron level, the propagation is decomposed into a vertical path of SD and a horizontal path of TD. The dataflow of error propagation in the SD is similar to the typical BP for DNNs, i.e. each neuron accumulates the weighted error signals from the upper layer and iteratively updates the parameters in different layers; while the dataflow in the TD shares the same neuronal states, which makes it quite complicated to directly obtain the analytical solution. To solve this problem, we use the proposed iterative LIF model to unfold the state space in both the SD and TD direction, thus the states in the TD at different time steps can be distinguished that enables the chain rule for iterative propagation. Similar idea can be found in the BPTT algorithm for training RNNs in [28].

Now, we discuss how to obtain the complete gradient descent based on the following four cases. Firstly, we denote that


Case 1: at the output layer .
In this case, the derivative can be directly obtained since it depends on the loss function in Eq.(9) of the output layer. We could have


The derivation with respect to is generated based on


Case 2: at the layers .
In this case, the derivative iteratively depends on the error propagation in the SD at time as the typical BP algorithm. We have


Similarly, the derivative yields


Case 3: at the output layer .
In this case, the derivative depends on the error propagation in the TD direction. With the help of the proposed iterative LIF model in Eq.(3)-(5) by unfolding the state space in the TD, we acquire the required derivative based on the chain rule in the TD as follows


where as in Eq.(11).

Case 4: at the layers .
In this case, the derivative depends on the error propagation in both SD and TD. On one side, each neuron accumulates the weighted error signals from the upper layer in the SD like Case 2; on the other side, each neuron also receives the propagated error from self-feedback dynamics in the TD by iteratively unfolding the state space based on the chain rule like Case 3. So we have


Based on the four cases, the error propagation procedure (depending on the above derivatives) is shown in Figure2. At the single-neuron level (Figure2.a), the propagation is decomposed into the vertical path of SD and the horizontal path of TD. At the network level (Figure2.b), the dataflow of error propagation in the SD is similar to the typical BP for DNNs, i.e. each neuron accumulates the weighted error signals from the upper layer and iteratively updates the parameters in different layers; and in the TD the neuronal states are unfolded iteratively in the timing direction that enables the chain-rule propagation. Finally, we obtain the derivatives with respect to and as follows


where can be obtained from in Eq.(11)-(21). Given the and according to the STBP, we can use gradient descent optimization algorithms to effectively train SNNs for achieving high performance.

Ii-C Derivative Approximation of the Non-differentiable Spike Activity

In the previous sections, we have presented how to obtain the gradient information based on STBP, but the issue of non-differentiable points at each spiking time is yet to be addressed. Actually, the derivative of output gate is required for the STBP training of Eq.(11)-(22). Theoretically, is a non-differentiable Dirac function of which greatly challenges the effective learning of SNNs [23]. has zero value everywhere except an infinity value at zero, which causes the gradient vanishing or exploding issue that disables the error propagation. One of existing method viewed the discontinuous points of the potential at spiking times as noise and claimed it is beneficial for the model robustness[29, 23], while it did not directly address the non-differentiability of the spike activity. To this end, we introduce four curves to approximate the derivative of spike activity denoted by , , and in Figure3.b:


where determines the curve shape and steep degree. In fact, , , and

are the derivative of the rectangular function, polynomial function, sigmoid function and Gaussian cumulative distribution function, respectively. To be consistent with the Dirac function

, we introduce the coefficient to ensure the integral of each function is 1. Obviously, it can be proven that all the above candidates satisfy that


Thus, in Eq.(11)-(22) for STBP can be approximated by


In section III-C, we will analyze the influence on the SNNs performance with different curves and different values of .

Fig. 3: Derivative approximation of the non-differentiable spike activity.

(a) Step activation function of the spike activity and its original derivative function which is a typical Diract function

with infinite value at and zero value at other points. This non-differentiable property disables the error propagation. (b)Several typical curves to approximate the derivative of spike activity.

Iii Results

Iii-a Parameter Initialization

The initialization of parameters, such as the weights, thresholds and other parameters, is crucial for stabilizing the firing activities of the whole network. We should simultaneously ensure timely response of pre-synaptic stimulus but avoid too much spikes that reduces the neuronal selectivity. As it is known that the multiply-accumulate operations of the pre-spikes and weights, and the threshold comparison are two key steps for the computation in the forward pass. This indicates the relative magnitude between the weights and thresholds determines the effectiveness of parameter initialization. In this paper, we fix the threshold to be constant in each neuron for simplification, and only adjust the weights to control the activity balance. Firstly, we initial all the weight parameters sampling from the standard uniform distribution


Then, we normalize these parameters by


The set of other parameters is presented in TableI. Furthermore, throughout all the simulations in our work, any complex skill as in [19, 23] is no longer required, such as the fixed-amount-proportional reset, error normalization, weight/threshold regularization, etc.

Network parameter Description Value
Time window 30ms
Threshold (MNIST/object detection dataset/N-MNIST) 1.5, 2.0, 0.2
Decay factor (MNIST/object detection dataset/N-MNIST) 0.1ms, 0.15ms, 0.2ms
Derivative approximation parameters(Figure3) 1.0
Simulation time step 1ms
Learning rate (SGD) 0.5
Adam parameters 0.9, 0.999, 1-
TABLE I: Parameters set in our experiments

Iii-B Dataset Experiments

We test our SNNs model and the STBP training method on various datasets, including the static MNIST and a custom object detection dataset, as well as the dynamic N-MNIST dataset. The input of the first layer should be a spike train, which requires us to convert the samples from the static datasets into spike events. To this end, the Bernoulli sampling from original pixel intensity to the spike rate is used in this paper.

Iii-B1 Spatio-temporal fully connected neural network

Static Dataset. The MNIST dataset of handwritten digits [30] (figure4.b) and a custom dataset for object detection [14] (figure4.a) are chosen to test our method. MNIST is comprised of a training set with 60,000 labelled hand-written digits, and a testing set of other 10,000 labelled digits, which are generated from the postal codes of 0-9. Each digit sample is a 2828 grayscale image. The object detection dataset is a two-category image dataset created by our lab for pedestrian detection. It includes 1509 training samples and 631 testing samples of 28 grayscale image. By detecting whether there is a pedestrian, an image sample is labelled by 0 or 1, as illustrated in Figure4.a. The upper and lower sub-figures in Figure4.c are the spike pattern of 25 input neurons converted from the center patch of 55 pixels of a sample example on the object detection dataset and MNIST, respectively. Figure4.d illustrates an example for the spike pattern of output layer within 15ms before and after the STBP training over the stimulus of digit 9. At the beginning, neurons in the output layer randomly fires, while after the training the 10th neuron coding digit 9 fires most intensively that indicates correct inference is achieved.

Fig. 4: Static dataset experiments. (a) A custom dataset for object detection. This dataset is a two-category image set built by our lab for pedestrian detection. By detecting whether there is a pedestrian, an image sample is labelled by 0 or 1. The images in the yellow boxes are labelled as 1, and the rest ones are marked as 0. (b)MNIST dataset. (c) Raster plot of the spike pattern of 49 input neurons converted from the center patch of 55 pixels of a sample example on the object detection dataset (up) and MNIST (down). (d) Raster plot presents the comparison of output spike pattern before and after the STBP training over a digit 9 on MNIST dataset.
Model Network structure Training skills Accuracy
Spiking RBM (STDP)[31] 784-500-40 None 93.16%
Spiking RBM(pre-training*)[20] 784-500-500-10 None 97.48%
Spiking MLP(pre-training*) [19] 784-1200-1200-10 Weight normalization 98.64%
Spiking MLP(BP) [22] 784-200-200-10 None 97.66%
Spiking MLP(STDP) [15] 784-6400 None 95.00%
Spiking MLP(BP) [23] 784-800-10
Error normalization/
parameter regularization
Spiking MLP(STBP) 784-800-10 None 98.89%
  • We mainly compare with these methods that have the similar network architecture, and * means that their model is based on pre-trained ANN models.

TABLE II: Comparison with the state-of-the-art spiking networks with similar architecture on MNIST.

TableII compares our method with several other advanced results that use the similar MLP architecture on MNIST. Although we do not use any complex skill, the proposed STBP training method also outperforms all the reported results. We can achieve 98.89% testing accuracy which performs the best. TableIII compares our model with the typical MLP on the object detection dataset. The contrast model is one of the typical artificial neural networks (ANNs), i.e. not SNNs, and in the following we use ’non-spiking network’ to distinguish them. It can be seen that our model achieves better performance than the non-spiking MLP. Note that the overall firing rate of the input spike train from the object detection dataset is higher than the one from MNIST dataset, so we increase its threshold to 2.0 in the simulation experiments.

Model Network structure Accuracy
Mean Interval
Non-spiking MLP(BP) 784-400-10 98.31% [97.62%, 98.57%]
Spiking MLP(STBP) 784-400-10 98.34% [97.94%, 98.57%]
  • * results with epochs [201,210].

TABLE III: Comparison with the typical MLP over object detection dataset.

Dynamic Dataset. Compared with the static dataset, dynamic dataset, such as the N-MNIST[32]

, contains richer temporal features, and therefore it is more suitable to exploit SNN’s potential ability. We use the N-MNIST database as an example to evaluate the capability of our STBP method on dynamic dataset. N-MNIST converts the mentioned static MNIST dataset into its dynamic version of spike train by using the dynamic vision sensor (DVS)

[33]. For each original sample from MNIST, the work [32] controls the DVS to move in the direction of three sides of the isosceles triangle in turn (figure5.b) and collects the generated spike train which is triggered by the intensity change at each pixel. Figure5.a records the saccade results on digit 0. Each sub-graph records the spike train within 10ms and each 100ms represents one saccade period. Due to the two possible change directions of each pixel intensity (brighter or darker), DVS could capture the corresponding two kinds of spike events, denoted by on-event and off-event, respectively (figure5.c). Since N-MNIST allows the relative shift of images during the saccade process, it produces 3434 pixel range. And from the spatio-temporal representation in figure5.c, we can see that the on-events and off-events are so different that we use two channel to distinguish it. Therefore, the network structure is 34342-400-400-10.

Fig. 5: Dynamic dataset of N-MNIST. (a) Each sub-picture shows a 10ms-width spike train during the saccades. (b) Spike train is generated by moving the dynamic vision sensor (DVS) in turn towards the direction of 1, 2 and 3. (c) Spatio-temporal representation of the spike train from digit 0 [32]where the upper one and lower one denote the on-events and off-events, respectively.
Model Network structure Training skills Accuracy
Non-spiking CNN(BP)[24] - None 95.30%
Non-spiking CNN(BP)[34] - None 98.30%
Non-spiking MLP(BP)[23] -800-10 None 97.80%
LSTM(BPTT)[24] - Batch normalization 97.05%
Phased-LSTM(BPTT)[24] - None 97.38%
Spiking CNN(pre-training*)[34] - None 95.72%
Spiking MLP(BP)[23] -800-10
Error normalization/
parameter regularization
Spiking MLP(BP)[35] -10000-10 None 92.87%
Spiking MLP(STBP) -800-10 None 98.78%
  • We only show the network structure based on MLP, and the other network structure refers to the above references. *means that their model is based on pre-trained ANN models.

TABLE IV: Comparison with state-of-the-art networks over N-MNIST.

TableIV compares our STBP method with some state-of-the-art results on N-MNIST dataset. The upper 5 results are based on ANNs, and lower 4 results including our method uses SNNs. The ANNs methods usually adopt a frame-based method, which collects the spike events in a time interval () to form a frame of image, and use the conventional algorithms for image classification to train the networks. Since the transformed images are often blurred, the frame-based preprocessing is harmful for model performance and abandons the hardware friendly event-driven paradigm. As can be seen from TableIV, the models of ANN are generally worsen than the models of SNNs. In contrast, SNNs could naturally handle event stream patterns, and by better use of spatio-temporal feature of event streams, our proposed STBP method achieves best accuracy of 98.78% when compared all the reported ANNs and SNNs methods. The greatest advantage of our method is that we did not use any complex training skills, which is beneficial for future hardware implementation.

Iii-B2 Spatio-temporal convolution neural network

Extending our framework to convolution neural network structure allows the network going deeper and grants network more powerful SD information. Here we use our framework to establish the spatio-temporal convolution neural network. Compared with our spatio-temporal fully connected network, the main difference is the processing of the input image, where we use the convolution in place of the weighted summation. Specifically, in the convolution layer, each convolution neuron receives the convoluted input and updates its state according to the LIF model. In the pooling layer, because the binary coding of SNNs is inappropriate for standard max pooling, we use the average pooling instead.

Model Network structure Accuracy
Spiking CNN (pre-training)[13] 28281-12C5-P2-64C5-P2-10 99.12%
Spiking CNN(BP)[23] 28281-20C5-P2-50C5-P2-200-10 99.31%
Spiking CNN (STBP) 28281-15C5-P2-40C5-P2-300-10 99.42%
  • We mainly compare with these methods that have the similar network architecture, and * means that their model is based on pre-trained ANN models.

TABLE V: Comparison with other spiking CNN over MNIST.
Model Network structure Accuracy
Mean Interval
Non-spiking CNN(BP) -6C3-300-10 98.57% [98.57%, 98.57%]
Spiking CNN(STBP) -6C3-300-10 98.59% [98.26%, 98.89%]
  • * results with epochs [201,210].

TABLE VI: Comparison with the typical CNN over object detection dataset.

Our spiking CNN model are also tested on the MNIST dataset as well as the object detection dataset . In the MNIST, our network contains one convolution layers with kernel size of and two average pooling layers alternatively, followed by one hidden layer. And like traditional CNN, we use the elastic distortion [36] to preprocess dataset. TableV records the state-of-the-art performance spiking convolution neural networks over MNIST dataset. Our proposed spiking CNN model obtain 98.42% accuracy, which outperforms other reported spiking networks with slightly lighter structure. Furthermore, we configure the same network structure on a custom object detection database to evaluate the proposed model performance. The testing accuracy is reported after training 200 epochs. TableVI indicates our spiking CNN model could achieve a competitive performance with the non-spiking CNN.

Iii-C Performance Analysis

Iii-C1 The Impact of Derivative Approximation Curves

In section II-B, we introduce different curves to approximate the ideal derivative of the spike activity. Here we try to analyze the influence of different approximation curves on the testing accuracy. The experiments are also conducted on the MNIST dataset, and the network structure is . The testing accuracy is reported after training 200 epochs. Firstly, we compare the impact of different curve shapes on model performance. In our simulation we use the mentioned , , and shown in Figure3.b. Figure6.a illustrates the results of approximations of different shapes. We observe that different nonlinear curves, such as , , and , only present small variations on the performance.

Fig. 6: Comparisons of different derivation approximation curves. (a) The impact of different approximations. (b) The impact of different widths of regular approximation.

Furthermore, we use the rectangular approximation as an example to explore the impact of width on the experiment results. We set and corresponding results are plotted in figure6.b. Different colors denote different values. Both too large and too small value would cause worse performance and in our simulation, achieves the highest testing accuracy, which implies the width and steepness of rectangle influence the model performance. Combining figure 6.a and figure 6.b, it indicates that the key point for approximating the derivation of the spike activity is to capture the nonlinear nature, while the specific shape is not so critical.

Iii-C2 The Impact of Temporal Domain

A major contribution of this work is introducing the temporal domain into the existing spatial domain based BP training method, which makes full use of the spatio-temporal dynamics of SNNs and enables the high-performance training. Now we quantitatively analyze the impact of the TD item. The experiment configurations keep the same with the previous section () and we also report the testing results after training 200 epochs. Here the existing BP in the SD is termed as SDBP.

Model Dataset Network structure Training skills Accuracy
Mean Interval
Spiking MLP Objective tracking 784-400-10 None 97.11% [96.04%,97.78%]
(SDBP) MNIST 784-400-10 None 98.29% [98.23%, 98.39%]
Spiking MLP Objective tracking 784-400-10 None 98.32% [97.94%, 98.57%]
(STBP) MNIST 784-400-10 None 98.48% [98.42%, 98.51%]
  • * results with epochs [201,210].

TABLE VII: Comparison for the SDBP model and the STBP model on different datasets.

TableVII records the simulation results. The testing accuracy of SDBP is lower than the accuracy of the STBP on different dataset, which shows the time information is beneficial for model performance. Specifically, compared to the STBP, the SDBP has a 1.21% loss of accuracy on the objective tracking dataset, which is 5 times larger than the loss on the MNIST. And results also imply that the performance of SDBP is not stable enough. In addition to the interference of the dataset itself, the reason for this variation may be the unstability of SNNs training. Actually, the training of SNNs relies heavily on the parameter initialization, which is also a great challenge for SNNs applications. In many reported works, researchers usually leverage some special skills or mechanisms to improve the training performance, such as the lateral inhibition, regularization, normalization, etc. In contrast, by using our STBP training method, much higher performance can be achieved on the same network. Specifically, the testing accuracy of STBP reaches 98.48% on MNIST and 98.32% on the object detection dataset. Note that the STBP can achieve high accuracy without using any complex training skills. This stability and robustness indicate that the dynamics in the TD fundamentally includes great potential for the SNNs computing and this work indeed provides a new idea.

Iv Conclusion

In this work, a unified framework that allows supervised training spiking neural networks just like implementing backpropagation in deep neural networks (DNNs) has been built by exploiting the spatio-temporal information in the networks. Our major contributions are summarized as follows:

  1. We have presented a framework based on an iterative leaky integrate-and-fire model, which enables us to implement spatio-temporal backpropagation on SNNs. Unlike previous methods primarily focused on its spatial domain features, our framework further combines and exploits the features of SNNs in both the spatial domain and temporal domain;

  2. We have designed the STBP training algorithm and implemented it on both MLP and CNN architectures. The STBP has been verified on both static and dynamic datasets. Results have shown that our model is superior to the state-of-the-art SNNs on relatively small-scale networks of spiking MLP and CNNs, and outperforms DNNs with the same network size on dynamic N-MNIST dataset. An attractive advantage of our algorithm is that it doesn’t need extra training techniques which generally required by existing schemes, and is easier to be implemented in large-scale networks. Results also have revealed that the use of spatio-temporal complexity to solve problems could fulfill the potential of SNNs better;

  3. We have introduced an approximated derivative to address the non-differentiable issue of the spike activity. Controlled experiment indicates that the steepness and width of approximation curve would affect the model’s performance and the key point for approximations is to capture the nonlinear nature, while the specific shape is not so critical.

Because the brain combines complexity in the temporal and spatial domains to handle input information, we also would like to claim that implementing STBP on SNNs is more bio-plausible than applying BP on DNNs. The property of STBP that doesn’t rely on too many training skills makes it more hardware-friendly and useful for the design of neuromorphic chip with online learning ability. Regarding the future research topics, two issues we believe are quite necessary and very important. One is to apply our framework to tackle more problems with the timing characteristics, such as dynamic data processing, video stream identification and speech recognition. The other is how to accelerate the supervised training of large scale SNNs based on GPUs/CPUs or neuromorphic chips. The former aims to further exploit the rich spatio-temporal features of SNNs to deal with dynamic problems, and the later may greatly prompt the applications of large scale of SNNs in real life scenarios.


  • [1] P. Chaudhari and H. Agarwal,

    Progressive Review Towards Deep Learning Techniques

    .   Springer Singapore, 2017.
  • [2] L. Deng and D. Yu, “Deep learning: Methods and applications,” Foundations and Trends in Signal Processing, vol. 7, no. 3, pp. 197–387, 2014.
  • [3]

    Jia, Yangqing, Shelhamer, Evan, Donahue, Jeff, Karayev, Sergey, Long, and Jonathan, “Caffe: Convolutional architecture for fast feature embedding,”

    Eprint Arxiv, pp. 675–678, 2014.
  • [4] G. Hinton, L. Deng, D. Yu, and G. E. Dahl, “Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups,” IEEE Signal Processing Magazine, vol. 29, no. 6, pp. 82–97, 2012.
  • [5] K. He, X. Zhang, S. Ren, and J. Sun, Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition.   Springer International Publishing, 2014.
  • [6] X. Zhang, Z. Xu, C. Henriquez, and S. Ferrari, “Spike-based indirect training of a spiking neural network-controlled virtual insect,” in Decision and Control (CDC), 2013 IEEE 52nd Annual Conference on.   IEEE, 2013, pp. 6798–6805.
  • [7] J. N. Allen, H. S. Abdel-Aty-Zohdy, and R. L. Ewing, “Cognitive processing using spiking neural networks,” in IEEE 2009 National Aerospace and Electronics Conference, 2009, pp. 56–64.
  • [8] N. Kasabov and E. Capecci, “Spiking neural network methodology for modelling, classification and understanding of eeg spatio-temporal data measuring cognitive processes,” Information Sciences, vol. 294, no. C, pp. 565–575, 2015.
  • [9] B. V. Benjamin, P. Gao, E. Mcquinn, S. Choudhary, A. R. Chandrasekaran, J. M. Bussat, R. Alvarez-Icaza, J. V. Arthur, P. A. Merolla, and K. Boahen, “Neurogrid: A mixed-analog-digital multichip system for large-scale neural simulations,” Proceedings of the IEEE, vol. 102, no. 5, pp. 699–716, 2014.
  • [10] P. A. Merolla, J. V. Arthur, R. Alvarezicaza, A. S. Cassidy, J. Sawada, F. Akopyan, B. L. Jackson, N. Imam, C. Guo, and Y. Nakamura, “Artificial brains. a million spiking-neuron integrated circuit with a scalable communication network and interface.” Science, vol. 345, no. 6197, pp. 668–73, 2014.
  • [11] S. B. Furber, F. Galluppi, S. Temple, and L. A. Plana, “The spinnaker project,” Proceedings of the IEEE, vol. 102, no. 5, pp. 652–665, 2014.
  • [12] T. Hwu, J. Isbell, N. Oros, and J. Krichmar, “A self-driving robot using deep convolutional neural networks on neuromorphic hardware,”, 2016.
  • [13] S. K. Esser, P. A. Merolla, J. V. Arthur, A. S. Cassidy, R. Appuswamy, A. Andreopoulos, D. J. Berg, J. L. Mckinstry, T. Melano, and D. R. Barch, “Convolutional networks for fast, energy-efficient neuromorphic computing,” Proceedings of the National Academy of Sciences of the United States of America, vol. 113, no. 41, p. 11441, 2016.
  • [14] S. S. Zhang, L.P. Shi, “Creating more intelligent robots through brain-inspired computing,” Science(suppl), vol. 354, 2016.
  • [15] P. U. Diehl and M. Cook, “Unsupervised learning of digit recognition using spike-timing-dependent plasticity,” Frontiers in Computational Neuroscience, vol. 9, p. 99, 2015.
  • [16] D. Querlioz, O. Bichler, P. Dollfus, and C. Gamrat, “Immunity to device variations in a spiking neural network with memristive nanodevices,” IEEE Transactions on Nanotechnology, vol. 12, no. 3, pp. 288–295, 2013.
  • [17] S. R. Kheradpisheh, M. Ganjtabesh, and T. Masquelier, “Bio-inspired unsupervised learning of visual features leads to robust invariant object recognition,” Neurocomputing, vol. 205, no. C, pp. 382–392, 2016.
  • [18] J. A. Perezcarrasco, B. Zhao, C. Serrano, B. Acha, T. Serranogotarredona, S. Chen, and B. Linaresbarranco, “Mapping from frame-driven to frame-free event-driven vision systems by low-rate rate-coding and coincidence processing. application to feed forward convnets.” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 35, no. 11, pp. 2706–19, 2013.
  • [19]

    P. U. Diehl, D. Neil, J. Binas, and M. Cook, “Fast-classifying, high-accuracy spiking deep networks through weight and threshold balancing,” in

    International Joint Conference on Neural Networks, 2015, pp. 1–8.
  • [20]

    O. Peter, N. Daniel, S. C. Liu, D. Tobi, and P. Michael, “Real-time classification and sensor fusion with a spiking deep belief network,”

    Frontiers in Neuroscience, vol. 7, p. 178, 2013.
  • [21] E. Hunsberger and C. Eliasmith, “Spiking deep networks with lif neurons,” Computer Science, 2015.
  • [22] P. O’Connor and M. Welling, “Deep spiking networks,”, 2016.
  • [23] J. H. Lee, T. Delbruck, and M. Pfeiffer, “Training deep spiking neural networks using backpropagation,” Frontiers in Neuroscience, vol. 10, 2016.
  • [24] D. Neil, M. Pfeiffer, and S. C. Liu, “Phased lstm: Accelerating recurrent network training for long or event-based sequences,”, 2016.
  • [25] F. A. Gers, J. Schmidhuber, and F. Cummins, “Learning to forget: continual prediction with lstm,” Neural Computation, vol. 12, no. 10, p. 2451, 1999.
  • [26]

    S. Hochreiter and J. Schmidhuber, “Long short-term memory,”

    Neural computation, vol. 9, no. 8, pp. 1735–1780, 1997.
  • [27]

    J. Chung, C. Gulcehre, K. Cho, and Y. Bengio, “Gated feedback recurrent neural networks,”

    Computer Science, pp. 2067–2075, 2015.
  • [28] P. J. Werbos, “Backpropagation through time: what it does and how to do it,” Proceedings of the IEEE, vol. 78, no. 10, pp. 1550–1560, 1990.
  • [29] Y. Bengio, T. Mesnard, A. Fischer, S. Zhang, and Y. Wu, “An objective function for stdp,” Computer Science, 2015.
  • [30] Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proceedings of the IEEE, vol. 86, no. 11, pp. 2278–2324, 1998.
  • [31]

    E. Neftci, S. Das, B. Pedroni, K. Kreutzdelgado, and G. Cauwenberghs, “Event-driven contrastive divergence for spiking neuromorphic systems,”

    Frontiers in Neuroscience, vol. 7, p. 272, 2013.
  • [32] G. Orchard, A. Jayawant, G. K. Cohen, and N. Thakor, “Converting static image datasets to spiking neuromorphic datasets using saccades,” Frontiers in Neuroscience, vol. 9, 2015.
  • [33] P. Lichtsteiner, C. Posch, and T. Delbruck, “A 128x128 120db 15us latency asynchronous temporal contrast vision sensor,” IEEE Journal of Solid-State Circuits, vol. 43, no. 2, pp. 566–576, 2007.
  • [34] D. Neil and S. C. Liu, “Effective sensor fusion with event-based sensors and deep network architectures,” in IEEE Int. Symposium on Circuits and Systems, 2016.
  • [35] G. K. Cohen, G. Orchard, S. H. Leng, J. Tapson, R. B. Benosman, and A. V. Schaik, “Skimming digits: Neuromorphic classification of spike-encoded images,” Frontiers in Neuroscience, vol. 10, no. 184, 2016.
  • [36] P. Y. Simard, D. Steinkraus, and J. C. Platt, “Best practices for convolutional neural networks applied to visual document analysis,” in International Conference on Document Analysis and Recognition, 2003, p. 958.