The success of Deep Neural Networks (DNNs) is attributed to the deep hierarchy structure, that learns the representations of big data with multiple levels of abstraction
. Research in DNNs has advanced the state-of-art in many machine learning tasks, such as image recognition, speech recognition [3, 4]5, 6], and medical diagnosis . However, training of DNNs generally requires high computing resources (e.g., GPUs and computing clusters). Therefore, in power-critical computing platform, such as edge computing, implementation of DNNs is greatly limited [8, 9]. Spiking Neural Networks (SNNs) provide a low-power alternative to neural network implementation. It is designed to emulate brain computing, that has the potential to provide computing capabilities equivalent to that of DNNs on an ultra-low-power spike-driven neuromorphic hardware [10, 11, 12]. However, due to the relatively shallow network structures, SNNs have yet to match the performance of DNNs in pattern classification tasks on standard benchmarks. 111This research is supported by Programmatic Grant No. A1687b0033 from the Singapore Government’s Research, Innovation and Enterprise 2020 plan (Advanced Manufacturing and Engineering domain), the National Natural Science Foundation of China (Grant No. 61976043 and 61573081), the National Key Research and Development Program of China (2018AAA0100202), the Zhejiang Lab (Grant No. 2019KC0AB02).
. Unfortunately, the training of DSNNs is not straightforward as the well-studied error back-propagation (BP) learning algorithm is not applicable due to the complex temporal dynamics and the non-differentiable spike function. Addressing the issue of DSNN training, there have been many successful implementations, that can be classified into three categories.
The first category is ANN-to-SNN conversion methods. They train an equivalent ANN, and then approximately convert the pre-trained ANN into SNN version [21, 22, 23, 24, 25, 26, 27, 28, 29, 30]. The goal of ANN-to-SNN methods is to leverage the state-of-the-art ANN training algorithms, so that the converted SNN version can reach the competitive classification performance of its off-line trained ANN counterparts. Due to the approximation, the conversion suffers from some loss of accuracy. Despite many studies, such as weight/activation normalization [26, 27, 28], and adding noise to the model [24, 25], the solutions are far from perfect. Furthermore, the conversion SNNs use the spike rate of a spiking neuron to encode analog activity of an ANN neuron. Thus, a high spike rate is required to simulate the stronger analog, which would then mask the discrete nature of the spike activity, and is energy-intensive . Also, the inference time of the spike rate based encoding scheme is another problem .
The second category is the membrane potential driven learning algorithms, treating the neuron’s membrane potential as differentiable signals to solve the non-differential problems of spikes with the surrogate derivatives . For example, [33, 34, 35] back-propagate errors based on the membrane potential at a single time step, which ignore the temporal dependency, but only use signal values at the current time instance. To resolve this problem, SLAYER  and STBP [16, 17] train deep SNNs with surrogate derivatives based on the idea of Back-propagation Through Time (BPTT) algorithm. While competitive accuracies are reported on the MNIST and CIFAR10 datasets , the computational and memory demands of these algorithms are high for BPTT because the entire sequence must be stored to compute the gradients exactly.
The third category is the spike-driven learning algorithms, which uses the timing of spikes as the relevant signals for controlling synaptic changes. The typical examples include SpikeProp  and its derivatives [37, 38, 39, 40]. These methods apply a linear assumption that the neuron’s membrane potential increases linearly in the infinitesimal time around the spike time, then the derivative of spike function can be calculated and the back-propagation can be implemented in multi-layer SNNs. Recently, Mostafa  applied non-leaky integrate-and-fire neurons to avoid the problem of the non-differentiable spike function, and the work showed competitive performance on MNIST dataset. The performance of SNN with spike-driven learning algorithm is further improved by  and . However, the existing spike-driven learning algorithms suffer from certain limitations, such as the problems of dead neuron and gradient exploding, which require several complicated skills to relieve these problems . For example, in , some constraints are imposed on synaptic weights to overcome dead neuron problem, and gradient normalization strategy is used to overcome the problem of gradient exploding. These complex training strategies limit the scalablity of the learning algorithms.
Among the existing learning algorithms, the spike-driven learning algorithms perform the SNNs training in a strictly event-driven manner, and are compatible with the temporal coding in which the information is carried by the timing of individual spikes in a very sparse manner; Hence, spike-driven learning algorithms hold the potential of enabling ultra-low-power event-driven neuromorphic hardware. With these considerations, we focus on developing more effective spike-driven learning algorithms for deep SNNs, and make the following contributions in this paper:
1) We thoroughly analyze the issues that make the well-studied BP algorithm incompatible for training SNNs, including the problems of non-differentiable spike generation function, gradient exploding and dead neuron. Building on such an understanding, we put forward a Rectified Linear Postsynaptic Potential function (ReL-PSP) for spiking neurons to resolve these problems.
2) Based on the proposed ReL-PSP, we derive a new spike-timing-dependent BP algorithm (STDBP) for DSNN. In this algorithm, the timing of spikes is used as the information-carrying quantities, and learning happens only at the spike times in a totally event-driven manner.
3) Due to the good scalability of the proposed algorithm, we extend it to convolutional spiking neural network (CSNN), and achieve an accuracy of 99.2% on MNIST dataset. To our best knowledge, this is the first implementation of a CSNN structure based on the spike-timing-based supervised learning algorithm.
Experimental results demonstrate that the proposed learning algorithm achieves the state-of-the-art performance in spike time based learning algorithms of SNNs. This work provides a new perspective to investigate the significance of spike timing dynamics in information coding, synaptic plasticity, and decision making in SNNs-based computing paradigm.
2 Problem Description
Error back-propagation (specifically stochastic gradient descent) is the workhorse for the remarkable success of DNNs. However, as shown in Fig.1, the dynamics of a typical artificial neuron in DNNs and that in SNNs is rather different, and the well-studied BP algorithm cannot be directly applied to deep SNNs due to issues of non-differentiable spike function, exploding gradients and dead neurons. In the following, we will discuss these issues in depth.
Consider a fully connected DSNN. For simplicity, each neuron is assumed to emit at most one spike. In general, the membrane potential of neuron in layer can be expressed as
where is the spike of the th neuron in layer -, and is the synaptic weight of the connection from neuron (in -1 layer) to neuron (in layer). Each incoming spike from neuron will induce a postsynaptic potential (PSP) at neuron , and the kernel is used to describe the PSP generated by the spike . Hence each input spike makes a contribution to the membrane potential of the neuron as described by in Eq. 1. There are several PSP functions, and a commonly used one is the alpha function which is defined as
Fig. 2(a) shows the waveform of the alpha PSP function. As shown in Fig. 2(b), integrating the weighted PSPs gives the dynamics of the membrane potential . The neuron will emit a spike when its membrane potential reaches the firing threshold , as mathematically defined in the spike generation function :
Once a spike is emitted, the refractory kernel is used to reset the membrane potential to resting.
To train SNNs using BP, we need to compute the derivative of the postsynaptic spike time with respect to a presynaptic spike time and synaptic weight of the corresponding connection:
Due to the discrete nature of the spike generation function (Eq. 3), the difficulty of Eq. 4 lies in solving the partial derivative , which we referred to as the problem of non-differentiable spike function. Existing spike-driven learning algorithms [36, 43] assume that the membrane potential increases linearly in the infinitesimal time interval before spike time . Then, can be expressed as
The exploding gradient problem occurs when i.e. the membrane potential just reaches the firing threshold, emitting a spike (Fig. 2b). Since is the denominator in Eq. 6, this causes Eq. 6 to explode with large weight updates. Although various strategies have been proposed to alleviate this problem, such as adaptive learning rate  and dynamic firing threshold , it has not been fully resolved.
From Eq. 5, when the presynaptic neuron does not emit a spike, the error cannot be back propagated through . This is the dead neuron
problem. This problem is also common in DNNs with ReLU activation function. However, due to the leaky nature of the PSP kernel and spike generate mechanism, the problem of dead neuron is more severe in SNNs. As shown in Fig.2(c), there are three input spikes, and the neuron emits a spike with large synaptic weights (blue). With slightly reduced synaptic weights, the membrane potential stays sub-threshold and the neuron becomes a dead neuron (green). When the neuron does not spike, no errors can back-propagate through it. The problem of dead neuron is fatal in spike-driven learning algorithms.
In this section, we describe how the above challenges maybe overcome and a DSNN may still be trained using BP. To this end, we introduce the Rectified Linear Postsynaptic Potential function (ReL-PSP) for the spiking neuron model. In Section 3.2, the proposed spike-timing-dependent back propagation (STDBP) learning algorithm (based on ReL-PSP) is presented.
3.1 ReL-PSP Based Spiking Neuron Model
As presented in Section 2, BP cannot be directly applied in DSNNs due to problems of non-differentable spike function, exploding gradient and dead neuron. To overcome the above-mentioned problems, we propose a simple yet efficient Rectified Linear Postsynaptic Potential (ReL-PSP) based spiking neuron model, and the dynamics of the proposed neuron model is defined as
whereby is the kernel of the PSP function, which is defined as
As shown in Fig. 3(a), given an input spike at , the membrane potential after is a linear function of time . Since the shape of the proposed PSP function resembles that of a rectified linear function, we name it the ReL-PSP function. In the following, we will analyze how the proposed neuron model solves the above-mentioned problems.
3.1.1 Non-differentiable spike function
As shown in Fig. 3b, due to the linearity of the ReL-PSP, the membrane potential increases linearly prior to spike time . In this case, there is no need to assume linearity , and we can directly use Eq. 10 to compute . This resolves the problem of non-differentiable spike generation.
The precise gradients in BP provide the necessary information for optimization, and is key to the high accuracy of DNNs. Without having to assume linearity, we use the precise value of instead of approximating it, and avoid accumulating errors across multiple layers.
3.1.2 Gradient explosion
Exploding gradient occurs when the denominator in Eq. 6 approaches 0. In this case, the membrane potential just reaches the firing threshold at spike time, and is caused by the combined effect of and partial derivative of the PSP function. Compared to the Alpha-PSP function, which has zero gradient at its peak, there is less chance for such a scenario to occur in the ReL-PSP function. As the partial derivative of ReL-PSP, , is always equal to 1, Eq. 10 can be expressed as .
As may still be close to
, the exploding gradient problem may not be completely solved. However, as from Eqs.3 and 8, we obtain the spike time as a function of input spikes and synaptic weights :
Re-arranging, the spike time can be calculated as
Should the be close to 0, the spike will be emitted late, and may not contribute to the spike in the next layer. Therefore, the neuron in the layer does not participate in error BP, and does not result in exploding gradient.
3.1.3 Dead neuron
In neural networks, sparse representation (few activated neurons) has many advantages, such as information disentangling, efficient variable-size representation, linear separability etc. However, sparsity may also hurt predictive performance, as given the same number of neurons, it reduces effective capacity of the model . Unfortunately, as shown in Fig. 2(c), due to the leaky nature of the alpha-shape PSP and the spike generation mechanism, such a spiking neuron is more likely to suffer from the dead neuron problem. However, as shown in Fig. 3
(c), with the ReL-PSP kernel, the PSP increases over time. Hence the neuron with a more positive sum of weights fires earlier than one with a less positive sum, with lower probability of becoming a dead neuron. Overall, the proposed ReL-PSP greatly alleviates the dead neuron problem as the PSP does not decay over time, while maintaining a sparse representation to the same extent of the ReLU activation function.
3.2 Error Backpropagation
Given a classification task with
categories, each neuron in the output layer is assigned to a category. When a training sample is presented to the neural network, the corresponding output neuron should fire the earliest. There are several loss functions that can be constructed to achieve this goal[31, 41, 42]. In this work, the cross-entropy loss function is used. To minimise the spike time of the target neuron and maximises the spike time of non-target neurons, we use the softmax function on the negative values of the spike times in the output layer: . Then, the loss function is given by
is the vector of the spike times in the output layer andis the target class index.
The loss function is minimised by updating the synaptic weights across the network. This has the effect of delaying or advancing spike times across the network. The derivatives of the first spike time with respect to synaptic weights and input spike times are given by
In this section, we investigate two SNNs: the fully connected SNN and convolutional SNN on image classification task based on the MNIST dataset  so as to benchmark their learning capabilities with existing spike-driven learning algorithms.
4.1 Temporal Coding
The MNIST dataset comprises of 60,000 grayscale images for training and 10,000 grayscale images for testing. We first convert the images into spike trains. There are many encoding strategies. Rate-coding assumes that a higher sensory variable corresponds to a higher firing rate  and requires a large number of encoding spikes to be transmitted. However, this is highly inefficient with minimal information content in each spike. In this work, a more efficient temporal coding scheme is used, that encodes information in individual spike time, with the assumption that strongly activated neurons tend to fire earlier .
As shown in Fig. 4, the input information is encoded in spike timing of neurons, with each neuron firing only once. More salient information is encoded as an earlier spike in the corresponding input neuron. The encoding spikes then propagate to the subsequent layers in a temporal fashion. Each neuron in the hidden and output layer receives the spikes from its presynaptic neurons, and it emits a spike when the membrane potential reaches the threshold. Similar to the input layer, the neurons in the hidden and output layer that are strongly activated will fire first. Therefore, temporal coding is maintained throughout the DSNN, and the output neuron that fires earliest categorizes the input stimulus.
4.2 Experimental Results
Table. 1 shows the classification accuracies of the two SNNs, and other spike-driven learning algorithms on the MNIST dataset. The proposed STDBP learning algorithm could reach accuracies of and with network structures of 784-400-10 and 784-800-10, respectively. They outperform previously reported results with same network structure. For example, with the structure of 784-400-10, the classification accuracy of our method is , while the accuracy achieved by Mostafa  is . Another advantage of our algorithm is that it does not need additional training strategies which are widely used in previous works to improve their performance. This facilitates large-scale implementation of STDBP. Moreover, to our best knowledge, this is the first implementation of a SCNN based on spike-driven learning algorithms. The model achieves an accuracy of , higher than the fully connected SNN.
|Model||Coding||Network Architecture||Additional Strategy||Acc. %|
|Mostafa ||Temporal||784-800-10||Weight and Gradient Constraintion||97.5|
|Tavanaei et al ||Rate||784-1000-10||None||96.6|
|Comsa et al ||Temporal||784-340-10||Weight and Gradient Constraintion||97.4|
|Kheradpisheh et al||Temporal||784-400-10||Weight constraintion||97.4|
|STDBP (This work)||Temporal||784-400-10||None||98.1|
|STDBP (This work)||Temporal||784-800-10||None||98.4|
|STDBP (This work)||Temporal||
Fig. 5 shows the distribution of spike timing in the hidden layers and of the earliest spike time in the output layer across 10000 test images for two SNNs, namely 784-400-10 and 784-800-10. For both architectures, the SNN makes a decision after only a fraction of the hidden layer neurons. For the 784-400-10 topology, an output neuron spikes (a class is selected) after only of the hidden neurons have spiked. The network is thus able to make very rapid decisions about the input class. In addition, during the simulation time, only 66.3% of the hidden neurons have spiked. Therefore, the experimental results demonstrate that the proposed learning algorithm works in a accurate, fast and sparse manner.
5 Discussion and Conclusion
In this work, we analysed the problems that BP faces in a DSNN, namely, non-differentiable spike function, exploding gradient, and dead neuron problem. To address these problems, we propose the Rectified Linear Postsynaptic Potential function (ReL-PSP) for spiking neurons and the STDBP learning algorithm for DSNNs. We evaluate the proposed method on both multi-layer fully connected SNN and CSNN. Our experiments on MNIST reach an accuracy of with the fully connected SNN and with the CSNN, which is the state-of-art in spike-driven learning algorithms for SNNs.
Many studies have been proposed to train DSNNs, such as conversion methods [21, 22, 23, 24, 25, 26, 27, 28, 29, 30], and surrogate gradients methods [16, 17, 32, 33, 34, 35]. These methods are not compatible with temporal coding and spike-based learning mechanism, and the advantages of SNNs have not been fully exploited, especially spike timing. Due to the non-differentiability of spike function, BP in SNNs using spike timing remains an open question. To perform BP using spike timing, many methods have been proposed [36, 37, 38, 39, 40, 31, 41, 42]. Two common drawback of these methods are exploding gradients and dead neurons, which have been partially addressed using techniques such as constraints on weights and gradient normalization. These techniques affect learning efficiency and limit application of these learning algorithms in large-scale networks. The proposed STDBP learning algorithm with ReL-PSP spiking neuron model can train DSNNs directly without any additional technique, hence allowing the DSNN to scale, as shown in the high accuracy of the SCNN.
In addition to being fast, sparse and more accurate, the proposed ReL-PSP neuron model and STDBP have some other features that might make it more energy-efficient and (neuromorphic) hardware friendly. Firstly, compared to the alpha-shape PSP function, the linear ReL-PSP function is simpler for hardware implementation. Secondly, unlike rate-based encoding methods that require more time to generate enough output spikes for classification, our method takes advantage of temporal coding and uses a single spike, which is more sparse and energy-efficient, given energy is mainly consumed during spike generation and transmission. Thirdly, without additional training techniques, on-chip training in neuromorphic chips would be much easier to realize.
-  Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. Deep learning. nature, 521(7553):436–444, 2015.
-  Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In , pages 770–778, 2016.
-  Ossama Abdel-Hamid, Abdel-rahman Mohamed, Hui Jiang, Li Deng, Gerald Penn, and Dong Yu. Convolutional neural networks for speech recognition. IEEE/ACM Transactions on audio, speech, and language processing, 22(10):1533–1545, 2014.
Max WY Lam, Xie Chen, Shoukang Hu, Jianwei Yu, Xunying Liu, and Helen Meng.
Gaussian process lstm recurrent neural network language models for speech recognition.In ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 7235–7239. IEEE, 2019.
-  Tom Young, Devamanyu Hazarika, Soujanya Poria, and Erik Cambria. Recent trends in deep learning based natural language processing. ieee Computational intelligenCe magazine, 13(3):55–75, 2018.
-  Reza Ghaeini, Sadid A Hasan, Vivek Datla, Joey Liu, Kathy Lee, Ashequl Qadir, Yuan Ling, Aaditya Prakash, Xiaoli Z Fern, and Oladimeji Farri. Dr-bilstm: Dependent reading bidirectional lstm for natural language inference. arXiv preprint arXiv:1802.05577, 2018.
-  Andre Esteva, Brett Kuprel, Roberto A Novoa, Justin Ko, Susan M Swetter, Helen M Blau, and Sebastian Thrun. Dermatologist-level classification of skin cancer with deep neural networks. Nature, 542(7639):115, 2017.
-  J Feldmann, N Youngblood, CD Wright, H Bhaskaran, and WHP Pernice. All-optical spiking neurosynaptic networks with self-learning capabilities. Nature, 569(7755):208, 2019.
-  Sumit Bam Shrestha and Garrick Orchard. Slayer: Spike layer error reassignment in time. In Advances in Neural Information Processing Systems, pages 1412–1421, 2018.
-  Eugene M Izhikevich. Simple model of spiking neurons. IEEE Transactions on neural networks, 14(6):1569–1572, 2003.
-  Wulfram Gerstner and Werner M Kistler. Spiking neuron models: Single neurons, populations, plasticity. Cambridge university press, 2002.
-  Michael Pfeiffer and Thomas Pfeil. Deep learning with spiking neurons: opportunities and challenges. Frontiers in neuroscience, 12, 2018.
-  Lei Deng, Yujie Wu, Xing Hu, Ling Liang, Yufei Ding, Guoqi Li, Guangshe Zhao, Peng Li, and Yuan Xie. Rethinking the performance comparison between snns and anns. Neural Networks, 121:294–307, 2020.
-  Jing Pei, Lei Deng, Sen Song, Mingguo Zhao, Youhui Zhang, Shuang Wu, Guanrui Wang, Zhe Zou, Zhenzhi Wu, Wei He, et al. Towards artificial general intelligence with hybrid tianjic chip architecture. Nature, 572(7767):106, 2019.
-  Amirhossein Tavanaei, Masoud Ghodrati, Saeed Reza Kheradpisheh, Timothee Masquelier, and Anthony Maida. Deep learning in spiking neural networks. Neural Networks, 2018.
Yujie Wu, Lei Deng, Guoqi Li, Jun Zhu, and Luping Shi.
Spatio-temporal backpropagation for training high-performance spiking neural networks.Frontiers in neuroscience, 12, 2018.
-  Yujie Wu, Lei Deng, Guoqi Li, Jun Zhu, Yuan Xie, and Luping Shi. Direct training for spiking neural networks: Faster, larger, better. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33, pages 1311–1318, 2019.
-  Yingyezhe Jin, Wenrui Zhang, and Peng Li. Hybrid macro/micro level backpropagation for training deep spiking neural networks. In Advances in Neural Information Processing Systems, pages 7005–7015, 2018.
-  Jibin Wu, Yansong Chua, Malu Zhang, Guoqi Li, Haizhou Li, and Kay Chen Tan. A hybrid learning rule for efficient and rapid inference with spiking neural networks. arXiv preprint arXiv:1907.01167, 2019.
-  Jibin Wu, Yansong Chua, Malu Zhang, Qu Yang, Guoqi Li, and Haizhou Li. Deep spiking neural network with spike count based learning rule. arXiv preprint arXiv:1902.05705, 2019.
-  Steve K Esser, Rathinakumar Appuswamy, Paul Merolla, John V Arthur, and Dharmendra S Modha. Backpropagation for energy-efficient neuromorphic computing. In Advances in Neural Information Processing Systems, pages 1117–1125, 2015.
-  Eric Hunsberger and Chris Eliasmith. Spiking deep networks with lif neurons. arXiv preprint arXiv:1510.08829, 2015.
-  Steven K Essera, Paul A Merollaa, John V Arthura, Andrew S Cassidya, Rathinakumar Appuswamya, Alexander Andreopoulosa, David J Berga, Jeffrey L McKinstrya, Timothy Melanoa, Davis R Barcha, et al. Convolutional networks for fast energy-efficient neuromorphic computing. Proc. Nat. Acad. Sci. USA, 113(41):11441–11446, 2016.
Peter O’Connor, Daniel Neil, Shih-Chii Liu, Tobi Delbruck, and Michael
Real-time classification and sensor fusion with a spiking deep belief network.Frontiers in neuroscience, 7:178, 2013.
-  Qian Liu, Yunhua Chen, and Steve Furber. Noisy softplus: an activation function that enables snns to be trained as anns. arXiv preprint arXiv:1706.03609, 2017.
-  Peter U Diehl, Daniel Neil, Jonathan Binas, Matthew Cook, Shih-Chii Liu, and Michael Pfeiffer. Fast-classifying, high-accuracy spiking deep networks through weight and threshold balancing. In 2015 International Joint Conference on Neural Networks (IJCNN), pages 1–8. IEEE, 2015.
-  Peter U Diehl, Bruno U Pedroni, Andrew Cassidy, Paul Merolla, Emre Neftci, and Guido Zarrella. Truehappiness: Neuromorphic emotion recognition on truenorth. In 2016 International Joint Conference on Neural Networks (IJCNN), pages 4278–4285. IEEE, 2016.
-  Bodo Rueckauer, Iulia-Alexandra Lungu, Yuhuang Hu, Michael Pfeiffer, and Shih-Chii Liu. Conversion of continuous-valued deep networks to efficient event-driven networks for image classification. Frontiers in neuroscience, 11:682, 2017.
-  Bodo Rueckauer and Shih-Chii Liu. Conversion of analog to spiking neural networks using sparse temporal coding. In 2018 IEEE International Symposium on Circuits and Systems (ISCAS), pages 1–5. IEEE, 2018.
-  Bing Han, Gopalakrishnan Srinivasan, and Kaushik Roy. Rmp-snns: Residual membrane potential neuron for enabling deeper high-accuracy and low-latency spiking neural networks. arXiv preprint arXiv:2003.01811, 2020.
-  Hesham Mostafa. Supervised learning based on temporal coding in spiking neural networks. IEEE transactions on neural networks and learning systems, 29(7):3227–3235, 2017.
-  Emre O Neftci, Hesham Mostafa, and Friedemann Zenke. Surrogate gradient learning in spiking neural networks. arXiv preprint arXiv:1901.09948, 2019.
-  Priyadarshini Panda and Kaushik Roy. Unsupervised regenerative learning of hierarchical features in spiking deep networks for object recognition. In 2016 International Joint Conference on Neural Networks (IJCNN), pages 299–306. IEEE, 2016.
-  Jun Haeng Lee, Tobi Delbruck, and Michael Pfeiffer. Training deep spiking neural networks using backpropagation. Frontiers in neuroscience, 10:508, 2016.
-  Friedemann Zenke and Surya Ganguli. Superspike: Supervised learning in multilayer spiking neural networks. Neural computation, 30(6):1514–1541, 2018.
-  Sander M Bohte, Joost N Kok, and Han La Poutre. Error-backpropagation in temporally encoded networks of spiking neurons. Neurocomputing, 48(1-4):17–37, 2002.
-  Sumit Bam Shrestha and Qing Song. Robust spike-train learning in spike-event based weight update. Neural Networks, 96:33–46, 2017.
-  Sumit Bam Shrestha and Qing Song. Robustness to training disturbances in spikeprop learning. IEEE transactions on neural networks and learning systems, 29(7):3126–3139, 2017.
-  Yan Xu, Xiaoqin Zeng, Lixin Han, and Jing Yang. A supervised multi-spike learning algorithm based on gradient descent for spiking neural networks. Neural Networks, 43:99–113, 2013.
-  Chaofei Hong, Xile Wei, Jiang Wang, Bin Deng, Haitao Yu, and Yanqiu Che. Training spiking neural networks for cognitive tasks: A versatile framework compatible with various temporal codes. IEEE transactions on neural networks and learning systems, 2019.
-  Saeed Reza Kheradpisheh and Timothée Masquelier. S4nn: temporal backpropagation for spiking neural networks with one spike per neuron. arXiv preprint arXiv:1910.09495, 2019.
-  Iulia M Comsa, Krzysztof Potempa, Luca Versari, Thomas Fischbacher, Andrea Gesmundo, and Jyrki Alakuijala. Temporal coding in spiking neural networks with alpha synaptic function. arXiv preprint arXiv:1907.13223, 2019.
-  Qiang Yu, Haizhou Li, and Kay Chen Tan. Spike timing or rate? neurons learn to make decisions for both through threshold-driven plasticity. IEEE transactions on cybernetics, 49(6):2178–2189, 2018.
-  Sumit Bam Shrestha and Qing Song. Adaptive learning rate of spikeprop based on weight convergence analysis. Neural Networks, 63:185–198, 2015.
-  Xavier Glorot, Antoine Bordes, and Yoshua Bengio. Deep sparse rectifier neural networks. In Proceedings of the fourteenth international conference on artificial intelligence and statistics, pages 315–323, 2011.
-  Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998.
-  Joseph M Brader, Walter Senn, and Stefano Fusi. Learning real-world stimuli in a neural network with spike-driven synaptic dynamics. Neural computation, 19(11):2881–2912, 2007.
-  Simon Thorpe, Arnaud Delorme, and Rufin Van Rullen. Spike-based strategies for rapid processing. Neural networks, 14(6-7):715–725, 2001.
-  Amirhossein Tavanaei and Anthony Maida. Bp-stdp: Approximating backpropagation using spike timing dependent plasticity. Neurocomputing, 330:39–47, 2019.