I Introduction
Deep learning has made remarkable progress in recent years, with huge impacts on many aspects of our daily lives [20]. While being braininspired, deep learning models differ significantly from the biological brain in many ways. In human brain, the information is represented and communicated through asynchronous action potentials or spikes. To faithfully describe the dynamics of biological neural networks, several spiking neuron models have been proposed with different degree of biological realism[2]. Although, how information is encoded and exchanged within networks of spiking neurons remain largely unknown, the inherent properties of spiking neural networks (e.g., low power eventdriven computation and massive parallelism) have motivated a growing body of research works in the energy efficient neuromorphic hardware as well as compatible spikebased learning rules[3, 4].
Early studies of SNNs were focused mostly on a single layer of neurons, which establish a strong theoretical foundation in the neural coding and synaptic plasticity [5, 2, 6]. Motivated by the recent success in deep learning, research attention in SNNs has been shifted towards networks with multiple hidden layers [7, 3]. However, training deep SNNs remains a challenging task due to the nondifferentiability of spike generation. To overcome this, differentiable proxies have been employed to enable the powerful gradient backpropagation algorithm, examples include the membrane potential[8, 9, 10], spike timing of first spike [11] and spike statistics [12, 13].
While much progress has been made on spikebased learning rules in recent years, we observe that comparatively less attention has been paid to how information is represented in the network (i.e., neural encoding) while developing these learning rules. Specifically, we argue that the temporal credit assignment is unnecessary when sensory inputs are rate encoded[14]
, whereby spike timing carries no additional information. The problem is amplified when using traditional computer vision datasets or their neuromorphic versions to benchmark novel spikebased learning rules, wherein negligible time information exists
[15].Another line of research in deep SNN involves the conversion of pretrained ANNs to SNNs of the same network architecture[16, 17, 18, 19, 20]. This indirect training approach assumes the graded activation of analog neurons is equivalent to the average firing rate of spiking neurons, and simply requires parsing and normalizing the weights after training the ANNs. Notably, Rueckauer et al. provide a theoretical analysis of the performance deviation of such approach as well as a systematic study of frequently used layers in the CNN [18, 19, 20]
. This conversion approach achieves the bestreported results for SNNs on many benchmark datasets including the challenging ImageNet dataset
[21]. Nevertheless, the latency and accuracy tradeoff has been identified as the main shortcoming of such an approach[17], requiring additional techniques to improve the latency and power efficiency [22].In this paper, to effectively process the ratecoded sensory inputs and feature vectors with a deep SNN, we propose a novel spikebased learning rule based on the nonleaky integrateandfire (IF) neuron. The temporal information associated with spikes is ignored in such a neuron model. Moreover, the nondifferentiability of spike generation is circumvented by the use of spike count as a surrogate for gradient backpropagation. In contrast to the indirect conversion approach, the proposed rule uses the spike count information that can be directly obtained from spiking neurons. In addition, the latency, spike rate and other hardware constraints can be incorporated during the training phase, allowing direct deployment and efficient inference on the neuromorphic hardware.
The rest of this paper is organized as follows: in Section II, we present the proposed spikebased learning rule. In Section III, we evaluate the proposed learning rule on the UCI machine learning and MNIST benchmark datasets. Finally, we conclude with a further discussion in Section IV.
Ii Methods
Iia Neuron Model
In this work, we use the integrateandfire (IF) neuron model. This model faithfully retains the number of input spikes it receives (until reset) and its output spike count is independent of the spike timing of its inputs. While the IF neuron does not emulate the rich temporal dynamics of biological neurons, it is however ideal for working with ratecoded sensory input where spike timings don’t play a role.
At each time step , the input spikes to neuron at layer are integrated as follows
(1) 
where is the neuron firing threshold and indicates the occurrence of an input spike from afferent neuron at time step . The denotes the synaptic weight that connects afferent neuron from layer .
Neuron then integrates the input current into its membrane potential as per Eq. 2. is initialized with a learnable parameter (Eq. 3), and an output spike is generated whenever crosses the firing threshold (Eq. 4).
(2) 
(3) 
(4) 
The total number of spikes (i.e., spike count) generated by neuron at the input layer can be determined by summing all incoming spikes over the simulation period as per Eq. 5. For static image inputs, both raw intensity values or aggregate spike counts from a Poisson generator can be used as the input.
(5) 
According to Eq. 1, the aggregated input current of neuron in layer can be expressed as
(6) 
where is the input spike count from presynaptic neuron at layer and is the initial membrane potential of postsynaptic neuron at layer .
Different from the continuous neuron activation function that used in the traditional ANNs, output spike counts are only nonnegative integers (enforced by the term
in Eq. 7). The surplus membrane potential that insufficient to induce an additional spike is ignored for the next sample as shown in Fig. 1 and Eq. 7. Such rounding effort leads to a quantization error, which can be compensated by normalizing the synaptic weights with zero mean in the subsequent layer. Moreover, the output spike counts are upper bounded by the maximum time steps , such constraint can be alleviated using a higher time resolution . In practice, we have not noticed any performance drop due to from our experimental results on the UCI and MNIST datasets.(7)  
where the output spike count will be clipped at a value of zero for negative aggregated input current .
IiB Backpropagation in Ratecoded Deep SNNs
Here, we derive the backpropagation algorithm using the spike count as a surrogate for gradient propagation.
IiB1 Loss Function
In this work, the CrossEntropy loss function that is commonly used for classification tasks is employed as per Eq. 8, which transforms the realvalued outputs to a normalized probability distribution. Other loss functions used in the ANNs may also be applied.
(8)  
where refers to neurons at the output layer.
The partial derivative of CrossEntropy loss with respect to the output spike count can be determined as
(9) 
where is the output spike count and is the desired onehot label for neuron at output layer .
IiB2 Output Layer
Following Eqs. 6, 7 and 9, the partial derivatives of the loss function with respect to the synaptic weight and bias term can be expressed in Eqs. 10 and 11, respectively. As per common practice, we denote the term .
(10)  
(11) 
IiB3 Hidden Layers
Similar to Eqs. 10 and 11, the partial derivatives of the loss function with respect to the synaptic weight and bias term for hidden layer can be expressed in Eqs. 12 and 13 below.
(12)  
(13) 
where
(14)  
Such a direct training approach allows easy integration of hardware constraints into the loss function and optimized jointly during training, including spike rate, inference latency and limited synaptic weight precision etc. Hence, facilitating more convenient deployment and better inference performance on the real neuromorphic hardware.
Dataset  Tr  Ts  Features  Classes  Network Structure 




Iris  90  60  4  3  4203  100/100  100/96.7  
WBC  455  228  9  2  9202  100/100  99.1/98.3  
Abalone  2000  2177  7  3  8502  100/100  45.7/47.8  
Yeast  990  494  8  10  85010  100/100  56.7/31.6 
Iii Experimental Results
In this section, we evaluate the proposed spike count based learning rule on the traditional machine learning and image classification tasks.
Iiia UCI Classification Tasks
To evaluate ratecoded SNN models, we use datasets from the UCI machine learning repository that have been widely used for benchmarking machine learning and neural network models[24]. The following four datasets are used: 1) Iris; 2) Wisconsin Breast Cancer (WBC); 3) Abalone; 4) Yeast. For a fair comparison, the experimental setups follow those from recent work on the rankorder learning for SNN[23]. Table I summarizes the experimental setups and classification results for each dataset: 1) the splitting of training (Tr) and testing (Te) samples; 2) the number of features; 3) the number of output classes; 4) the network structure used for each dataset, and 5) classification accuracies for train and test set.
The input feature vectors are normalized within [0,1], thereafter Poisson spike trains are generated for each feature dimension with firing rates proportional to the normalized feature value. The simulation period of = 20 ms with a simulation time step of 1 ms (i.e., Hz) is used. We initialize the SNN by setting firing threshold and learning rate to 1.0 and
, respectively. The weights for the SNN classifier are drawn randomly from a Gaussian distribution with a mean of 0 and standard deviation of 0.05. Adam optimizer
[25] is used for parameter update. For each network structure, 5 SNNs with random weight initialization are trained and the average classification results are reported.As shown in Table I, the deep SNN trained with the proposed learning rule achieves 100% accuracies consistently for all four benchmark datasets. In contrast, the SNN trained with rankorder learning [23] achieves only competitive results for the easier Iris and WBC datasets, while the test accuracies degrade significantly to less than 50% for the more challenging Abalone and Yeast datasets. Although rankorder learning generally implies low latency and low spike rates, it is worth noting that it only applies to singlelayer networks, whereby the input encoding layer is directly connected to the output layer. Therefore, the representation powers of these SNN models are greatly limited. In contrast, the proposed learning rule overcomes this limitation and can scale well with multiple hidden layers.
IiiB MNIST Classification Task
We further evaluate our proposed learning rule using the standard MNIST dataset of handwritten digits that is widely used for benchmarking multilayer SNN learning rules [7]. The training and testing sets consist of 60,000 and 10,000 grayscale images of 28 28 pixels. Similar to the experimental setup used for UCI datasets, the input spike trains are generated from a Poisson generator, whereby firing rates are proportional to the normalized pixel intensity. The simulation period of = 50 ms with a simulation time step of 1 ms (i.e., ) is used. We initialize the SNN by setting the firing threshold and learning rate to 1.0 and
, respectively. We perform all the experiments using the Pytorch library, whereby the dynamics of the IF neuron as mentioned in Section.
II are explicitly modeled during training and testing. The weights are initialized with default values in Pytorch, and we use the Adam optimizer for parameter update. For each network structure, 5 SNNs with random weight initialization are trained and the average classification results are reported.We perform experiments using two common feedforward neural network architectures: the multilayer perceptron (MLP) and convolutional neural network (CNN). For the MLP, we explore the use of two network structures (describe in terms of the number of neurons in each layer): 78480010 and 78480080010. As shown in Table II, the SNN models trained with the proposed learning rule achieves classification accuracies of 98.64% and 98.66% for one and two hidden layers, respectively. These accuracies are competitive with both spikebased learning rules [12, 8, 26, 11, 27] and ANN conversion approaches[17, 22] as summarized in Table II.
CNNs are currently the default choice for many computer vision tasks, including image classification [28], detection[29] and segmentation[30]. For SNNs, the best reported result for the MNIST dataset also employs a CNN architecture[18]. Here, we apply the proposed learning rule to train a spikingCNN with the CNN architecture of 282812c52a64c52a10. The notation ‘12c5’ denotes 12 convolution kernel of size 5 5 and ‘2a’ denotes average pooling of size 2
2. The outputs from the final average pooling layer are vectorized and fully connected to the output layer. As shown in Table II, the spikingCNN model trained with the proposed rule offers a promising classification accuracy of 99.26%. It also worth mentioning that neither additional data augmentation nor advanced techniques such as batch normalization or dropout are applied in this work; we expect the accuracies to be further improved when these techniques are applied.
We note that many existing spikebased learning rules for deep SNN consider the spike timing as useful information. Despite promising results achieved with these rules on standard benchmark datasets such as MNIST and CIFAR10, we expect longer training time and more memory to compute and store the dynamics of neuron than the proposed learning rule.
Model  Network Architecture  Method  Test Accuracy (%) 

O’Connor (2016) [12]  MLP  Fractional stochastic gradient descent 
97.93 
Lee (2017) [8]  MLP  Backpropagation  98.88 
Neftci (2017) [26]  MLP  Eventdriven random backpropagation  97.98 
Mostafa (2017) [11]  MLP  Backpropagation with temporal coding  98.00 
Wu (2018) [27]  MLP  SpatioTemporal Backpropagation  98.48 
Diehl (2015) [17]  MLP  Conversion of ANNs  98.60 
Neil (2016) [22]  MLP  Conversion of ANNs  98.00 
This work  MLP (78480010)  Backpropagation with ratecoded SNN  98.64 
This work  MLP (78480080010)  Backpropagation with ratecoded SNN  98.66 
Lee (2017) [8]  CNN  Backpropagation  99.31 
Shrestha (2018) [9]  CNN  Backpropagation  99.36 
Diehl (2015) [17]  CNN  Conversion of ANNs  99.10 
Rueckauer (2017) [18]  CNN  Conversion of ANNs  99.44 
Kheradpisheh (2018) [31]  CNN  Layerwise STDP + SVM  98.40 
This work  CNN  Backpropagation with ratecoded SNN  99.26 
Comparison of classification accuracies of deep SNNs trained with the proposed and other supervised learning rules on the MNIST dataset (For more details, refer to the review paper
[7]).The latency and accuracy tradeoff have been identified for the indirect ANN conversion approach, whereby classification accuracy improves over time when more evidence is accumulated [17]. Although techniques [22]
have been proposed to effectively improve the latency and power efficiency, they generally require more training time and hyperparameter tuning. In our approach, however, the latency and other hardware constraints are integrated during the training phase of the proposed learning rule, allowing direct deployment to the neuromorphic hardware for efficient inference without additional work as proposed for the indirect conversion approach
[22]. For instance, to reduce the inference time, we can explicitly constraint the simulation period with = 10 ms for both training and testing. Notably, the MLP model (78480010) is able to achieve a classification accuracy of 98.40%, which is quite close to the accuracy when trained with = 50 ms for the MNIST dataset.Iv Discussion and Conclusion
Motivated by the fact that no useful temporal information is encoded in spike timing for ratecode spiking inputs, we introduce a novel spikebased learning rule to train deep SNNs, whereby the spike count of each neuron is used as the surrogate for gradient backpropagation. Differing from other spikebased learning rules, which consider the spike timing during error backpropagation[8, 9], the proposed learning rule requires much lesser computation and memory. Moreover, the proposed learning rule demonstrates competitive classification accuracies on both UCI machine learning and MNIST datasets.
In contrast to the indirect ANN to SNN conversion approach, the proposed learning rule can integrate the inference latency, spike rate and hardware constraints more effectively during the training. Hence, it allows direct deployment to neuromorphic hardware for efficient inference. Despite promising results are achieved on the MNIST dataset, the quantization error as shown in the surplus membrane potential of spiking neurons may become severe when these errors are accumulated over many layers. In future work, we will investigate how to scale up the learning rule to deeper neural network architectures, such as VGGNet and ResNet, so as to solve more challenging tasks.
Acknowledgments
This research is supported by Programmatic grant no. A1687b0033 from the Singapore Government’s Research, Innovation and Enterprise 2020 plan (Advanced Manufacturing and Engineering domain)
References
 [1] Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, vol. 521, no. 7553, pp. 436, 2015.
 [2] W. Gerstner and W. M. Kistler, Spiking neuron models: Single neurons, populations, plasticity, Cambridge University Press, 2002.
 [3] M. Pfeiffer and T. Pfeil, “Deep learning with spiking neurons: Opportunities & challenges,” Frontiers in Neuroscience, vol. 12, pp. 774, 2018.
 [4] J. Wu, Y. Chua, M. Zhang, H. Li, and K. C. Tan, “A spiking neural network framework for robust sound classification,” Frontiers in Neuroscience, vol. 12, pp. 836, 2018.
 [5] W. Maass, “Networks of spiking neurons: the third generation of neural network models,” Neural networks, vol. 10, no. 9, pp. 1659–1671, 1997.
 [6] P. Dayan and L. F. Abbott, Theoretical neuroscience, vol. 806, Cambridge, MA: MIT Press, 2001.
 [7] A. Tavanaei, M. Ghodrati, S. R. Kheradpisheh, T. Masquelier, and A. S. Maida, “Deep learning in spiking neural networks,” arXiv preprint arXiv:1804.08150, 2018.
 [8] J. H. Lee, T. Delbruck, and M. Pfeiffer, “Training deep spiking neural networks using backpropagation,” Frontiers in Neuroscience, vol. 10, pp. 508, 2016.
 [9] S. B. Shrestha and G. Orchard, “Slayer: Spike layer error reassignment in time,” arXiv preprint arXiv:1810.08646, 2018.
 [10] Y. Wu, L. Deng, G. Li, J. Zhu, and L. Shi, “Direct training for spiking neural networks: Faster, larger, better,” arXiv preprint arXiv:1809.05793, 2018.
 [11] H. Mostafa, “Supervised learning based on temporal coding in spiking neural networks,” IEEE Transactions on Neural Networks and Learning Systems, vol. 29, no. 7, pp. 3227–3235, 2018.
 [12] P. O’Connor and M. Welling, “Deep spiking networks,” arXiv preprint arXiv:1602.08323, 2016.
 [13] E. Stromatias, M. Soto, T. SerranoGotarredona, and B. LinaresBarranco, “An eventdriven classifier for spiking neural networks fed with synthetic or dynamic vision sensor data,” Frontiers in neuroscience, vol. 11, pp. 350, 2017.
 [14] E. N. Brown, R. E. Kass, and P. P. Mitra, “Multiple neural spike train data analysis: stateoftheart and future challenges,” Nature Neuroscience, vol. 7, no. 5, pp. 456, 2004.
 [15] L. R. Iyer, Y. Chua, and H. Li, “Is neuromorphic mnist neuromorphic? analyzing the discriminative power of neuromorphic datasets in the time domain,” arXiv preprint arXiv:1807.01013, 2018.
 [16] Y. Cao, Y. Chen, and D. Khosla, “Spiking deep convolutional neural networks for energyefficient object recognition,” International Journal of Computer Vision, vol. 113, no. 1, pp. 54–66, 2015.
 [17] P. U. Diehl, D. Neil, J. Binas, M. Cook, S. C. Liu, and M. Pfeiffer, “Fastclassifying, highaccuracy spiking deep networks through weight and threshold balancing,” in 2015 International Joint Conference on Neural Networks (IJCNN), July 2015, pp. 1–8.
 [18] B. Rueckauer, I. A. Lungu, Y. Hu, M. Pfeiffer, and S. C. Liu, “Conversion of continuousvalued deep networks to efficient eventdriven networks for image classification,” Frontiers in Neuroscience, vol. 11, pp. 682, 2017.
 [19] A. Sengupta, Y.g Ye, C. Wang, R.and Liu, and K. Roy, “Going deeper in spiking neural networks: Vgg and residual architectures,” arXiv preprint arXiv:1802.02627, 2018.
 [20] Y. Hu, H. Tang, Y. Wang, and G. Pan, “Spiking deep residual network,” arXiv preprint arXiv:1805.01352, 2018.

[21]
J. Deng, W. Dong, R. Socher, L. Li, Kai Li, and Li FeiFei,
“Imagenet: A largescale hierarchical image database,”
in
2009 IEEE Conference on Computer Vision and Pattern Recognition
, June 2009, pp. 248–255.  [22] D. Neil, M. Pfeiffer, and S. C. Liu, “Learning to be efficient: algorithms for training lowlatency, lowcompute deep spiking neural networks,” in Proceedings of the 31st Annual ACM Symposium on Applied Computing. ACM, 2016, pp. 293–298.
 [23] J. Wang, A. Belatreche, L. P. Maguire, and T. M. McGinnity, “Spiketemp: An enhanced rankorderbased learning approach for spiking neural networks with adaptive structure,” IEEE Transactions on Neural Networks and Learning Systems, vol. 28, no. 1, pp. 30–43, 2017.
 [24] A. Asuncion and D. Newman, “Uci machine learning repository,” 2007.
 [25] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014.
 [26] E. O. Neftci, C. Augustine, S. Paul, and G. Detorakis, “Eventdriven random backpropagation: Enabling neuromorphic deep learning machines,” Frontiers in neuroscience, vol. 11, pp. 324, 2017.
 [27] Y. Wu, L. Deng, G. Li, J. Zhu, and L. Shi, “Spatiotemporal backpropagation for training highperformance spiking neural networks,” Frontiers in Neuroscience, vol. 12, pp. 331, 2018.
 [28] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in Advances in Neural Information Processing Systems, 2012, pp. 1097–1105.
 [29] K. Simonyan and A. Zisserman, “Very deep convolutional networks for largescale image recognition,” arXiv preprint arXiv:1409.1556, 2014.
 [30] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look once: Unified, realtime object detection,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 779–788.
 [31] S. R. Kheradpisheh, M. Ganjtabesh, S. J. Thorpe, and T. Masquelier, “Stdpbased spiking deep convolutional neural networks for object recognition,” Neural Networks, vol. 99, pp. 56–67, 2018.