I Introduction
Driven by the availability of largescale labeled training data, highperformance computing resources as well as effective deep neural network architectures, deep learning has made spectacular achievements in computer vision
[1, 2], speech processing [3, 4], language understanding [5] and robotics [6]. Notwithstanding remarkable computational capabilities, these deep neural network models are computationally intensive and memory inefficient, making it challenging to deploy those models onto pervasive mobile and InternetofThings (IoT) devices that with limited computational budgets. Moreover, the evergrowing neural network model complexities, computational demands and concerns about information security motivate novel energy efficient solutions.Human brains, with millions of years of evolution, are incredibly efficient to perform complex perceptual and cognitive tasks [7]. Although hierarchically organized deep neural network models are braininspired, they differ significantly from biological brains in many ways. Fundamentally, the information is represented and communicated through asynchronous action potentials or spikes in the brain. To efficiently and rapidly process the information carried by these spike trains, biological neural systems adopt the eventdriven computation strategy, whereby energies are mostly consumed during spike generation and communication. Neuromorphic computing (NC), as an emerging nonvon Neumann computing paradigm, aims to mimic such asynchronous eventdriven information processing with spiking neural networks (SNNs) in silicon [8]. The novel neuromorphic computing architectures, for instances TrueNorth [9] and Loihi [10], leverage on the lowpower, denselyconnected parallel computing units to support spikebased computation. Furthermore, the colocated memory and computation can effectively mitigate the problem of low bandwidth between the CPU and memory (i.e., von Neumann bottleneck) [11]. When implemented on these neuromorphic architectures, competitive classification accuracies can be achieved with high throughputs and compelling energy efficiency [12]. Therefore, integrating the algorithmic power of deep learning with unprecedented efficiency of neuromorphic computing architectures offer an intriguing solution for intelligent embedded devices, and represent an important milestone towards future braininspired computing machines.
While neuromorphic computing architectures offer attractive energy saving, how to train largescale deep SNNs remains a challenging research problem. Due to the asynchronous and discontinuous nature of synaptic operations within the SNN, the error backpropagation algorithm that widely used for ANN training is not directly applicable to the SNN.
To overcome this, differentiable proxies have been employed to enable the powerful error backpropagation algorithm with discrete spikes, examples include the membrane potential [13, 14, 15, 16], spike timing of the first spike [17] and spike statistics [18]
. Additional research efforts also devoted to training the constrained ANN that approximate the properties of spiking neuron and then map those trained weights to the SNN
[19, 12, 20, 21, 22]. Although competitive accuracies were demonstrated with both approaches on the MNIST, CIFAR10 [23] datasets and their neuromorphic versions [24, 25], how to scale these learning rules up to the size of stateoftheart deep ANNs remains elusive. In addition, the temporal credit assignment that performed with these spikebased learning rules is memory and computationally inefficient when sensory inputs are rate encoded, wherein spike timing carries negligible additional information [26].Another vein of research in deep SNN learning rules involves the conversion of pretrained ANNs to SNNs with the same network architecture [27, 28, 29, 30, 31]. This indirect training approach assumes the graded activation of analog neurons is equivalent to the average firing rate of spiking neurons, and simply requires parsing and normalizing the weights after training the ANNs. Notably, Rueckauer et al. provide a theoretical analysis of the performance deviation of such approach as well as a systematic study of frequently used layers in the CNN [29, 30, 31]. This conversion approach achieves the bestreported results for SNNs on many benchmark datasets including the challenging ImageNet dataset[32]. Nevertheless, the latency and accuracy tradeoff has been identified as the main shortcoming of such an approach[28], requiring additional techniques to improve the latency and power efficiency [33].
The biological plausible Hebbian learning rules [34] and spiketimingdependent plasticity (STDP) [35, 36] represent another class of local learning rules that are particularly interesting for computational neuroscience studies and hardware implementations with emerging nonvolatile memory device[37]. It, however, remains challenging to apply them for largescale machine learning tasks due to the ineffective taskspecific credit assignment. Interested readers are advised to refer to the review articles [38, 39] for a systematic review of the recent progress in deep SNN learning rules and applications.
In this paper, to effectively process the ratecoded sensory inputs, we propose a novel learning rule based on the hybrid neural network with shared weights, wherein a ratebased SNN is used for the forward propagation to determine precise spike counts and spike trains, and an equivalent ANN is used for error backpropagation to approximate the gradients at each coupled SNN layer. The deep SNNs, trained with the proposed learning rule, achieve competitive classification accuracies to the baseline ANNs and other SNN implementations for image classification on the CIFAR10 and IMAGENET2012[32]. Furthermore, comparing to other available SNN learning rules, the proposed learning rule support rapid inference with orders of timesaving and significantly reduced synaptic operations on machine learning tasks.
The rest of this paper is organized as follows: in Section II, we present the proposed learning rule within the hybrid networks. In Section III, we investigate why the proposed learning rule can learn effectively by comparing the high dimensional geometry of activation values and weightactivation dot products between the coupled ANN and SNN network layers. Furthermore, we evaluate the proposed learning rule on the CIFAR10 and IMAGENET2012 datasets by comparing classification accuracies, inference speed and energy efficiency to other SNN implementations. Finally, we conclude the paper in Section IV.
Ii Learning within the Hybrid Network
In this section, we first introduce the coding schemes and neuron models that are employed in this work. We then review and explain the spike count mismatch problem when mapping the constrained ANN weights to the ratebased SNN. Finally, we present a hybrid learning rule to circumvent such a spike count mismatch problem.
Iia Encoding and Decoding Schemes
The SNN deals with spiking events, therefore, additional efforts should be paid to transform conventional framebased images or feature vectors into spike trains as well as decode output spike trains to the associated output classes. There are two coding schemes that are commonly used: rate code and temporal code. Rate code
[28, 29]converts realvalued inputs into spike trains at each sampling time step following a Poisson or Bernoulli distribution. However, it suffers from the sampling error, thereby requires long encoding time window to compensate for such errors. Despite superior coding efficiency and computational advantages over rate code, the temporal code is very complex to decode and sensitive to noise
[40].In this work, as shown in Fig. 1
, we feed the realvalued input directly into the ANN layer; while zero pad the input along the temporal dimension (to match a specific encoding time window) before passing it into the SNN layer. Hence, precise input information is preserved and the first SNN layer can perform the encoding in a learnable fashion. In addition, such an encoding scheme is also beneficial for rapid inference since the information is typically encoded at early time steps. For decoding, it is feasible to decode from the SNN output layer using either the discrete spike count or the continuous aggregate membrane potential. In our experiments, we however notice that decoding using the aggregate membrane potential provides a much smoother learning curve due to high precision error gradients derived at the output layer.
IiB Neuron Model
In this work, we use the integrateandfire (IF) neuron model with reset by subtraction scheme[29] for SNN layers. This simplified spiking neuron model drops the leaky term and refractory period; as a result, it can faithfully retain the number of input spikes it receives (until reset). While the IF neuron does not emulate the rich temporal dynamics of biological neurons, it is however ideal for working with ratecoded sensory input where spike timings don’t play a significant role.
At each time step , the input spikes to neuron at layer are integrated as follows
(1) 
where is the neuron firing threshold and indicates the occurrence of an input spike from afferent neuron at time step . The denotes the synaptic weight that connects afferent neuron from layer . Neuron then integrates the input current into its membrane potential as per Eq. 2. is initialized with a learnable parameter (Eq. 3) that is equivalent to the bias term of the coupled ANN, and an output spike is generated whenever crosses the firing threshold (Eq. 4).
(2) 
(3) 
(4) 
IiC Spike Count Mismatch Problem
In our earlier work[41], we neglect the temporal dynamic of IF neurons and consider them as a simple nonleaky integrator, and established the following correspondence between the aggregated membrane potential and the output spike count.
(7)  
where the output ‘spike count’ will be clipped at a value of zero for negative aggregated membrane potential .
Different from the continuous neuron activation function that used in the ANNs, are only nonnegative integers (enforced by the term in Eq. 7). The surplus membrane potential that insufficient to induce an additional spike is ignored for the next sample, resulting in quantization error as shown in Fig. 2 and Eq. 7. Moreover, the is upper bounded by the maximum time steps , such a constraint can be alleviated using a higher time resolution
. In the backward pass, the discontinuity of the activation function is addressed with the straightthrough estimator
[42]. Taking Eq. 7 as the neuron activation function for ANN and map the trained ANN weights to the SNN, we were able to achieve competitive classification accuracy on the MNIST dataset.However, when applying this spike count based learning rule to a more complex CIFAR10 dataset, we noticed a large accuracy drop (up to 10%) when mapping ANN weights to SNN. After carefully compare the ANN output ‘spike count’ with the actual SNN spike count, we note growing spike count mismatch between ANN and SNN layers as shown in Fig. 4. This mismatch problem happens due to ignoring the temporal dynamic of IF neurons in the constrained ANN. To allow a better understanding of this problem, as shown in Fig. 3, we have prepared a handcrafted example wherein a postsynaptic IF neuron is connected to three presynaptic neurons with one spike each. Although aggregate membrane potential of the postsynaptic neuron is below the firing threshold, in the end, one spike will be generated from this neuron due to its inherent temporal dynamic. While this spike count mismatch problem may be trivial for shallow networks of the size that used for the MNIST dataset[41] or with very high input spike rate, it has a huge impact to deep SNN with sparse input spike trains and short encoding time window as demonstrated on the CIFAR10 dataset Fig. 4.
IiD Credit Assignment in the Hybrid Network
To overcome this spike count mismatch problem originating from the inherent dynamic of IF neuron, we propose a hybrid learning rule. As shown in Fig. 6, an ANN with activation function defined in Eq. 7 is employed to enable error backpropagation in a ratebased network; while SNN, sharing weights with the coupled ANN is employed to determine exact output spike count and spike train. The spike count and spike train derived from the SNN layer will be transmitted to the subsequent ANN and SNN layers, respectively. Despite a coupled ANN is harnessed in the training phase, the inference is executed entirely on the SNN. The idea of decoupled network layers has also been exploited in the binary neural networks [43]
, of which fullprecision activation values are calculated at each layer, whereas binarized activation values are forward propagated to the subsequent layer.
By injecting the dynamic of IF neurons into the training phase, this learning rule effectively prevents the spike count mismatch problem from accumulating across layers. Although mismatch still exists between outputs of the ANN layer and the coupled SNN layer (spike count), our experimental results suggest that the angle between these two outputs are exceedingly small in a high dimensional space and this relationship maintains throughout learning. In addition, weightactivation dot products, a critical intermediate quantity, are approximately preserved disregard the mismatch error. Therefore, the modified learning dynamic in such a decoupled network can approximate the learning dynamic of an intact ANN. The pseudo of the proposed learning rule has been provided in Algorithm. 1.
Iii Experimental Evaluation and Discussion
In this section, we first evaluate the learning capability of the proposed learning rule on two standard image classification benchmarks. We further discuss why effective learning can be performed within a decoupled network configuration. Finally, we present and discuss the attractive properties of rapid inference and reduced total synaptic operations that are achieved with the proposed learning rule.
Iiia Datasets, Networks and Implementation
To evaluate the learning capability, convergence property and energy efficiency of the proposed learning rule, we use two image classification benchmark datasets: CIFAR10 [23] and IMAGENET2012 [32]. The CIFAR10 consists of 60,000 color images of size 3232 from 10 classes, with a standard split of 50,000 and 10,000 for train and test, respectively. The largescale IMAGENET2012 dataset consists of over 1.2 million images from 1,000 object categories. Notably, the success of AlexNet [1] on this dataset represents a key milestone of deep learning research.
As shown in Fig. 5
, we use a customized convolutional neural network (CNN) CIFARNet with 6 learnable layers for CIFAR10 and AlexNet for IMAGENET2012. To reduce the dependency on weight initialization and to accelerate the training process, we add batch normalization
[44]layer after each convolution and fullyconnected layer. Given batch normalization layer only performs an affine transformation, we integrate their parameters into the preceding layer’s weight vector before copy that into the coupled SNN layer. We replace average pooling operations that commonly used in the ANNtoSNN conversion approach with a stride of 2 convolution operations, which perform dimensionality reduction in a learnable fashion
[45]. This design choice eliminates the quantization errors that will happen to IF neurons if the average pooling layer is used.We perform all experiments with Tensorpack toolbox [46]
, which is a highlevel neural network training interface based on the TensorFlow. Tensorpack optimizes the whole training pipeline, providing accelerated and memory efficient training on multiGPU machines. We follow the same data preprocessing procedures (crop, flip and mean normalization, etc.), optimizer, learning rate decay schedule that are adopted in the Tensorpack CIFAR10 and IMAGENET2012 examples and use those configurations consistently for all experiments. As shown in Fig.
6, we implement customized convolution and fullyconnected layers in Tensorpack and integrate the operations of ANN layer and coupled SNN layer under a unified interface.IiiB Counting Synaptic Operations
The computational cost of neuromorphic architectures is typically benchmarked using the total synaptic operations [9, 29, 30, 16]. For SNN, as defined below, the total synaptic operations (SynOps) are correlated with the neurons’ firing rate, fanout (number of outgoing connections to the subsequent layer) and simulation time window .
(8) 
where is the total number of layers and denotes the total number of neurons in layer . indicates whether a spike is generated by neuron of layer at time instant .
In contrast, the total synaptic operations that required to classify one image in the ANN is given as follows
(9) 
with denotes the number of incoming connections to each neuron in layer . In our experiment, we calcuate the average synaptic operations on a randomly chosen minibatch (256 images) from the test set.
Model  Network Architecture  Method  Test Accuracy (%)  Inference Time  
CIFAR10 
Panda and Roy (2016)[47]  Spiking Convolutional Autoencoder 
Layerwise Spikebased Learning  75.42   
Esser et al. (2016)[12]  Spiking CNN (15 layers)  Binary Neural Network  89.32  16  
Rueckauer et al. (2017)[29]  Spiking CNN (8 layers)  Conversion of ANN  90.85    
Wu et al. (2018)[15]  Spiking CNN (CIFARNet)  Error Backpropagation Through Time  90.53    
Wu et al. (2018)[15]  Spiking CNN (AlexNet)  Error Backpropagation Through Time  85.24    
Sengupta et al. (2019)[30]  Spiking CNN (VGG16)  Conversion of ANN  91.46  2,500  
Lee et al. (2019)[16]  Spiking CNN (ResNet11)  Conversion of ANN  90.15  3,000  
Lee et al. (2019)[16]  Spiking CNN (ResNet11)  Spikebased Learning  90.95  100  
This work (Spike Count)  Spiking CNN (6 layers)  Error Backpropagation within Hybrid Network  91.54  16  
This work (Agg. Mem. Potential)  Spiking CNN (6 layers)  Error Backpropagation within Hybrid Network  90.07  16  
IMAGENET 
Hunsberger and Eliasmith, (2016)[48]  Spiking CNN (AlexNet)  Conversion of Constrained ANN  51.80 (76.20)  200 
Rueckauer et al. (2017)[29]  Spiking CNN (VGG16)  Conversion of ANN  49.61 (81.63)  400  
Sengupta et al. (2019)[30]  Spiking CNN (VGG16)  Conversion of ANN  69.96 (89.01)  2,500  
This work  CNN (AlexNet)  Error Backpropagation  57.55 (80.44)    
This work  Spiking CNN (AlexNet)  Error Backpropagation within Hybrid Network  46.63 (70.80)  13  
This work  Spiking CNN (AlexNet)  Error Backpropagation within Hybrid Network  50.22 (73.60)  18 
IiiC Image Classification Results
For CIFAR10, as shown in Table. I, the spikingCIFARNet trained with the proposed learning rule achieve competitive test accuracies of 91.54% (spike count decoding) and 90.07% (aggregate membrane potential decoding), respectively. The spikingCIFARNet, with spike count decoding, achieves by far the bestreported result on CIFAR10 with SNN. As shown in Fig. 7, we however note its learning dynamic is unstable, which may attribute to the noisy error gradients derived at the output layer. Therefore, we use aggregate membrane potential decoding for the rest of the experiments on IMAGENET2012 as well as study the effect of encoding time window on CIFAR10. Although the learning converges slower than the plain CNN and bounded CNN (proposed in our earlier work[41], using the constrained activation function Eq. 7), the error rate of SNN eventually matches to that of the bounded CNN. It suggests that by adding the dynamic of IF neurons into the training phase, the spike count mismatch problem as described in Sec. IIC can be effectively alleviated.
To train a model on IMAGENET2012 with a spikebased learning rule, it requires large computer memories to store intermediate states of spiking neuron as well as huge computational costs. Hence, only a few SNN implementations, without taking into consideration the dynamic of spiking neurons during training, have made some successful attempts on this challenging task, including ANNtoSNN conversion [29, 30] and conversion of constrained ANN [48] approaches. Our approach, however, combines the advantage of both approaches; the dynamic of IF neurons are considered during the forward propagation, while only ratebased ANN is used for error backpropagation. As a result, the proposed approach improves both on the memory requirement and computational cost as compared to the spikebased learning rules.
As shown in Table. I, with an inference time of 18 time steps (input image is encoded within a time window of 10 time steps), the spikingAlexNet trained with the proposed learning rule achieves top1 and top5 accuracies of 50.22% and 73.60%, respectively. This result is comparable to that of the constrained ANN conversion approach with the same AlexNet architecture. Notably, the proposed learning rule only takes 18 inference time steps which are at least an order of magnitude faster than the other reported approaches. While the ANNtoSNN conversion approaches achieve better classification accuracies on IMAGENET2012, their successes are large credit to more advanced models used. Furthermore, we note an accuracy drop of around 7% from the baseline AlexNet implementation (revised from the original AlexNet model [1]), which may attribute to the mismatch error between ANN layer activation values and SNN layer spike counts. As future work, we would like to explore strategies to minimize such mismatch errors and also evaluate more advanced network architectures.
IiiD Activation Direction Preservation and WeightActivation Dot Product Proportionality within the Decoupled Network
Although the learning capability of the proposed learning rule has been demonstrated on the CIFAR10 and IMAGENET2012. It is puzzling why learning can be performed effectively across decoupled network layers. To address this question, we borrow ideas from the recent theoretical work of binary neural network [49], wherein learning is also performed across decoupled network layers (binarized activations are forward propagated to subsequent layers). In the proposed hybrid network, as shown in Fig. 8, the ANN layer activation value at layer is replaced with the aggregate spike count of the coupled SNN layer. Due to the dynamic nature of spike generation, there is no explicit transformation function between and . To circumvent this problem, we analyze the degree of mismatch between these two quantities and its effect on the activation forward propagation and error backpropagation.
In our numerical experiments on CIFAR10 with a randomly draw minibatch of 256 test samples, we calculate the cosine angle between vectorized and for all the convolution layers. As shown in Fig. 9, their cosine angles are below 30 degrees on average and such relationships maintain consistently throughout learning. While these angles seem large in low dimensions, they are exceedingly small in a high dimensional space. According to the hyperdimensional computing theory [50] and the study of binary neural network [49], the cosine angle between any two high dimensional random vectors is approximately orthogonal. It also worth to note that the distortion of replacing with is less severe than binarizing a high dimensional random vector, which changes cosine angle by 37 degrees in theory. Given that the activation function and error gradient that backpropagated from the subsequent ANN layer remains equal, the distortions to the error backpropagation are bounded locally by the mismatch error.
Furthermore, we calculate the Pearson Correlation Coefficient between weightactivation dot products and , which is an important intermediate quantity (input to the batch normalization layer) in our current network configurations. We note that Pearson Correlation Coefficients maintain consistently above 0.9 throughout learning for most of the samples, suggesting the linear relationship of weightactivation dot products are approximately preserved.
IiiE Rapid Inference with Reduced Syanptic Operations
As shown in Fig. 7, the proposed learning rule is able to deal with different encoding window sizes on CIFAR10. At the most challenging case when , we are able to achieve a satisfying error rate that is below 12%. This may credit to the encoding strategy that we have employed, whereby input information is encoded at the first time step before passing into the SNN layer. In addition, the Batch Normalization layer that added after each convolution and fullyconnected layer ensures information transmitting effectively to top layers. The error rate reduces further with expanded time window size, while the improvement vanishes beyond . Hence, the SNN trained with the proposed learning rule can perform inference rapidly with at least an order of timesaving compared with other learning rules as shown in Table. I. While binary neural network also supports a rapid inference, they propagate information in a synchronized fashion and differ fundamentally from asynchronous information processing that studied in other works.
To study the energy efficiency of the proposed learning rule, we calculate the ratio of SNN AC operations to ANN MAC operations on the CIFAR10 and IMAGENET2012 and compare them with other stateoftheart learning rules. Thanks to the short inference time required and sparse synaptic activity. As shown in Table. II, when the spikingAlexNet that trained with the proposed learning rule, achieves a ratio of only 0.40 and 0.68 for CIFAR10 and IMAGENET2012 dataset, respectively. Notably, with a ratio below 1, it indicates that spikingAlexNet is more energy efficient than its ANN counterpart. Notwithstanding the fact that for SNNs, only an accumulate (AC) operation is performed for each synaptic operation. While for ANNs, a more costly multiplyandaccumulate (MAC) operation is performed, resulting in an order of magnitude chip area as well as energy saving per synaptic operation[29, 30]. Furthermore, the proposed learning rule achieves at least 9 and 3 times synaptic operation savings as compared to other learning rules [16, 30] on the CIFAR10 and IMAGENET2012 datasets, respectively.
Model  T  CIFAR10  IMAGENET2012 

VGGNet9 [16]  100  3.61   
ResNet11 [16]  100  5.06   
VGGNet16[30]  500    1.975 
ResNet34[30]  2,000    2.40 
AlexNet (this work)  13  0.27  0.50 
AlexNet (this work)  18  0.40  0.68 
Iv Conclusion
In this work, we introduce a novel learning rule based on the hybrid neural network to effectively train ratebased SNNs for efficient and rapid inference on machine learning tasks. Within the hybrid neural network, a ratebased SNN using IF neurons are employed to determine precise spike counts and spike trains for the activation forward propagation; while an ANN, sharing the weight with the coupled SNN, is used to approximate gradients of the coupled SNN. Given the error backpropagation is performed on the ratebased ANN, the proposed learning rule is memory and computationally more efficient than the error backpropagation through time algorithm that used in many spikebased learning rules [13, 14, 15].
To understand why the learning can be effectively performed with decoupled network layers, we study the learning dynamic of the decoupled network and compare that to an intact ANN. The empirical study on the CIFAR10 reveals that cosine distances between vectorized ANN output and the coupled SNN output spike count are exceedingly small in a high dimensional space and such a relationship maintain throughout the training. Furthermore, a strong positive Pearson Correlation Coefficients are exhibited between weightactivation dot product and , an important intermediate quantity in the activation forward propagation, suggesting a linear relationship of weightactivation dot products are approximately preserved.
The SNNs trained with the proposed learning rule have demonstrated competitive classification accuracies on the CIFAR10 and IMAGENET2012 datasets. By encoding sensory stimuli into early time steps in a learnable fashion and adding batch normalization layers to ensure effective information flow; rapid inferences, with at least an order of magnitude time savings comparing to the stateoftheart ANNtoSNN conversion approach[30], are demonstrated on largescale IMAGENET2012 image classification task. Furthermore, the total synaptic operations are also significantly reduced comparing to the baseline ANNs and other SNN implementations. By integrating the algorithmic power of the proposed learning rule with the unprecedented energy efficiency of emerging neuromorphic computing architectures, we expect to enable lowpower onchip computing on the pervasive mobile and embedded devices. As future work, we will explore strategies to close the accuracy gap between the baseline ANN and SNN implementations as well as evaluate more advanced network architectures.
References
 [1] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in Advances in neural information processing systems, 2012, pp. 1097–1105.

[2]
K. He, X. Zhang, S. Ren, and J. Sun,
“Deep residual learning for image recognition,”
in
Proceedings of the IEEE conference on computer vision and pattern recognition
, 2016, pp. 770–778.  [3] W. Xiong, J. Droppo, X. Huang, F. Seide, M. L. Seltzer, A. Stolcke, D. Yu, and G. Zweig, “Toward human parity in conversational speech recognition,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 25, no. 12, pp. 2410–2423, Dec 2017.
 [4] A. Van Den Oord, S. Dieleman, H. Zen, K. Simonyan, O. Vinyals, A. Graves, N. Kalchbrenner, A. W. Senior, and K. Kavukcuoglu, “Wavenet: A generative model for raw audio.,” SSW, vol. 125, 2016.

[5]
J. Hirschberg and C. D. Manning,
“Advances in natural language processing,”
Science, vol. 349, no. 6245, pp. 261–266, 2015.  [6] D. Silver, J. Schrittwieser, K. Simonyan, I. Antonoglou, A. Huang, A. Guez, T. Hubert, L. Baker, M. Lai, A. Bolton, et al., “Mastering the game of go without human knowledge,” Nature, vol. 550, no. 7676, pp. 354, 2017.
 [7] S. B. Laughlin and T. J. Sejnowski, “Communication in neuronal networks,” Science, vol. 301, no. 5641, pp. 1870–1874, 2003.
 [8] C. D. Schuman, T. E. Potok, R. M. Patton, J. D. Birdwell, M. E. Dean, G. S. Rose, and J. S. Plank, “A survey of neuromorphic computing and neural networks in hardware,” arXiv preprint arXiv:1705.06963, 2017.
 [9] P. A. Merolla, J. V. Arthur, R. AlvarezIcaza, A. S. Cassidy, J. Sawada, F. Akopyan, B. L. Jackson, N. Imam, C. Guo, Y. Nakamura, et al., “A million spikingneuron integrated circuit with a scalable communication network and interface,” Science, vol. 345, no. 6197, pp. 668–673, 2014.
 [10] M. Davies, N. Srinivasa, T. H. Lin, G. Chinya, S. H. Cao, Y.and Choday, G. Dimou, P. Joshi, N. Imam, S. Jain, et al., “Loihi: A neuromorphic manycore processor with onchip learning,” IEEE Micro, vol. 38, no. 1, pp. 82–99, 2018.
 [11] D. Monroe, “Neuromorphic computing gets ready for the (really) big time,” Communications of the ACM, vol. 57, no. 6, pp. 13–15, 2014.
 [12] S. K. Esser, P. A. Merolla, J. V. Arthur, A. S. Cassidy, R. Appuswamy, A. Andreopoulos, D. J. Berg, J. L. McKinstry, T. Melano, D. R. Barch, C. di Nolfo, P. Datta, A. Amir, B. Taba, M. D. Flickner, and D. S. Modha, “Convolutional networks for fast, energyefficient neuromorphic computing,” Proceedings of the National Academy of Sciences, vol. 113, no. 41, pp. 11441–11446, 2016.
 [13] J. H. Lee, T. Delbruck, and M. Pfeiffer, “Training deep spiking neural networks using backpropagation,” Frontiers in Neuroscience, vol. 10, pp. 508, 2016.
 [14] S. B. Shrestha and G. Orchard, “Slayer: Spike layer error reassignment in time,” in Advances in Neural Information Processing Systems 31, S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. CesaBianchi, and R. Garnett, Eds., pp. 1412–1421. Curran Associates, Inc., 2018.
 [15] Y. Wu, L. Deng, G. Li, J. Zhu, and L. Shi, “Direct training for spiking neural networks: Faster, larger, better,” arXiv preprint arXiv:1809.05793, 2018.
 [16] C. Lee, S. S. Sarwar, and K. Roy, “Enabling spikebased backpropagation in stateoftheart deep neural network architectures,” arXiv preprint arXiv:1903.06379, 2019.

[17]
H. Mostafa,
“Supervised learning based on temporal coding in spiking neural networks,”
IEEE Transactions on Neural Networks and Learning Systems, vol. 29, no. 7, pp. 3227–3235, 2018.  [18] E. Stromatias, M. Soto, T. SerranoGotarredona, and B. LinaresBarranco, “An eventdriven classifier for spiking neural networks fed with synthetic or dynamic vision sensor data,” Frontiers in neuroscience, vol. 11, pp. 350, 2017.
 [19] S. K. Esser, R. Appuswamy, P. Merolla, J. V. Arthur, and D. S. Modha, “Backpropagation for energyefficient neuromorphic computing,” in Advances in Neural Information Processing Systems, 2015, pp. 1117–1125.
 [20] P. O’Connor and M. Welling, “Deep spiking networks,” arXiv preprint arXiv:1602.08323, 2016.
 [21] E. Hunsberger and C. Eliasmith, “Spiking deep networks with lif neurons,” arXiv preprint arXiv:1510.08829, 2015.
 [22] D. Zambrano, R. Nusselder, H. S. Scholte, and S. Bohte, “Efficient computation in adaptive artificial spiking neural networks,” arXiv preprint arXiv:1710.04838, 2017.
 [23] A. Krizhevsky and G. E. Hinton, “Learning multiple layers of features from tiny images,” Tech. Rep., Citeseer, 2009.
 [24] G. Orchard, A. Jayawant, G. K. Cohen, and N. Thakor, “Converting static image datasets to spiking neuromorphic datasets using saccades,” Frontiers in neuroscience, vol. 9, pp. 437, 2015.
 [25] H. Li, H. Liu, G. Ji, X.and Li, and L. Shi, “Cifar10dvs: an eventstream dataset for object classification,” Frontiers in neuroscience, vol. 11, pp. 309, 2017.
 [26] L. R. Iyer, Y. Chua, and H. Li, “Is neuromorphic mnist neuromorphic? analyzing the discriminative power of neuromorphic datasets in the time domain,” arXiv preprint arXiv:1807.01013, 2018.
 [27] Y. Cao, Y. Chen, and D. Khosla, “Spiking deep convolutional neural networks for energyefficient object recognition,” International Journal of Computer Vision, vol. 113, no. 1, pp. 54–66, 2015.
 [28] P. U. Diehl, D. Neil, J. Binas, M. Cook, S. C. Liu, and M. Pfeiffer, “Fastclassifying, highaccuracy spiking deep networks through weight and threshold balancing,” in 2015 International Joint Conference on Neural Networks (IJCNN), July 2015, pp. 1–8.
 [29] B. Rueckauer, I. A. Lungu, Y. Hu, M. Pfeiffer, and S. C. Liu, “Conversion of continuousvalued deep networks to efficient eventdriven networks for image classification,” Frontiers in Neuroscience, vol. 11, pp. 682, 2017.
 [30] A. Sengupta, Y. Ye, R. Wang, C. Liu, and K. Roy, “Going deeper in spiking neural networks: Vgg and residual architectures,” Frontiers in neuroscience, vol. 13, 2019.
 [31] Y. Hu, H. Tang, Y. Wang, and G. Pan, “Spiking deep residual network,” arXiv preprint arXiv:1805.01352, 2018.
 [32] J. Deng, W. Dong, R. Socher, L. Li, Kai Li, and Li FeiFei, “Imagenet: A largescale hierarchical image database,” in 2009 IEEE Conference on Computer Vision and Pattern Recognition, June 2009, pp. 248–255.
 [33] D. Neil, M. Pfeiffer, and S. C. Liu, “Learning to be efficient: algorithms for training lowlatency, lowcompute deep spiking neural networks,” in Proceedings of the 31st Annual ACM Symposium on Applied Computing. ACM, 2016, pp. 293–298.
 [34] D. O. Hebb, The organization of behavior: A neuropsychological theory, Psychology Press, 2005.
 [35] H. Markram, J. Lübke, M. Frotscher, and B. Sakmann, “Regulation of synaptic efficacy by coincidence of postsynaptic aps and epsps,” Science, vol. 275, no. 5297, pp. 213–215, 1997.
 [36] G. Q. Bi and M. M. Poo, “Synaptic modifications in cultured hippocampal neurons: dependence on spike timing, synaptic strength, and postsynaptic cell type,” Journal of neuroscience, vol. 18, no. 24, pp. 10464–10472, 1998.
 [37] G. W. Burr, R. M. Shelby, A. Sebastian, S. Kim, S. Kim, S. Sidler, K. Virwani, M.i Ishii, P. Narayanan, A. Fumarola, et al., “Neuromorphic computing using nonvolatile memory,” Advances in Physics: X, vol. 2, no. 1, pp. 89–124, 2017.
 [38] M. Pfeiffer and T. Pfeil, “Deep learning with spiking neurons: Opportunities & challenges,” Frontiers in Neuroscience, vol. 12, pp. 774, 2018.
 [39] A. Tavanaei, M. Ghodrati, S. R. Kheradpisheh, T. Masquelier, and A. Maida, “Deep learning in spiking neural networks,” Neural Networks, 2018.
 [40] J. Wu, Y. Chua, M. Zhang, H. Li, and K. C. Tan, “A spiking neural network framework for robust sound classification,” Frontiers in neuroscience, vol. 12, 2018.
 [41] J. Wu, Y. Chua, M. Zhang, Q. Yang, G. Li, and H. Li, “Deep spiking neural network with spike count based learning rule,” arXiv preprint arXiv:1902.05705, 2019.
 [42] Y. Bengio, N. Léonard, and A. Courville, “Estimating or propagating gradients through stochastic neurons for conditional computation,” arXiv preprint arXiv:1308.3432, 2013.
 [43] M. Courbariaux, I. Hubara, D. Soudry, R. ElYaniv, and Y. Bengio, “Binarized neural networks: Training deep neural networks with weights and activations constrained to+ 1 or1,” arXiv preprint arXiv:1602.02830, 2016.
 [44] S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep network training by reducing internal covariate shift,” arXiv preprint arXiv:1502.03167, 2015.
 [45] X. Zhang, X. Zhou, M. Lin, and J. Sun, “Shufflenet: An extremely efficient convolutional neural network for mobile devices,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 6848–6856.
 [46] Y. Wu et al., “Tensorpack,” https://github.com/tensorpack/, 2016.
 [47] P. Panda and K. Roy, “Unsupervised regenerative learning of hierarchical features in spiking deep networks for object recognition,” in 2016 International Joint Conference on Neural Networks (IJCNN). IEEE, 2016, pp. 299–306.
 [48] E. Hunsberger and C. Eliasmith, “Training spiking deep networks for neuromorphic hardware,” arXiv preprint arXiv:1611.05141, 2016.
 [49] A. G. Anderson and C. P. Berg, “The highdimensional geometry of binary neural networks,” arXiv preprint arXiv:1705.07199, 2017.

[50]
P. Kanerva,
“Hyperdimensional computing: An introduction to computing in distributed representation with highdimensional random vectors,”
Cognitive computation, vol. 1, no. 2, pp. 139–159, 2009.
Comments
There are no comments yet.