Spiking neural networks (SNNs), as the third generation of neural networks, are getting more and more attention due to their higher biological plausibility, hardware friendliness, lower energy demand, and temporal nature [1, 2, 3, 4]. Although SNNs have not yet reached the performance of the state-of-the-art artificial neural networks (ANNs) with deep architectures, recent efforts on adapting the gradient descent and backpropagation algorithms to SNNs have led to great achivements .
Contrary to artificial neurons with floating-point outputs, spiking neurons communicate via sparse and asynchronous stereotyped spikes which makes them suitable for event-based computations [1, 2]. That is why the neuromorphic implementation of SNNs can be far less energy-hungry than ANN implementations  which makes them appealing for real-time embedded AI systems and edge computing solutions. However, as SNNs become larger they require more storage and computational power. Binarizing the synaptic weights, similar to the binarized artificial neural networks (BANNs) , could be a good solution to reduce the memory and computational requirements of SNNs.
Although the use of binary (+1 and -1) weights in ANNs is not a very recent idea [8, 9, 10] , the early studies could not adapt backpropagation to BANNs. Since binary weights cannot be updated in small amounts, the backpropagation and stochastic gradient descent algorithms cannot be directly applied to BANNs. By proposing BinaryConnect Courbariaux et al. were the first who successfully trained deep BANNs using the backpropagation algorithm. They used real-valued weights which are binarized before being used in the forward pass. During backpropagation, using the Straight-Through Estimator (STE), the gradients of the binary weights are simply passed and applied to the real-valued weights. Soon after, Rastegari et al. proposed XNOR-Net that is very similar to BinaryConnect but it multiplies a per-layer scaling factor (the L1-norm of real-valued weights) to the binary weights to make a better approximation of the real-valued weights. In order to speed up the learning phase of BANNs, Tang et al.
, the early studies could not adapt backpropagation to BANNs. Since binary weights cannot be updated in small amounts, the backpropagation and stochastic gradient descent algorithms cannot be directly applied to BANNs. By proposing BinaryConnect[11, 12]
Courbariaux et al. were the first who successfully trained deep BANNs using the backpropagation algorithm. They used real-valued weights which are binarized before being used in the forward pass. During backpropagation, using the Straight-Through Estimator (STE), the gradients of the binary weights are simply passed and applied to the real-valued weights. Soon after, Rastegari et al.
proposed XNOR-Net that is very similar to BinaryConnect but it multiplies a per-layer scaling factor (the L1-norm of real-valued weights) to the binary weights to make a better approximation of the real-valued weights. In order to speed up the learning phase of BANNs, Tang et al. controlled the rate of oscillations in binary weights between -1 and 1 by optimizing the learning rates. They also proposed to use learned scaling factors instead of the L1-norm of real-valued weights in XNOR-Net. In DoReFa-NET , Zhou et al. proposed a model with variable width-size (down to binary) weights, activations, and even gradients during backpropagation. A more detailed survey on BANNs is provided in .
A few recent studies have tried to convert supervised BANNs into equivalent binary SNNs (BSNNs), however, there is no other study to the best of our knowledge aimed at directly training multi-layer supervised SNNs with binary weights. Esser et al.  trained ANNs with constrained weights and activations and deployed them into SNNs with binary weights on TrueNorth. Later in , they mapped convolutional ANNs with trinary weights and binary activations to SNNs on TrueNorth. Rueckauer et al.  converted BinaryConnect  with binary and full-precision activations into equivalent rate-coded BSNNs. Although their converted BSNN had binary weights, they did not binarize the full-precision parameters of the batch-normalization layers. In
with binary and full-precision activations into equivalent rate-coded BSNNs. Although their converted BSNN had binary weights, they did not binarize the full-precision parameters of the batch-normalization layers. In, Wang et al. convert BinaryConnect networks to rate-coded BSNNs using a weights-thresholds balance conversion method which scales the high-precision batch normalization parameters of BinaryConnect into -1 or 1. In another study, Lu et al.  converted a modified version of XNOR-Net without batch normalization and bias inputs into equivalent rate-coded BSNNs.
In this work, we propose a direct supervised learning algorithm to train multi-layer SNNs with binary synaptic weights. The input layer uses a temporal time-to-first-spike coding to convert the input image into a spike train with one spike per neuron. The non-leaky integrate-and-fire (IF) neurons in the subsequent hidden and output layers integrate incoming spikes through binary (+1 or -1) synapses and emit only one spike right after the first crossing of the threshold. Inspired by BANNs, we also use a set of real-valued proxy weights such that the binary weights are indeed the sign of real-valued weights. Hence, in the backward pass, we update the real-valued weights based on the errors made by the binary weights. Literally, after completing the forward pass with binary weights, the output layer computes the errors by comparing its actual and target firing times, and then, real-valued synaptic weights get updated using the temporal error backpropagation. We evaluated the proposed network on MNIST
In this work, we propose a direct supervised learning algorithm to train multi-layer SNNs with binary synaptic weights. The input layer uses a temporal time-to-first-spike coding[21, 22, 23]
to convert the input image into a spike train with one spike per neuron. The non-leaky integrate-and-fire (IF) neurons in the subsequent hidden and output layers integrate incoming spikes through binary (+1 or -1) synapses and emit only one spike right after the first crossing of the threshold. Inspired by BANNs, we also use a set of real-valued proxy weights such that the binary weights are indeed the sign of real-valued weights. Hence, in the backward pass, we update the real-valued weights based on the errors made by the binary weights. Literally, after completing the forward pass with binary weights, the output layer computes the errors by comparing its actual and target firing times, and then, real-valued synaptic weights get updated using the temporal error backpropagation. We evaluated the proposed network on MNIST and Fashion-MNIST  datasets with 97.0% and 87.3% categorization accuracies, respectively.
SNNs can vary in terms of neuronal model, neural connectivity, information coding, and learning strategy which deeply affect their accuracy, memory, and energy efficiency. The advantages of the proposed BSNN are 1) the use of non-leaky IF neurons whit a very simple neuronal dynamics, 2) having binarized connectivity with low memory and computational cost 3) the use of a sparse temporal coding with at most one spike per neuron, and 4) learning by a direct supervised temporal learning rule which forces the network to make decisions as accurate and early as possible.
The input layer of the proposed binarized single-spike supervised spiking neural network (BS4NN) converts the input image into a spike train based on a time-to-first-spike coding. These spikes are then propagated through the network, where, the binary IF neurons in hidden and output layers are not allowed to fire more than once per image. Each output neuron is dedicated to a different category and the first output neuron to fire determines the decision of the network.
The error of each output neuron is computed by comparing its actual firing time with a target firing time. Then, a modified version of the temporal backpropagation algorithm in S4NN  is used to update the synaptic weights. During the learning phase, we have two sets of weights, the real-valued weights, , and the corresponding binary weights, , where . The forward propagation is done with the binary weights, while, the error backpropagation and weight updates are done by the real-valued weights. Finally, we put the real-valued weights aside and use the binary weights to inference about testing images. Note that some of the following equations are adopted from S4NN  and they are reproduced here for the sake of readers.
2.1 Forward pass
The input layer converts the input image into a volley of spikes using a single-spike temporal coding scheme known as intensity-to-latency conversion. For images with the pixel intensity of range , the firing time of the th input neuron, , corresponding to the th pixel intensity, , is computed as
where, is the maximum firing time. In this way, input neurons with higher pixel intensities have shorter spike latencies. Here, we used discrete time. Therefore, the spike train of the th input neuron is defined as
Subsequent hidden and output layers are comprised of non-leaky IF neurons. The th IF neuron of th layer receives incoming spikes through binary synaptic weights of -1 or +1 and update its membrane potential, , as
where and are, respectively, the input spike train and the binary synaptic weight connecting the th presynaptic neuron to the neuron . Note that is a scaling factor shared between all the neurons of the th layer. The IF neuron fires only once, the first time its membrane potential crosses the threshold ,
For each input image, we first reset all the membrane voltages to zero and then run the simulation for at most time steps. Each output neuron is assigned to a different category and the output neuron that fires earlier than others determines the category of the input image. Hence, in the test phase, we do not need to continue the simulation after the first spike in the output layer. If none of the output neurons fires before , the output neuron with the maximum membrane potential at makes the decision. However, during the learning phase, to compute the temporal error and gradients, we need all the neurons in the network to fire at some point, and hence, we continue the simulation until and if a neuron never fires, we force it to emit a fake spike at time .
2.2 Backward pass
For a categorization task with categories, we define the temporal error as a function of the actual and target firing times,
where and are the actual and the target firing times of the th output neuron, respectively. Let’s define as the minimum firing time in the output layer (i. e., ). For an input image belonging to the th category, we have
where, is a positive constant. This way the correct neuron is encouraged to fire first and others are penalized to not fire earlier than . In a special case that all the output neurons remain silent during the forward pass (emit fake spikes at ), we set and to force the correct neuron to fire.
Let’s define the “squared error” loss function as
Let’s define the “squared error” loss function as
To apply the gradient descent algorithm, we should compute , the gradient of the loss function with respect to the binary weights. However, the gradient descent method makes small changes to the weights, which cannot be done with binary values. To solve the problem, during the learning phase, we use a set of real-valued weights, , as a proxy, such that
and, as the gradient of the function is 0 or undefined, using the straight-through estimator (STE) we approximate , therefore, we have
Now, we can update the real-valued weights as
where is the learning rate parameter.
|Layer size||Initial real-value weights||Initial parameters|
|Model||Coding||Neuron / Synapse / PSP||Learning||Hidden(#)||Acc. (%)|
|Mostafa (2017) ||Temporal||IF / Real-value /Exponential||Temporal Backpropagation||800||97.2|
|Tavanaei et al. (2019) ||Rate||IF / Real-value / Instantaneous||STDP-based Backpropagation||1000||96.6|
|Comsa et al. (2019) ||Temporal||SRM / Real-value / Exponential||Temporal Backpropagation||340||97.9|
|Zhang et al. (2020) ||Temporal||IF / Real-value / Linear||Temporal Backpropagation||400||98.1|
|Zhang et al.(2020) ||Temporal||IF / Real-value / Linear||Temporal Backpropagation||800||98.4|
|Sakemi et al.(2020) ||Temporal||IF / Real-value / Linear||Temporal Backpropagation||500||97.8|
|Sakemi et al.(2020) ||Temporal||IF / Real-value / Linear||Temporal Backpropagation||800||98.0|
|S4NN ||Temporal||IF / Real-value / Instantaneous||Temporal Backpropagation||400||97.4|
|Binary (0 & 1)||Binary Sigmoid/ Binary/ -||Backpropagation with ADAM||600||96.8|
|BS4NN (this paper)||Temporal||IF / Binary / Instantaneous||Temporal Backpropagation||600||97.0|
where, is the firing time of the th neuron of the th layer. Also, according to , we approximate to be if and 0 otherwise. Therefore, we have
where for the output layer (i. e., ) we have
and for the hidden layers (i. e., ), according to the backpropagation algorithm, we compute the weighted sum of the delta values of neurons in the following layer,
where, iterates over neurons in layer . Similar to , we approximate and if and only if . To have smooth gradients, we use the real-valued weights, , instead of the scaled binary weights, .
We also update the scaling factor as
where is the learning rate parameter. Therefore we compute
Note that before updating the weights we normalize the gradients as , to avoid exploding and vanishing gradients. Also, we added an -norm weight regularization term to the loss function to avoid overfitting. The parameter is the regularization parameter accounting for the degree of weigh penalization.
3.1 MNIST dataset
In this section, we evaluate BS4NN on the MNIST dataset which is the most popular benchmark for spiking neural networks . The MNIST dataset contains 60,000 handwritten digits (0 to 9) in images of size pixels as the train set. The test set contains 10,000 digit images, images per digit. Here, we train a fully connected BS4NN with one hidden layer containing 600 IF neurons. The parameter settings are provided in Table 1. Initial synaptic weights including input-hidden ( ) and hidden-output ( ) weights are drawn from uniform distributions in range are discounted by 30% every ten epochs. Other parameters remain intact in both the learning and testing phases.
) weights are drawn from uniform distributions in rangeand , respectively. Trainable parameters including the synaptic weights the scale factors of hidden () and output () layers are tuned through the learning phase. Adaptive parameters including and
are discounted by 30% every ten epochs. Other parameters remain intact in both the learning and testing phases.
Table 2 presents the categorization accuracy of the proposed BS4NN along with some other SNNs with spike-time-based direct supervised learning algorithms and fully-connected architectures. BS4NN is the only network in this table that uses binary weights and it could reach 97.0% accuracy on MNIST. As mentioned in the Methods Section, BS4NN uses a modified version of the temporal backpropagation algorithm in S4NN (Kheradpisheh et al. (2020) ) to have binary weights. Compared to S4NN, the categorization accuracy in BS4NN dropped by 0.4% only. Although BS4NN is outperformed by the other SNNs by at most 1.4%, its advantages are the use of binary weights instead of real-valued full-precision weights and instantaneous post-synaptic potential (PSP) function. As seen, BS4NN could outperform Tavanaei et al. (2019)  that uses real-valued weights and instantaneous PSP. Other SNNs use exponential and linear PSP functions which complicate the neural processing and the learning procedure of the network, which consequently, increase their computational and energy cost.
We also compared BS4NN to a BNN with a similar architecture. To do a fair comparison, inspired from , we implemented a BNN with binary weights (-1 and +1) and binary sigmoid activations (0 and 1). The network has a single hidden layer of size 600 and it is trained using ADAM optimizer and squared hinge loss function for 500 epochs. The learning rate initiates from and exponentially decays, through the learning epochs, down to . According to , the initial real-valued weights of each layer are randomly drawn from a uniform distribution in range , where, is the number of synaptic weights of that layer. As provided in Table 2, the BNN could reach the best accuracy of 96.8% on MNIST, that is a 0.2% drop with respect to BS4NN (we will comment these results in the Discussion).
The firing times of the ten output neurons over all test images are shown in Figure 0(a). Images are ordered by the digit category from ’0’ to ’9’. For each test image, the firing time of each neuron is shown by a color-coded dot. As seen, for each category, its corresponding output neuron tends to fire earlier than others. It is better evident in Figure 0(b) which shows the mean firing time of each output neuron for each digit category. Each output neuron has, by difference, the shortest mean firing time for images of its corresponding digit. Interestingly, BS4NN needs a much longer time to detect digit ’1’ (188 time steps) that could be due to the use of binary weights. Other digits cover more pixels of the image, and therefore, produce more early spikes than digit ’1’. Since the weights are binary, the few early spikes of digit ’1’ can not activate the hidden IF neurons, and hence, BS4NN needs to wait for later surrounding spikes to distinguish digit ’1’ from other digits.
We further counted the mean required number of spikes for BS4NN to categorize images of each digit category. To this end, we counted the number of spikes in all the layers until the emission of the first spike in the output layer (when the network makes its decision). The mean required spikes of the input and hidden layers are depicted in Figure 2. All digit categories but ’1’, on average, require about 100 spikes in the input and 200 spikes in the hidden layers, respectively. Digit ’1’ requires about 300 input spikes, while, similar to other digits, its hidden layer needs about 100 spikes. As explained above, digit ’1’ covers a fewer number of pixels than other digits and also its shape overlaps with the constituent parts of some other digits, hence, due to the use of the binary weights, the network should wait for later input spikes to distinguish digit ’1’ from other digits.
Figure 3 shows the time course of the membrane potentials of the output neurons for a sample ’9’ test image. The membrane potential of the 9th output neuron overtakes others at the 15th time step and quickly increases until it crosses the threshold at the 58th time step. The accumulated input spikes until the 15, 58, 100, 190, and 250 time steps are depicted in this figure. As seen, up to the 15th time step, a few input spikes are propagated and at the 58th time steps with the propagation of a few more input spikes, the 9th output neuron reaches its threshold and determines the category of the input image. Later input, hidden, and output spikes are no more required by the network.
To evaluate the robustness of the trained BS4NN to the input noise, during the test phase, we added random jitter noise drawn from a uniform distribution in range to the pixels of the input images. The noise level, , varies from 5% to 100% of the maximum pixel intensity, . Figure 3(a) shows a sample image contaminated with different levels of jitter noise. The recognition accuracy of the trained model over noisy test images under different levels of noise is plotted in Figure 3(b). As shown, the recognition accuracy remains above 95% and it drops to 79% for the 100% noise level. In higher noise levels, the order of input spikes can dramatically change and because BS4NN has only +1 and -1 synaptic weights even to the insignificant parts of the input images, It affects the behavior of IF neurons which consequently increase the categorization error rate.
|Zhang et al. (2019) ||Recurrent SNN||Leaky IF||Rate||Real-value||Spike-train backpropagation||90.1|
|Ranjan et al. (2019) ||Convolutional SNN||Leaky IF||Rate||Real-value||Spike-rate backpropagation||89.0|
|Wu et al. (2020) ||Convolutional SNN||Leaky IF||Rate||Real-value||Global-local hybrid learning rule||93.3|
|Zhang et al. (2020) ||Fully-connected SNN||Leaky IF||Rate||Real-value||Spike-sequence backpropagation||89.5|
|Zhang et al. (2020) ||Fully-connected SNN||IOW222Input Output Weighted Leaky IF ||Rate||Real-value||Spike-sequence backpropagation||90.2|
|Hao et al.(2020) ||Fully-connected SNN||Leaky IF||Rate||Real-value||Dopamine-modulated STDP||85.3|
|S4NN||Fully-connected SNN||IF||Temporal||Real-value||Temporal backpropagation||88.0|
BS4NN (this paper)
|Fully-connected SNN||IF||Temporal||Binary||Temporal backpropagation||87.3|
In a further experiment, we replaced the binary weights of the trained BS4NN with their corresponding real-valued weights and applied them to the test images. In other words, we replaced the term in Eq. 3 with . The network reached 89.1% accuracy on test images which is far less than the 97.0% accuracy of the binary weights. It shows that, although we update the real-valued proxy weights during the learning phase, we are actually tuning the binary weights, because the loss and gradients are computed based on the binary weights. Figure 5 shows the pairs of the real and binary-valued weights for 16 randomly selected hidden neurons. Dark pixels correspond to negative and bright values correspond to positive weights. It seems that hidden neurons tend to detect different variants of digits and their constituent parts.
To assess the speed-accuracy trade-off in BS4NN, we first trained the network with a threshold of 100 for all the neurons, then we varied the thresholds from 0 to 200 for all the neurons and evaluated the network on test images. As shown in Figure 6, the accuracy peaks around the threshold of 100 and drops as we move to higher or lower threshold values, while, the response time (time to the first spike in the output layer) increases by the threshold. Regarding this trade-off, by reducing the threshold of the pre-trained BS4NN, one can get faster responses but with lower accuracy. For instance, by setting the threshold to 80, the response time shortens from 112.9 to 44.9 (3x faster responses), while, the accuracy drops from 97.0% to 91.0%.
The scaling factors are full-precision floating-point parameters we used in our neuronal layers to have a better approximation of the real-valued weights by the binary weights. We could round the factors in the pre-trained network down to two decimal places without a change in the categorization accuracy.
3.2 Fashion-MNIST dataset
Fashion-MNIST  is a fashion product image dataset with 10 classes (see Figure 7). Images are gathered from the thumbnails of the clothing products on an online shopping website. Fashion-MNIST has the same image size and training/testing splits as MNIST, but it is a more challenging classification task. Here, we used a BS4NN with a single hidden layer with 1000 IF neurons. Details of the parameter values are presented in Table 1. The initial weights of all layers are randomly drawn from a uniform distribution in the range [0,1]. The learning rate parameters and discount by 30% every 10 epochs, and the scaling factors and are trained during the learning phase.
(a) The mean firing times of the output neurons over the Fashion-MNIST categories. (b) The confusion matrix of BS4NN on Fashion-MNIST. (c) The mean required number of spikes per category and layer.
Table 3 summarizes the characteristics and recognition accuracies of recent SNNs on the Fashion-MNIST dataset. BS4NN could reach 87.3% accuracy (0.7% drop with respect to S4NN). Apart from BS4NN, all the models use real-valued synaptic weights, spike-rate-based neural coding, and leaky neurons with exponential decay. The mean firing times of the output neurons of BS4NN for each of the ten categories of Fashion-MNIST are illustrated in Figure 7(a). As seen, the correct output neuron has the minimum firing time for its corresponding category than others. However, compared to MNIST, there is a small difference between the mean firing times of the correct and some other neurons. It could be due to the similarities between instances of different categories. For instance, as shown in Figure 7(b), BS4NN confuses ankle boots, sandals, and sneakers. There is a similar situation for shirts and t-shirts, and also, between pullovers and coats, where, their firing times are close together and consequently BS4NN confuses them by each other sometimes. The total required number of spikes in each layer and the total network is provided in Figure 7(c). Those classes that are mostly confused by each other (i. e., shirts, t-shirts, coats, and pullovers) require more spikes in both input and hidden layers. One reason could be the larger size of these objects in the input image leading to more early input spikes. But, the other reason, especially for the hidden layer, could be the need for more discriminative features between these confusing categories.
We also did a comparison between BS4NN and a BNN with binary weights (-1 and 1), binary activations (0 and1), and the same architecture as BS4NN on Fashion-MNIST. The BNN is trained using ADAM optimizer and squared hinge loss function. The learning rate is initially set to and exponential decays down to . The initial real-valued weights of each layer are randomly drawn from a uniform distribution in range , where, is the number of synaptic weights of that layer. Interestingly, BS4NN outperforms BNN by 0.9% accuracy.
In this paper, we propose a binarized spiking neural network (called BS4NN) with a direct supervised temporal learning algorithm. To this end, we used a very common approach in the area of BANNs . During the learning phase, we have two sets of real and binary-valued weights, such that the binary weights are the sign of the real-valued weights. The binary weights are used for the inference and gradient backpropagation, while, in the backward pass, the weight updates are applied to the real-valued weights. The proposed BS4NN uses the time-to-first-spike coding [38, 39, 40, 41, 42] to convert image pixels into spike trains in which input neurons with higher pixel intensities emit spikes with shorter latencies. The subsequent hidden and output neurons are comprised of non-leaky IF neurons with binary (+1 or -1) weights that fire once when they reach their threshold for the first time. The decision is simply made by the first spike in the output layer. The temporal error is then computed by comparing the actual and target firing times. Gradients backpropagate through the network and are applied to the real-valued weights. Target firing times are computed relative to the actual firing times of the output neurons to push the correct output neuron to fire earlier than others. It forces BS4NN to make quick and accurate decisions with the less possible amount of spikes (high sparsity).
In our experiments, BS4NN could reach 97.0% and 87.3% accuracy on MNIST and Fashion-MNIST datasets, respectively. Although in terms of accuracy, BS4NN could not beat the real-valued SNNs, it has several computational, memory, and energy advantages which makes it suitable for hardware and neuromorphic implementations. Interestingly, BS4NN has also outperformed BNNs with same architectures on MNIST and Fashion-MNIST by 0.2% and 0.9% accuracy, respectively. This improvement with respect to BNN could be due to the use of time in our time-to-first-spike coding and temporal backpropagation in BS4NN. Both networks have binary activations and binary weights, but the advantage of BS4NN is the use of temporal information encoded in spike-times.
Instead of real-valued weights, BS4NN uses binary synapses with only one full-precision scaling factor per layer. It can be very important for memory optimization in hardware implementations where every synaptic weight requires a separate memory space. If one implements the binary synapses with a single bit of memory, then it can reduce the network size by 32x compared to a network with 32-bit floating-point weights [13, 43]. Also, it can ease the implementation of multiplicative synapses by replacing them with one unit increment and decrement operations. Hence, it can be important for reducing the computational and energy-consumption costs [13, 43].
The use of non-leaky IF neurons instead of complicated neuron models such as SRM  and LIF [36, 44] makes BS4NN more computationally efficient and hardware friendly. It might be possible to efficiently implement leakage in analog hardware regarding the physical features of transistors and capacitors , but it is always costly to be implemented in digital hardware. To do so, one might periodically (e.g., every millisecond) decrease the membrane potential of all neurons (clock-driven) , or whenever an input spike is received by a neuron (event-based) [46, 47]. The first one requires energy and the latter one needs more memory to store last firing times.
The implementation of instantaneous synapses used in BS4NN is way simpler than the exponential , alpha , and linear [30, 31, 48] synaptic currents and costs much less energy and computation. In instantaneous synapses, each input spike causes a sudden potential increment or decrement, but in the current-based synapses, each input spike causes the potential to be updated on several consecutive time steps (which requires an extra state parameter).
As mentioned above, BS4NN uses single-spike neural coding throughout the network. The input layer employs a time-to-first-spike coding by which input neurons fires only once (shorter latencies for stronger inputs). Also, neurons in the subsequent layers are allowed to fire at most once and only when they reach their threshold for the first time. In addition, the proposed temporal learning algorithm used to train BS4NN forces it to rely on earlier spikes and respond as quickly as possible. This cocktail is shown to take much less energy and time on neuromorphic devices compared to the rate-coded SNNs [49, 50], even up to 15 times lower energy-consumption and 5 times faster decisions .
Recently, efforts are made to convert pre-trained BANNs into equivalent BSNNs with spike-rate-based neural coding [18, 19, 20] . However, these networks do not use the temporal advantages of SNNs that can be obtained through a direct learning algorithm. Due to the non-differentiability of the thresholding activation function in spiking neurons, it is not convenient to apply backpropagation and gradient descents to SNNs. Various solutions are proposed to tackle this problem including computing gradients with respect to the spike rates instead of single spikes , and transfer learning by sharing weights between the SNN and an ANN
. However, these networks do not use the temporal advantages of SNNs that can be obtained through a direct learning algorithm. Due to the non-differentiability of the thresholding activation function in spiking neurons, it is not convenient to apply backpropagation and gradient descents to SNNs. Various solutions are proposed to tackle this problem including computing gradients with respect to the spike rates instead of single spikes[52, 53, 54, 55], using differentiable smoothed spike functions , surrogate gradients for the threshold function in the backward pass [57, 58, 59, 60, 61, 62]
, and transfer learning by sharing weights between the SNN and an ANN[49, 63]. In another approach, known as latency learning, the neuron’s activity is defined based on the firing time of its first spike, therefore, they do not need to compute the gradient of the thresholding function. In return, they need to define the firing time as a function of the membrane potential [26, 29, 30, 31, 36, 64, 65], or directly as a function the firing times of presynaptic neurons . By the way, all aforementioned learning strategies work with full-precision real-valued weights and future studies can assess their capabilities to be used in BSNNs.
A. Tavanaei, M. Ghodrati, S. R. Kheradpisheh, T. Masquelier and A. Maida, Deep learning in spiking neural networks,Neural Networks 111 (2019) 47–63.
-  M. Pfeiffer and T. Pfeil, Deep learning with spiking neurons: opportunities and challenges, Frontiers in Neuroscience 12 (2018) p. 774.
-  A. Taherkhani, A. Belatreche, Y. Li, G. Cosma, L. P. Maguire and T. M. McGinnity, A review of learning in biologically plausible spiking neural networks, Neural Networks 122 (2020) 253–272.
-  B. Illing, W. Gerstner and J. Brea, Biologically plausible deep learning–but how far can we go with shallow networks?, Neural Networks (2019).
-  X. Wang, X. Lin and X. Dang, Supervised learning in spiking neural networks: A review of algorithms and evaluations, Neural Networks (2020).
-  K. Roy, A. Jaiswal and P. Panda, Towards spike-based machine intelligence with neuromorphic computing, Nature 575 (nov 2019) 607–617.
-  T. Simons and D.-J. Lee, A review of binarized neural networks, Electronics 8(6) (2019) p. 661.
-  D. Saad and E. Marom, Training feed forward nets with binary weights via a modified chir algorithm, Complex Systems 4(5) (1990).
-  S. S. Venkatesh, Directed drift: A new linear threshold algorithm for learning binary weights on-line, Journal of Computer and System Sciences 46(2) (1993) 198–217.
-  C. Baldassi, A. Braunstein, N. Brunel and R. Zecchina, Efficient supervised learning in networks with binary synapses, Proceedings of the National Academy of Sciences 104(26) (2007) 11079–11084.
-  M. Courbariaux, Y. Bengio and J.-P. David, Binaryconnect: Training deep neural networks with binary weights during propagations, Advances in neural information processing systems, 2015, pp. 3123–3131.
-  M. Courbariaux, I. Hubara, D. Soudry, R. El-Yaniv and Y. Bengio, Binarized neural networks: Training deep neural networks with weights and activations constrained to +1 or -1, arXiv preprint arXiv:1602.02830 (2016).
European conference on computer vision, Springer2016, pp. 525–542.
W. Tang, G. Hua and L. Wang, How to train a compact binary neural network with
Thirty-First AAAI conference on artificial intelligence, 2017.
-  S. Zhou, Y. Wu, Z. Ni, X. Zhou, H. Wen and Y. Zou, Dorefa-net: Training low bitwidth convolutional neural networks with low bitwidth gradients, arXiv preprint arXiv:1606.06160 (2016).
-  S. K. Esser, R. Appuswamy, P. Merolla, J. V. Arthur and D. S. Modha, Backpropagation for energy-efficient neuromorphic computing, Advances in Neural Information Processing Systems 28, eds. C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama and R. Garnett (Curran Associates, Inc., 2015), pp. 1117–1125.
-  S. K. Esser, P. A. Merolla, J. V. Arthur, A. S. Cassidy, R. Appuswamy, A. Andreopoulos, D. J. Berg, J. L. McKinstry, T. Melano, D. R. Barch, C. di Nolfo, P. Datta, A. Amir, B. Taba, M. D. Flickner and D. S. Modha, Convolutional networks for fast, energy-efficient neuromorphic computing, Proceedings of the National Academy of Sciences 113(41) (2016) 11441–11446.
-  B. Rueckauer, I.-A. Lungu, Y. Hu, M. Pfeiffer and S.-C. Liu, Conversion of continuous-valued deep networks to efficient event-driven networks for image classification, Frontiers in Neuroscience 11 (2017) p. 682.
-  Y. Wang, Y. Xu, R. Yan and H. Tang, Deep spiking neural networks with binary weights for object recognition, IEEE Transactions on Cognitive and Developmental Systems (2020).
-  S. Lu and A. Sengupta, Exploring the connection between binary and spiking neural networks, arXiv preprint arXiv:2002.10064 (2020).
-  S. R. Kheradpisheh, M. Ganjtabesh, S. J. Thorpe and T. Masquelier, Stdp-based spiking deep convolutional neural networks for object recognition, Neural Networks 99 (2018) 56–67.
-  M. Mozafari, S. R. Kheradpisheh, T. Masquelier, A. Nowzari-Dalini and M. Ganjtabesh, First-spike-based visual categorization using reward-modulated stdp, IEEE Transactions on Neural Networks and Learning Systems 29(12) (2018) 6178–6190.
S. R. Kheradpisheh, M. Ganjtabesh and T. Masquelier, Bio-inspired unsupervised learning of visual features leads to robust invariant object recognition,Neurocomputing 205 (sep 2016) 382–392.
-  Y. LeCun, L. Bottou, Y. Bengio, P. Haffner et al., Gradient-based learning applied to document recognition, Proceedings of the IEEE 86(11) (1998) 2278–2324.
-  H. Xiao, K. Rasul and R. Vollgraf, Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms, arXiv preprint arXiv:1708.07747 (2017).
-  S. R. Kheradpisheh and T. Masquelier, Temporal backpropagation for spiking neural networks with one spike per neuron, International Journal of Neural Systems 30(06) (2020) p. 2050027, PMID: 32466691.
-  A. Tavanaei and A. Maida, Bp-stdp: Approximating backpropagation using spike timing dependent plasticity, Neurocomputing 330 (2019) 39–47.
-  H. Mostafa, Supervised learning based on temporal coding in spiking neural networks, IEEE Transactions on Neural Networks and Learning Systems 29(7) (2017) 3227–3235.
-  I. M. Comsa, K. Potempa, L. Versari, T. Fischbacher, A. Gesmundo and J. Alakuijala, Temporal coding in spiking neural networks with alpha synaptic function, arXiv (2019) p. 1907.13223.
-  M. Zhang, J. Wang, Z. Zhang, A. Belatreche, J. Wu, Y. Chua, H. Qu and H. Li, Spike-timing-dependent back propagation in deep spiking neural networks, arXiv preprint arXiv:2003.11837 (2020).
-  Y. Sakemi, K. Morino, T. Morie and K. Aihara, A supervised learning algorithm for multilayer spiking neural networks based on temporal coding toward energy-efficient vlsi processor design, arXiv preprint arXiv:2001.05348 (2020).
-  X. Glorot and Y. Bengio, Understanding the difficulty of training deep feedforward neural networks, Proceedings of the thirteenth international conference on artificial intelligence and statistics, 2010, pp. 249–256.
-  W. Zhang and P. Li, Spike-train level backpropagation for training deep recurrent spiking neural networks, Advances in Neural Information Processing Systems, 2019, pp. 7802–7813.
J. A. K. Ranjan, T. Sigamani and J. Barnabas, A novel and efficient classifier using spiking neural network,The Journal of Supercomputing (2019) 1–16.
-  Y. Wu, R. Zhao, J. Zhu, F. Chen, M. Xu, G. Li, S. Song, L. Deng, G. Wang, H. Zheng et al., Brain-inspired global-local hybrid learning towards human-like intelligence, arXiv preprint arXiv:2006.03226 (2020).
-  W. Zhang and P. Li, Temporal spike sequence learning via backpropagation for deep spiking neural networks, arXiv preprint arXiv:2002.10085 (2020).
-  Y. Hao, X. Huang, M. Dong and B. Xu, A biologically plausible supervised learning method for spiking neural networks using the symmetric stdp rule, Neural Networks 121 (2020) 387–395.
-  M. Mozafari, M. Ganjtabesh, A. Nowzari-Dalini, S. J. Thorpe and T. Masquelier, Bio-inspired digit recognition using reward-modulated spike-timing-dependent plasticity in deep convolutional networks, Pattern Recognition 94 (2019) 87–95.
-  M. Mozafari, M. Ganjtabesh, A. Nowzari-Dalini and T. Masquelier, SpykeTorch: Efficient Simulation of Convolutional Spiking Neural Networks With at Most One Spike per Neuron, Frontiers in Neuroscience 13 (jul 2019) 1–12.
R. Vaila, J. Chiasson and V. Saxena, Feature Extraction using Spiking Convolutional Neural Networks,Proceedings of the International Conference on Neuromorphic Systems - ICONS ’19, (ACM Press, New York, New York, USA, 2019), pp. 1–8.
-  R. Vaila, J. Chiasson and V. Saxena, Deep convolutional spiking neural networks for image classification, arXiv preprint arXiv:1903.12272 (2019).
-  P. Kirkland, G. Di Caterina, J. Soraghan and G. Matich, Spikeseg: Spiking segmentation via stdp saliency mapping, International Joint Conference on Nerual Networks, 2020.
-  B. McDanel, S. Teerapittayanon and H. Kung, Embedded binarized neural networks, arXiv preprint arXiv:1709.02260 (2017).
-  T. Masquelier and S. R. Kheradpisheh, Optimal localist and distributed coding of spatiotemporal spike patterns through stdp and coincidence detection, Frontiers in computational neuroscience 12 (2018) p. 74.
-  A. Yousefzadeh, T. Masquelier, T. Serrano-Gotarredona and B. Linares-Barranco, Hardware implementation of convolutional STDP for on-line visual feature learning, 2017 IEEE International Symposium on Circuits and Systems (ISCAS) (may 2017) 1–4.
-  G. Orchard, C. Meyer, R. Etienne-Cummings, C. Posch, N. Thakor and R. Benosman, HFirst: A Temporal Approach to Object Recognition, IEEE Transactions on Pattern Analysis and Machine Intelligence (2015).
-  A. Yousefzadeh, T. Serrano-Gotarredona and B. Linares-Barranco, Fast Pipeline 128x128 pixel spiking convolution core for event-driven vision processing in FPGAs 2015 International Conference on Event-based Control, Communication, and Signal Processing (EBCCSP) , (IEEE, jun 2015), pp. 1–8.
-  B. Rueckauer and S.-C. Liu, Conversion of analog to spiking neural networks using sparse temporal coding, 2018 IEEE International Symposium on Circuits and Systems (ISCAS), IEEE2018, pp. 1–5.
-  S. P, K. T. N. Chu, Y. Tavva, J. Wu, M. Zhang, H. Li, T. E. Carlson et al., You only spike once: Improving energy-efficient neuromorphic inference to ann-level accuracy, arXiv preprint arXiv:2006.09982 (2020).
-  J. Göltz, A. Baumbach, S. Billaudelle, A. Kungl, O. Breitwieser, K. Meier, J. Schemmel, L. Kriener and M. Petrovici, Fast and deep neuromorphic learning with first-spike coding, Proceedings of the Neuro-inspired Computational Elements Workshop, 2020, pp. 1–3.
-  S. Oh, D. Kwon, G. Yeom, W.-M. Kang, S. Lee, S. Y. Woo, J. S. Kim, M. K. Park and J.-H. Lee, Hardware implementation of spiking neural networks using time-to-first-spike encoding, arXiv preprint arXiv:2006.05033 (2020).
-  E. Hunsberger and C. Eliasmith, Spiking deep networks with lif neurons, arXiv (2015) p. 1510.08829.
-  J. H. Lee, T. Delbruck and M. Pfeiffer, Training deep spiking neural networks using backpropagation, Frontiers in Neuroscience 10 (2016) p. 508.
-  E. O. Neftci, C. Augustine, S. Paul and G. Detorakis, Event-driven random back-propagation: Enabling neuromorphic deep learning machines, Frontiers in Neuroscience 11 (2017) p. 324.
-  F. Zenke and S. Ganguli, Superspike: Supervised learning in multilayer spiking neural networks, Neural Computation 30(6) (2018) 1514–1541.
-  D. Huh and T. J. Sejnowski, Gradient descent for spiking neural networks, Advances in Neural Information Processing Systems, 2018, pp. 1433–1443.
-  E. O. Neftci, H. Mostafa and F. Zenke, Surrogate gradient learning in spiking neural networks, arXiv (2019) p. 1901.09948.
-  S. M. Bohte, Error-backpropagation in networks of fractionally predictive spiking neurons, International Conference on Artificial Neural Networks, Springer2011, pp. 60–68.
-  S. K. Esser, P. A. Merolla, J. V. Arthur, A. S. Cassidy, R. Appuswama, A. Andreopoulos, D. J. Berg, J. L. McKinstry, T. Melano, D. R. Barch, C. d. Nolfo, P. Datta, A. Amir, B. Taba, M. D. Flickner and D. S. Modha, Convolutional networks for fast energy-efficient neuromorphic computing, Proceedings of the National Academy of Sciences of USA 113(41) (2016) 11441–11446.
-  S. B. Shrestha and G. Orchard, Slayer: Spike layer error reassignment in time, Advances in Neural Information Processing Systems, 2018, pp. 1412–1421.
G. Bellec, D. Salaj, A. Subramoney, R. Legenstein and W. Maass, Long short-term memory and learning-to-learn in networks of spiking neurons,Advances in Neural Information Processing Systems, 2018, pp. 787–797.
-  R. Zimmer, T. Pellegrini, S. F. Singh and T. Masquelier, Technical report: supervised training of convolutional spiking neural networks with pytorch, arXiv preprint arXiv:1911.10124 (2019).
-  J. Wu, Y. Chua, M. Zhang, G. Li, H. Li and K. C. Tan, A tandem learning rule for efficient and rapid inference on deep spiking neural networks, arXiv (2019) arXiv–1907.
-  S. M. Bohte, H. La Poutré and J. N. Kok, Error-Backpropagation in Temporally Encoded Networks of Spiking Neurons, Neurocomputing 48 (2000) 17–37.
-  S. Zhou, Y. Chen, Q. Ye and J. Li, Direct training based spiking convolutional neural networks for object recognition, arXiv preprint arXiv:1909.10837 (2019).