BS4NN: Binarized Spiking Neural Networks with Temporal Coding and Learning

We recently proposed the S4NN algorithm, essentially an adaptation of backpropagation to multilayer spiking neural networks that use simple non-leaky integrate-and-fire neurons and a form of temporal coding known as time-to-first-spike coding. With this coding scheme, neurons fire at most once per stimulus, but the firing order carries information. Here, we introduce BS4NN, a modification of S4NN in which the synaptic weights are constrained to be binary (+1 or -1), in order to decrease memory and computation footprints. This was done using two sets of weights: firstly, real-valued weights, updated by gradient descent, and used in the backward pass of backpropagation, and secondly, their signs, used in the forward pass. Similar strategies have been used to train (non-spiking) binarized neural networks. The main difference is that BS4NN operates in the time domain: spikes are propagated sequentially, and different neurons may reach their threshold at different times, which increases computational power. We validated BS4NN on two popular benchmarks, MNIST and Fashion MNIST, and obtained state-of-the-art accuracies for this sort of networks (97.0 respect to real-valued weights (0.4 demonstrated that BS4NN outperforms a simple BNN with the same architectures on those two datasets (by 0.2 leverages the temporal dimension.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 6

page 7

page 9

08/31/2021

Spike time displacement based error backpropagation in convolutional spiking neural networks

We recently proposed the STiDi-BP algorithm, which avoids backward recur...
10/06/2021

Spike-inspired Rank Coding for Fast and Accurate Recurrent Neural Networks

Biological spiking neural networks (SNNs) can temporally encode informat...
10/07/2013

Mean Field Bayes Backpropagation: scalable training of multilayer neural networks with binary weights

Significant success has been reported recently using deep neural network...
11/22/2019

Technical report: supervised training of convolutional spiking neural networks with PyTorch

Recently, it has been shown that spiking neural networks (SNNs) can be t...
03/25/2018

Neural Nets via Forward State Transformation and Backward Loss Transformation

This article studies (multilayer perceptron) neural networks with an emp...
09/27/2021

Spiking neural networks trained via proxy

We propose a new learning algorithm to train spiking neural networks (SN...
10/13/2021

A Time Encoding approach to training Spiking Neural Networks

While Spiking Neural Networks (SNNs) have been gaining in popularity, it...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Spiking neural networks (SNNs), as the third generation of neural networks, are getting more and more attention due to their higher biological plausibility, hardware friendliness, lower energy demand, and temporal nature [1, 2, 3, 4]. Although SNNs have not yet reached the performance of the state-of-the-art artificial neural networks (ANNs) with deep architectures, recent efforts on adapting the gradient descent and backpropagation algorithms to SNNs have led to great achivements [5].

Contrary to artificial neurons with floating-point outputs, spiking neurons communicate via sparse and asynchronous stereotyped spikes which makes them suitable for event-based computations [1, 2]. That is why the neuromorphic implementation of SNNs can be far less energy-hungry than ANN implementations [6] which makes them appealing for real-time embedded AI systems and edge computing solutions. However, as SNNs become larger they require more storage and computational power. Binarizing the synaptic weights, similar to the binarized artificial neural networks (BANNs) [7], could be a good solution to reduce the memory and computational requirements of SNNs.

Although the use of binary (+1 and -1) weights in ANNs is not a very recent idea [8, 9, 10]

, the early studies could not adapt backpropagation to BANNs. Since binary weights cannot be updated in small amounts, the backpropagation and stochastic gradient descent algorithms cannot be directly applied to BANNs. By proposing BinaryConnect 

[11, 12]

Courbariaux et al. were the first who successfully trained deep BANNs using the backpropagation algorithm. They used real-valued weights which are binarized before being used in the forward pass. During backpropagation, using the Straight-Through Estimator (STE), the gradients of the binary weights are simply passed and applied to the real-valued weights. Soon after, Rastegari et al. 

[13]

proposed XNOR-Net that is very similar to BinaryConnect but it multiplies a per-layer scaling factor (the L1-norm of real-valued weights) to the binary weights to make a better approximation of the real-valued weights. In order to speed up the learning phase of BANNs, Tang et al. 

[14] controlled the rate of oscillations in binary weights between -1 and 1 by optimizing the learning rates. They also proposed to use learned scaling factors instead of the L1-norm of real-valued weights in XNOR-Net. In DoReFa-NET [15], Zhou et al. proposed a model with variable width-size (down to binary) weights, activations, and even gradients during backpropagation. A more detailed survey on BANNs is provided in [7].

A few recent studies have tried to convert supervised BANNs into equivalent binary SNNs (BSNNs), however, there is no other study to the best of our knowledge aimed at directly training multi-layer supervised SNNs with binary weights. Esser et al. [16] trained ANNs with constrained weights and activations and deployed them into SNNs with binary weights on TrueNorth. Later in [17], they mapped convolutional ANNs with trinary weights and binary activations to SNNs on TrueNorth. Rueckauer et al. [18] converted BinaryConnect [11]

with binary and full-precision activations into equivalent rate-coded BSNNs. Although their converted BSNN had binary weights, they did not binarize the full-precision parameters of the batch-normalization layers. In 

[19], Wang et al. convert BinaryConnect networks to rate-coded BSNNs using a weights-thresholds balance conversion method which scales the high-precision batch normalization parameters of BinaryConnect into -1 or 1. In another study, Lu et al. [20] converted a modified version of XNOR-Net without batch normalization and bias inputs into equivalent rate-coded BSNNs.

In this work, we propose a direct supervised learning algorithm to train multi-layer SNNs with binary synaptic weights. The input layer uses a temporal time-to-first-spike coding 

[21, 22, 23]

to convert the input image into a spike train with one spike per neuron. The non-leaky integrate-and-fire (IF) neurons in the subsequent hidden and output layers integrate incoming spikes through binary (+1 or -1) synapses and emit only one spike right after the first crossing of the threshold. Inspired by BANNs, we also use a set of real-valued proxy weights such that the binary weights are indeed the sign of real-valued weights. Hence, in the backward pass, we update the real-valued weights based on the errors made by the binary weights. Literally, after completing the forward pass with binary weights, the output layer computes the errors by comparing its actual and target firing times, and then, real-valued synaptic weights get updated using the temporal error backpropagation. We evaluated the proposed network on MNIST 

[24] and Fashion-MNIST [25] datasets with 97.0% and 87.3% categorization accuracies, respectively.

SNNs can vary in terms of neuronal model, neural connectivity, information coding, and learning strategy which deeply affect their accuracy, memory, and energy efficiency. The advantages of the proposed BSNN are 1) the use of non-leaky IF neurons whit a very simple neuronal dynamics, 2) having binarized connectivity with low memory and computational cost 3) the use of a sparse temporal coding with at most one spike per neuron, and 4) learning by a direct supervised temporal learning rule which forces the network to make decisions as accurate and early as possible.

2 Methods

The input layer of the proposed binarized single-spike supervised spiking neural network (BS4NN) converts the input image into a spike train based on a time-to-first-spike coding. These spikes are then propagated through the network, where, the binary IF neurons in hidden and output layers are not allowed to fire more than once per image. Each output neuron is dedicated to a different category and the first output neuron to fire determines the decision of the network.

The error of each output neuron is computed by comparing its actual firing time with a target firing time. Then, a modified version of the temporal backpropagation algorithm in S4NN [26] is used to update the synaptic weights. During the learning phase, we have two sets of weights, the real-valued weights, , and the corresponding binary weights, , where . The forward propagation is done with the binary weights, while, the error backpropagation and weight updates are done by the real-valued weights. Finally, we put the real-valued weights aside and use the binary weights to inference about testing images. Note that some of the following equations are adopted from S4NN [26] and they are reproduced here for the sake of readers.

2.1 Forward pass

The input layer converts the input image into a volley of spikes using a single-spike temporal coding scheme known as intensity-to-latency conversion. For images with the pixel intensity of range , the firing time of the th input neuron, , corresponding to the th pixel intensity, , is computed as

(1)

where, is the maximum firing time. In this way, input neurons with higher pixel intensities have shorter spike latencies. Here, we used discrete time. Therefore, the spike train of the th input neuron is defined as

(2)

Subsequent hidden and output layers are comprised of non-leaky IF neurons. The th IF neuron of th layer receives incoming spikes through binary synaptic weights of -1 or +1 and update its membrane potential, , as

(3)

where and are, respectively, the input spike train and the binary synaptic weight connecting the th presynaptic neuron to the neuron . Note that is a scaling factor shared between all the neurons of the th layer. The IF neuron fires only once, the first time its membrane potential crosses the threshold ,

(4)

where checks if the neuron has not fired at any previous time step. Equivalently, one can move the scaling factor from Eq. 3 to Eq. 4 by replacing with .

For each input image, we first reset all the membrane voltages to zero and then run the simulation for at most time steps. Each output neuron is assigned to a different category and the output neuron that fires earlier than others determines the category of the input image. Hence, in the test phase, we do not need to continue the simulation after the first spike in the output layer. If none of the output neurons fires before , the output neuron with the maximum membrane potential at makes the decision. However, during the learning phase, to compute the temporal error and gradients, we need all the neurons in the network to fire at some point, and hence, we continue the simulation until and if a neuron never fires, we force it to emit a fake spike at time .

2.2 Backward pass

For a categorization task with categories, we define the temporal error as a function of the actual and target firing times,

(5)

where and are the actual and the target firing times of the th output neuron, respectively. Let’s define as the minimum firing time in the output layer (i. e., ). For an input image belonging to the th category, we have

(6)

where, is a positive constant. This way the correct neuron is encouraged to fire first and others are penalized to not fire earlier than . In a special case that all the output neurons remain silent during the forward pass (emit fake spikes at ), we set and to force the correct neuron to fire.

Let’s define the “squared error” loss function as

(7)

To apply the gradient descent algorithm, we should compute , the gradient of the loss function with respect to the binary weights. However, the gradient descent method makes small changes to the weights, which cannot be done with binary values. To solve the problem, during the learning phase, we use a set of real-valued weights, , as a proxy, such that

(8)

and, as the gradient of the function is 0 or undefined, using the straight-through estimator (STE) we approximate , therefore, we have

(9)

Now, we can update the real-valued weights as

(10)

where is the learning rate parameter.

Layer size Initial real-value weights Initial parameters
Dataset Hidden Output
MNIST 600 10 256 100 5 5 0.1 0.01 1
Fashion-MNIST 1000 10 256 700 5 10 0.1 0.01 1
Table 1: The structural, initialization, and model parameters used for MNIST and Fashion-MNIST datasets.
Model Coding Neuron / Synapse / PSP Learning Hidden(#) Acc. (%)
Mostafa (2017) [28] Temporal IF / Real-value /Exponential Temporal Backpropagation 800 97.2
Tavanaei et al. (2019)  [27] Rate IF / Real-value / Instantaneous STDP-based Backpropagation 1000 96.6
Comsa et al. (2019) [29] Temporal SRM / Real-value / Exponential Temporal Backpropagation 340 97.9
Zhang et al. (2020) [30] Temporal IF / Real-value / Linear Temporal Backpropagation 400 98.1
Zhang et al.(2020)  [30] Temporal IF / Real-value / Linear Temporal Backpropagation 800 98.4
Sakemi et al.(2020)  [31] Temporal IF / Real-value / Linear Temporal Backpropagation 500 97.8
Sakemi et al.(2020)  [31] Temporal IF / Real-value / Linear Temporal Backpropagation 800 98.0
S4NN [26] Temporal IF / Real-value / Instantaneous Temporal Backpropagation 400 97.4

BNN
Binary (0 & 1) Binary Sigmoid/ Binary/ - Backpropagation with ADAM 600 96.8
BS4NN (this paper) Temporal IF / Binary / Instantaneous Temporal Backpropagation 600 97.0

Table 2: The recognition accuracies of recent supervised fully connected SNNs with spike-time-based backpropagation on the MNIST dataset. The details of each model including its input coding scheme, neuron model, synapses, post-synaptic potential (PSP), learning method, and the number of hidden neurons are provided.

Let’s define

(11)

where, is the firing time of the th neuron of the th layer. Also, according to [26], we approximate to be if and 0 otherwise. Therefore, we have

(12)

where for the output layer (i. e., ) we have

(13)

and for the hidden layers (i. e., ), according to the backpropagation algorithm, we compute the weighted sum of the delta values of neurons in the following layer,

(14)

where, iterates over neurons in layer . Similar to [26], we approximate and if and only if . To have smooth gradients, we use the real-valued weights, , instead of the scaled binary weights, .

We also update the scaling factor as

(15)

where is the learning rate parameter. Therefore we compute

(16)

where and iterate over neurons in layer and , respectively. Here again, similar to [26], we approximate and according to Eq.3, we compute .

Note that before updating the weights we normalize the gradients as , to avoid exploding and vanishing gradients. Also, we added an -norm weight regularization term to the loss function to avoid overfitting. The parameter is the regularization parameter accounting for the degree of weigh penalization.

3 Results

3.1 MNIST dataset

In this section, we evaluate BS4NN on the MNIST dataset which is the most popular benchmark for spiking neural networks [1]. The MNIST dataset contains 60,000 handwritten digits (0 to 9) in images of size pixels as the train set. The test set contains 10,000 digit images, images per digit. Here, we train a fully connected BS4NN with one hidden layer containing 600 IF neurons. The parameter settings are provided in Table 1. Initial synaptic weights including input-hidden ( ) and hidden-output (

) weights are drawn from uniform distributions in range

and , respectively. Trainable parameters including the synaptic weights the scale factors of hidden () and output () layers are tuned through the learning phase. Adaptive parameters including and

are discounted by 30% every ten epochs. Other parameters remain intact in both the learning and testing phases.

Table 2 presents the categorization accuracy of the proposed BS4NN along with some other SNNs with spike-time-based direct supervised learning algorithms and fully-connected architectures. BS4NN is the only network in this table that uses binary weights and it could reach 97.0% accuracy on MNIST. As mentioned in the Methods Section, BS4NN uses a modified version of the temporal backpropagation algorithm in S4NN (Kheradpisheh et al. (2020) [26]) to have binary weights. Compared to S4NN, the categorization accuracy in BS4NN dropped by 0.4% only. Although BS4NN is outperformed by the other SNNs by at most 1.4%, its advantages are the use of binary weights instead of real-valued full-precision weights and instantaneous post-synaptic potential (PSP) function. As seen, BS4NN could outperform Tavanaei et al. (2019)  [27] that uses real-valued weights and instantaneous PSP. Other SNNs use exponential and linear PSP functions which complicate the neural processing and the learning procedure of the network, which consequently, increase their computational and energy cost.

(a)
(b)
Figure 1: (a) The firing times of the ten output neurons over the test images ordered by category. (b) The mean firing time of each output neuron (rows) over the images of different digits (columns).

We also compared BS4NN to a BNN with a similar architecture. To do a fair comparison, inspired from [12], we implemented a BNN with binary weights (-1 and +1) and binary sigmoid activations (0 and 1). The network has a single hidden layer of size 600 and it is trained using ADAM optimizer and squared hinge loss function for 500 epochs. The learning rate initiates from and exponentially decays, through the learning epochs, down to . According to [32], the initial real-valued weights of each layer are randomly drawn from a uniform distribution in range , where, is the number of synaptic weights of that layer. As provided in Table 2, the BNN could reach the best accuracy of 96.8% on MNIST, that is a 0.2% drop with respect to BS4NN (we will comment these results in the Discussion).

Figure 2: The mean required number of spikes in the input, hidden, and total layers.

The firing times of the ten output neurons over all test images are shown in Figure 0(a). Images are ordered by the digit category from ’0’ to ’9’. For each test image, the firing time of each neuron is shown by a color-coded dot. As seen, for each category, its corresponding output neuron tends to fire earlier than others. It is better evident in Figure 0(b) which shows the mean firing time of each output neuron for each digit category. Each output neuron has, by difference, the shortest mean firing time for images of its corresponding digit. Interestingly, BS4NN needs a much longer time to detect digit ’1’ (188 time steps) that could be due to the use of binary weights. Other digits cover more pixels of the image, and therefore, produce more early spikes than digit ’1’. Since the weights are binary, the few early spikes of digit ’1’ can not activate the hidden IF neurons, and hence, BS4NN needs to wait for later surrounding spikes to distinguish digit ’1’ from other digits.

We further counted the mean required number of spikes for BS4NN to categorize images of each digit category. To this end, we counted the number of spikes in all the layers until the emission of the first spike in the output layer (when the network makes its decision). The mean required spikes of the input and hidden layers are depicted in Figure 2. All digit categories but ’1’, on average, require about 100 spikes in the input and 200 spikes in the hidden layers, respectively. Digit ’1’ requires about 300 input spikes, while, similar to other digits, its hidden layer needs about 100 spikes. As explained above, digit ’1’ covers a fewer number of pixels than other digits and also its shape overlaps with the constituent parts of some other digits, hence, due to the use of the binary weights, the network should wait for later input spikes to distinguish digit ’1’ from other digits.

Figure 3 shows the time course of the membrane potentials of the output neurons for a sample ’9’ test image. The membrane potential of the 9th output neuron overtakes others at the 15th time step and quickly increases until it crosses the threshold at the 58th time step. The accumulated input spikes until the 15, 58, 100, 190, and 250 time steps are depicted in this figure. As seen, up to the 15th time step, a few input spikes are propagated and at the 58th time steps with the propagation of a few more input spikes, the 9th output neuron reaches its threshold and determines the category of the input image. Later input, hidden, and output spikes are no more required by the network.

Figure 3: The trajectory of the membrane potential for all the ten output neuron for sample ’9’ test image along with the demonstration of the accumulated input spikes until the 15, 58, 100, 190, and 250 time steps

To evaluate the robustness of the trained BS4NN to the input noise, during the test phase, we added random jitter noise drawn from a uniform distribution in range to the pixels of the input images. The noise level, , varies from 5% to 100% of the maximum pixel intensity, . Figure 3(a) shows a sample image contaminated with different levels of jitter noise. The recognition accuracy of the trained model over noisy test images under different levels of noise is plotted in Figure 3(b). As shown, the recognition accuracy remains above 95% and it drops to 79% for the 100% noise level. In higher noise levels, the order of input spikes can dramatically change and because BS4NN has only +1 and -1 synaptic weights even to the insignificant parts of the input images, It affects the behavior of IF neurons which consequently increase the categorization error rate.

(a)
(b)
Figure 4: (a) A sample image contaminated with different amount of jitter noise. (b) The recognition accuracy of the trained BS4NN on test images under different levels of noise.
Figure 5: Reconstruction of the real-valued weights and their corresponding binary weights for sixteen randomly selected hidden neurons.
Figure 6: The speed-accuracy trade-off in the pre-trained BS4NN when the threshold varies form 0 to 200.
Model Neuron Coding Synapses Learning Acc. (%)
Zhang et al. (2019) [33] Recurrent SNN Leaky IF Rate Real-value Spike-train backpropagation 90.1
Ranjan et al. (2019) [34] Convolutional SNN Leaky IF Rate Real-value Spike-rate backpropagation 89.0
Wu et al. (2020) [35] Convolutional SNN Leaky IF Rate Real-value Global-local hybrid learning rule 93.3
Zhang et al. (2020) [36] Fully-connected SNN Leaky IF Rate Real-value Spike-sequence backpropagation 89.5
Zhang et al. (2020) [36] Fully-connected SNN IOW222Input Output Weighted Leaky IF [36] Rate Real-value Spike-sequence backpropagation 90.2
Hao et al.(2020) [37] Fully-connected SNN Leaky IF Rate Real-value Dopamine-modulated STDP 85.3
S4NN Fully-connected SNN IF Temporal Real-value Temporal backpropagation 88.0

BNN
Fully-connected Binary Sigmoid Binary - ADAM 86.4

BS4NN (this paper)
Fully-connected SNN IF Temporal Binary Temporal backpropagation 87.3
Table 3: The recognition accuracies of recent supervised SNNs on the Fashion-MNIST dataset. The details of each model including its architecture, input coding scheme, neuron model, and learning method are provided.

In a further experiment, we replaced the binary weights of the trained BS4NN with their corresponding real-valued weights and applied them to the test images. In other words, we replaced the term in Eq. 3 with . The network reached 89.1% accuracy on test images which is far less than the 97.0% accuracy of the binary weights. It shows that, although we update the real-valued proxy weights during the learning phase, we are actually tuning the binary weights, because the loss and gradients are computed based on the binary weights. Figure 5 shows the pairs of the real and binary-valued weights for 16 randomly selected hidden neurons. Dark pixels correspond to negative and bright values correspond to positive weights. It seems that hidden neurons tend to detect different variants of digits and their constituent parts.

To assess the speed-accuracy trade-off in BS4NN, we first trained the network with a threshold of 100 for all the neurons, then we varied the thresholds from 0 to 200 for all the neurons and evaluated the network on test images. As shown in Figure 6, the accuracy peaks around the threshold of 100 and drops as we move to higher or lower threshold values, while, the response time (time to the first spike in the output layer) increases by the threshold. Regarding this trade-off, by reducing the threshold of the pre-trained BS4NN, one can get faster responses but with lower accuracy. For instance, by setting the threshold to 80, the response time shortens from 112.9 to 44.9 (3x faster responses), while, the accuracy drops from 97.0% to 91.0%.

The scaling factors are full-precision floating-point parameters we used in our neuronal layers to have a better approximation of the real-valued weights by the binary weights. We could round the factors in the pre-trained network down to two decimal places without a change in the categorization accuracy.

Figure 7: Sample images from Fashion-MNIST dataset.

3.2 Fashion-MNIST dataset

Fashion-MNIST [25] is a fashion product image dataset with 10 classes (see Figure 7). Images are gathered from the thumbnails of the clothing products on an online shopping website. Fashion-MNIST has the same image size and training/testing splits as MNIST, but it is a more challenging classification task. Here, we used a BS4NN with a single hidden layer with 1000 IF neurons. Details of the parameter values are presented in Table 1. The initial weights of all layers are randomly drawn from a uniform distribution in the range [0,1]. The learning rate parameters and discount by 30% every 10 epochs, and the scaling factors and are trained during the learning phase.

(a)
(b)
(c)
Figure 8:

(a) The mean firing times of the output neurons over the Fashion-MNIST categories. (b) The confusion matrix of BS4NN on Fashion-MNIST. (c) The mean required number of spikes per category and layer.

Table 3 summarizes the characteristics and recognition accuracies of recent SNNs on the Fashion-MNIST dataset. BS4NN could reach 87.3% accuracy (0.7% drop with respect to S4NN). Apart from BS4NN, all the models use real-valued synaptic weights, spike-rate-based neural coding, and leaky neurons with exponential decay. The mean firing times of the output neurons of BS4NN for each of the ten categories of Fashion-MNIST are illustrated in Figure 7(a). As seen, the correct output neuron has the minimum firing time for its corresponding category than others. However, compared to MNIST, there is a small difference between the mean firing times of the correct and some other neurons. It could be due to the similarities between instances of different categories. For instance, as shown in Figure 7(b), BS4NN confuses ankle boots, sandals, and sneakers. There is a similar situation for shirts and t-shirts, and also, between pullovers and coats, where, their firing times are close together and consequently BS4NN confuses them by each other sometimes. The total required number of spikes in each layer and the total network is provided in Figure 7(c). Those classes that are mostly confused by each other (i. e., shirts, t-shirts, coats, and pullovers) require more spikes in both input and hidden layers. One reason could be the larger size of these objects in the input image leading to more early input spikes. But, the other reason, especially for the hidden layer, could be the need for more discriminative features between these confusing categories.

We also did a comparison between BS4NN and a BNN with binary weights (-1 and 1), binary activations (0 and1), and the same architecture as BS4NN on Fashion-MNIST. The BNN is trained using ADAM optimizer and squared hinge loss function. The learning rate is initially set to and exponential decays down to . The initial real-valued weights of each layer are randomly drawn from a uniform distribution in range , where, is the number of synaptic weights of that layer. Interestingly, BS4NN outperforms BNN by 0.9% accuracy.

4 Discussions

In this paper, we propose a binarized spiking neural network (called BS4NN) with a direct supervised temporal learning algorithm. To this end, we used a very common approach in the area of BANNs [7]. During the learning phase, we have two sets of real and binary-valued weights, such that the binary weights are the sign of the real-valued weights. The binary weights are used for the inference and gradient backpropagation, while, in the backward pass, the weight updates are applied to the real-valued weights. The proposed BS4NN uses the time-to-first-spike coding [38, 39, 40, 41, 42] to convert image pixels into spike trains in which input neurons with higher pixel intensities emit spikes with shorter latencies. The subsequent hidden and output neurons are comprised of non-leaky IF neurons with binary (+1 or -1) weights that fire once when they reach their threshold for the first time. The decision is simply made by the first spike in the output layer. The temporal error is then computed by comparing the actual and target firing times. Gradients backpropagate through the network and are applied to the real-valued weights. Target firing times are computed relative to the actual firing times of the output neurons to push the correct output neuron to fire earlier than others. It forces BS4NN to make quick and accurate decisions with the less possible amount of spikes (high sparsity).

In our experiments, BS4NN could reach 97.0% and 87.3% accuracy on MNIST and Fashion-MNIST datasets, respectively. Although in terms of accuracy, BS4NN could not beat the real-valued SNNs, it has several computational, memory, and energy advantages which makes it suitable for hardware and neuromorphic implementations. Interestingly, BS4NN has also outperformed BNNs with same architectures on MNIST and Fashion-MNIST by 0.2% and 0.9% accuracy, respectively. This improvement with respect to BNN could be due to the use of time in our time-to-first-spike coding and temporal backpropagation in BS4NN. Both networks have binary activations and binary weights, but the advantage of BS4NN is the use of temporal information encoded in spike-times.

Instead of real-valued weights, BS4NN uses binary synapses with only one full-precision scaling factor per layer. It can be very important for memory optimization in hardware implementations where every synaptic weight requires a separate memory space. If one implements the binary synapses with a single bit of memory, then it can reduce the network size by 32x compared to a network with 32-bit floating-point weights [13, 43]. Also, it can ease the implementation of multiplicative synapses by replacing them with one unit increment and decrement operations. Hence, it can be important for reducing the computational and energy-consumption costs [13, 43].

The use of non-leaky IF neurons instead of complicated neuron models such as SRM [29] and LIF [36, 44] makes BS4NN more computationally efficient and hardware friendly. It might be possible to efficiently implement leakage in analog hardware regarding the physical features of transistors and capacitors [6], but it is always costly to be implemented in digital hardware. To do so, one might periodically (e.g., every millisecond) decrease the membrane potential of all neurons (clock-driven) [45], or whenever an input spike is received by a neuron (event-based) [46, 47]. The first one requires energy and the latter one needs more memory to store last firing times.

The implementation of instantaneous synapses used in BS4NN is way simpler than the exponential [28], alpha [29], and linear [30, 31, 48] synaptic currents and costs much less energy and computation. In instantaneous synapses, each input spike causes a sudden potential increment or decrement, but in the current-based synapses, each input spike causes the potential to be updated on several consecutive time steps (which requires an extra state parameter).

As mentioned above, BS4NN uses single-spike neural coding throughout the network. The input layer employs a time-to-first-spike coding by which input neurons fires only once (shorter latencies for stronger inputs). Also, neurons in the subsequent layers are allowed to fire at most once and only when they reach their threshold for the first time. In addition, the proposed temporal learning algorithm used to train BS4NN forces it to rely on earlier spikes and respond as quickly as possible. This cocktail is shown to take much less energy and time on neuromorphic devices compared to the rate-coded SNNs [49, 50], even up to 15 times lower energy-consumption and 5 times faster decisions [51].

Recently, efforts are made to convert pre-trained BANNs into equivalent BSNNs with spike-rate-based neural coding [18, 19, 20]

. However, these networks do not use the temporal advantages of SNNs that can be obtained through a direct learning algorithm. Due to the non-differentiability of the thresholding activation function in spiking neurons, it is not convenient to apply backpropagation and gradient descents to SNNs. Various solutions are proposed to tackle this problem including computing gradients with respect to the spike rates instead of single spikes 

[52, 53, 54, 55], using differentiable smoothed spike functions [56], surrogate gradients for the threshold function in the backward pass [57, 58, 59, 60, 61, 62]

, and transfer learning by sharing weights between the SNN and an ANN 

[49, 63]. In another approach, known as latency learning, the neuron’s activity is defined based on the firing time of its first spike, therefore, they do not need to compute the gradient of the thresholding function. In return, they need to define the firing time as a function of the membrane potential [26, 29, 30, 31, 36, 64, 65], or directly as a function the firing times of presynaptic neurons [28]. By the way, all aforementioned learning strategies work with full-precision real-valued weights and future studies can assess their capabilities to be used in BSNNs.

References