Spike time displacement based error backpropagation in convolutional spiking neural networks

08/31/2021 ∙ by Maryam Mirsadeghi, et al. ∙ CNRS AUT Shahid Beheshti University 0

We recently proposed the STiDi-BP algorithm, which avoids backward recursive gradient computation, for training multi-layer spiking neural networks (SNNs) with single-spike-based temporal coding. The algorithm employs a linear approximation to compute the derivative of the spike latency with respect to the membrane potential and it uses spiking neurons with piecewise linear postsynaptic potential to reduce the computational cost and the complexity of neural processing. In this paper, we extend the STiDi-BP algorithm to employ it in deeper and convolutional architectures. The evaluation results on the image classification task based on two popular benchmarks, MNIST and Fashion-MNIST datasets with the accuracies of respectively 99.2 algorithm has been applicable in deep SNNs. Another issue we consider is the reduction of memory storage and computational cost. To do so, we consider a convolutional SNN (CSNN) with two sets of weights: real-valued weights that are updated in the backward pass and their signs, binary weights, that are employed in the feedforward process. We evaluate the binary CSNN on two datasets of MNIST and Fashion-MNIST and obtain acceptable performance with a negligible accuracy drop with respect to real-valued weights (about 0.6 drops, respectively).

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 8

page 9

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Spiking neural networks (SNNs) are recently attracting more and more attention due to their temporal nature and their event-driven processing paradigm which make them suitable for energy-efficient neuromorphic implementation. However, due to the use of non-differentiable activation function and the temporal dynamics of SNNs, it is still a big challenge to train SNNs directly. Therefore, they have not yet reached the state-of-the-art accuracy compared to the artificial neural networks (ANNs), especially, in deep architectures with single-spike-based temporal coding.

In temporal coding scheme, information is carried by the timing or the order of individual spikes [1, 2, 3]. In the extreme case of single-spike-based temporal coding, neurons are allowed to fire at most once, which can radically reduce the computational and energy demand of SNNs. So far, different solutions have been proposed to adapt the backpropagation (BP) algorithm to directly train SNNs with single-spike-based temporal coding.

Since the neuronal activity in single-spike coding is defined by the neurons firing time, two approaches are used to adapt BP to single-spike-based SNNs. The first approach is to compute or approximate the derivative of the firing time of each neuron with respect to its membrane potential [4, 11, 12]. The second approach is to directly compute the firing time of each postsynaptic neuron based on the firing times of its presynaptic neurons [5, 6, 7].

Bohte, et al. [4] introduced a temporal version of BP called SpikeProp which minimizes the temporal error of the network to train single-spike multilayer SNNs. They used exponentially SRM neuron models and employed a piecewise linear approximation to compute the derivative of the thresholding activation function at the firing time. Kheradpisheh, et al. [11]

proposed temporal version of BP for a multi-layer SNN with IF neurons and instantaneous synapses. To do so, they approximated the derivative of the neurons firing latency with respect to the membrane potential by

. By using Rectified Linear Postsynaptic Potential (ReL-PSP) spiking neuron model, Zhang et al. [12] could avoid such approximation and precisely compute this derivative. Mostafa [5] defined the firing time of each neuron directly based on its presynaptic spike times. He used IF neurons with exponentially decaying synaptic current. Comsa, et al. [6] employed a similar approach for SRM neuron models with alpha synaptic function. Zhou et al. [7] developed [5] for implementing deep convolutional spiking neural networks (CSNNs) and achieved the state-of-the-art performance.

All aforementioned models but Zhang et al [12] and Zhou et al. [7], have been used fully-connected networks with one or two hidden layers and they have not ever been applied to a deeper structure. In Zhang et al. [12], authors developed a deep convolutional spiking neural network consisted of two convolutional and two hidden layers [12]. They used ReL-PSP based spiking neuron model and trained the network by employing temporal BP with recursive backward gradient. Zhou et al. [7] extended [5] to implement deep architectures of SNN based on the VGG16 model [8, 9] for CIFAR10 and the GoogleNet model [10]

for ImageNet. To the best of our knowledge, These are the only implementation of a CSNN with single-spike-based temporal coding. Other CSNNs are either a converted version of traditional CNNs 

[13, 14, 15, 16] or they use rate coding or multi-spike per neuron schemes to directly apply BP on the network [17, 18, 19].

Recently, we proposed an error backpropagation algorithm based on the spike time displacements, called STiDi-BP, for multi-layer fully-connected SNNs with single-spike-based temporal coding [20]. Similar to [12, 4, 11], we used a linear approximation to compute the derivative of neurons’ firing time with respect to their membrane potential. However, in STiDi-BP, instead of recursively backpropagating the errors, we computed the desired firing time of neurons in each layer, and hence, we could locally compute the error just by comparing the actual and desired firing times.

In this paper, we extend the STiDi-BP learning approach to be applicable in deeper and convolutional architectures. The evaluation results on two image classification tasks of MNIST and Fashion-MNIST datasets, with respectively and recognition accuracy, confirm the capabilities of the proposed algorithm in deep CSNNs.

Implementing SNNs with real-valued weights on neuromorphic devices requires a large amount of memory space and imposes a high load of floating-point computation. Binarizing the synaptic weights can help to reduce the memory footprint and the computational cost. A few recent studies had tried to convert supervised binary artificial neural networks (BANNs) into equivalent binary SNNs (BSNNs) 

[21, 22, 23, 24]

. For the first time, Kheradpisheh et al. have introduced a direct supervised learning algorithm to train a two-layer fully connected SNN with binary synaptic weights

[25]. But they didn’t apply it to deeper SNNs or CSNNs. There is no other study to the best of our knowledge aimed at directly training deep supervised SNNs and CSNNs with binary weights. Here, we employ STiDi-BP to directly train a deep CSNN with binary synaptic weights which are the sign of real-valued weights. In the backward pass, we update the real-valued weights and the feedforward processing is performed by the binary weights. We have evaluated the proposed network on MNIST and Fashion-MNIST datasets with categorization accuracies of and , respectively, that has a negligible drop compared to real-valued-based CSNN.

2 Forward pass

Here, the proposed convolutional spiking neural network is comprised of a temporal coding input layer, a stack of interlaying convolutional and pooling layers for feature exraction, and a cascade of fully connected layers for the final classification. A temporal coding is used to convert the input image into a sparse spike train (i.e. one spike per pixel). After feeding the input codded image to the network, the convolutional operations are applied. In a convolutional layer, several filters are used to extract visual features from the previous layer which are presented in different feature maps. After each convolutional layer, a pooling layer is used to remove the redundancy and reduce the size of the feature maps. A pooling layer does a nonlinear max pooling operation over a set of neighboring neurons to select a neuron with the highest activity (i.e., earliest spike). After the last pooling layer, the fully connected layers are implemented to process the extracted features and do the final classification.

2.1 Neuron Model

We use a simple piecewise linear postsynaptinc potential (PL-PSP) based spiking neuron model which has a very low computational cost compared to exponential PSP models [20]. The membrane potential of neuron at time is the weighted summation of the PL-PSPs of its afferent neurons:

(1)

where, is the synaptic weight connecting the presynaptic neuron to the neuron and is the spike time of neuron . is the kernel of the PL-PSP function that is illustrated in Figure 1 and is described by the following equation:

(2)

here, and are time constants of the PL-PSP function and .

Figure 1: The piecewise linear postsynaptic potential caused by the presynaptic spike time .

2.2 Temporal coding

Contrary to the costly rate-coding scheme in which the input image is encoded in the spike rates of the input neurons (i.e., the higher the pixel value, the higher the firing rate), we use the more efficient temporal coding scheme[20], where the information is carried by the timing of individual spikes in a sparse manner. Each input neuron fires at most once such that neurons corresponding to pixels with higher intensities emit earlier spikes. After feeding input spikes to the network, each neuron in the subsequent layer updates its membrane potential by integrating the voltage sum of all the presynaptic spikes and fires a spike right after crossing the threshold. Each neuron fires only once where its firing time determines the saliency of the extracted feature (such as the input layer), hence, the whole network obeys the sparse temporal coding.

2.3 Convolutional layers

There are several feature maps in a convolutional layer, each of which corresponds to a convolutional filter. Neurons in a specific map share the same set of synaptic weights, and therefore, detect the same feature at different locations. Within a map , each convolutional neuron at location (,

) receives spikes from neurons inside a certain window within all the feature maps of the previous layer. Therefore, each visual feature in a convolutional layer is obtained by combining several simpler features extracted in the previous layer. The neuron computes its membrane potential

by applying the corresponding filter on the received spike times according to the extended Eq. 1 and Eq. 2. Whenever crosses the threshold , the convolutional neuron emits a spike, and regarding the single-spike-based coding, it will remain silent until the end of the simulation.

2.4 Pooling layers

In the rate-coding scheme, it is not possible to detect the neuron with maximum firing rate until the last simulation time point. While, in the proposed model, due to the use of temporal coding, we can simply perform the max pooling operation by propagating the first spike appearing in the input window of each pooling neuron. To do so, in pooling layers, we use IF neurons with the threshold and the input synaptic weights of one. Each pooling neuron performs the maximum operation over a window in the corresponding feature map of the previous layer. The first input spike from the neighboring afferent neurons activates the pooling neuron and makes it to fire a spike immediately. Each pooling neuron is permitted to fire at most once during the simulation time. Note that there is no learning for the pooling neurons.

2.5 Fully connected layers

The last stage of the proposed model which performs the final classification based on the extracted visual features of the convolutional part, is composed of cascading fully connected layers with PL-PSP based neurons.

Each neuron in layer integrates the weighted spikes according to Eq. 1 and Eq. 2 and emits a spike when its post-synaptic voltage crosses the threshold. The neuron is allowed to fire at most once and the spike latency carries the information. The same process has recurred in the following hidden layers and output layers.

In the output layer, the number of neurons is equal to the number of classes and each neuron is assigned to a different category. The output neuron that fires earlier than others is called the winner and it determines the class of the input image.

3 Backward pass

Here we apply the proposed learning algorithm to a deep convolutional spiking neural network to show that it is suitable and practical for deep structure of SNNs.

The error loss function of each layer is described independently by computing the time differences between the actual and the desired firing times. The desired firing times in the middle layers are calculated by displacing the presynaptic spike times such that the error is minimized. Then, the GD is performed locally to update the synaptic weights without any backward recursive GD computation and to overcome the non-differentiality of SNNs, a linear approximation method described in

[20] is employed.

3.1 Fully connected layers

The learning rule in the fully connected layers is the same as [20]. Here, we briefly introduce the proposed learning algorithm for the reader’s convenience. More complete description of STiDi-BP is given in[20].

The loss function of each layer is calculated independently by the following equation:

(3)

where, is the temporal error function for the postsynaptic neuron obtained by substracting the desired and the actual firing times ( and , respectively) of the neuron in the layer:

(4)

In order to minimize the squared error loss function , the synaptic weights of layer should be modified by using GD algorithm. To update each synaptic weight , we compute the gradient of loss function with respect to . Hence

(5)

where, is the learning rate parameter and is the weight connection between neuron in layer to the neuron . in Eq. 5 can be expanded to

(6)

where, is the membrane potential of neuron and is the spike time of neuron . Neuron has contribution to the membrane potential computation only if it fires before . By using Eq. 3 and Eq. 4 we can express the first term as:

(7)

And, the third term is computed by considering Eq. 1 and Eq. 2:

(8)

The second term, the derivative of the postsynaptic spike time with respect to its membrane potential, is calculated according to the following equation. The details of computation are given in [20].

(9)

By substituting Eq. 7, Eq. 8 and Eq. 9 in Eq. 6, the final equation for modifing the synaptic weights and minimizing the squared error loss function is described by:

(10)

After computing the error loss function for each layer, we use Eq. 5 to update the synaptic weights of that layer and minimize the local error. Hence, we don’t have backward recursive gradient computation. How to calculate target firing times is an important issue that will be discussed in the next section.

3.1.1 Calculation of target firing times

The target firing time is computed using different formula for the neurons of middle layer and output layer [20].

In the output layer, we use a relative encoding method in which the correct output neuron should be encouraged to fire earlier than others. To do that, should take into account the input image category lablel. by assuming that the input image belongs to the class, the output neuron should fire at time , and others set to fire at later time . Here and are the maximum and the minimum output firing times and is a constant parameter used to provide resolution distance for the winner neuron.

There is a different situation for the middle layers. To compute the desired firing time of each neuron in the middle layer, we define the time displacement amount of the neuron spike time to reduce the postsynaptic error . To do that, we compute the derivative of the postsynaptic error with respect to :

(11)

Here, is the learning rate and iterates over neurons in layer . By expanding the Eq. 11 we have

(12)

The first and the second terms of Eq. 12 are calculated the same as Eq. 7 and Eq. 8, respectively and, the third term is expressed by considering Eq. 1 and Eq. 2

(13)

Finally, the time displacement amount of presynaptic neuron is described by substituting Eq. 7, Eq. 8 and Eq. 13 in the RHS of Eq. 12

(14)

The postsynaptic error is reduced if the presynaptic neuron fires a spike at time instead of time . Hence, should be considered as the target firing time of the neuron .

3.2 Convolutional layers

For each specific map of a convolutional layer , the error loss function is calculated independently, by integrating the mean squares of the difference between the actual and the desired firing times:

(15)

where, d iterates over all the neurons of map . Then, the GD algorithm is locally employed to modify the synaptic weights of the corresponding filter as follows:

(16)

Each synaptic weight of filter () corresponds to the presynaptic neuron located at () with the spike firing latency of . Therefore, is updated as

(17)

By expanding Eq. 17, we have

(18)

The first, second and third terms of Eq. 18 are calculated by extending Eq. 7, Eq. 8 and Eq. 9, respectivly:

(19)
(20)
(21)

where, is the spike firing latency of neuron in map of convolutional layer .

4 Binarization

Here we apply some modification to STiDi-BP learning rule to directly train the CSNN with binary synaptic weights .

The only change in the forward path is the use of binary weights instead of real-valued weights, where . Hence, the membrane potential of the postsynaptic neuron of layer in Eq. 1 is rewriten as

(22)

iterates over all presynaptic neurons and is a shared scaling factor among all neurons of layer . Each layer has its own scaling factor which should be updated in addition to the synaptic weights in the learning phase. The scaling factor is used to make sure that neurons cross the threshold.

In the backward pass, we have two sets of weights, real-valued weights and binary weights. For each layer , we update the real-valued weights by using Eq. 5 and Eq. 10 explained in section 3.2 and, update the scaling factor as

(23)

here is the learning rate parameter and is calculated by using the following equation:

(24)

The first and second terms of Eq. 24 are calculated according to Eq. 7 and Eq. 8. For computing the third term we use Eq. 22:

(25)

5 Experiments and results

In this section we evaluate the proposed STiDi-BP training algorithm on deep structure of spiking neural network for two image classification tasks: MNIST dataset and Fashion-MNIST dataset. We develop a single-spike-based temporal convolutional SNN with piecewise linear SRM neurons and consumes two different modes: real-valued weights and binary weights. In the following, each network (CSNN and binary CSNN) is examined separately.

5.1 Real-valued weights

5.1.1 MNIST dataset

The MNIST dataset [26] is the most popular benchmark for spiking neural networks. It comprises of grayscale training images and grayscale testing images. To evaluate the proposed learning algorithm on the MNIST dataset, we develop an Real-valued weights-base CSNN (R-CSNN) with the structure of , which consists of one convolutional layer, one pooling layer and one hidden layer followed by an output layer. The convolutional layer is comprised of neural maps with convolution- window. The pooling-window of the pooling layer is of size

with the stride of

. The hidden and the output layers are respectively consist of and neurons. Here the maximum simulation time is and other parameters of each layer are listed separately in Table 1

layer initial weights
Convolutional 80 0.001 1 5 [0, 2]
Hidden 0.01 1 50 [0, 0.25]
Output 0.001 1 10 [0, 0.5]
Table 1: Model parameters for MNIST dataset in R-CSNN.

In Table 2, we compare the STiDi-BP with some recent reported results which used supervised learning algorithms. As shown, [17, 19] used CSNN structure and achieved the highest performance, while, they are based on rate coding which has great deal of computation. In the area of single-spike-timing-based supervised learning algorithms [12, 7], and this work are the only implementation of CSNN and other implementations [5, 6, 11, 20] are fully connected networks. In [20], we introduced STiDi-BP algorithm and achieved the accuracy of with the network structure of . While, other fully connected SNNs[5, 6, 11, 12] employed the traditional temporal BP which requires backward recursive gradient computation. Zhang et al. in [12] employed rectified linear PSP based spiking neuron models and developed two SNNs. They reached the accuracy of for a fully connected network and for CSNN with the structure of . Zhou et al. [7] use IF neuron models with exponential decaying function and defined a direct relation between neuron’s pre and postsynaptic firing times same as [5]. They acheived accuracy for CSNN with the structure of . Here we apply STiDi-BP to an R-CSNN architecture and acheive the state-of-the-art accuracy with the lower number of convolutional and hidden layers.

Model structure Coding Accuracy().
Mostafa (2017) [5] 784-800-10 Temporal 97.2
Comsa et al. (2019) [6] 784-340-10 Temporal 97.9
Kheradpisheh et al. (2020) [11] 784-400-10 Temporal 97.4
Mirsadeghi et al. (2020) [20] 784-350-10 Temporal 97.4
Zhang et al.(2020) [12] 784-800-10 Temporal 98.5
W.Zhange et al. (2020) [17] 15C5-P2-40C5-P2-300-10 rate 99.5
Fang et al.(2020) [19] 128C3-P2-128C3-P2-2048-100-10 rate 99.6
Zhang et al.(2020) [12] 16C5-P2-32C5-P2-800-128-10 Temporal 99.4
Zhou et al.(2020) [7] 32C5-P2-16C5-P2-10 Temporal 99.3
STiDi-BP in R-CSNN (This paper) 40C5-P2-1000-10 Temporal 99.2
Table 2: The classification accuracies of recent supervised SNNs with direct training on the MNIST dataset with some details such as input coding scheme and the network structure are provided in the table. The convolution layer and pooling layer are represented by C and P, respectively and layers are separated by -.

The mean firing time of each output neuron over the images of different categories and the mean required spikes of all layers are depicted in Figure 2 and Figure 3, respectively.

Figure 2: The mean firing time of each output neuron (rows) over the images of different digit categories (columns) in R-CSNN.
Figure 3: The mean required number of spikes in the input, convolutional, hidden, and total layers in R-CSNN.

According to Figure 2, Each output neuron tends to fire earlier for images of its corresponding category which confirms that to recognize each input digit, it is not necessary to give all its input spikes to the network. Here digit has the maximum mean firing time because it covers the pixels that are common among most other digits. Therefore, the network needs much longer time to detect it. On the other hand, Figure 3 shows that the network is able to detect the class corresponding to the input image by firing a limited number of neurons in each layer which helps to make very rapid decisions about the input categories. For example, the network needs only spikes in total to correctly recognize digit , which has the maximum mean firing time of . And, the maximum number of mean required spikes is related to the digit , which is only spikes.

These two properties (that are illustrated in Figure 2 and Figure 3), are the most important reasons for low cost and high computational speed in single-spike-based temporal SNNs.

This is shown more clearly in Figure 4, where, the membrane potential of output neuron for a sample test image and the accumulated input spikes until the , , , , and time steps are depicted. As soon as the membrane potential of an output neuron reaches the threshold, the network assigns the corresponding class to the input image and can stop the computations. Here, the membrane potential of output neuron overtakes others and reaches the threshold at time step . As seen, there is no need to propagate all input spikes to determine the category of the input image. By propagating a limited number of input spikes up to the

time step, the membrane potential of the correct output neuron crosses the threshold and the network can classify the input image.

Figure 4: The trajectory of the membrane potential of all output neurons for sample test image. The incoming input spikes up to the time step contribute to the digit classification and the remaining spikes are ignored.

5.1.2 Fashion-MNIST dataset

Fashion-MNIST is a dataset of Zalando’s article images [27] which has the same image size and structure of training and testing splits as MNIST, but it is more challenging classification problem.

Here we develop an R-CSNN with the structure of . The maximum simulation time is and other parameters of each layer are listed separately in Table 3.

layer initial weights
convolutional 80 0.0001 1 5 [0, 2]
convolutional 80 0.001 1 10 [0, 1]
0.1 1 100 [0, 1]
0.01 1 50 [0, 1]
Table 3: Model parameters for Fashion-MNIST dataset in R-CSNN.

The classification accuracies and characteristics of different approaches on Fashion-MNIST dataset are shown in Table 4.

Model structure Coding Accuracy().
Kheradpisheh et al. (2020) [11] 784-1000-10 Temporal 88.0
Zhang et al.(2020) [12] 784-1000-10 Temporal 88.1
W.Zhange et al. (2020) [17] 32C5-P2-64C5-P2-1024-10 rate 92.8
Fang et al.(2020) [19] 128C3-P2-128C3-P2-2048-100-10 rate 93.8
Zhang et al.(2020) [12] 16C5-P2-32C5-P2-800-128-10 Temporal 90.1
STiDi-BP in R-CSNN (This paper) 20C5-P2-40C5-P2-1000-10 Temporal 92.8
Table 4: The classification accuracies of recent supervised SNNs with direct training on the fashion-MNIST dataset with input coding scheme and the network structure are provided in the table. The convolution layer and pooling layer are represented by C and P, respectively and layers are separated by -.

[17, 19] that achieve the highest performance with the convolutional structure, are based on rate coding scheme with the great deal of computation. Among the temporal coding-based SNN approaches, [12] and this work are the only implementation of CSNN in which, the proposed learning algorithm outperforms and reaches the accuracy of which is not much different from rate coding schemes.

The mean firing times of the output neurons for each categories of Fashion-MNIST are illustrated in Figure 5.

Figure 5: The mean firing times of the output neurons over the Fashion-MNIST categories in R-CSNN.

The correct output neuron tends to fire earlier than the others for its corresponding category, yet the difference between the mean firing times of the correct and some other output neurons is small. This is due to the similarities between images of different categories in Fashion-MNIST compared to MNIST.

These similarities are more clearly shown in Figure 6

, where, the confusion matrix of the proposed learning algorithm on Fashion-MNIST is depicted. According to Figure 

6, the network confuses T_shirt, shirt, dress, coat and pullover due to their similar images. And, the same goes for ankle boots, sandals, and sneakers that have close mean firing times together. For example, the output neuron corresponding to the ‘Ankle boot’ sample has the mean firing time of , while, it fires respectively at for ‘sneaker’ and ‘sandal’ samples, which are very close to .

Figure 6: The confusion matrix of R-CSNN on Fashion-MNIST.

In Table 5, we show the mean firing time of correct output neurons and the mean required number of spikes in all layers. Again, it is not necessary to give all the input spikes of an image to the network and by firing a limited number of neurons in each layer, the network can recognize the image category.

Category T_shirt Trouser Pulloiver Dress Coat Sandal Shirt Sneaker Bag Ankle boot
MFT 73 68 75 72 74 70 77 64 72 66
MRN 2836 2096 3227 2386 3169 1886 3020 1771 3015 2592
Table 5: The mean firing time (MFT) of the correct output neuron and the mean required number (MRN) of spikes in all the layers for each category of Fashion-MNIST in R-CSNN.

5.2 Binary weights

5.2.1 MNIST dataset

Here we apply STiDi-BP to directly train a CSNN with binary synaptic weights (B-CSNN) and evaluate it on the MNIST dataset. The network has the same structure and parameters as R-CSNN in section , except that the scaling factor (SCF) of each layer as another trainable parameter has been added to it. The parameter settings are provided in Table 6. In the convolutional layer, the value of the parameter is shared between all the weights.

layer initial SCF
convolutional 0.0001 [0, 2]
Hidden 0.001 [0, 3]
Output 0.0001 [0, 2]
Table 6: Model parameters for MNIST dataset in B-CSNN.

The mean required number of spikes in each layer is depicted in Figure 7.

Figure 7: The mean required number of spikes in the input, convolutional, hidden, and total layers in B-CSNN.

By comparing Figure 7 with Figure 3, we see that the mean required number of spikes in the input and convolutional layers are almost equal to the R-CSNN and the difference is in the hidden layer, where, B-CSNN needs more number of spikes than R-CSNN. In fact, due to the use of binary weights, the B-CSNN should wait for more time steps in the hidden layer to detect the corresponding category of an input image. Therefore more spikes are generated in the hidden layer. For example, the network needs about spikes in total layers to correctly recognize digit , while, there was spikes in R-CSNN.

In Table 7, we show the classification accuracy of STiDi-BP on two networks of R-CSNN and B-CSNN and compare with BS4NN[25]. As mentioned before, BS4NN is the only network aimed at directly training multi-layer temporal SNNs with binary weights. Because of employing convolutional structure, B-CSNN outperforms the BS4NN. And, compared to R-CSNN, the performance of B-CSNN only dropped by which is due to the use of binary weights.

Model structure Accuracy()
BS4NN[25] 784-600-10 97.0
R-CSNN 40C5-P2-1000-10 99.2
B-CSNN 40C5-P2-1000-10 98.6
Table 7: The classification accuracies of recent binary SNNs with direct training on the MNIST dataset.

5.2.2 Fashion-MNIST dataset

Here we evaluate B-CSNN on the Fashion-MNIST dataset. The network has the structure of with the initial scaling factors in range . The value of is for the convolutional layers and for the hidden and the output layers. In the convolutional layers, the value of is different for each convolutional filter and is trained independently of the others. Other parameters are the same as Table 3.

We illustrate the classification accuracy of the proposed learning algorithm on B-CSNN and R-CSNN in Table 8 and compare them with the BS4NN.

Model structure Accuracy()
BS4NN[25] 784-1000-10 87.3
R-CSNN 20C5-P2-40C5-P2-1000-10 92.8
B-CSNN 20C5-P2-40C5-P2-1000-10 92.0
Table 8: The classification accuracies of recent binary SNNs with direct training on the Fashion-MNIST dataset.

As seen, B-CSNN outperforms the BS4NN, due to the use of convolutional structure. And, there is only dropped compared to R-CSNN. Therefore, the proposed algorithm is able to directly train a SNN with binary weights without any significant drop in the performance of the network.

The mean firing time of correct output neurons along with the mean required number of spikes of all layers are depicted in Table 9. As seen, the mean required number (MRN) of spikes of all layers in the B-CSNN is more than R-CSNN for each category. This can be due to the use of binary weights which makes B-CSNN to wait for more time steps to detect the corresponding category of an input image.

Category T_shirt Trouser Pullover Dress Coat Sandal Shirt Sneaker Bag Ankle boot
MFT 74 70 74 73 72 71 75 68 71 67
MRN 3013 2255 3418 2541 3351 2023 3204 1824 3201 2742
Table 9: The mean firing time (MFT) of the correct output neuron and the mean required number (MRN) of spikes in all the layers for each category of Fashion-MNIST in B-CSNN.

6 Discussion

In this paper, we used a convolutional SNN, as the deep structure of SNN, with two modes of real-valued and binary weights. Then, we employed the proposed supervised learning algorithm, STiDi-BP, to directly train both networks. In the learning phase, we applied gradient descent (GD) to each layer independently to discard the backward recursive gradient computation. Therefore, the desired firing times at the middle layers should be defined by using presynaptic spike time displacement, while, the desired output spike times were defined by using the relative timing of output neurons.

The most important advantage of our proposed approach is the use of temporal single-spike coding. In such methods, calculations in the backward direction are only performed at the actual firing times and it is not required to backpropagate the error in all the time steps which decreases the computational cost and the required storage space. The space complexity in each layer during the backward pass is . While, in the rate coding schemes, the space complexity in each layer in the backward pass is , where is the number of time steps, due to backpropagating the error in all the time steps. Also, contrary to the rate-based CSNNs, the max-pooling operation can be simply done by propagating the first spike emerging inside the receptive window of each pooling neuron.

Many of the existing supervised learning algorithms are based on rate or multi-spike coding [28, 29, 30, 31, 32, 33, 34] which require expensive computation. There are few works that focus on the single-spike-based temporal coding [20, 4, 5, 6, 11, 25, 12, 7]. Among the existing single-spike-based temporal approaches, Zhang et al. [12] and Zhou et al. [7] are the only implementation of a convolutional SNN architecture with single-spike-timing-based supervised learning algorithms and others are based on fully connected networks.

With the exception of [12, 7], the other CSNNs have been presented in two forms: 1- the converted version of traditional CNNs [13, 14, 15, 16] and, 2- CSNNs that use rate coding or multi-spike-based coding schemes to be directly trained by BP [17, 18, 19]. These approaches are computationally expensive due to the use of rate coding scheme and are based on backward recursive gradient computation.

Experimental results confirmed that the proposed approach can be applied to a CSNN and it achieves acceptable results compared to [12]. The CSNN trained by this algorithm reaches accuracy on MNIST dataset and accuracy of on Fashion-MNIST as the more challenging dataset.

Adapting the proposed learning rule to CSNN, removes backward recursive gradient computation, and reduces the complexity of neural processing and computational cost.

Binarizing the synaptic weights is another important improvement which helps optimization in hardware implementations of deep SNNs [35, 36]. Current Binary SNNs are the converted version of pre-trained BANNs [21, 22, 23, 24]. They train a BANN by using traditional BP and then, convert it into the equivalent BSNN with rate-based neural coding. Here, we developed a CSNN with binary weights, which are the sign of real-valued weights, and employed the proposed learning rule to directly train it. The forward pass is done with the binary weights and, in the backward pass, we updated the real-valued weights. The proposed BCSNN uses single-bit of memories for implementing binary synapses and employs only one full-precision scaling factor in each layer or each convolutional filter. Therefore, the network size can be reduced by compared to a network with -bit floating-point synaptic weights [12]. Also, due to the use of single-bit synapses, the multiplier blocks that impose high load of floating-point computation to the network can be replaced by one unit increment and decrement blocks [12]. The evaluation results shows that the BCSNN has a negligible performance drop compared to the CSNN, respectively and

accuracy drop on MNIST and Fashion-MNIST datasets. While it has more advantages than the CSNN in terms of hardware implementation. To the best of our knowledge, this is the first implementation that aim to directly train a deep structure of single-spike-based temporal SNNs with binary synaptic weights. However, one of the most important challenges we face is to make the network deeper to solve more complex problems such as CIFAR10 or ImageNet classification which can be our future topic.

References