Efficient Spiking Neural Networks with Logarithmic Temporal Coding

11/10/2018 ∙ by Ming Zhang, et al. ∙ Zhejiang University 0

A Spiking Neural Network (SNN) can be trained indirectly by first training an Artificial Neural Network (ANN) with the conventional backpropagation algorithm, then converting it into an SNN. The conventional rate-coding method for SNNs uses the number of spikes to encode magnitude of an activation value, and may be computationally inefficient due to the large number of spikes. Temporal-coding is typically more efficient by leveraging the timing of spikes to encode information. In this paper, we present Logarithmic Temporal Coding (LTC), where the number of spikes used to encode an activation value grows logarithmically with the activation value; and the accompanying Exponentiate-and-Fire (EF) spiking neuron model, which only involves efficient bit-shift and addition operations. Moreover, we improve the training process of ANN to compensate for approximation errors due to LTC. Experimental results indicate that the resulting SNN achieves competitive performance at significantly lower computational cost than related work.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Deep Learning based on Artificial Neural Networks (ANNs) has achieved tremendous success in many application domains in recent years [1, 2, 3]

. Spiking Neural Networks (SNNs) use neuron action potentials, or spikes, for event-driven computation and communication. If the number of spikes are low, then most neurons and synapses in an SNN may be idle most of the time, hence the hardware implementation of SNNs can be much more efficient than conventional ANNs used in Deep Learning for inference tasks. Training, or learning, algorithms for SNNs

[4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14] are an active area of research, and are not as mature as conventional Deep Learning. Several recent SNN learning algorithms based on spiking variants of backpropagation [15, 16] achieved good performance, but their neuron models incur high computational cost. One alternative is to use ANN-to-SNN conversion techniques [17, 18, 19, 20, 21, 22, 23], which works by first training an ANN with the conventional backpropagation algorithm, then converting it into an SNN. Most existing ANN-to-SNN conversion methods are based on rate-coding, where activations in the ANN are approximated by firing rates of the corresponding spike trains in the SNN, and the number of spikes for encoding a real-valued activation grows linearly with the activation value. For current methods [18, 19] to achieve performance comparable to the ANN, the neurons in the SNN have to fire a large number of spikes, which leads to high computational cost. Although several recent methods [22, 23] reduced the number of spikes by employing more efficient neural coding, these methods relied on complex neuron models that continually perform expensive operations.

In this paper, we propose an ANN-to-SNN conversion method based on novel Logarithmic Temporal Coding (LTC), where the number of spikes for encoding an activation grows logarithmically with the activation value in the worse case. LTC is integrated with the Exponentiate-and-Fire (EF) spiking neuron model. Note that the EF neuron model is not biologically realistic. It is an artificial model that we designed to use in conjunction with LTC for efficient computation in an SNN. If implemented with fixed-point arithmetic, an EF neuron performs a bit shift every time step and an addition for every incoming spike. Furthermore, we introduce approximation errors of LTC into the ANN, and leverage the training process of the ANN to compensate for the approximation errors, eliminating most of performance drop due to ANN-to-SNN conversion. Compared with rate-coding methods, our temporal-coding method achieves similar performance at significantly lower computational cost. Experimental results show that, for a CNN architecture with sufficient model capacity, the proposed method outperforms rate-based coding, achieving test accuracy of 99.41% on the MNIST dataset, and computational cost reduction of 93.61%.

Ii Related work

Learning for single-layer SNNs is a well-studied topic. Supervised learning algorithms aimed to train an SNN to classify input spatiotemporal patterns

[4] or to generate control signals with precise spike times in response to input spatiotemporal patterns [5, 6, 7]. The Tempotron rule [4] trained a spiking neuron to perform binary classification by firing one or more spikes in response to its associated class. ReSuMe [5] trained spiking neurons to generate target spike trains in response to given spatiotemporal patterns. Supervised learning was achieved by combining learning windows of Hebbian rules and a concept of remote supervision. The E-learning rule of Chronotron [6] improved memory capacity by minimizing a modified version of the Victor and Purpura (VP) distance between the output spike train and the target spike train with gradient descent. SPAN [7]

also achieved improved memory capacity over ReSuMe, but with a simpler learning rule than Chronotron. The learning rule was a spiking variant of the Delta rule, where the input, output, and target values are replaced with convolved spike trains. These algorithms depend on predefined target spike trains, which are not available for neurons in the hidden layers of a multi-layer SNN. Unsupervised learning rules aimed to train an SNN to detect spatiotemporal patterns or extract features from input stimuli. In

[9], a population of spiking neurons connected with lateral inhibitory synapses were trained using Spike Time-Dependent Plasticity (STDP) to recognize different spatiotemporal patterns. In [8]

, an event-driven variation of contrastive divergence was proposed to train a restricted Boltzmann machine constructed with integrate-and-fire neurons. These algorithms rely on specific network topologies with a single layer of spiking neurons. All the learning algorithms are limited to SNNs with a single layer of neurons. There is a large performance gap between the resulting SNNs and traditional ANNs.

Multi-layer SNNs are more difficult to train than single-layer SNNs. Backpropagation [24] cannot be directly applied to multi-layer SNNs due to the discontinuity associated with spiking activities. SpikeProp [10] adapted backpropagation for SNN, circumventing the discontinuity by assuming the membrane potential to be a linear function of time for a small region around spike times. SpikeProp was extended by later works to use Resilient Propagation and QuickProp [11], and to train neurons in hidden layers and the output layer to fire multiple spikes [12, 13, 14]. However, there is still a large performance gap between these SNNs and traditional ANNs. Recent works avoided making assumptions about the discontinuity. In [15], a custom spiking neuron model incorporated a spike generation algorithm to approximate intermediate values of both the forward pass and the backward pass with spike trains. The spike generation algorithm had to add the encoded value to its internal state for every neuron at every time step. In [16], the membrane potential of a neuron was assumed to be a differentiable function of postsynaptic potentials and the afterpotential, and the backward pass propagated errors through the postsynaptic potentials and the afterpotential instead of input and output spike times. The exponential decay of postsynaptic potentials and afterpotentials require two multiplications be performed for every neuron at every time step. These learning algorithms trained small SNNs with several layers to achieve comparable performance to that of traditional ANNs. However, they rely on complex neuron models that perform expensive arithmetic operations every time step. Furthermore, how these algorithms scale to deeper SNNs remains unclear.

Another line of work trained SNNs indirectly by converting a trained ANN into its equivalent SNN. In [17]

, an ANN with Rectified Linear Unit (ReLU) nonlinearity was trained using backpropagation and the weights were then directly mapped to an SNN of Integrate-and-Fire (IF) neurons with the same topology. In a similar way, an ANN with Softplus

[20] or Noisy Softplus [21] nonlinearity could be converted to an SNN of more biologically plausible Leaky Integrate-and-Fire (LIF) neurons. There was a significant performance gap between the resulting SNN and the original ANN. The performance gap was narrowed by weight normalization [18] and resetting membrane potential by subtracting the firing threshold [19]. With these improvements, the resulting SNNs achieved performance comparable to the corresponding ANNs. All of these ANN-to-SNN conversion methods were based on rate coding, where the number of spikes it takes to encode an activation grows linearly with the activation. Empirically, the neurons in the SNN have to maintain high firing rates to achieve a comparable performance to the original ANN. Since the computational cost a spiking neuron incurs is proportional to the number of incoming spikes, spike trains generated according to rate coding impose high computational cost on downstream neurons.

Recent ANN-to-SNN conversion methods reduced the number of spikes used to encode activations by employing more efficient neural coding. In [22], an ANN was converted to an Adapting SNN (AdSNN) based on synchronous Pulsed Sigma-Delta coding. When driven by a strong stimulus, an Adaptive Spiking Neuron (ASN) adaptively raises its dynamic firing threshold every time it fires a spike, reducing its firing rate. However, an ASN has to perform four multiplications every time step to update its postsynaptic current, firing threshold, and refractory response. In [23], an ANN was converted to an SNN based on temporal coding, where an activation in the ANN was approximated by the latency to the first spike of the corresponding spike train in the SNN. Thus, at most one spike needs to be fired for each activation. However, each Time-To-First-Spike (TTFS) neuron keeps track of synapses which have ever received an input spike, and has to add the sum of the synaptic weights to its membrane potential every time step. Although these methods reduce the number of spikes, their complex neuron models still incur high computational cost.

ANN-to-SNN conversion approximates real-valued activations with spike trains. The approximation errors contribute to the performance gap between the SNN and the ANN. Fortunately, a deep ANN can be trained to tolerate the approximation errors, if the approximation errors are introduced during the training phase. In [25], each activation of an ANN was approximated with a power of two, where the exponents of the powers were constrained within a set of several consecutive integers. The error tolerance of an ANN allows it to compensate for approximation errors in the corresponding SNN during the training phase, which in turn helps close the performance gap between the SNN and the ANN.

Different from existing ANN-to-SNN conversion methods, we reduce both the number of spikes and the complexity of the neuron model. We propose encoding activations with Logarithmic Temporal Coding (LTC), where the number of spikes grows logarithmically with the encoded activation in the worst case. If implemented with fixed-point arithmetic, our Exponentiate-and-Fire (EF) neuron model involves only bit shifts and additions. A neuron performs a bit shift every time step and an addition for every incoming spike.

Iii Method

Every time a spiking neuron receives an input spike, the membrane potential of the neuron is increased by the postsynaptic potential (PSP). Evaluation of PSPs contribute to most of the computational cost of an SNN. To reduce the number of spikes used to encode every activation throughout the ANN, we propose Logarithmic Temporal Coding (LTC). A real-valued activation is first approximated by retaining a predefined subset of bits in its binary representation. Then, a spike is generated for each of the remaining 1 bits and no spike is generated for the 0 bits. The number of spikes for encoding an activation grows logarithmically, rather than linearly, with the activation in the worst case.

We propose Exponentiate-and-Fire (EF) neuron used in conjunction with LTC, which performs equivalent computation to that of an analog neuron with Rectified Linear Unit (ReLU) nonlinearity. Furthermore, we propose Error-tolerant ANN training, which leverages the ANN training process to compensate for approximation errors introduced by LTC and reduces the chance for EF neurons to fire undesired spikes.

We use the term “activation” to refer to output values of all analog neurons in an ANN, including neurons in the input, hidden and output layers.

Iii-a Logarithmic temporal coding

To encode a real-valued activation into a spike train, the activation is first represented as a binary number. Then, the activation is approximated by retaining only a subset of the bits of the binary number at a predefined set of consecutive positions; the other bits of the binary number are set to zero. Finally, for each remaining 1 bit of the binary number, a spike is generated with spike timing determined by the position of the bit in the binary number, while no spike is generated for the 0 bits.

An real-valued activation can be represented as the sum of a possibly infinite series of powers of two with different integer exponents . We approximate the real-valued activation by constraining the exponents within a predefined exponent range , i.e., a finite set of consecutive integers from to . This approximation can be formulated as a closed-form equation:

(1)

Since the approximation defined by Eqn. 1 may involve multiple powers of two, we refer to this approximation as Multi-Power Logarithmic Approximation (Multi-Power LA).

As a special case, if we further require the approximation to involve at most one single power of two, the approximation reduces to Single-Power Logarithmic Approximation (Single-Power LA):

(2)

We refer to multi-power LA and single-power LA collectively as Logarithmic Approximation (LA).

In order to generate an LTC spike train from a logarithmic approximation , we define a time window with discrete time steps . If a power of two contributes to the logarithmic approximation , i.e., is present in the series of powers of two of , then a spike is present in the LTC spike train with a spike time . There are two variants of LTC: Multi-spike LTC corresponds to multi-power LA, while Single-spike LTC corresponds to single-power LA.

Obviously, single-spike LTC encodes a real-valued activation into a spike train with at most one single spike. For multi-spike LTC, we derive an upper bound of the number of spikes used to encode a real-valued activation, as Proposition 1 states.

Proposition 1.

Suppose multi-spike LTC encodes a real value into a spike train with spikes. If , then ; if , then ; if , then .

Proof.

Let be the multi-power LA of . Any power with an integer exponent cannot contribute to , because . For a power with an integer exponent to contribute to , .

If , . Hence, no power of two contributes to the multi-power LA of . According to LTC, the spike train contains no spike, hence .

If , . In the worst-case, every with an integer exponent in the set contributes to . Thus, .

If , . Every with an integer exponent contributes to , hence . ∎

The logarithmic increase in the number of spikes for LTC is much slower than the linear increase for rate coding. The slow increase comes at the cost of significant approximation error. Since both LA and LTC are deterministic, the approximation error can be easily introduced into activations of an ANN during the training phase. We leverage the training process of an ANN to compensate for the approximation errors, as detailed in Section III-C.

Iii-B Exponentiate-and-Fire (EF) neuron model

Figure 1 illustrates the Exponentiate-and-Fire (EF) neuron model. An EF neuron integrates input spikes using an exponentially growing PSP kernel, and generates output spikes using an exponentially growing afterhyperpolarizing potential (AHP) kernel. With the exponentially growing kernel, an EF neuron is able to perform computation that is equivalent to the computation of an analog neuron with ReLU nonlinearity.

Fig. 1: Computation graph of an Exponentiate-and-Fire (EF) neuron.

The EF neuron model is based on the Spike Response Model (SRM) [26] with discrete time. The membrane potential at time is given by:

(3)

where is the set of synapses; is weight of synapse ; is the total PSP elicited by the input spike train at synapse ; is the set of output spike times; is the AHP elicited by the output spike at time ; evaluates to 1 if and only if the condition enclosed within the brackets is true. is the pre-reset membrane potential immediately before the reset:

(4)

Iii-B1 Input integration

Input spike trains of a neuron are generated using the input exponent range of the neuron, and presented to the neuron during its input time window , where . The exponentially growing PSP kernel is used to integrates input spikes:

(5)

where is the current time; is time of the input spike. With this PSP kernel, the PSP elicited by an input spike is equal to at the spike time , and doubles every time step thereafter.

The total PSP elicited by an input spike train at the synapse is the superposition of PSPs elicited by all spikes in the spike train:

(6)

where is the set of spike times of the input spike train.

If the EF neuron does not fire any output spike before , then no output spike would interfere with input integration, and the EF neuron computes a weighted sum of the LAs of its input spike trains, as Lemma 1 states.

Lemma 1.

The pre-reset membrane potential of an EF neuron , if the EF neuron does not fire any output spike during the time interval , where is the LA of the -th input LTC spike train.

Proof.

According to LA, . According to LTC, the spike time corresponding to the power is . The total PSP elicited by the -th input LTC spike train at is given by

(7)

Since the EF neuron does not fire any output spike before , reduces to a weighted sum of PSPs elicited by the input spike trains:

(8)

completing the proof. ∎

Iii-B2 Output spike generation

The goal of an EF neuron is to generate an output LTC spike train that encodes using its output exponent range and present the spike train within its output time window. The output time window starts at the last time step of the input time window, and lasts for time steps.

An EF neuron generates an output spike train by thresholding its exponentially growing membrane potential. Specifically, the EF neuron doubles its membrane potential every time step after the time step , as dictated by the exponentially growing PSP kernel and AHP kernel (detailed below), until its pre-reset membrane potential reaches its firing threshold from below, when it fires an output spike at time , and its membrane potential is reset.

A Multi-Spike EF neuron resets its membrane potential by subtracting the firing threshold from it:

(9)

A Single-Spike EF neuron resets its membrane potential to 0:

(10)

After resetting its membrane potential, a multi-spike EF neuron doubles its membrane potential every time step thereafter, and may fire subsequent output spikes. In contrast, a single-spike EF neuron does not fire any subsequent spike, since its membrane potential remains zero after the reset.

If the EF neuron receives all input spikes within its input time window, then no input spike would interfere with output spike generation during its output time window, and the EF neuron generates the desired output LTC spike train within its output time window, as Lemma 2 states.

Lemma 2.

An EF neuron generates an output LTC spike train that encodes using its output exponent range and presents the spike train within its output time window, if the EF neuron does not receive any input spike after the end of its input time window.

We prove Lemma 2 in Appendix A.

Theorem 1.

An EF neuron performs equivalent computation to the computation of an analog neuron with ReLU nonlinearity, and encodes the result into its output LTC spike train, if the following conditions hold:

  1. All input spikes are received within its input time window, and

  2. No output spikes are fired before the beginning of its output time window.

Proof.

Theorem 1 follows from Lemmas 1 and 2. ∎

However, with the spike generation mechanism alone, an EF neuron may fire undesired output spikes outside its output time window. An undesired early output spike before the output time window interferes with input integration of the neuron. In addition, the output time window of a layer of EF neurons is the input time window of the next layer . An undesired late output spike after the output time window interferes with output spike generation of the downstream neurons. Undesired output spikes break the equivalence between EF neurons and analog ReLU neurons, which in turn degrades the performance of the SNN.

In order to prevent undesired output spikes of an EF neuron from affecting the downstream neurons, we allow output spikes within the output time window to travel to the downstream neurons, and discard undesired output spikes outside the output time window. Furthermore, we reduce the chance for an EF neuron to fire an undesired early output spike by suppressing excessively large activations of the corresponding analog ReLU neuron, as detailed in Section III-C.

Algorithm 1 shows operations an EF neuron performs at every time step. First, the membrane potential is doubled (Eqn. 5, 9 and 10). Then, the input current is calculated by summing up weights of the synapses that receive an input spike at the current time step (Eqn. 3). The input current is scaled by the resistance (Eqn. 5) and the result is added to the membrane potential (Eqn. 3). If the membrane potential is greater than or equal to the firing threshold , an output spike is fired, and the membrane potential is reset accordingly (Eqn. 9 and 10).

From Algorithm 1, it can be seen that the EF neuron model can be efficiently implemented in hardware with fixed-point arithmetic. If is implemented as a fixed-point number, it can be doubled by a bit shift; if is implemented as a floating-point number, it can be doubled by an addition to its exponent. The multiplication by can be avoided by pre-computing for every synaptic weight and using the scaled synaptic weights at run-time. The other arithmetic operations are additions and subtractions.

Iii-C Error-tolerant ANN training

Both LTC approximation errors and undesired early output spikes contribute to the performance gap between an ANN and the corresponding SNN. We introduce the approximation errors into the activations of the ANN by applying logarithmic approximation to every non-negative activation, and rely on the training process to compensate for the approximation errors. Furthermore, we regularize the loss function with the

Excess Loss to suppress excessively large activations, which in turn reduces the chance for an EF neuron to fire an undesired early output spike.

For every analog neuron of the ANN, we apply LA to its non-negative activations, so that the downstream neurons receive the approximate activations instead of the original activations. The variant of LA corresponds to the variant of LTC used to generate the corresponding spike train in the SNN. Negative pre-activations of the output layer are not approximated using LA and remain unchanged. For each layer , the minimum exponent and the maximum exponent

within the output exponent range are tuned as hyperparameters, similar to

[25]. To reduce the number of hyperparameters, we use the same output exponent range for all hidden layers.

As can be seen in Eqn. 1 and 2, the derivative of the LA w.r.t. the real-valued activation is zero almost everywhere, which prevents backpropagation from updating parameters of the bottom layers of the ANN. To allow gradients to pass through LA, for both variants of LA, we define the derivative of w.r.t. as

(11)

In order to suppress excessively large activations, we define the Excess Loss as

(12)

where the outer sum runs across training examples , the middle sum runs across all layers of the ANN, the inner sum runs across all neurons of the layer , and is the activation of the -th neuron of the layer for the -th training example. The excess loss punishes large positive activations of every layer that are greater than .

Input integration:
     
     
     
Output spike generation:
     if  then
         Fire an output spike
         if the neuron is a multi-spike EF neuron then
              
         end if
         if the neuron is a single-spike EF neuron then
              
         end if
     end if
Algorithm 1 Operations performed by an EF neuron at every time step.

The excess loss is added to the loss function of the ANN, which is to be minimized by the training process:

(13)

where is the loss of the ANN on training data given parameters , and is a hyperparameter that controls the strength of the excess loss.

Although the excess loss does not completely prevent EF neurons from firing undesired early output spikes, it makes undesired early output spikes unlikely. Our experiments show that performance of an SNN with LTC is very close to the performance of the corresponding ANN with LA; the negative impact of undesired early output spikes seems to be negligible.

Iv Experimental results

Iv-a Experimental setup

We conduct our experiments on a PC with an nVidia GeForce GTX 1060 GPU with a 6 GB frame buffer, a quad-core Intel Core i5-7300HQ CPU, and 8 GB main memory. We use TensorFlow

[27] not only for training and testing ANNs, but also for simulating SNNs. For each SNN, we build a computation graph with operations performed by the SNN at every time step, where every spiking neuron outputs either 1 or 0 to indicate whether it fires an output spike or not. The computation graph is run once for every time step with appropriate input values.

We use the MNIST dataset of handwritten digits [24], which consists of 70000 28x28-pixel greyscale images of handwritten digits, divided into a training set of 60000 images and a test set of 10000 images. For hyperparameter tuning, we further divide the original training set into a training set of 55000 images and a validation set of 5000 images. The test set is only used to test ANNs and SNNs after all hyperparameters are fixed.

We use two CNN architectures in our experiments. One is the CNN-small architecture (12C5@28x28-P2-64C5@12x12-P2-F10) with limited model capacity. This architecture was also used in previous work [18, 21]. The other is the CNN-large architecture (32C5@28x28-P2-64C5@14x14-P2-F1024-F10) [28].

Iv-B Configuration of training and testing

We consider 5 types of CNNs, each for both CNN-small and CNN-large:

  1. CNN-original: Original CNNs with zero biases, ReLU nonlinearity, and average pooling. CNNs of this type are converted to two types of SNNs. The SNN-rate-IF-rst-zero type uses the reset-to-zero mechanism [18], while the SNN-rate-IF-rst-subtract type uses the reset-by-subtraction mechanism [19]. We refer to SNN-rate-IF-rst-zero and SNN-rate-IF-rst-subtract collectively as SNN-rate-IF. Since data-based normalization was shown to outperform model-based normalization, the weights of the CNNs are normalized with data-based normalization.

  2. CNN-TF: Same as CNN-original, except that the transfer function proposed in [22] is used as the nonlinearity. The corresponding SNN type is SNN-ASN, where SNNs consist of Adaptive Spiking Neurons (ASNs) [22]. We do not implement the arousal mechanism.

  3. CNN-CR: Same as CNN-original, except that clamped ReLU [23]

    is used as the nonlinearity, and that max-pooling is used instead of average-pooling. The corresponding SNN type is

    SNN-TTFS, where SNNs consist of Time-To-First-Spike (TTFS) neurons [23].

  4. CNN-multi-power-LA: Same as CNN-original, except that all activations throughout the CNN are approximated with multi-power LA. The corresponding SNN type is SNN-multi-spike-LTC, where EF neurons in the hidden and output layers generate multi-spike LTC spike trains.

  5. CNN-single-power-LA: Same as CNN-multi-power-LA, except that activations of hidden neurons are approximated with single-power LA. The corresponding SNN type is SNN-single-spike-LTC, which is the same as SNN-multi-spike-LTC, except that the EF neurons in the hidden layers generate single-spike LTC spike trains.

We refer to CNN-multi-power-LA and CNN-single-power-LA collectively as CNN-LA, and SNN-multi-spike-LTC and SNN-single-spike-LTC collectively as SNN-LTC. For each CNN type, we train five CNNs separately with the same hyperparameters and convert them to SNNs.

For SNN-rate-IF, the maximum input rate for generating an input spike train is 1 spike per time step, since this maximum input rate was shown to achieve the best performance [18]. For CNN-TF and SNN-ASN, we adopt the hyperparameters for the transfer function and ASNs in [22]. The resting threshold and the multiplicative parameter are set to a large value 0.1 to decrease firing rates of ASNs. For both CNN-LA types, Table I shows exponent ranges for different layers and the strength of the excess loss.

CNN
arch.
Exponent ranges
Excess
loss
Input Hidden Output
CNN-small
CNN-large
TABLE I: Exponent ranges and strength of excess loss for CNNs of CNN-LA types.

For SNN-rate-IF and SNN-ASN types, each of the SNNs is simulated for 500 time steps. For SNN-TTFS, simulation for an input image is stopped after the output layer fires the first output spike [23].

Iv-C Performance evaluation

Table II compares final average test accuracies of our ANN-to-SNN conversion methods with those of previous ANN-to-SNN conversion methods. The “Method” column shows SNN types, where the “SNN-” prefix is omitted. “small” and “large” in round brackets denote the CNN-small and CNN-large architectures, respectively. The “Dev.” column shows the maximum difference between the test accuracy of an SNN and the test accuracy of the corresponding CNN. For the SNN-rate-IF types, since input spike trains are generated stochastically, we test each of these SNNs five times. For each combination of CNN architecture and CNN/SNN type, the final average test accuracy in the table is obtained by averaging the final test accuracies of all test runs of the neural networks.

Method Test accuracy (%)
Dev.
(%)
# neurons
CNN SNN
Rate-IF-rst-zero (small) [18] 99.25 99.20 0.16
Rate-IF-rst-subtract
(small) [19]
99.25 99.25 0.06
ASN (small) [22] 99.43 99.43 0.04
TTFS (small) [23] 99.22 98.53 0.83
Multi-spike-LTC (small)
[this work]
99.23 99.23 0.00
Single-spike-LTC (small)
[this work]
99.03 99.03 0.00
Rate-IF-rst-zero (large) [18] 99.27 99.24 0.09
Rate-IF-rst-subtract
(large) [19]
99.27 99.27 0.12
ASN (large) [22] 99.45 99.44 0.04
TTFS (large) [23] 99.47 99.20 0.44
Multi-spike-LTC (large)
[this work]
99.38 99.38 0.00
Single-spike-LTC (large)
[this work]
99.41 99.41 0.02
Rate-LIF-Softplus [20] N/A 98.36 N/A 710
Rate-LIF-Noisy-Softplus [21] 99.05 98.85 0.20
TABLE II: Comparison of final average test accuracies of ANN-to-SNN methods.

For the CNN-small architecture, SNN-multi-spike-LTC achieves an average test accuracy that is lower than that of SNN-ASN and similar to those of the SNN-rate-IF types. SNN-single-spike-LTC achieves a lower average test accuracy than those of SNN-multi-spike-LTC and the SNN-rate-IF types. Both SNN-LTC types achieve a significantly higher average test accuracy than SNN-TTFS.

The difference in average test accuracy between SNN-rate-IF, SNN-ASN, and SNN-LTC is closely related to the model capacities of the corresponding CNN types. With a small exponent range size (4 for hidden layers), multi-power LA significantly decreases the precision of activations by mapping them to a few discrete values. The decrease in precision leads to a decrease in the model capacity of CNN-multi-power-LA. Hence multi-power LA can be seen as a regularizer. Single-power LA is a stronger regularizer than multi-power LA, since it further decreases the precision for activations. By contrast, the transfer function of CNN-TF maps real-valued activations to an interval of real numbers, which allows for much higher precision than the logarithmic approximations. Hence, the transfer function is a weaker regularizer than the logarithmic approximations.

For a small CNN architecture like CNN-small, which has limited model capacity even if all activations are real values, the strong regularization of the logarithmic approximations has a negative effect on the CNN-LA types’ ability of modeling training data. By contrast, the weak regularization of the transfer function has a negligible effect on CNN-TF’s ability of modeling training data, but helps it achieve a higher average test accuracy than CNN-original by mitigating overfitting.

For the CNN-large architecture, which has sufficient model capacity, both the logarithmic approximations and the transfer function have negligible effect on the CNN types’ ability of modeling training data; they mitigate overfitting and help CNN-TF and the CNN-LA types achieve a higher average test accuracy than CNN-original. Therefore, the SNN-LTC types outperform the SNN-rate-IF types and achieve similar average test accuracies to that of the SNN-ASN type.

As shown in the “Dev.” column of Table II, for the SNN-LTC types, the test accuracy of every SNN is very close to the test accuracy of the corresponding CNN. The difference in test accuracy is slightly larger for CNN-TF and SNN-ASN, and much larger for other CNN and SNN types, especially for CNN-CR and SNN-TTFS. For CNN-large, the performance gap between SNN-TTFS and CNN-CR prevents SNN-TTFS from achieving a higher average test accuracy than the SNN-LTC types, although CNN-CR achieves a higher average test accuracy than the CNN-LA types. There seems to be a closer similarity in behavior between SNN-LTC and CNN-LA than between other SNN types and their corresponding CNN types. The close similarity between SNN-LTC and CNN-LA in turn suggests that the excess loss is very effective in preventing EF neurons from firing undesired early spikes; the impact of few undesired early spikes is negligible.

For both CNN-large and CNN-small, the SNN-LTC types outperform SNN types based on LIF neurons.

Iv-D Computational cost evaluation

In this section, we compare the computational cost of our ANN-to-SNN conversion method with related work [18, 19, 22, 23].

In an SNN, every time a spike arrives at a synapse, which is referred to as a synaptic event, a postsynaptic potential is added to the membrane potential of the postsynaptic neuron. These operations contribute to most of the computational cost of an SNN. We use the average number of synaptic events that an SNN processes for every input image as a metric for the computational cost of the SNN. In addition, we also count the average number of spikes fired by all neurons of an SNN for every input image.

Figure 2 shows the experimental results for CNN-small. For each of SNN-rate-reset-zero, SNN-rate-reset-subtract, SNN-ASN, and SNN-TTFS, the computational cost and test accuracy at every time step during a test run of an SNN are plotted as a point. For every time step, these computational costs and test accuracies are averaged over all test runs of the SNN type. The resulting average computational costs and average test accuracies are plotted as a line. For SNN-LTC types, only the final computational cost and the final test accuracy are shown for every SNN.

(a)
(b)
(c)
(d)
Fig. 2: Computational costs and accuracies of SNNs with the CNN-small architecture. The “SNN-” prefix is omitted. LABEL:sub@fig_computational_cost_accuracy_cnn_small_synaptic_events_zoom_in is a close-up view of the region in the green box in LABEL:sub@fig_computational_cost_accuracy_cnn_small_synaptic_events_complete, and LABEL:sub@fig_computational_cost_accuracy_cnn_small_spikes_zoom_in is a close-up view of the region in the green box in LABEL:sub@fig_computational_cost_accuracy_cnn_small_spikes_complete.

As shown in Figure 2, SNN-LTC types achieve high test accuracies at low computational costs. At the same average computational costs, the SNN-rate-IF types and the SNN-ASN type achieve significantly lower average test accuracies ranging from 9.8% to 98%. The average test accuracies of SNN-rate-IF and SNN-ASN increase quickly with increasing computational costs at an early stage of the test runs, and then fluctuate near their maximum values for a long time until the end of simulation. The average test accuracy of SNN-TTFS increases rapidly with increasing computational costs at the end of simulation, when the output layers of the SNNs fire their first output spikes.

In order to compare the ever-changing average computational costs of previous ANN-to-SNN conversion methods with the final average computational costs of the SNN-LTC types, we find two kinds of reference computational costs for each of SNN-rate-IF-rst-zero, SNN-rate-IF-rst-subtract, SNN-ASN, and SNN-TTFS. One is the stable computational cost where the average test accuracy converges to the final average test accuracy. Specifically, we consider the average test accuracy to have converged if it remains within the range around the final average test accuracy until the end of the simulation time. The other kind of reference computational costs are the matching computational costs w.r.t. to each of the SNN-LTC types, where the average test accuracy of the SNN-rate-IF, SNN-ASN, or SNN-TTFS type starts to surpass the average test accuracy of the SNN-LTC type. The reference computational costs are marked with vertical lines in Figure 2.

Table III compares computational costs of our ANN-to-SNN conversion methods with those of previous ANN-to-SNN conversion methods, for the CNN-small architecture. For every SNN-rate-IF type and the SNN-ASN type, both the stable computational costs and the matching computational costs are shown, along with the ratios (in percentage) of the SNN-LTC types’ computational costs to the reference computational costs. The matching computational cost of SNN-rate-rst-zero w.r.t. SNN-multi-spike-LTC is not shown, because the average test accuracy of SNN-multi-spike-LTC is higher than the highest average test accuracy of SNN-rate-rst-zero. The matching computational costs of SNN-TTFS are not shown for the same reason.

As shown in Table III, the average computational costs of the SNN-LTC types are much lower than the reference computational costs of the SNN-rate-IF types and the SNN-ASN type. Compared with the SNN-rate-IF types, SNN-multi-spike-LTC achieves a similar average test accuracy while reducing the computational cost by more than 80% in terms of synaptic events and more than 65% in terms of spikes; SNN-single-spike-LTC reduces the computational cost by more than 76% in terms of synaptic events and more than 63% in terms of spikes, at the cost of a decrease of 0.22% in final average test accuracy. Compared with the SNN-ASN type, SNN-multi-spike-LTC reduces the computational cost by more than 69% in terms of synaptic events and more than 64% in terms of spikes, at the cost of a decrease of 0.2% in final average test accuracy; SNN-single-spikc-LTC reduces the computational cost by more than 73% in terms of synaptic events and more than 72% in terms of spikes, at the cost of a decrease of 0.4% in final average test accuracy. Compared with SNN-single-spike-LTC, SNN-multi-spike-LTC achieves a higher average test accuracy at a higher average computational cost.

# Synaptic events # Spikes
SNN-multi-spike-LTC SNN-single-spike-LTC SNN-multi-spike-LTC SNN-single-spike-LTC
SNN-rate-rst-zero (stable) () () () ()
SNN-rate-rst-zero (matching) N/A () N/A ()
SNN-rate-rst-subtract (stable) () () () ()
SNN-rate-rst-subtract (matching) () () () ()
SNN-ASN (stable) () () () ()
SNN-ASN (matching) () () () ()
SNN-TTFS (stable) () () () ()
TABLE III: Comparison of computational costs of SNN types with the CNN-small architecture.

Compared with SNN-TTFS, both SNN-LTC types achieve significantly higher average test accuracies, but at much higher average computational costs in terms of synaptic events. However, for SNN-TTFS, the number of synaptic events is an underestimate of the true computational cost. According to the membrane potential update rule (Equation (4) in [23]), a TTFS neuron keeps track of the synapses which have ever received an input spike, and adds the sum of the synaptic weights to its membrane potential every time step. The number of synaptic events accounts for the updates of the sum of synaptic weights, not the updates of the membrane potential. As shown in Table IV, the number of membrane potential updates (other ADDs) dominates the true computational cost of SNN-TTFS. The computational costs of the SNN-LTC types are similar to the true computational cost of SNN-TTFS. The average computational cost of SNN-multi-spike-LTC is 5.20% higher, and the average computational cost of SNN-single-spike-LTC is 11.43% lower.

# ADDs for synaptic events # Other ADDs Comput. cost
SNN-TTFS (stable)
SNN-multi-spike-LTC ()
SNN-single-spike-LTC ()
TABLE IV: Comparison of computational costs of SNN-LTC types and the SNN-TTFS type with the CNN-small architecture.

Figure 3 shows computational costs and test accuracies of SNNs with the CNN-large architecture. The SNN-LTC types achieve high test accuracies at low computational costs. At the average computational costs of the SNN-LTC types, the SNN-rate-IF types and the SNN-ASN type achieve very poor average test accuracies around 9.8%.

(a)
(b)
(c)
(d)
Fig. 3: Computational costs and accuracies of SNNs with the CNN-large architecture. The “SNN-” prefix is omitted. LABEL:sub@fig_computational_cost_accuracy_cnn_large_synaptic_events_zoom_in is a close-up view of the region in the green box in LABEL:sub@fig_computational_cost_accuracy_cnn_large_synaptic_events_complete, and LABEL:sub@fig_computational_cost_accuracy_cnn_large_spikes_zoom_in is a close-up view of the region in the green box in LABEL:sub@fig_computational_cost_accuracy_cnn_large_spikes_complete.

Similar to Table III, Table V compares computational costs of our ANN-to-SNN conversion methods with those of previous ANN-to-SNN conversion methods, for the CNN-large architecture. For the SNN-rate-IF types and the SNN-TTFS type, only the stable computational costs are shown, since the average test accuracies of the SNN-LTC types are higher than the highest average test accuracies of these types. Compared with the SNN-rate-IF types, the SNN-LTC types achieve higher average test accuracies while reducing the computational cost by more than 92% in terms of synaptic events and more than 91% in terms of spikes. Compared with the SNN-ASN type, the SNN-LTC types reduce the computational cost by more than 76% in terms of synaptic events and more than 75% in terms of spikes, at the cost of a slight decrease of less than 0.1% in final average test accuracy. SNN-single-spike-LTC slightly outperforms SNN-multi-spike-LTC at a lower computational cost.

# Synaptic events # Spikes
SNN-multi-spike-LTC SNN-single-spike-LTC SNN-multi-spike-LTC SNN-single-spike-LTC
SNN-rate-rst-zero (stable) () () () ()
SNN-rate-rst-subtract (stable) () () () ()
SNN-ASN (stable) () () () ()
SNN-ASN (matching) () () () ()
SNN-TTFS (stable) () () () ()
TABLE V: Comparison of computational costs of SNN types with the CNN-large architecture.

Both SNN-LTC types achieve higher average test accuracies than the SNN-TTFS type. As shown in Table VI, SNN-multi-spike-LTC and SNN-single-spike-LTC reduce the average computational cost by 41.22% and 43.22%, respectively.

# ADDs for synaptic events # Other ADDs Comput. cost
SNN-TTFS (stable)
SNN-multi-spike-LTC ()
SNN-single-spike-LTC ()
TABLE VI: Comparison of computational costs of SNN-LTC types and the SNN-TTFS type with the CNN-large architecture.

V Conclusions

In this work, we propose an ANN-to-SNN conversion method based on novel Logarithmic Temporal Coding (LTC), and the Exponentiate-and-Fire (EF) neuron model. Moreover, we introduce the approximation errors of LTC into the ANN, and train the ANN to compensate for the approximation errors, eliminating most of performance drop due to ANN-to-SNN conversion. The experimental results show that the proposed method achieves competitive performance at a significantly lower computational cost.

In future work, we are going to explore the combination of our logarithmic temporal coding, which sparsifies spike trains in time, and regularization techniques that sparsify spike trains across spiking neurons. Sparsifying spike trains across both space and time may help achieve further computational efficiency.

Appendix A An exponentiate-and-fire neuron generates a logarithmic temporal coding spike train

In this section, we prove Lemma 2 in Section III-B2.

We observe that an EF neuron doubles its membrane potential every time step if the neuron does not receive or fire any spike, as Lemma 3 states.

Lemma 3.

Let be two time steps, where . The pre-reset membrane potential of an EF neuron if the following conditions hold:

  1. the EF neuron does not receive any input spike during the time interval , and

  2. the EF neuron does not fire any output spike during the time interval .

Proof.

Lemma 3 follows from the definition of the membrane potential (Eqn. 3), the definition of the pre-reset membrane potential (Eqn. 4), and the exponentially growing postsynaptic potential and afterhyperpolarizing potential kernels (Eqn. 5, 9 and 10). ∎

With Lemma 3, we prove Lemma 2 below.

Proof.

Depending on the value of , there are four cases: , , , and .

If , the logarithmic approximation of is 0, and the desired LTC spike train contains no spikes. For the EF neuron, by Lemma 3, remains zero or negative during the output time window, and the neuron does not fire any output spike during this time interval. Hence, the neuron generates the desired LTC spike train within its output time window.

If , with exponent range , the logarithmic approximation of is 0, and the desired LTC spike train contains no spikes.

For the EF neuron, by Lemma 3, the first output spike time satisfies the following condition:

(14)

Solving the equation, we have

(15)

since , . In other words, the neuron fires the first output spike after the end of its output time window. Hence, the neuron generates the desired LTC spike train within its output time window.

For the remaining cases, we derive the spike times of the desired LTC spike train and the output spike times of the EF neuron, and show that the output spike train of the EF neuron is consistent with the desired LTC spike train at the end of this proof.

The case where corresponds to the case of Eqn. 1 where . In this case, Eqn. 1 can be formulated as

(16)
(17)
(18)

where the sum in Eqn. 16 runs across exponents from to the smallest . Note that gives the single-power LA of . By substituting and into Eqn. 17, we derive the spike times of the desired LTC spike train:

(19)

where is the -th output spike time. For both multi-spike LTC and single-spike LTC, Eqn. 19 gives the first spike time . For multi-spike LTC, Eqn. 19 also gives subsequent spike times. By further substituting Eqn. 17 and into Inequality 18, we derive constraints on the spike times:

(20)
(21)

For the EF neuron, every output spike time within the output time window satisfies the following conditions:

(22)
(23)

The first output spike may be fired either at time , if ; or at time (Lemma 3), if . In both cases, the first output spike time satisfies Eqns. 14 and 15.

In the case of single-spike LTC, the EF neuron fires a single output spike at . In the case of multi-spike LTC, the EF neuron may fire subsequent output spikes. Consider every two consecutive output spike times and , where . By Lemma 3, the pre-reset membrane potentials can be formulated as

(24)
(25)

Solving the recurrence relation above, we have

(26)

By substituting Eqn. 26 into Inequality 22 and considering Inequality 23, we have

(27)

By substituting Eqn. 26 into Inequality 22 and solving the resulting inequality for the minimum integer value for , we have

(28)

The case where corresponds to the case of Eqn. 1 where . In this case, Eqn. 1 can be formulated as

(29)
(30)

Note that gives the single-power LA of . By substituting into Eqn. 30, we derive the spike times of the desired LTC spike train:

(31)

For both multi-spike LTC and single-spike LTC, Eqn. 31 gives the first spike time . For multi-spike LTC, Eqn. 31 also gives subsequent spike times.

For the EF neuron, since , the first output spike time is

(32)

In the case of single-spike LTC, the EF neuron fires only a single output spike. In the case of multi-spike LTC, suppose the EF neuron fires an output spike at the time step . By Lemma 3,

(33)

It is easy to see that, if , then , and will be the next output spike time. Since , the EF neuron fires an output spike at every time step within its output time window. Hence,

(34)

By comparing Eqn. 15 and 28 with Eqn. 19, Inequality 27 with Inequality 20, Inequality 23 with Inequality 21, and Eqn. 34 with Eqn. 31, it can be seen that the output spike train of the EF neuron within its output time window is consistent with the desired LTC spike train, except that every output spike time of the EF neuron is larger than the corresponding spike time of the desired LTC spike train. The difference is due to the fact that the output time window of the EF neuron starts at the time step .

Therefore, in all cases, the EF neuron generates an LTC spike train that encodes within its output time window, completing the proof. ∎

References