. Spiking Neural Networks (SNNs) use neuron action potentials, or spikes, for event-driven computation and communication. If the number of spikes are low, then most neurons and synapses in an SNN may be idle most of the time, hence the hardware implementation of SNNs can be much more efficient than conventional ANNs used in Deep Learning for inference tasks. Training, or learning, algorithms for SNNs[4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14] are an active area of research, and are not as mature as conventional Deep Learning. Several recent SNN learning algorithms based on spiking variants of backpropagation [15, 16] achieved good performance, but their neuron models incur high computational cost. One alternative is to use ANN-to-SNN conversion techniques [17, 18, 19, 20, 21, 22, 23], which works by first training an ANN with the conventional backpropagation algorithm, then converting it into an SNN. Most existing ANN-to-SNN conversion methods are based on rate-coding, where activations in the ANN are approximated by firing rates of the corresponding spike trains in the SNN, and the number of spikes for encoding a real-valued activation grows linearly with the activation value. For current methods [18, 19] to achieve performance comparable to the ANN, the neurons in the SNN have to fire a large number of spikes, which leads to high computational cost. Although several recent methods [22, 23] reduced the number of spikes by employing more efficient neural coding, these methods relied on complex neuron models that continually perform expensive operations.
In this paper, we propose an ANN-to-SNN conversion method based on novel Logarithmic Temporal Coding (LTC), where the number of spikes for encoding an activation grows logarithmically with the activation value in the worse case. LTC is integrated with the Exponentiate-and-Fire (EF) spiking neuron model. Note that the EF neuron model is not biologically realistic. It is an artificial model that we designed to use in conjunction with LTC for efficient computation in an SNN. If implemented with fixed-point arithmetic, an EF neuron performs a bit shift every time step and an addition for every incoming spike. Furthermore, we introduce approximation errors of LTC into the ANN, and leverage the training process of the ANN to compensate for the approximation errors, eliminating most of performance drop due to ANN-to-SNN conversion. Compared with rate-coding methods, our temporal-coding method achieves similar performance at significantly lower computational cost. Experimental results show that, for a CNN architecture with sufficient model capacity, the proposed method outperforms rate-based coding, achieving test accuracy of 99.41% on the MNIST dataset, and computational cost reduction of 93.61%.
Ii Related work
also achieved improved memory capacity over ReSuMe, but with a simpler learning rule than Chronotron. The learning rule was a spiking variant of the Delta rule, where the input, output, and target values are replaced with convolved spike trains. These algorithms depend on predefined target spike trains, which are not available for neurons in the hidden layers of a multi-layer SNN. Unsupervised learning rules aimed to train an SNN to detect spatiotemporal patterns or extract features from input stimuli. In, a population of spiking neurons connected with lateral inhibitory synapses were trained using Spike Time-Dependent Plasticity (STDP) to recognize different spatiotemporal patterns. In 
, an event-driven variation of contrastive divergence was proposed to train a restricted Boltzmann machine constructed with integrate-and-fire neurons. These algorithms rely on specific network topologies with a single layer of spiking neurons. All the learning algorithms are limited to SNNs with a single layer of neurons. There is a large performance gap between the resulting SNNs and traditional ANNs.
Multi-layer SNNs are more difficult to train than single-layer SNNs. Backpropagation  cannot be directly applied to multi-layer SNNs due to the discontinuity associated with spiking activities. SpikeProp  adapted backpropagation for SNN, circumventing the discontinuity by assuming the membrane potential to be a linear function of time for a small region around spike times. SpikeProp was extended by later works to use Resilient Propagation and QuickProp , and to train neurons in hidden layers and the output layer to fire multiple spikes [12, 13, 14]. However, there is still a large performance gap between these SNNs and traditional ANNs. Recent works avoided making assumptions about the discontinuity. In , a custom spiking neuron model incorporated a spike generation algorithm to approximate intermediate values of both the forward pass and the backward pass with spike trains. The spike generation algorithm had to add the encoded value to its internal state for every neuron at every time step. In , the membrane potential of a neuron was assumed to be a differentiable function of postsynaptic potentials and the afterpotential, and the backward pass propagated errors through the postsynaptic potentials and the afterpotential instead of input and output spike times. The exponential decay of postsynaptic potentials and afterpotentials require two multiplications be performed for every neuron at every time step. These learning algorithms trained small SNNs with several layers to achieve comparable performance to that of traditional ANNs. However, they rely on complex neuron models that perform expensive arithmetic operations every time step. Furthermore, how these algorithms scale to deeper SNNs remains unclear.
Another line of work trained SNNs indirectly by converting a trained ANN into its equivalent SNN. In 
, an ANN with Rectified Linear Unit (ReLU) nonlinearity was trained using backpropagation and the weights were then directly mapped to an SNN of Integrate-and-Fire (IF) neurons with the same topology. In a similar way, an ANN with Softplus or Noisy Softplus  nonlinearity could be converted to an SNN of more biologically plausible Leaky Integrate-and-Fire (LIF) neurons. There was a significant performance gap between the resulting SNN and the original ANN. The performance gap was narrowed by weight normalization  and resetting membrane potential by subtracting the firing threshold . With these improvements, the resulting SNNs achieved performance comparable to the corresponding ANNs. All of these ANN-to-SNN conversion methods were based on rate coding, where the number of spikes it takes to encode an activation grows linearly with the activation. Empirically, the neurons in the SNN have to maintain high firing rates to achieve a comparable performance to the original ANN. Since the computational cost a spiking neuron incurs is proportional to the number of incoming spikes, spike trains generated according to rate coding impose high computational cost on downstream neurons.
Recent ANN-to-SNN conversion methods reduced the number of spikes used to encode activations by employing more efficient neural coding. In , an ANN was converted to an Adapting SNN (AdSNN) based on synchronous Pulsed Sigma-Delta coding. When driven by a strong stimulus, an Adaptive Spiking Neuron (ASN) adaptively raises its dynamic firing threshold every time it fires a spike, reducing its firing rate. However, an ASN has to perform four multiplications every time step to update its postsynaptic current, firing threshold, and refractory response. In , an ANN was converted to an SNN based on temporal coding, where an activation in the ANN was approximated by the latency to the first spike of the corresponding spike train in the SNN. Thus, at most one spike needs to be fired for each activation. However, each Time-To-First-Spike (TTFS) neuron keeps track of synapses which have ever received an input spike, and has to add the sum of the synaptic weights to its membrane potential every time step. Although these methods reduce the number of spikes, their complex neuron models still incur high computational cost.
ANN-to-SNN conversion approximates real-valued activations with spike trains. The approximation errors contribute to the performance gap between the SNN and the ANN. Fortunately, a deep ANN can be trained to tolerate the approximation errors, if the approximation errors are introduced during the training phase. In , each activation of an ANN was approximated with a power of two, where the exponents of the powers were constrained within a set of several consecutive integers. The error tolerance of an ANN allows it to compensate for approximation errors in the corresponding SNN during the training phase, which in turn helps close the performance gap between the SNN and the ANN.
Different from existing ANN-to-SNN conversion methods, we reduce both the number of spikes and the complexity of the neuron model. We propose encoding activations with Logarithmic Temporal Coding (LTC), where the number of spikes grows logarithmically with the encoded activation in the worst case. If implemented with fixed-point arithmetic, our Exponentiate-and-Fire (EF) neuron model involves only bit shifts and additions. A neuron performs a bit shift every time step and an addition for every incoming spike.
Every time a spiking neuron receives an input spike, the membrane potential of the neuron is increased by the postsynaptic potential (PSP). Evaluation of PSPs contribute to most of the computational cost of an SNN. To reduce the number of spikes used to encode every activation throughout the ANN, we propose Logarithmic Temporal Coding (LTC). A real-valued activation is first approximated by retaining a predefined subset of bits in its binary representation. Then, a spike is generated for each of the remaining 1 bits and no spike is generated for the 0 bits. The number of spikes for encoding an activation grows logarithmically, rather than linearly, with the activation in the worst case.
We propose Exponentiate-and-Fire (EF) neuron used in conjunction with LTC, which performs equivalent computation to that of an analog neuron with Rectified Linear Unit (ReLU) nonlinearity. Furthermore, we propose Error-tolerant ANN training, which leverages the ANN training process to compensate for approximation errors introduced by LTC and reduces the chance for EF neurons to fire undesired spikes.
We use the term “activation” to refer to output values of all analog neurons in an ANN, including neurons in the input, hidden and output layers.
Iii-a Logarithmic temporal coding
To encode a real-valued activation into a spike train, the activation is first represented as a binary number. Then, the activation is approximated by retaining only a subset of the bits of the binary number at a predefined set of consecutive positions; the other bits of the binary number are set to zero. Finally, for each remaining 1 bit of the binary number, a spike is generated with spike timing determined by the position of the bit in the binary number, while no spike is generated for the 0 bits.
An real-valued activation can be represented as the sum of a possibly infinite series of powers of two with different integer exponents . We approximate the real-valued activation by constraining the exponents within a predefined exponent range , i.e., a finite set of consecutive integers from to . This approximation can be formulated as a closed-form equation:
Since the approximation defined by Eqn. 1 may involve multiple powers of two, we refer to this approximation as Multi-Power Logarithmic Approximation (Multi-Power LA).
As a special case, if we further require the approximation to involve at most one single power of two, the approximation reduces to Single-Power Logarithmic Approximation (Single-Power LA):
We refer to multi-power LA and single-power LA collectively as Logarithmic Approximation (LA).
In order to generate an LTC spike train from a logarithmic approximation , we define a time window with discrete time steps . If a power of two contributes to the logarithmic approximation , i.e., is present in the series of powers of two of , then a spike is present in the LTC spike train with a spike time . There are two variants of LTC: Multi-spike LTC corresponds to multi-power LA, while Single-spike LTC corresponds to single-power LA.
Obviously, single-spike LTC encodes a real-valued activation into a spike train with at most one single spike. For multi-spike LTC, we derive an upper bound of the number of spikes used to encode a real-valued activation, as Proposition 1 states.
Suppose multi-spike LTC encodes a real value into a spike train with spikes. If , then ; if , then ; if , then .
Let be the multi-power LA of . Any power with an integer exponent cannot contribute to , because . For a power with an integer exponent to contribute to , .
If , . Hence, no power of two contributes to the multi-power LA of . According to LTC, the spike train contains no spike, hence .
If , . In the worst-case, every with an integer exponent in the set contributes to . Thus, .
If , . Every with an integer exponent contributes to , hence . ∎
The logarithmic increase in the number of spikes for LTC is much slower than the linear increase for rate coding. The slow increase comes at the cost of significant approximation error. Since both LA and LTC are deterministic, the approximation error can be easily introduced into activations of an ANN during the training phase. We leverage the training process of an ANN to compensate for the approximation errors, as detailed in Section III-C.
Iii-B Exponentiate-and-Fire (EF) neuron model
Figure 1 illustrates the Exponentiate-and-Fire (EF) neuron model. An EF neuron integrates input spikes using an exponentially growing PSP kernel, and generates output spikes using an exponentially growing afterhyperpolarizing potential (AHP) kernel. With the exponentially growing kernel, an EF neuron is able to perform computation that is equivalent to the computation of an analog neuron with ReLU nonlinearity.
The EF neuron model is based on the Spike Response Model (SRM)  with discrete time. The membrane potential at time is given by:
where is the set of synapses; is weight of synapse ; is the total PSP elicited by the input spike train at synapse ; is the set of output spike times; is the AHP elicited by the output spike at time ; evaluates to 1 if and only if the condition enclosed within the brackets is true. is the pre-reset membrane potential immediately before the reset:
Iii-B1 Input integration
Input spike trains of a neuron are generated using the input exponent range of the neuron, and presented to the neuron during its input time window , where . The exponentially growing PSP kernel is used to integrates input spikes:
where is the current time; is time of the input spike. With this PSP kernel, the PSP elicited by an input spike is equal to at the spike time , and doubles every time step thereafter.
The total PSP elicited by an input spike train at the synapse is the superposition of PSPs elicited by all spikes in the spike train:
where is the set of spike times of the input spike train.
If the EF neuron does not fire any output spike before , then no output spike would interfere with input integration, and the EF neuron computes a weighted sum of the LAs of its input spike trains, as Lemma 1 states.
The pre-reset membrane potential of an EF neuron , if the EF neuron does not fire any output spike during the time interval , where is the LA of the -th input LTC spike train.
According to LA, . According to LTC, the spike time corresponding to the power is . The total PSP elicited by the -th input LTC spike train at is given by
Since the EF neuron does not fire any output spike before , reduces to a weighted sum of PSPs elicited by the input spike trains:
completing the proof. ∎
Iii-B2 Output spike generation
The goal of an EF neuron is to generate an output LTC spike train that encodes using its output exponent range and present the spike train within its output time window. The output time window starts at the last time step of the input time window, and lasts for time steps.
An EF neuron generates an output spike train by thresholding its exponentially growing membrane potential. Specifically, the EF neuron doubles its membrane potential every time step after the time step , as dictated by the exponentially growing PSP kernel and AHP kernel (detailed below), until its pre-reset membrane potential reaches its firing threshold from below, when it fires an output spike at time , and its membrane potential is reset.
A Multi-Spike EF neuron resets its membrane potential by subtracting the firing threshold from it:
A Single-Spike EF neuron resets its membrane potential to 0:
After resetting its membrane potential, a multi-spike EF neuron doubles its membrane potential every time step thereafter, and may fire subsequent output spikes. In contrast, a single-spike EF neuron does not fire any subsequent spike, since its membrane potential remains zero after the reset.
If the EF neuron receives all input spikes within its input time window, then no input spike would interfere with output spike generation during its output time window, and the EF neuron generates the desired output LTC spike train within its output time window, as Lemma 2 states.
An EF neuron generates an output LTC spike train that encodes using its output exponent range and presents the spike train within its output time window, if the EF neuron does not receive any input spike after the end of its input time window.
An EF neuron performs equivalent computation to the computation of an analog neuron with ReLU nonlinearity, and encodes the result into its output LTC spike train, if the following conditions hold:
All input spikes are received within its input time window, and
No output spikes are fired before the beginning of its output time window.
However, with the spike generation mechanism alone, an EF neuron may fire undesired output spikes outside its output time window. An undesired early output spike before the output time window interferes with input integration of the neuron. In addition, the output time window of a layer of EF neurons is the input time window of the next layer . An undesired late output spike after the output time window interferes with output spike generation of the downstream neurons. Undesired output spikes break the equivalence between EF neurons and analog ReLU neurons, which in turn degrades the performance of the SNN.
In order to prevent undesired output spikes of an EF neuron from affecting the downstream neurons, we allow output spikes within the output time window to travel to the downstream neurons, and discard undesired output spikes outside the output time window. Furthermore, we reduce the chance for an EF neuron to fire an undesired early output spike by suppressing excessively large activations of the corresponding analog ReLU neuron, as detailed in Section III-C.
Algorithm 1 shows operations an EF neuron performs at every time step. First, the membrane potential is doubled (Eqn. 5, 9 and 10). Then, the input current is calculated by summing up weights of the synapses that receive an input spike at the current time step (Eqn. 3). The input current is scaled by the resistance (Eqn. 5) and the result is added to the membrane potential (Eqn. 3). If the membrane potential is greater than or equal to the firing threshold , an output spike is fired, and the membrane potential is reset accordingly (Eqn. 9 and 10).
From Algorithm 1, it can be seen that the EF neuron model can be efficiently implemented in hardware with fixed-point arithmetic. If is implemented as a fixed-point number, it can be doubled by a bit shift; if is implemented as a floating-point number, it can be doubled by an addition to its exponent. The multiplication by can be avoided by pre-computing for every synaptic weight and using the scaled synaptic weights at run-time. The other arithmetic operations are additions and subtractions.
Iii-C Error-tolerant ANN training
Both LTC approximation errors and undesired early output spikes contribute to the performance gap between an ANN and the corresponding SNN. We introduce the approximation errors into the activations of the ANN by applying logarithmic approximation to every non-negative activation, and rely on the training process to compensate for the approximation errors. Furthermore, we regularize the loss function with theExcess Loss to suppress excessively large activations, which in turn reduces the chance for an EF neuron to fire an undesired early output spike.
For every analog neuron of the ANN, we apply LA to its non-negative activations, so that the downstream neurons receive the approximate activations instead of the original activations. The variant of LA corresponds to the variant of LTC used to generate the corresponding spike train in the SNN. Negative pre-activations of the output layer are not approximated using LA and remain unchanged. For each layer , the minimum exponent and the maximum exponent
within the output exponent range are tuned as hyperparameters, similar to. To reduce the number of hyperparameters, we use the same output exponent range for all hidden layers.
As can be seen in Eqn. 1 and 2, the derivative of the LA w.r.t. the real-valued activation is zero almost everywhere, which prevents backpropagation from updating parameters of the bottom layers of the ANN. To allow gradients to pass through LA, for both variants of LA, we define the derivative of w.r.t. as
In order to suppress excessively large activations, we define the Excess Loss as
where the outer sum runs across training examples , the middle sum runs across all layers of the ANN, the inner sum runs across all neurons of the layer , and is the activation of the -th neuron of the layer for the -th training example. The excess loss punishes large positive activations of every layer that are greater than .
The excess loss is added to the loss function of the ANN, which is to be minimized by the training process:
where is the loss of the ANN on training data given parameters , and is a hyperparameter that controls the strength of the excess loss.
Although the excess loss does not completely prevent EF neurons from firing undesired early output spikes, it makes undesired early output spikes unlikely. Our experiments show that performance of an SNN with LTC is very close to the performance of the corresponding ANN with LA; the negative impact of undesired early output spikes seems to be negligible.
Iv Experimental results
Iv-a Experimental setup
We conduct our experiments on a PC with an nVidia GeForce GTX 1060 GPU with a 6 GB frame buffer, a quad-core Intel Core i5-7300HQ CPU, and 8 GB main memory. We use TensorFlow not only for training and testing ANNs, but also for simulating SNNs. For each SNN, we build a computation graph with operations performed by the SNN at every time step, where every spiking neuron outputs either 1 or 0 to indicate whether it fires an output spike or not. The computation graph is run once for every time step with appropriate input values.
We use the MNIST dataset of handwritten digits , which consists of 70000 28x28-pixel greyscale images of handwritten digits, divided into a training set of 60000 images and a test set of 10000 images. For hyperparameter tuning, we further divide the original training set into a training set of 55000 images and a validation set of 5000 images. The test set is only used to test ANNs and SNNs after all hyperparameters are fixed.
Iv-B Configuration of training and testing
We consider 5 types of CNNs, each for both CNN-small and CNN-large:
CNN-original: Original CNNs with zero biases, ReLU nonlinearity, and average pooling. CNNs of this type are converted to two types of SNNs. The SNN-rate-IF-rst-zero type uses the reset-to-zero mechanism , while the SNN-rate-IF-rst-subtract type uses the reset-by-subtraction mechanism . We refer to SNN-rate-IF-rst-zero and SNN-rate-IF-rst-subtract collectively as SNN-rate-IF. Since data-based normalization was shown to outperform model-based normalization, the weights of the CNNs are normalized with data-based normalization.
CNN-multi-power-LA: Same as CNN-original, except that all activations throughout the CNN are approximated with multi-power LA. The corresponding SNN type is SNN-multi-spike-LTC, where EF neurons in the hidden and output layers generate multi-spike LTC spike trains.
CNN-single-power-LA: Same as CNN-multi-power-LA, except that activations of hidden neurons are approximated with single-power LA. The corresponding SNN type is SNN-single-spike-LTC, which is the same as SNN-multi-spike-LTC, except that the EF neurons in the hidden layers generate single-spike LTC spike trains.
We refer to CNN-multi-power-LA and CNN-single-power-LA collectively as CNN-LA, and SNN-multi-spike-LTC and SNN-single-spike-LTC collectively as SNN-LTC. For each CNN type, we train five CNNs separately with the same hyperparameters and convert them to SNNs.
For SNN-rate-IF, the maximum input rate for generating an input spike train is 1 spike per time step, since this maximum input rate was shown to achieve the best performance . For CNN-TF and SNN-ASN, we adopt the hyperparameters for the transfer function and ASNs in . The resting threshold and the multiplicative parameter are set to a large value 0.1 to decrease firing rates of ASNs. For both CNN-LA types, Table I shows exponent ranges for different layers and the strength of the excess loss.
For SNN-rate-IF and SNN-ASN types, each of the SNNs is simulated for 500 time steps. For SNN-TTFS, simulation for an input image is stopped after the output layer fires the first output spike .
Iv-C Performance evaluation
Table II compares final average test accuracies of our ANN-to-SNN conversion methods with those of previous ANN-to-SNN conversion methods. The “Method” column shows SNN types, where the “SNN-” prefix is omitted. “small” and “large” in round brackets denote the CNN-small and CNN-large architectures, respectively. The “Dev.” column shows the maximum difference between the test accuracy of an SNN and the test accuracy of the corresponding CNN. For the SNN-rate-IF types, since input spike trains are generated stochastically, we test each of these SNNs five times. For each combination of CNN architecture and CNN/SNN type, the final average test accuracy in the table is obtained by averaging the final test accuracies of all test runs of the neural networks.
|Method||Test accuracy (%)||
|Rate-IF-rst-zero (small) ||99.25||99.20||0.16|
|ASN (small) ||99.43||99.43||0.04|
|TTFS (small) ||99.22||98.53||0.83|
|Rate-IF-rst-zero (large) ||99.27||99.24||0.09|
|ASN (large) ||99.45||99.44||0.04|
|TTFS (large) ||99.47||99.20||0.44|
For the CNN-small architecture, SNN-multi-spike-LTC achieves an average test accuracy that is lower than that of SNN-ASN and similar to those of the SNN-rate-IF types. SNN-single-spike-LTC achieves a lower average test accuracy than those of SNN-multi-spike-LTC and the SNN-rate-IF types. Both SNN-LTC types achieve a significantly higher average test accuracy than SNN-TTFS.
The difference in average test accuracy between SNN-rate-IF, SNN-ASN, and SNN-LTC is closely related to the model capacities of the corresponding CNN types. With a small exponent range size (4 for hidden layers), multi-power LA significantly decreases the precision of activations by mapping them to a few discrete values. The decrease in precision leads to a decrease in the model capacity of CNN-multi-power-LA. Hence multi-power LA can be seen as a regularizer. Single-power LA is a stronger regularizer than multi-power LA, since it further decreases the precision for activations. By contrast, the transfer function of CNN-TF maps real-valued activations to an interval of real numbers, which allows for much higher precision than the logarithmic approximations. Hence, the transfer function is a weaker regularizer than the logarithmic approximations.
For a small CNN architecture like CNN-small, which has limited model capacity even if all activations are real values, the strong regularization of the logarithmic approximations has a negative effect on the CNN-LA types’ ability of modeling training data. By contrast, the weak regularization of the transfer function has a negligible effect on CNN-TF’s ability of modeling training data, but helps it achieve a higher average test accuracy than CNN-original by mitigating overfitting.
For the CNN-large architecture, which has sufficient model capacity, both the logarithmic approximations and the transfer function have negligible effect on the CNN types’ ability of modeling training data; they mitigate overfitting and help CNN-TF and the CNN-LA types achieve a higher average test accuracy than CNN-original. Therefore, the SNN-LTC types outperform the SNN-rate-IF types and achieve similar average test accuracies to that of the SNN-ASN type.
As shown in the “Dev.” column of Table II, for the SNN-LTC types, the test accuracy of every SNN is very close to the test accuracy of the corresponding CNN. The difference in test accuracy is slightly larger for CNN-TF and SNN-ASN, and much larger for other CNN and SNN types, especially for CNN-CR and SNN-TTFS. For CNN-large, the performance gap between SNN-TTFS and CNN-CR prevents SNN-TTFS from achieving a higher average test accuracy than the SNN-LTC types, although CNN-CR achieves a higher average test accuracy than the CNN-LA types. There seems to be a closer similarity in behavior between SNN-LTC and CNN-LA than between other SNN types and their corresponding CNN types. The close similarity between SNN-LTC and CNN-LA in turn suggests that the excess loss is very effective in preventing EF neurons from firing undesired early spikes; the impact of few undesired early spikes is negligible.
For both CNN-large and CNN-small, the SNN-LTC types outperform SNN types based on LIF neurons.
Iv-D Computational cost evaluation
In an SNN, every time a spike arrives at a synapse, which is referred to as a synaptic event, a postsynaptic potential is added to the membrane potential of the postsynaptic neuron. These operations contribute to most of the computational cost of an SNN. We use the average number of synaptic events that an SNN processes for every input image as a metric for the computational cost of the SNN. In addition, we also count the average number of spikes fired by all neurons of an SNN for every input image.
Figure 2 shows the experimental results for CNN-small. For each of SNN-rate-reset-zero, SNN-rate-reset-subtract, SNN-ASN, and SNN-TTFS, the computational cost and test accuracy at every time step during a test run of an SNN are plotted as a point. For every time step, these computational costs and test accuracies are averaged over all test runs of the SNN type. The resulting average computational costs and average test accuracies are plotted as a line. For SNN-LTC types, only the final computational cost and the final test accuracy are shown for every SNN.
As shown in Figure 2, SNN-LTC types achieve high test accuracies at low computational costs. At the same average computational costs, the SNN-rate-IF types and the SNN-ASN type achieve significantly lower average test accuracies ranging from 9.8% to 98%. The average test accuracies of SNN-rate-IF and SNN-ASN increase quickly with increasing computational costs at an early stage of the test runs, and then fluctuate near their maximum values for a long time until the end of simulation. The average test accuracy of SNN-TTFS increases rapidly with increasing computational costs at the end of simulation, when the output layers of the SNNs fire their first output spikes.
In order to compare the ever-changing average computational costs of previous ANN-to-SNN conversion methods with the final average computational costs of the SNN-LTC types, we find two kinds of reference computational costs for each of SNN-rate-IF-rst-zero, SNN-rate-IF-rst-subtract, SNN-ASN, and SNN-TTFS. One is the stable computational cost where the average test accuracy converges to the final average test accuracy. Specifically, we consider the average test accuracy to have converged if it remains within the range around the final average test accuracy until the end of the simulation time. The other kind of reference computational costs are the matching computational costs w.r.t. to each of the SNN-LTC types, where the average test accuracy of the SNN-rate-IF, SNN-ASN, or SNN-TTFS type starts to surpass the average test accuracy of the SNN-LTC type. The reference computational costs are marked with vertical lines in Figure 2.
Table III compares computational costs of our ANN-to-SNN conversion methods with those of previous ANN-to-SNN conversion methods, for the CNN-small architecture. For every SNN-rate-IF type and the SNN-ASN type, both the stable computational costs and the matching computational costs are shown, along with the ratios (in percentage) of the SNN-LTC types’ computational costs to the reference computational costs. The matching computational cost of SNN-rate-rst-zero w.r.t. SNN-multi-spike-LTC is not shown, because the average test accuracy of SNN-multi-spike-LTC is higher than the highest average test accuracy of SNN-rate-rst-zero. The matching computational costs of SNN-TTFS are not shown for the same reason.
As shown in Table III, the average computational costs of the SNN-LTC types are much lower than the reference computational costs of the SNN-rate-IF types and the SNN-ASN type. Compared with the SNN-rate-IF types, SNN-multi-spike-LTC achieves a similar average test accuracy while reducing the computational cost by more than 80% in terms of synaptic events and more than 65% in terms of spikes; SNN-single-spike-LTC reduces the computational cost by more than 76% in terms of synaptic events and more than 63% in terms of spikes, at the cost of a decrease of 0.22% in final average test accuracy. Compared with the SNN-ASN type, SNN-multi-spike-LTC reduces the computational cost by more than 69% in terms of synaptic events and more than 64% in terms of spikes, at the cost of a decrease of 0.2% in final average test accuracy; SNN-single-spikc-LTC reduces the computational cost by more than 73% in terms of synaptic events and more than 72% in terms of spikes, at the cost of a decrease of 0.4% in final average test accuracy. Compared with SNN-single-spike-LTC, SNN-multi-spike-LTC achieves a higher average test accuracy at a higher average computational cost.
|# Synaptic events||# Spikes|
Compared with SNN-TTFS, both SNN-LTC types achieve significantly higher average test accuracies, but at much higher average computational costs in terms of synaptic events. However, for SNN-TTFS, the number of synaptic events is an underestimate of the true computational cost. According to the membrane potential update rule (Equation (4) in ), a TTFS neuron keeps track of the synapses which have ever received an input spike, and adds the sum of the synaptic weights to its membrane potential every time step. The number of synaptic events accounts for the updates of the sum of synaptic weights, not the updates of the membrane potential. As shown in Table IV, the number of membrane potential updates (other ADDs) dominates the true computational cost of SNN-TTFS. The computational costs of the SNN-LTC types are similar to the true computational cost of SNN-TTFS. The average computational cost of SNN-multi-spike-LTC is 5.20% higher, and the average computational cost of SNN-single-spike-LTC is 11.43% lower.
|# ADDs for synaptic events||# Other ADDs||Comput. cost|
Figure 3 shows computational costs and test accuracies of SNNs with the CNN-large architecture. The SNN-LTC types achieve high test accuracies at low computational costs. At the average computational costs of the SNN-LTC types, the SNN-rate-IF types and the SNN-ASN type achieve very poor average test accuracies around 9.8%.
Similar to Table III, Table V compares computational costs of our ANN-to-SNN conversion methods with those of previous ANN-to-SNN conversion methods, for the CNN-large architecture. For the SNN-rate-IF types and the SNN-TTFS type, only the stable computational costs are shown, since the average test accuracies of the SNN-LTC types are higher than the highest average test accuracies of these types. Compared with the SNN-rate-IF types, the SNN-LTC types achieve higher average test accuracies while reducing the computational cost by more than 92% in terms of synaptic events and more than 91% in terms of spikes. Compared with the SNN-ASN type, the SNN-LTC types reduce the computational cost by more than 76% in terms of synaptic events and more than 75% in terms of spikes, at the cost of a slight decrease of less than 0.1% in final average test accuracy. SNN-single-spike-LTC slightly outperforms SNN-multi-spike-LTC at a lower computational cost.
|# Synaptic events||# Spikes|
Both SNN-LTC types achieve higher average test accuracies than the SNN-TTFS type. As shown in Table VI, SNN-multi-spike-LTC and SNN-single-spike-LTC reduce the average computational cost by 41.22% and 43.22%, respectively.
|# ADDs for synaptic events||# Other ADDs||Comput. cost|
In this work, we propose an ANN-to-SNN conversion method based on novel Logarithmic Temporal Coding (LTC), and the Exponentiate-and-Fire (EF) neuron model. Moreover, we introduce the approximation errors of LTC into the ANN, and train the ANN to compensate for the approximation errors, eliminating most of performance drop due to ANN-to-SNN conversion. The experimental results show that the proposed method achieves competitive performance at a significantly lower computational cost.
In future work, we are going to explore the combination of our logarithmic temporal coding, which sparsifies spike trains in time, and regularization techniques that sparsify spike trains across spiking neurons. Sparsifying spike trains across both space and time may help achieve further computational efficiency.
Appendix A An exponentiate-and-fire neuron generates a logarithmic temporal coding spike train
We observe that an EF neuron doubles its membrane potential every time step if the neuron does not receive or fire any spike, as Lemma 3 states.
Let be two time steps, where . The pre-reset membrane potential of an EF neuron if the following conditions hold:
the EF neuron does not receive any input spike during the time interval , and
the EF neuron does not fire any output spike during the time interval .
Depending on the value of , there are four cases: , , , and .
If , the logarithmic approximation of is 0, and the desired LTC spike train contains no spikes. For the EF neuron, by Lemma 3, remains zero or negative during the output time window, and the neuron does not fire any output spike during this time interval. Hence, the neuron generates the desired LTC spike train within its output time window.
If , with exponent range , the logarithmic approximation of is 0, and the desired LTC spike train contains no spikes.
For the EF neuron, by Lemma 3, the first output spike time satisfies the following condition:
Solving the equation, we have
since , . In other words, the neuron fires the first output spike after the end of its output time window. Hence, the neuron generates the desired LTC spike train within its output time window.
For the remaining cases, we derive the spike times of the desired LTC spike train and the output spike times of the EF neuron, and show that the output spike train of the EF neuron is consistent with the desired LTC spike train at the end of this proof.
where the sum in Eqn. 16 runs across exponents from to the smallest . Note that gives the single-power LA of . By substituting and into Eqn. 17, we derive the spike times of the desired LTC spike train:
where is the -th output spike time. For both multi-spike LTC and single-spike LTC, Eqn. 19 gives the first spike time . For multi-spike LTC, Eqn. 19 also gives subsequent spike times. By further substituting Eqn. 17 and into Inequality 18, we derive constraints on the spike times:
For the EF neuron, every output spike time within the output time window satisfies the following conditions:
In the case of single-spike LTC, the EF neuron fires a single output spike at . In the case of multi-spike LTC, the EF neuron may fire subsequent output spikes. Consider every two consecutive output spike times and , where . By Lemma 3, the pre-reset membrane potentials can be formulated as
Solving the recurrence relation above, we have
Note that gives the single-power LA of . By substituting into Eqn. 30, we derive the spike times of the desired LTC spike train:
For the EF neuron, since , the first output spike time is
In the case of single-spike LTC, the EF neuron fires only a single output spike. In the case of multi-spike LTC, suppose the EF neuron fires an output spike at the time step . By Lemma 3,
It is easy to see that, if , then , and will be the next output spike time. Since , the EF neuron fires an output spike at every time step within its output time window. Hence,
By comparing Eqn. 15 and 28 with Eqn. 19, Inequality 27 with Inequality 20, Inequality 23 with Inequality 21, and Eqn. 34 with Eqn. 31, it can be seen that the output spike train of the EF neuron within its output time window is consistent with the desired LTC spike train, except that every output spike time of the EF neuron is larger than the corresponding spike time of the desired LTC spike train. The difference is due to the fact that the output time window of the EF neuron starts at the time step .
Therefore, in all cases, the EF neuron generates an LTC spike train that encodes within its output time window, completing the proof. ∎
-  J. Hu, L. Shen, and G. Sun, “Squeeze-and-excitation networks,” CoRR, vol. abs/1709.01507, 2017. [Online]. Available: http://arxiv.org/abs/1709.01507
-  K. Kowsari, D. E. Brown, M. Heidarysafa, K. J. Meimandi, M. S. Gerber, and L. E. Barnes, “Hdltex: Hierarchical deep learning for text classification,” CoRR, vol. abs/1709.08267, 2017. [Online]. Available: http://arxiv.org/abs/1709.08267
-  T. Cazenave, “Residual networks for computer go,” IEEE Transactions on Games, vol. 10, no. 1, pp. 107–110, March 2018.
-  R. Gütig and H. Sompolinsky, “The tempotron: a neuron that learns spike timing-based decisions,” Nature neuroscience, vol. 9, no. 3, pp. 420–428, 2006.
-  F. Ponulak and A. Kasiński, “Supervised learning in spiking neural networks with resume: Sequence learning, classification, and spike shifting,” Neural Computation, vol. 22, no. 2, pp. 467–510, Feb 2010.
-  R. V. Florian, “The chronotron: A neuron that learns to fire temporally precise spike patterns,” PLOS ONE, vol. 7, no. 8, pp. 1–27, 08 2012. [Online]. Available: https://doi.org/10.1371/journal.pone.0040233
-  A. Mohemmed, S. Schliebs, S. Matsuda, and N. Kasabov, “Span: Spike pattern association neuron for learning spatio-temporal spike patterns,” International Journal of Neural Systems, vol. 22, no. 04, p. 1250012, 2012.
-  E. Neftci, S. Das, B. Pedroni, K. Kreutz-Delgado, and G. Cauwenberghs, “Event-driven contrastive divergence for spiking neuromorphic systems,” Frontiers in Neuroscience, vol. 7, p. 272, 2014. [Online]. Available: https://www.frontiersin.org/article/10.3389/fnins.2013.00272
-  P. Diehl and M. Cook, “Unsupervised learning of digit recognition using spike-timing-dependent lasticity,” Frontiers in Computational Neuroscience, vol. 9, p. 99, 2015. [Online]. Available: https://www.frontiersin.org/article/10.3389/fncom.2015.00099
-  S. M. Bohte, J. N. Kok, and H. L. Poutré, “Error-backpropagation in temporally encoded networks of spiking neurons,” Neurocomputing, vol. 48, no. 1–4, pp. 17 – 37, 2002. [Online]. Available: http://www.sciencedirect.com/science/article/pii/S0925231201006580
-  S. McKennoch, D. Liu, and L. G. Bushnell, “Fast modifications of the spikeprop algorithm,” in The 2006 IEEE International Joint Conference on Neural Network Proceedings, July 2006, pp. 3970–3977.
-  O. Booij and H. tat Nguyen, “A gradient descent rule for spiking neurons emitting multiple spikes,” Information Processing Letters, vol. 95, no. 6, pp. 552 – 558, 2005, applications of Spiking Neural Networks. [Online]. Available: http://www.sciencedirect.com/science/article/pii/S0020019005001560
-  S. Ghosh-Dastidar and H. Adeli, “A new supervised learning algorithm for multiple spiking neural networks with application in epilepsy and seizure detection,” Neural Networks, vol. 22, no. 10, pp. 1419 – 1431, 2009. [Online]. Available: http://www.sciencedirect.com/science/article/pii/S0893608009000653
-  Y. Xu, X. Zeng, L. Han, and J. Yang, “A supervised multi-spike learning algorithm based on gradient descent for spiking neural networks,” Neural Networks, vol. 43, pp. 99 – 113, 2013. [Online]. Available: http://www.sciencedirect.com/science/article/pii/S0893608013000440
-  P. O’Connor and M. Welling, “Deep spiking networks,” CoRR, vol. abs/1602.08323, 2016. [Online]. Available: http://arxiv.org/abs/1602.08323
-  J. H. Lee, T. Delbruck, and M. Pfeiffer, “Training deep spiking neural networks using backpropagation,” Frontiers in Neuroscience, vol. 10, p. 508, 2016. [Online]. Available: https://www.frontiersin.org/article/10.3389/fnins.2016.00508
Y. Cao, Y. Chen, and D. Khosla, “Spiking deep convolutional neural networks for energy-efficient object recognition,”
International Journal of Computer Vision, vol. 113, no. 1, pp. 54–66, May 2015. [Online]. Available: https://doi.org/10.1007/s11263-014-0788-3
-  P. U. Diehl, D. Neil, J. Binas, M. Cook, S. C. Liu, and M. Pfeiffer, “Fast-classifying, high-accuracy spiking deep networks through weight and threshold balancing,” in 2015 International Joint Conference on Neural Networks (IJCNN), July 2015, pp. 1–8.
-  B. Rueckauer, I.-A. Lungu, Y. Hu, M. Pfeiffer, and S.-C. Liu, “Conversion of continuous-valued deep networks to efficient event-driven networks for image classification,” Frontiers in Neuroscience, vol. 11, p. 682, 2017. [Online]. Available: https://www.frontiersin.org/article/10.3389/fnins.2017.00682
-  E. Hunsberger and C. Eliasmith, “Spiking deep networks with LIF neurons,” CoRR, vol. abs/1510.08829, 2015. [Online]. Available: http://arxiv.org/abs/1510.08829
-  Q. Liu, Y. Chen, and S. B. Furber, “Noisy softplus: an activation function that enables snns to be trained as anns,” CoRR, vol. abs/1706.03609, 2017. [Online]. Available: http://arxiv.org/abs/1706.03609
-  D. Zambrano, R. Nusselder, H. S. Scholte, and S. M. Bohte, “Efficient computation in adaptive artificial spiking neural networks,” CoRR, vol. abs/1710.04838, 2017. [Online]. Available: http://arxiv.org/abs/1710.04838
-  B. Rueckauer and S. Liu, “Conversion of analog to spiking neural networks using sparse temporal coding,” in 2018 IEEE International Symposium on Circuits and Systems (ISCAS), May 2018, pp. 1–5.
-  Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proceedings of the IEEE, vol. 86, no. 11, pp. 2278–2324, 1998.
-  D. Miyashita, E. H. Lee, and B. Murmann, “Convolutional neural networks using logarithmic data representation,” CoRR, vol. abs/1603.01025, 2016. [Online]. Available: http://arxiv.org/abs/1603.01025
-  W. Gerstner and W. M. Kistler, Spiking Neuron Models: Single Neurons, Populations, Plasticity. Cambridge University Press, 2002.
M. Abadi, A. Agarwal, P. Barham et al.
, “TensorFlow: Large-scale machine learning on heterogeneous systems,” 2015, software available from tensorflow.org. [Online]. Available:https://www.tensorflow.org/
-  TensorFlow, “A guide to tf layers: Building a convolutional neural network,” https://tensorflow.google.cn/tutorials/layers, 2018, accessed: 2018-04-10.