I Introduction
Deep Learning based on Artificial Neural Networks (ANNs) has achieved tremendous success in many application domains in recent years [1, 2, 3]
. Spiking Neural Networks (SNNs) use neuron action potentials, or spikes, for eventdriven computation and communication. If the number of spikes are low, then most neurons and synapses in an SNN may be idle most of the time, hence the hardware implementation of SNNs can be much more efficient than conventional ANNs used in Deep Learning for inference tasks. Training, or learning, algorithms for SNNs
[4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14] are an active area of research, and are not as mature as conventional Deep Learning. Several recent SNN learning algorithms based on spiking variants of backpropagation [15, 16] achieved good performance, but their neuron models incur high computational cost. One alternative is to use ANNtoSNN conversion techniques [17, 18, 19, 20, 21, 22, 23], which works by first training an ANN with the conventional backpropagation algorithm, then converting it into an SNN. Most existing ANNtoSNN conversion methods are based on ratecoding, where activations in the ANN are approximated by firing rates of the corresponding spike trains in the SNN, and the number of spikes for encoding a realvalued activation grows linearly with the activation value. For current methods [18, 19] to achieve performance comparable to the ANN, the neurons in the SNN have to fire a large number of spikes, which leads to high computational cost. Although several recent methods [22, 23] reduced the number of spikes by employing more efficient neural coding, these methods relied on complex neuron models that continually perform expensive operations.In this paper, we propose an ANNtoSNN conversion method based on novel Logarithmic Temporal Coding (LTC), where the number of spikes for encoding an activation grows logarithmically with the activation value in the worse case. LTC is integrated with the ExponentiateandFire (EF) spiking neuron model. Note that the EF neuron model is not biologically realistic. It is an artificial model that we designed to use in conjunction with LTC for efficient computation in an SNN. If implemented with fixedpoint arithmetic, an EF neuron performs a bit shift every time step and an addition for every incoming spike. Furthermore, we introduce approximation errors of LTC into the ANN, and leverage the training process of the ANN to compensate for the approximation errors, eliminating most of performance drop due to ANNtoSNN conversion. Compared with ratecoding methods, our temporalcoding method achieves similar performance at significantly lower computational cost. Experimental results show that, for a CNN architecture with sufficient model capacity, the proposed method outperforms ratebased coding, achieving test accuracy of 99.41% on the MNIST dataset, and computational cost reduction of 93.61%.
Ii Related work
Learning for singlelayer SNNs is a wellstudied topic. Supervised learning algorithms aimed to train an SNN to classify input spatiotemporal patterns
[4] or to generate control signals with precise spike times in response to input spatiotemporal patterns [5, 6, 7]. The Tempotron rule [4] trained a spiking neuron to perform binary classification by firing one or more spikes in response to its associated class. ReSuMe [5] trained spiking neurons to generate target spike trains in response to given spatiotemporal patterns. Supervised learning was achieved by combining learning windows of Hebbian rules and a concept of remote supervision. The Elearning rule of Chronotron [6] improved memory capacity by minimizing a modified version of the Victor and Purpura (VP) distance between the output spike train and the target spike train with gradient descent. SPAN [7]also achieved improved memory capacity over ReSuMe, but with a simpler learning rule than Chronotron. The learning rule was a spiking variant of the Delta rule, where the input, output, and target values are replaced with convolved spike trains. These algorithms depend on predefined target spike trains, which are not available for neurons in the hidden layers of a multilayer SNN. Unsupervised learning rules aimed to train an SNN to detect spatiotemporal patterns or extract features from input stimuli. In
[9], a population of spiking neurons connected with lateral inhibitory synapses were trained using Spike TimeDependent Plasticity (STDP) to recognize different spatiotemporal patterns. In [8], an eventdriven variation of contrastive divergence was proposed to train a restricted Boltzmann machine constructed with integrateandfire neurons. These algorithms rely on specific network topologies with a single layer of spiking neurons. All the learning algorithms are limited to SNNs with a single layer of neurons. There is a large performance gap between the resulting SNNs and traditional ANNs.
Multilayer SNNs are more difficult to train than singlelayer SNNs. Backpropagation [24] cannot be directly applied to multilayer SNNs due to the discontinuity associated with spiking activities. SpikeProp [10] adapted backpropagation for SNN, circumventing the discontinuity by assuming the membrane potential to be a linear function of time for a small region around spike times. SpikeProp was extended by later works to use Resilient Propagation and QuickProp [11], and to train neurons in hidden layers and the output layer to fire multiple spikes [12, 13, 14]. However, there is still a large performance gap between these SNNs and traditional ANNs. Recent works avoided making assumptions about the discontinuity. In [15], a custom spiking neuron model incorporated a spike generation algorithm to approximate intermediate values of both the forward pass and the backward pass with spike trains. The spike generation algorithm had to add the encoded value to its internal state for every neuron at every time step. In [16], the membrane potential of a neuron was assumed to be a differentiable function of postsynaptic potentials and the afterpotential, and the backward pass propagated errors through the postsynaptic potentials and the afterpotential instead of input and output spike times. The exponential decay of postsynaptic potentials and afterpotentials require two multiplications be performed for every neuron at every time step. These learning algorithms trained small SNNs with several layers to achieve comparable performance to that of traditional ANNs. However, they rely on complex neuron models that perform expensive arithmetic operations every time step. Furthermore, how these algorithms scale to deeper SNNs remains unclear.
Another line of work trained SNNs indirectly by converting a trained ANN into its equivalent SNN. In [17]
, an ANN with Rectified Linear Unit (ReLU) nonlinearity was trained using backpropagation and the weights were then directly mapped to an SNN of IntegrateandFire (IF) neurons with the same topology. In a similar way, an ANN with Softplus
[20] or Noisy Softplus [21] nonlinearity could be converted to an SNN of more biologically plausible Leaky IntegrateandFire (LIF) neurons. There was a significant performance gap between the resulting SNN and the original ANN. The performance gap was narrowed by weight normalization [18] and resetting membrane potential by subtracting the firing threshold [19]. With these improvements, the resulting SNNs achieved performance comparable to the corresponding ANNs. All of these ANNtoSNN conversion methods were based on rate coding, where the number of spikes it takes to encode an activation grows linearly with the activation. Empirically, the neurons in the SNN have to maintain high firing rates to achieve a comparable performance to the original ANN. Since the computational cost a spiking neuron incurs is proportional to the number of incoming spikes, spike trains generated according to rate coding impose high computational cost on downstream neurons.Recent ANNtoSNN conversion methods reduced the number of spikes used to encode activations by employing more efficient neural coding. In [22], an ANN was converted to an Adapting SNN (AdSNN) based on synchronous Pulsed SigmaDelta coding. When driven by a strong stimulus, an Adaptive Spiking Neuron (ASN) adaptively raises its dynamic firing threshold every time it fires a spike, reducing its firing rate. However, an ASN has to perform four multiplications every time step to update its postsynaptic current, firing threshold, and refractory response. In [23], an ANN was converted to an SNN based on temporal coding, where an activation in the ANN was approximated by the latency to the first spike of the corresponding spike train in the SNN. Thus, at most one spike needs to be fired for each activation. However, each TimeToFirstSpike (TTFS) neuron keeps track of synapses which have ever received an input spike, and has to add the sum of the synaptic weights to its membrane potential every time step. Although these methods reduce the number of spikes, their complex neuron models still incur high computational cost.
ANNtoSNN conversion approximates realvalued activations with spike trains. The approximation errors contribute to the performance gap between the SNN and the ANN. Fortunately, a deep ANN can be trained to tolerate the approximation errors, if the approximation errors are introduced during the training phase. In [25], each activation of an ANN was approximated with a power of two, where the exponents of the powers were constrained within a set of several consecutive integers. The error tolerance of an ANN allows it to compensate for approximation errors in the corresponding SNN during the training phase, which in turn helps close the performance gap between the SNN and the ANN.
Different from existing ANNtoSNN conversion methods, we reduce both the number of spikes and the complexity of the neuron model. We propose encoding activations with Logarithmic Temporal Coding (LTC), where the number of spikes grows logarithmically with the encoded activation in the worst case. If implemented with fixedpoint arithmetic, our ExponentiateandFire (EF) neuron model involves only bit shifts and additions. A neuron performs a bit shift every time step and an addition for every incoming spike.
Iii Method
Every time a spiking neuron receives an input spike, the membrane potential of the neuron is increased by the postsynaptic potential (PSP). Evaluation of PSPs contribute to most of the computational cost of an SNN. To reduce the number of spikes used to encode every activation throughout the ANN, we propose Logarithmic Temporal Coding (LTC). A realvalued activation is first approximated by retaining a predefined subset of bits in its binary representation. Then, a spike is generated for each of the remaining 1 bits and no spike is generated for the 0 bits. The number of spikes for encoding an activation grows logarithmically, rather than linearly, with the activation in the worst case.
We propose ExponentiateandFire (EF) neuron used in conjunction with LTC, which performs equivalent computation to that of an analog neuron with Rectified Linear Unit (ReLU) nonlinearity. Furthermore, we propose Errortolerant ANN training, which leverages the ANN training process to compensate for approximation errors introduced by LTC and reduces the chance for EF neurons to fire undesired spikes.
We use the term “activation” to refer to output values of all analog neurons in an ANN, including neurons in the input, hidden and output layers.
Iiia Logarithmic temporal coding
To encode a realvalued activation into a spike train, the activation is first represented as a binary number. Then, the activation is approximated by retaining only a subset of the bits of the binary number at a predefined set of consecutive positions; the other bits of the binary number are set to zero. Finally, for each remaining 1 bit of the binary number, a spike is generated with spike timing determined by the position of the bit in the binary number, while no spike is generated for the 0 bits.
An realvalued activation can be represented as the sum of a possibly infinite series of powers of two with different integer exponents . We approximate the realvalued activation by constraining the exponents within a predefined exponent range , i.e., a finite set of consecutive integers from to . This approximation can be formulated as a closedform equation:
(1) 
Since the approximation defined by Eqn. 1 may involve multiple powers of two, we refer to this approximation as MultiPower Logarithmic Approximation (MultiPower LA).
As a special case, if we further require the approximation to involve at most one single power of two, the approximation reduces to SinglePower Logarithmic Approximation (SinglePower LA):
(2) 
We refer to multipower LA and singlepower LA collectively as Logarithmic Approximation (LA).
In order to generate an LTC spike train from a logarithmic approximation , we define a time window with discrete time steps . If a power of two contributes to the logarithmic approximation , i.e., is present in the series of powers of two of , then a spike is present in the LTC spike train with a spike time . There are two variants of LTC: Multispike LTC corresponds to multipower LA, while Singlespike LTC corresponds to singlepower LA.
Obviously, singlespike LTC encodes a realvalued activation into a spike train with at most one single spike. For multispike LTC, we derive an upper bound of the number of spikes used to encode a realvalued activation, as Proposition 1 states.
Proposition 1.
Suppose multispike LTC encodes a real value into a spike train with spikes. If , then ; if , then ; if , then .
Proof.
Let be the multipower LA of . Any power with an integer exponent cannot contribute to , because . For a power with an integer exponent to contribute to , .
If , . Hence, no power of two contributes to the multipower LA of . According to LTC, the spike train contains no spike, hence .
If , . In the worstcase, every with an integer exponent in the set contributes to . Thus, .
If , . Every with an integer exponent contributes to , hence . ∎
The logarithmic increase in the number of spikes for LTC is much slower than the linear increase for rate coding. The slow increase comes at the cost of significant approximation error. Since both LA and LTC are deterministic, the approximation error can be easily introduced into activations of an ANN during the training phase. We leverage the training process of an ANN to compensate for the approximation errors, as detailed in Section IIIC.
IiiB ExponentiateandFire (EF) neuron model
Figure 1 illustrates the ExponentiateandFire (EF) neuron model. An EF neuron integrates input spikes using an exponentially growing PSP kernel, and generates output spikes using an exponentially growing afterhyperpolarizing potential (AHP) kernel. With the exponentially growing kernel, an EF neuron is able to perform computation that is equivalent to the computation of an analog neuron with ReLU nonlinearity.
The EF neuron model is based on the Spike Response Model (SRM) [26] with discrete time. The membrane potential at time is given by:
(3) 
where is the set of synapses; is weight of synapse ; is the total PSP elicited by the input spike train at synapse ; is the set of output spike times; is the AHP elicited by the output spike at time ; evaluates to 1 if and only if the condition enclosed within the brackets is true. is the prereset membrane potential immediately before the reset:
(4) 
IiiB1 Input integration
Input spike trains of a neuron are generated using the input exponent range of the neuron, and presented to the neuron during its input time window , where . The exponentially growing PSP kernel is used to integrates input spikes:
(5) 
where is the current time; is time of the input spike. With this PSP kernel, the PSP elicited by an input spike is equal to at the spike time , and doubles every time step thereafter.
The total PSP elicited by an input spike train at the synapse is the superposition of PSPs elicited by all spikes in the spike train:
(6) 
where is the set of spike times of the input spike train.
If the EF neuron does not fire any output spike before , then no output spike would interfere with input integration, and the EF neuron computes a weighted sum of the LAs of its input spike trains, as Lemma 1 states.
Lemma 1.
The prereset membrane potential of an EF neuron , if the EF neuron does not fire any output spike during the time interval , where is the LA of the th input LTC spike train.
Proof.
According to LA, . According to LTC, the spike time corresponding to the power is . The total PSP elicited by the th input LTC spike train at is given by
(7) 
Since the EF neuron does not fire any output spike before , reduces to a weighted sum of PSPs elicited by the input spike trains:
(8) 
completing the proof. ∎
IiiB2 Output spike generation
The goal of an EF neuron is to generate an output LTC spike train that encodes using its output exponent range and present the spike train within its output time window. The output time window starts at the last time step of the input time window, and lasts for time steps.
An EF neuron generates an output spike train by thresholding its exponentially growing membrane potential. Specifically, the EF neuron doubles its membrane potential every time step after the time step , as dictated by the exponentially growing PSP kernel and AHP kernel (detailed below), until its prereset membrane potential reaches its firing threshold from below, when it fires an output spike at time , and its membrane potential is reset.
A MultiSpike EF neuron resets its membrane potential by subtracting the firing threshold from it:
(9) 
A SingleSpike EF neuron resets its membrane potential to 0:
(10) 
After resetting its membrane potential, a multispike EF neuron doubles its membrane potential every time step thereafter, and may fire subsequent output spikes. In contrast, a singlespike EF neuron does not fire any subsequent spike, since its membrane potential remains zero after the reset.
If the EF neuron receives all input spikes within its input time window, then no input spike would interfere with output spike generation during its output time window, and the EF neuron generates the desired output LTC spike train within its output time window, as Lemma 2 states.
Lemma 2.
An EF neuron generates an output LTC spike train that encodes using its output exponent range and presents the spike train within its output time window, if the EF neuron does not receive any input spike after the end of its input time window.
Theorem 1.
An EF neuron performs equivalent computation to the computation of an analog neuron with ReLU nonlinearity, and encodes the result into its output LTC spike train, if the following conditions hold:

All input spikes are received within its input time window, and

No output spikes are fired before the beginning of its output time window.
However, with the spike generation mechanism alone, an EF neuron may fire undesired output spikes outside its output time window. An undesired early output spike before the output time window interferes with input integration of the neuron. In addition, the output time window of a layer of EF neurons is the input time window of the next layer . An undesired late output spike after the output time window interferes with output spike generation of the downstream neurons. Undesired output spikes break the equivalence between EF neurons and analog ReLU neurons, which in turn degrades the performance of the SNN.
In order to prevent undesired output spikes of an EF neuron from affecting the downstream neurons, we allow output spikes within the output time window to travel to the downstream neurons, and discard undesired output spikes outside the output time window. Furthermore, we reduce the chance for an EF neuron to fire an undesired early output spike by suppressing excessively large activations of the corresponding analog ReLU neuron, as detailed in Section IIIC.
Algorithm 1 shows operations an EF neuron performs at every time step. First, the membrane potential is doubled (Eqn. 5, 9 and 10). Then, the input current is calculated by summing up weights of the synapses that receive an input spike at the current time step (Eqn. 3). The input current is scaled by the resistance (Eqn. 5) and the result is added to the membrane potential (Eqn. 3). If the membrane potential is greater than or equal to the firing threshold , an output spike is fired, and the membrane potential is reset accordingly (Eqn. 9 and 10).
From Algorithm 1, it can be seen that the EF neuron model can be efficiently implemented in hardware with fixedpoint arithmetic. If is implemented as a fixedpoint number, it can be doubled by a bit shift; if is implemented as a floatingpoint number, it can be doubled by an addition to its exponent. The multiplication by can be avoided by precomputing for every synaptic weight and using the scaled synaptic weights at runtime. The other arithmetic operations are additions and subtractions.
IiiC Errortolerant ANN training
Both LTC approximation errors and undesired early output spikes contribute to the performance gap between an ANN and the corresponding SNN. We introduce the approximation errors into the activations of the ANN by applying logarithmic approximation to every nonnegative activation, and rely on the training process to compensate for the approximation errors. Furthermore, we regularize the loss function with the
Excess Loss to suppress excessively large activations, which in turn reduces the chance for an EF neuron to fire an undesired early output spike.For every analog neuron of the ANN, we apply LA to its nonnegative activations, so that the downstream neurons receive the approximate activations instead of the original activations. The variant of LA corresponds to the variant of LTC used to generate the corresponding spike train in the SNN. Negative preactivations of the output layer are not approximated using LA and remain unchanged. For each layer , the minimum exponent and the maximum exponent
within the output exponent range are tuned as hyperparameters, similar to
[25]. To reduce the number of hyperparameters, we use the same output exponent range for all hidden layers.As can be seen in Eqn. 1 and 2, the derivative of the LA w.r.t. the realvalued activation is zero almost everywhere, which prevents backpropagation from updating parameters of the bottom layers of the ANN. To allow gradients to pass through LA, for both variants of LA, we define the derivative of w.r.t. as
(11) 
In order to suppress excessively large activations, we define the Excess Loss as
(12) 
where the outer sum runs across training examples , the middle sum runs across all layers of the ANN, the inner sum runs across all neurons of the layer , and is the activation of the th neuron of the layer for the th training example. The excess loss punishes large positive activations of every layer that are greater than .
The excess loss is added to the loss function of the ANN, which is to be minimized by the training process:
(13) 
where is the loss of the ANN on training data given parameters , and is a hyperparameter that controls the strength of the excess loss.
Although the excess loss does not completely prevent EF neurons from firing undesired early output spikes, it makes undesired early output spikes unlikely. Our experiments show that performance of an SNN with LTC is very close to the performance of the corresponding ANN with LA; the negative impact of undesired early output spikes seems to be negligible.
Iv Experimental results
Iva Experimental setup
We conduct our experiments on a PC with an nVidia GeForce GTX 1060 GPU with a 6 GB frame buffer, a quadcore Intel Core i57300HQ CPU, and 8 GB main memory. We use TensorFlow
[27] not only for training and testing ANNs, but also for simulating SNNs. For each SNN, we build a computation graph with operations performed by the SNN at every time step, where every spiking neuron outputs either 1 or 0 to indicate whether it fires an output spike or not. The computation graph is run once for every time step with appropriate input values.We use the MNIST dataset of handwritten digits [24], which consists of 70000 28x28pixel greyscale images of handwritten digits, divided into a training set of 60000 images and a test set of 10000 images. For hyperparameter tuning, we further divide the original training set into a training set of 55000 images and a validation set of 5000 images. The test set is only used to test ANNs and SNNs after all hyperparameters are fixed.
IvB Configuration of training and testing
We consider 5 types of CNNs, each for both CNNsmall and CNNlarge:

CNNoriginal: Original CNNs with zero biases, ReLU nonlinearity, and average pooling. CNNs of this type are converted to two types of SNNs. The SNNrateIFrstzero type uses the resettozero mechanism [18], while the SNNrateIFrstsubtract type uses the resetbysubtraction mechanism [19]. We refer to SNNrateIFrstzero and SNNrateIFrstsubtract collectively as SNNrateIF. Since databased normalization was shown to outperform modelbased normalization, the weights of the CNNs are normalized with databased normalization.

CNNCR: Same as CNNoriginal, except that clamped ReLU [23]
is used as the nonlinearity, and that maxpooling is used instead of averagepooling. The corresponding SNN type is
SNNTTFS, where SNNs consist of TimeToFirstSpike (TTFS) neurons [23]. 
CNNmultipowerLA: Same as CNNoriginal, except that all activations throughout the CNN are approximated with multipower LA. The corresponding SNN type is SNNmultispikeLTC, where EF neurons in the hidden and output layers generate multispike LTC spike trains.

CNNsinglepowerLA: Same as CNNmultipowerLA, except that activations of hidden neurons are approximated with singlepower LA. The corresponding SNN type is SNNsinglespikeLTC, which is the same as SNNmultispikeLTC, except that the EF neurons in the hidden layers generate singlespike LTC spike trains.
We refer to CNNmultipowerLA and CNNsinglepowerLA collectively as CNNLA, and SNNmultispikeLTC and SNNsinglespikeLTC collectively as SNNLTC. For each CNN type, we train five CNNs separately with the same hyperparameters and convert them to SNNs.
For SNNrateIF, the maximum input rate for generating an input spike train is 1 spike per time step, since this maximum input rate was shown to achieve the best performance [18]. For CNNTF and SNNASN, we adopt the hyperparameters for the transfer function and ASNs in [22]. The resting threshold and the multiplicative parameter are set to a large value 0.1 to decrease firing rates of ASNs. For both CNNLA types, Table I shows exponent ranges for different layers and the strength of the excess loss.

Exponent ranges 



Input  Hidden  Output  
CNNsmall  
CNNlarge 
For SNNrateIF and SNNASN types, each of the SNNs is simulated for 500 time steps. For SNNTTFS, simulation for an input image is stopped after the output layer fires the first output spike [23].
IvC Performance evaluation
Table II compares final average test accuracies of our ANNtoSNN conversion methods with those of previous ANNtoSNN conversion methods. The “Method” column shows SNN types, where the “SNN” prefix is omitted. “small” and “large” in round brackets denote the CNNsmall and CNNlarge architectures, respectively. The “Dev.” column shows the maximum difference between the test accuracy of an SNN and the test accuracy of the corresponding CNN. For the SNNrateIF types, since input spike trains are generated stochastically, we test each of these SNNs five times. For each combination of CNN architecture and CNN/SNN type, the final average test accuracy in the table is obtained by averaging the final test accuracies of all test runs of the neural networks.
Method  Test accuracy (%) 

# neurons  
CNN  SNN  
RateIFrstzero (small) [18]  99.25  99.20  0.16  

99.25  99.25  0.06  
ASN (small) [22]  99.43  99.43  0.04  
TTFS (small) [23]  99.22  98.53  0.83  

99.23  99.23  0.00  

99.03  99.03  0.00  
RateIFrstzero (large) [18]  99.27  99.24  0.09  

99.27  99.27  0.12  
ASN (large) [22]  99.45  99.44  0.04  
TTFS (large) [23]  99.47  99.20  0.44  

99.38  99.38  0.00  

99.41  99.41  0.02  
RateLIFSoftplus [20]  N/A  98.36  N/A  710  
RateLIFNoisySoftplus [21]  99.05  98.85  0.20 
For the CNNsmall architecture, SNNmultispikeLTC achieves an average test accuracy that is lower than that of SNNASN and similar to those of the SNNrateIF types. SNNsinglespikeLTC achieves a lower average test accuracy than those of SNNmultispikeLTC and the SNNrateIF types. Both SNNLTC types achieve a significantly higher average test accuracy than SNNTTFS.
The difference in average test accuracy between SNNrateIF, SNNASN, and SNNLTC is closely related to the model capacities of the corresponding CNN types. With a small exponent range size (4 for hidden layers), multipower LA significantly decreases the precision of activations by mapping them to a few discrete values. The decrease in precision leads to a decrease in the model capacity of CNNmultipowerLA. Hence multipower LA can be seen as a regularizer. Singlepower LA is a stronger regularizer than multipower LA, since it further decreases the precision for activations. By contrast, the transfer function of CNNTF maps realvalued activations to an interval of real numbers, which allows for much higher precision than the logarithmic approximations. Hence, the transfer function is a weaker regularizer than the logarithmic approximations.
For a small CNN architecture like CNNsmall, which has limited model capacity even if all activations are real values, the strong regularization of the logarithmic approximations has a negative effect on the CNNLA types’ ability of modeling training data. By contrast, the weak regularization of the transfer function has a negligible effect on CNNTF’s ability of modeling training data, but helps it achieve a higher average test accuracy than CNNoriginal by mitigating overfitting.
For the CNNlarge architecture, which has sufficient model capacity, both the logarithmic approximations and the transfer function have negligible effect on the CNN types’ ability of modeling training data; they mitigate overfitting and help CNNTF and the CNNLA types achieve a higher average test accuracy than CNNoriginal. Therefore, the SNNLTC types outperform the SNNrateIF types and achieve similar average test accuracies to that of the SNNASN type.
As shown in the “Dev.” column of Table II, for the SNNLTC types, the test accuracy of every SNN is very close to the test accuracy of the corresponding CNN. The difference in test accuracy is slightly larger for CNNTF and SNNASN, and much larger for other CNN and SNN types, especially for CNNCR and SNNTTFS. For CNNlarge, the performance gap between SNNTTFS and CNNCR prevents SNNTTFS from achieving a higher average test accuracy than the SNNLTC types, although CNNCR achieves a higher average test accuracy than the CNNLA types. There seems to be a closer similarity in behavior between SNNLTC and CNNLA than between other SNN types and their corresponding CNN types. The close similarity between SNNLTC and CNNLA in turn suggests that the excess loss is very effective in preventing EF neurons from firing undesired early spikes; the impact of few undesired early spikes is negligible.
For both CNNlarge and CNNsmall, the SNNLTC types outperform SNN types based on LIF neurons.
IvD Computational cost evaluation
In this section, we compare the computational cost of our ANNtoSNN conversion method with related work [18, 19, 22, 23].
In an SNN, every time a spike arrives at a synapse, which is referred to as a synaptic event, a postsynaptic potential is added to the membrane potential of the postsynaptic neuron. These operations contribute to most of the computational cost of an SNN. We use the average number of synaptic events that an SNN processes for every input image as a metric for the computational cost of the SNN. In addition, we also count the average number of spikes fired by all neurons of an SNN for every input image.
Figure 2 shows the experimental results for CNNsmall. For each of SNNrateresetzero, SNNrateresetsubtract, SNNASN, and SNNTTFS, the computational cost and test accuracy at every time step during a test run of an SNN are plotted as a point. For every time step, these computational costs and test accuracies are averaged over all test runs of the SNN type. The resulting average computational costs and average test accuracies are plotted as a line. For SNNLTC types, only the final computational cost and the final test accuracy are shown for every SNN.
As shown in Figure 2, SNNLTC types achieve high test accuracies at low computational costs. At the same average computational costs, the SNNrateIF types and the SNNASN type achieve significantly lower average test accuracies ranging from 9.8% to 98%. The average test accuracies of SNNrateIF and SNNASN increase quickly with increasing computational costs at an early stage of the test runs, and then fluctuate near their maximum values for a long time until the end of simulation. The average test accuracy of SNNTTFS increases rapidly with increasing computational costs at the end of simulation, when the output layers of the SNNs fire their first output spikes.
In order to compare the everchanging average computational costs of previous ANNtoSNN conversion methods with the final average computational costs of the SNNLTC types, we find two kinds of reference computational costs for each of SNNrateIFrstzero, SNNrateIFrstsubtract, SNNASN, and SNNTTFS. One is the stable computational cost where the average test accuracy converges to the final average test accuracy. Specifically, we consider the average test accuracy to have converged if it remains within the range around the final average test accuracy until the end of the simulation time. The other kind of reference computational costs are the matching computational costs w.r.t. to each of the SNNLTC types, where the average test accuracy of the SNNrateIF, SNNASN, or SNNTTFS type starts to surpass the average test accuracy of the SNNLTC type. The reference computational costs are marked with vertical lines in Figure 2.
Table III compares computational costs of our ANNtoSNN conversion methods with those of previous ANNtoSNN conversion methods, for the CNNsmall architecture. For every SNNrateIF type and the SNNASN type, both the stable computational costs and the matching computational costs are shown, along with the ratios (in percentage) of the SNNLTC types’ computational costs to the reference computational costs. The matching computational cost of SNNraterstzero w.r.t. SNNmultispikeLTC is not shown, because the average test accuracy of SNNmultispikeLTC is higher than the highest average test accuracy of SNNraterstzero. The matching computational costs of SNNTTFS are not shown for the same reason.
As shown in Table III, the average computational costs of the SNNLTC types are much lower than the reference computational costs of the SNNrateIF types and the SNNASN type. Compared with the SNNrateIF types, SNNmultispikeLTC achieves a similar average test accuracy while reducing the computational cost by more than 80% in terms of synaptic events and more than 65% in terms of spikes; SNNsinglespikeLTC reduces the computational cost by more than 76% in terms of synaptic events and more than 63% in terms of spikes, at the cost of a decrease of 0.22% in final average test accuracy. Compared with the SNNASN type, SNNmultispikeLTC reduces the computational cost by more than 69% in terms of synaptic events and more than 64% in terms of spikes, at the cost of a decrease of 0.2% in final average test accuracy; SNNsinglespikcLTC reduces the computational cost by more than 73% in terms of synaptic events and more than 72% in terms of spikes, at the cost of a decrease of 0.4% in final average test accuracy. Compared with SNNsinglespikeLTC, SNNmultispikeLTC achieves a higher average test accuracy at a higher average computational cost.
# Synaptic events  # Spikes  

SNNmultispikeLTC  SNNsinglespikeLTC  SNNmultispikeLTC  SNNsinglespikeLTC  
SNNraterstzero (stable)  ()  ()  ()  () 
SNNraterstzero (matching)  N/A  ()  N/A  () 
SNNraterstsubtract (stable)  ()  ()  ()  () 
SNNraterstsubtract (matching)  ()  ()  ()  () 
SNNASN (stable)  ()  ()  ()  () 
SNNASN (matching)  ()  ()  ()  () 
SNNTTFS (stable)  ()  ()  ()  () 
Compared with SNNTTFS, both SNNLTC types achieve significantly higher average test accuracies, but at much higher average computational costs in terms of synaptic events. However, for SNNTTFS, the number of synaptic events is an underestimate of the true computational cost. According to the membrane potential update rule (Equation (4) in [23]), a TTFS neuron keeps track of the synapses which have ever received an input spike, and adds the sum of the synaptic weights to its membrane potential every time step. The number of synaptic events accounts for the updates of the sum of synaptic weights, not the updates of the membrane potential. As shown in Table IV, the number of membrane potential updates (other ADDs) dominates the true computational cost of SNNTTFS. The computational costs of the SNNLTC types are similar to the true computational cost of SNNTTFS. The average computational cost of SNNmultispikeLTC is 5.20% higher, and the average computational cost of SNNsinglespikeLTC is 11.43% lower.
# ADDs for synaptic events  # Other ADDs  Comput. cost  

SNNTTFS (stable)  
SNNmultispikeLTC  ()  
SNNsinglespikeLTC  () 
Figure 3 shows computational costs and test accuracies of SNNs with the CNNlarge architecture. The SNNLTC types achieve high test accuracies at low computational costs. At the average computational costs of the SNNLTC types, the SNNrateIF types and the SNNASN type achieve very poor average test accuracies around 9.8%.
Similar to Table III, Table V compares computational costs of our ANNtoSNN conversion methods with those of previous ANNtoSNN conversion methods, for the CNNlarge architecture. For the SNNrateIF types and the SNNTTFS type, only the stable computational costs are shown, since the average test accuracies of the SNNLTC types are higher than the highest average test accuracies of these types. Compared with the SNNrateIF types, the SNNLTC types achieve higher average test accuracies while reducing the computational cost by more than 92% in terms of synaptic events and more than 91% in terms of spikes. Compared with the SNNASN type, the SNNLTC types reduce the computational cost by more than 76% in terms of synaptic events and more than 75% in terms of spikes, at the cost of a slight decrease of less than 0.1% in final average test accuracy. SNNsinglespikeLTC slightly outperforms SNNmultispikeLTC at a lower computational cost.
# Synaptic events  # Spikes  

SNNmultispikeLTC  SNNsinglespikeLTC  SNNmultispikeLTC  SNNsinglespikeLTC  
SNNraterstzero (stable)  ()  ()  ()  () 
SNNraterstsubtract (stable)  ()  ()  ()  () 
SNNASN (stable)  ()  ()  ()  () 
SNNASN (matching)  ()  ()  ()  () 
SNNTTFS (stable)  ()  ()  ()  () 
Both SNNLTC types achieve higher average test accuracies than the SNNTTFS type. As shown in Table VI, SNNmultispikeLTC and SNNsinglespikeLTC reduce the average computational cost by 41.22% and 43.22%, respectively.
# ADDs for synaptic events  # Other ADDs  Comput. cost  

SNNTTFS (stable)  
SNNmultispikeLTC  ()  
SNNsinglespikeLTC  () 
V Conclusions
In this work, we propose an ANNtoSNN conversion method based on novel Logarithmic Temporal Coding (LTC), and the ExponentiateandFire (EF) neuron model. Moreover, we introduce the approximation errors of LTC into the ANN, and train the ANN to compensate for the approximation errors, eliminating most of performance drop due to ANNtoSNN conversion. The experimental results show that the proposed method achieves competitive performance at a significantly lower computational cost.
In future work, we are going to explore the combination of our logarithmic temporal coding, which sparsifies spike trains in time, and regularization techniques that sparsify spike trains across spiking neurons. Sparsifying spike trains across both space and time may help achieve further computational efficiency.
Appendix A An exponentiateandfire neuron generates a logarithmic temporal coding spike train
We observe that an EF neuron doubles its membrane potential every time step if the neuron does not receive or fire any spike, as Lemma 3 states.
Lemma 3.
Let be two time steps, where . The prereset membrane potential of an EF neuron if the following conditions hold:

the EF neuron does not receive any input spike during the time interval , and

the EF neuron does not fire any output spike during the time interval .
Proof.
Proof.
Depending on the value of , there are four cases: , , , and .
If , the logarithmic approximation of is 0, and the desired LTC spike train contains no spikes. For the EF neuron, by Lemma 3, remains zero or negative during the output time window, and the neuron does not fire any output spike during this time interval. Hence, the neuron generates the desired LTC spike train within its output time window.
If , with exponent range , the logarithmic approximation of is 0, and the desired LTC spike train contains no spikes.
For the EF neuron, by Lemma 3, the first output spike time satisfies the following condition:
(14) 
Solving the equation, we have
(15) 
since , . In other words, the neuron fires the first output spike after the end of its output time window. Hence, the neuron generates the desired LTC spike train within its output time window.
For the remaining cases, we derive the spike times of the desired LTC spike train and the output spike times of the EF neuron, and show that the output spike train of the EF neuron is consistent with the desired LTC spike train at the end of this proof.
The case where corresponds to the case of Eqn. 1 where . In this case, Eqn. 1 can be formulated as
(16) 
(17) 
(18) 
where the sum in Eqn. 16 runs across exponents from to the smallest . Note that gives the singlepower LA of . By substituting and into Eqn. 17, we derive the spike times of the desired LTC spike train:
(19) 
where is the th output spike time. For both multispike LTC and singlespike LTC, Eqn. 19 gives the first spike time . For multispike LTC, Eqn. 19 also gives subsequent spike times. By further substituting Eqn. 17 and into Inequality 18, we derive constraints on the spike times:
(20) 
(21) 
For the EF neuron, every output spike time within the output time window satisfies the following conditions:
(22) 
(23) 
The first output spike may be fired either at time , if ; or at time (Lemma 3), if . In both cases, the first output spike time satisfies Eqns. 14 and 15.
In the case of singlespike LTC, the EF neuron fires a single output spike at . In the case of multispike LTC, the EF neuron may fire subsequent output spikes. Consider every two consecutive output spike times and , where . By Lemma 3, the prereset membrane potentials can be formulated as
(24)  
(25) 
Solving the recurrence relation above, we have
(26) 
By substituting Eqn. 26 into Inequality 22 and considering Inequality 23, we have
(27) 
By substituting Eqn. 26 into Inequality 22 and solving the resulting inequality for the minimum integer value for , we have
(28) 
The case where corresponds to the case of Eqn. 1 where . In this case, Eqn. 1 can be formulated as
(29) 
(30) 
Note that gives the singlepower LA of . By substituting into Eqn. 30, we derive the spike times of the desired LTC spike train:
(31) 
For both multispike LTC and singlespike LTC, Eqn. 31 gives the first spike time . For multispike LTC, Eqn. 31 also gives subsequent spike times.
For the EF neuron, since , the first output spike time is
(32) 
In the case of singlespike LTC, the EF neuron fires only a single output spike. In the case of multispike LTC, suppose the EF neuron fires an output spike at the time step . By Lemma 3,
(33) 
It is easy to see that, if , then , and will be the next output spike time. Since , the EF neuron fires an output spike at every time step within its output time window. Hence,
(34) 
By comparing Eqn. 15 and 28 with Eqn. 19, Inequality 27 with Inequality 20, Inequality 23 with Inequality 21, and Eqn. 34 with Eqn. 31, it can be seen that the output spike train of the EF neuron within its output time window is consistent with the desired LTC spike train, except that every output spike time of the EF neuron is larger than the corresponding spike time of the desired LTC spike train. The difference is due to the fact that the output time window of the EF neuron starts at the time step .
Therefore, in all cases, the EF neuron generates an LTC spike train that encodes within its output time window, completing the proof. ∎
References
 [1] J. Hu, L. Shen, and G. Sun, “Squeezeandexcitation networks,” CoRR, vol. abs/1709.01507, 2017. [Online]. Available: http://arxiv.org/abs/1709.01507
 [2] K. Kowsari, D. E. Brown, M. Heidarysafa, K. J. Meimandi, M. S. Gerber, and L. E. Barnes, “Hdltex: Hierarchical deep learning for text classification,” CoRR, vol. abs/1709.08267, 2017. [Online]. Available: http://arxiv.org/abs/1709.08267
 [3] T. Cazenave, “Residual networks for computer go,” IEEE Transactions on Games, vol. 10, no. 1, pp. 107–110, March 2018.
 [4] R. Gütig and H. Sompolinsky, “The tempotron: a neuron that learns spike timingbased decisions,” Nature neuroscience, vol. 9, no. 3, pp. 420–428, 2006.
 [5] F. Ponulak and A. Kasiński, “Supervised learning in spiking neural networks with resume: Sequence learning, classification, and spike shifting,” Neural Computation, vol. 22, no. 2, pp. 467–510, Feb 2010.
 [6] R. V. Florian, “The chronotron: A neuron that learns to fire temporally precise spike patterns,” PLOS ONE, vol. 7, no. 8, pp. 1–27, 08 2012. [Online]. Available: https://doi.org/10.1371/journal.pone.0040233
 [7] A. Mohemmed, S. Schliebs, S. Matsuda, and N. Kasabov, “Span: Spike pattern association neuron for learning spatiotemporal spike patterns,” International Journal of Neural Systems, vol. 22, no. 04, p. 1250012, 2012.
 [8] E. Neftci, S. Das, B. Pedroni, K. KreutzDelgado, and G. Cauwenberghs, “Eventdriven contrastive divergence for spiking neuromorphic systems,” Frontiers in Neuroscience, vol. 7, p. 272, 2014. [Online]. Available: https://www.frontiersin.org/article/10.3389/fnins.2013.00272
 [9] P. Diehl and M. Cook, “Unsupervised learning of digit recognition using spiketimingdependent lasticity,” Frontiers in Computational Neuroscience, vol. 9, p. 99, 2015. [Online]. Available: https://www.frontiersin.org/article/10.3389/fncom.2015.00099
 [10] S. M. Bohte, J. N. Kok, and H. L. Poutré, “Errorbackpropagation in temporally encoded networks of spiking neurons,” Neurocomputing, vol. 48, no. 1–4, pp. 17 – 37, 2002. [Online]. Available: http://www.sciencedirect.com/science/article/pii/S0925231201006580
 [11] S. McKennoch, D. Liu, and L. G. Bushnell, “Fast modifications of the spikeprop algorithm,” in The 2006 IEEE International Joint Conference on Neural Network Proceedings, July 2006, pp. 3970–3977.
 [12] O. Booij and H. tat Nguyen, “A gradient descent rule for spiking neurons emitting multiple spikes,” Information Processing Letters, vol. 95, no. 6, pp. 552 – 558, 2005, applications of Spiking Neural Networks. [Online]. Available: http://www.sciencedirect.com/science/article/pii/S0020019005001560
 [13] S. GhoshDastidar and H. Adeli, “A new supervised learning algorithm for multiple spiking neural networks with application in epilepsy and seizure detection,” Neural Networks, vol. 22, no. 10, pp. 1419 – 1431, 2009. [Online]. Available: http://www.sciencedirect.com/science/article/pii/S0893608009000653
 [14] Y. Xu, X. Zeng, L. Han, and J. Yang, “A supervised multispike learning algorithm based on gradient descent for spiking neural networks,” Neural Networks, vol. 43, pp. 99 – 113, 2013. [Online]. Available: http://www.sciencedirect.com/science/article/pii/S0893608013000440
 [15] P. O’Connor and M. Welling, “Deep spiking networks,” CoRR, vol. abs/1602.08323, 2016. [Online]. Available: http://arxiv.org/abs/1602.08323
 [16] J. H. Lee, T. Delbruck, and M. Pfeiffer, “Training deep spiking neural networks using backpropagation,” Frontiers in Neuroscience, vol. 10, p. 508, 2016. [Online]. Available: https://www.frontiersin.org/article/10.3389/fnins.2016.00508

[17]
Y. Cao, Y. Chen, and D. Khosla, “Spiking deep convolutional neural networks for energyefficient object recognition,”
International Journal of Computer Vision
, vol. 113, no. 1, pp. 54–66, May 2015. [Online]. Available: https://doi.org/10.1007/s1126301407883  [18] P. U. Diehl, D. Neil, J. Binas, M. Cook, S. C. Liu, and M. Pfeiffer, “Fastclassifying, highaccuracy spiking deep networks through weight and threshold balancing,” in 2015 International Joint Conference on Neural Networks (IJCNN), July 2015, pp. 1–8.
 [19] B. Rueckauer, I.A. Lungu, Y. Hu, M. Pfeiffer, and S.C. Liu, “Conversion of continuousvalued deep networks to efficient eventdriven networks for image classification,” Frontiers in Neuroscience, vol. 11, p. 682, 2017. [Online]. Available: https://www.frontiersin.org/article/10.3389/fnins.2017.00682
 [20] E. Hunsberger and C. Eliasmith, “Spiking deep networks with LIF neurons,” CoRR, vol. abs/1510.08829, 2015. [Online]. Available: http://arxiv.org/abs/1510.08829
 [21] Q. Liu, Y. Chen, and S. B. Furber, “Noisy softplus: an activation function that enables snns to be trained as anns,” CoRR, vol. abs/1706.03609, 2017. [Online]. Available: http://arxiv.org/abs/1706.03609
 [22] D. Zambrano, R. Nusselder, H. S. Scholte, and S. M. Bohte, “Efficient computation in adaptive artificial spiking neural networks,” CoRR, vol. abs/1710.04838, 2017. [Online]. Available: http://arxiv.org/abs/1710.04838
 [23] B. Rueckauer and S. Liu, “Conversion of analog to spiking neural networks using sparse temporal coding,” in 2018 IEEE International Symposium on Circuits and Systems (ISCAS), May 2018, pp. 1–5.
 [24] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradientbased learning applied to document recognition,” Proceedings of the IEEE, vol. 86, no. 11, pp. 2278–2324, 1998.
 [25] D. Miyashita, E. H. Lee, and B. Murmann, “Convolutional neural networks using logarithmic data representation,” CoRR, vol. abs/1603.01025, 2016. [Online]. Available: http://arxiv.org/abs/1603.01025
 [26] W. Gerstner and W. M. Kistler, Spiking Neuron Models: Single Neurons, Populations, Plasticity. Cambridge University Press, 2002.

[27]
M. Abadi, A. Agarwal, P. Barham et al.
, “TensorFlow: Largescale machine learning on heterogeneous systems,” 2015, software available from tensorflow.org. [Online]. Available:
https://www.tensorflow.org/  [28] TensorFlow, “A guide to tf layers: Building a convolutional neural network,” https://tensorflow.google.cn/tutorials/layers, 2018, accessed: 20180410.