Introduction
Spiking Neural Networks (SNNs), as the third generation of Artificial Neural Networks (ANNs) Maass (1997), have attracted great attention in recent years. Unlike traditional ANNs transmitting information at each propagation cycle, SNNs deliver information through spikes only when the membrane potential reaches the threshold Gerstner and Kistler (2002). Due to the eventdriven calculation, sparse activation, and multiplicationfree characteristics Roy et al. (2019), SNNs have greater energy efficiency than ANNs on neuromorphic chips Schemmel et al. (2010); Furber et al. (2012); Merolla et al. (2014); Davies et al. (2018); Pei et al. (2019). In addition, SNNs have inherent adversarial robustness. The adversarial accuracy of SNNs under gradientbased attacks is higher than ANNs with the same structure Sharmin et al. (2020). Nevertheless, the use of SNNs is still limited as it remains challenging to train highperformance SNNs.
Generally, there are two main approaches to train a multilayer SNN: (1) gradientbased optimization and (2) ANNtoSNN conversion. The gradientbased optimization takes the idea of ANNs and computes the gradient through backpropagation
Lee et al. (2016, 2020). Although the surrogate gradient methods have been proposed to mitigate the nondifferentiable problem of the thresholdtriggered firing of SNNs Shrestha and Orchard (2018); Wu et al. (2018); Neftci et al. (2019), it is still limited to shallow SNNs as the gradient becomes much unstable when the layer goes deeper Zheng et al. (2021). Besides, the gradientbased optimization method requires more GPU computing than ANN training.Unlike the gradientbased optimization method, ANNtoSNN conversion builds the relationship between activation of analog neurons and dynamics of spiking neurons, and then maps the parameters of a welltrained ANN to an SNN with low accuracy loss Cao et al. (2015); Diehl et al. (2015); Rueckauer et al. (2017); Han et al. (2020). Thus highperformance SNNs can be obtained without additional training. ANNtoSNN conversion requires nearly the same GPU computing and time as ANN training, and has yielded the best performance in deep network structure and largescale datasets Deng and Gu (2021). Despite these advantages, there has been a tradeoff between accuracy and latency. In order to achieve high precision as original ANNs, a long simulation time is needed to match the firing rate of a spiking neuron with the activation value of an analog neuron, which impedes the practical application of SNN.
In this paper, we make a step towards highperformance converted SNNs with extremely low latency (fewer than 32 timesteps). Instead of introducing constraints that facilitate ANNtoSNN conversion at the cost of model capacity, we show that the initialization of membrane potentials, which are typically chosen to be zero for all neurons, can be optimized to alleviate the tradeoff between accuracy and latency. Although zero initialization of membrane potentials can make it easier to relate activation of analog neurons to dynamics of spiking neurons, it also comes with inevitable long latency problems. As illustrated in Fig. 1, we find that without proper initialization, the neurons in converted SNN take a long time to fire the first spike, and thus the network is “inactive” in the first few timesteps. Based on this, we analyze ANNtoSNN conversion theoretically and prove that the expectation of square conversion error reaches the minimum value when the initial membrane potential is half of the firing threshold. Meanwhile, the expectation of conversion error reaches zero. By setting an optimal initial value in converted SNN, we find a considerable decrease in inference time and a remarkably increased accuracy in low inference time.
The main contributions of this paper can be summarized as follows:

We theoretically analyze ANNtoSNN conversion and show that scaling the thresholds does play a similar role as weight normalization, which can help to explain why threshold balancing can reduce the conversion loss and improve the inference latency.

We prove that the initialization of membrane potentials, which are typically chosen to be zero for all neurons, can be optimized to implement expected errorfree ANNtoSNN conversion.

We demonstrate the effectiveness of the proposed method in deep network architectures on the CIFAR10, CIFAR100 and ImageNet datasets. The proposed method achieves stateoftheart accuracy on nearly all tested datasets and network structures, using fewer timesteps.

We show that our method can be applied to other ANNSNN conversion methodologies and remarkably promote performance when the timesteps is small.
Related Work
Gradientbased optimization
The gradientbased optimization methods directly compute the gradient through backpropagation, which can be divided into two different categories Kim et al. (2020a)
: (1) activationbased methods and (2) timingbased methods. The activationbased methods unfold the SNNs into discrete timesteps and compute the gradient with backpropagation through time (BPTT), which borrow the idea from training recurrent neural networks in ANNs
Lee et al. (2016, 2020). As the gradient of the activation with respect to the membrane potential is nondifferentiable, the surrogate gradient is often used Shrestha and Orchard (2018); Wu et al. (2018); Neftci et al. (2019); Chen et al. (2021); Fang et al. (2021a, b). However, there is a lack of rigorous theoretical analysis of the surrogate gradient Zenke and Vogels (2021); Zenke et al. (2021). When the layer of SNNs becomes deeper (50 layers), the gradient becomes much unstable, and the networks suffer the degradation problem Zheng et al. (2021). The timingbased methods utilize some approximation methods to estimate the gradient of timings of firing spikes with respect to the membrane potential at the spike timing, which can significantly improves runtime efficiency of BP training. However, they are usually limited to shallow networks (
10 layers) Mostafa (2017); Kheradpisheh and Masquelier (2020); Zhang and Li (2020); Zhou et al. (2021); Wu et al. (2021).ANNtoSNN conversion The ANNtoSNN conversion is first proposed by Cao et al. Cao et al. (2015)
, which trains an ANN with ReLU activations and then converts the ANN to an SNN by replacing the activations with spiking neurons. By properly mapping the parameters in ANN to SNN, deep SNNs can gain comparable performance as deep ANNs. Further methods have been proposed to analyze conversion loss and improve the overall performance of converted SNNs, such as weight normalization and threshold balancing
Diehl et al. (2015); Rueckauer et al. (2016); Sengupta et al. (2019). A soft reset mechanism is applied to IF neurons in previous work Rueckauer et al. (2016); Han et al. (2020), to avoid information loss when neurons are reset. These works can achieve lossless conversion with long inference timesteps Kim et al. (2020b), but still suffer from severe accuracy loss with relatively small timesteps. In recent works, most studies focus on accelerating the inference with converted SNN. Stockl and Maass Stöckl and Maass (2021) propose new spiking neurons to better relate ANNs to SNN. Han and Roy Han and Roy (2020) use a timebased encoding scheme to speed up inference. RMP Han et al. (2020), RNL Ding et al. (2021) and TCL Ho and Chang (2020) try to alleviate the tradeoff between accuracy and latency by adjusting the threshold dynamically. Ding et al. Ding et al. (2021) propose an optimal fit curve to quantify the fit between ANNs’ activations and SNNs’ firing rates and demonstrate that the inference time can be reduced by optimizing the upper bound of the fit curve. Hwang et al. Hwang et al. (2021) proposed a layerwisely searching algorithm and performed adequate experiments to explore the best initial value of membrane potential. Deng et al. Deng et al. (2020) and Li et al. Li et al. (2021) propose a new method to shift weight, bias and membrane potential in each layer, making relatively lowlatency in converted SNNs. Different from the above methods, we directly optimize the initial membrane potential to increase performance at low inference time.Methods
In this section, we first introduce the neuron models for ANNs and SNNs, then we derive the mathematical framework for ANNtoSNN conversion. Based on this, we show that the initial membrane potential is essential to ANNtoSNN conversion, and derive the optimal initialization to achieve expected errorfree conversion.
ANNs and SNNs
The fundamental idea behind ANNtoSNN conversion is to build the relationship between the activation value of an analog neuron and the firing rate of a spiking neuron. Based on this relation, we can map the weights of trained ANNs to SNNs. Thus highperformance SNNs can be obtained without additional training Cao et al. (2015)
. To be specific, for an ANN, the ReLU activation function of analog neurons in layer
() can be described as:(1) 
where vector
denotes the output activation values of all neurons in layer , is the weight matrix between neurons in layer and neurons in layer , and refers to the bias of the neurons in layer .For SNNs, we consider the IntegrateandFire (IF) model, which is commonly used in the previous works Cao et al. (2015); Diehl et al. (2015); Han et al. (2020). In the IF model, if the spiking neurons in layer receive input at time , the temporal membrane potential can be formulated as the addition of its membrane potential at time and the summation of weighted input:
(2) 
where denotes the unweighted postsynaptic potentials from presynaptic neurons in layer at time , is the synaptic weights, and is the bias potential of spiking neurons in layer . When any element of exceeds the firing threshold at layer , the neuron will elicit a spike with unweighted postsynaptic potential :
(3)  
(4) 
Here is the th element of , which denotes the output spike at time and equals 1 if there is a spike and 0 otherwise. After firing a spike, the membrane potential at the next timestep will go back to a reset value. Two approaches are commonly used to reset the potential: “resettozero” and “resetbysubtraction”. As there exists obvious information loss in “resettozero”Rueckauer et al. (2017); Han and Roy (2020), we adopt “resetbysubtraction” mechanism in this paper. Specifically, after the firing, the membrane potential is reduced by an amount equal to the firing threshold . Thus the membrane potential updates according to:
(5) 
Theory for ANNSNN conversion
In order to relate the firing rate of SNNs to the activation value of ANNs, here we accumulate Eq. (5) from time to , divide it by , and get:
(6) 
We use to denote the firing rates of spiking neurons in layer during the period from time to , and substitute Eq. (4) into Eq. (6) to eliminate , we have:
(7) 
Note that Eq. (7) is the core equation of ANNSNN conversion. It describes the relationship of the firing rates of neurons in adjacent layers of an SNN, and can be related to the forwarding process of an ANN (Eq. (1)). To see this, we make the following assumption: The inference time (latency) is large enough so that and . Hence Eq. (7) can be simplified as:
(8)  
The last equality holds as the firing rate is strictly restricted in . By contrast, the ReLU activation values of ANNs in Eq. (1) only need to satisfy . In fact, the activation values of ANNs have an upper bound for countable limited dataset. Thus we can perform normalization for all activation values in Eq. (1). Specifically, assuming that , where denotes the maximum value of , we can rewritten Eq. (1) as:
(9) 
By comparing Eq. (8) and Eq. (9), we can find an ANN can be converted to an SNN by copying both weights and biases, and setting the firing threshold equal to the upper bound of the ReLU activation of analog neurons. Our result can help to explain the previous finding that scaling the firing thresholds can reduce the conversion loss and improve the inference latency Han and Roy (2020). Actually, scaling the thresholds does play a similar role as the weight normalization technique Diehl et al. (2015); Rueckauer et al. (2017) used in ANNtoSNN conversion.
From another perspective, we can directly relate the postsynaptic potentials of spiking neurons in adjacent layers to the forwarding process of an ANN (Eq. (1)). If we use to denote the average postsynaptic potentials from presynaptic neurons in layer during the period from time to and substitute it to Eq. (7), we have:
(10)  
By comparing Eq. (1) and Eq. (10), we get the same conclusion that if the inference time (latency) is large enough, an ANN can be converted to an SNN by coping both the weights and the biases. Note that although is not included in Eq. (10), it also should equal the maximum value of due to .
Optimal initialization of membrane potentials
The exact equivalence between the forwarding process of an ANN and the firing rates (or postsynaptic potentials) of adjacent layers of an SNN discussed above depends on the assumed condition that the time is large enough so that and . It incurs a long simulation time for SNNs when applied to complicated datasets. Moreover, under low latency constraints, there exists an intrinsic difference between and , which will transfer layer by layer, resulting in considerable accuracy degradation for converted SNNs. In this subsection, we will analyze the impact of membrane potential initialization and show that optimal initialization can implement expected errorfree ANNtoSNN conversion.
According to Eq. (10), we can rewrite the relationship of the postsynaptic potentials (or firing rates) of spiking neurons in adjacent layers in a new way:
(11) 
(12) 
Here represents the accumulated potential from time 0 to . denotes the floor function, which calculates the maximum integer that is smaller or equal to . Eq. (12) holds as . By comparing Eq. (11) and Eq. (1), Eq. (12) and Eq. (9), we can find there exist inherent quantization errors between ANNs and SNNs due to the discrete characteristic of the firing rate. Here we propose a simple and direct way that optimize the initialization of membrane potentials to reduce the error and improve the conversion.
To be specific, we suppose that the ReLU activation in layer of an ANN is the same as the postsynaptic potentials in layer of an SNN, that is , , and then compare the outputs of ANN and SNN in layer . For the convenience of representation, we use to denote the activation function of ANNs, that is and . Besides, we use to denote the activation function of SNN, that is and .
The expected squared difference between and can be defined as:
(13) 
Note that as the threshold is set to be the max activation value of ANN, should always fall into interval . If we assume that
is uniformly distributed in every small interval
with the probability density function
(), where and denote the th element in and , respectively, for , and , then we can obtain the optimal initialization of membrane potentials. We have the following Theorem.Theorem 1.
The expectation of square conversion error (Eq. (13)) reaches the minimum value when the initial value is , meanwhile the expectation of conversion error reaches , that is:
(14)  
(15) 
Here is the vector of . The detailed proof is in the Appendix section. Theorem 1 implies that when , not only the expectation of square conversion error is minimized, but the expectation of error will be zero as well. Thus optimal initialization of membrane potential can implement expected errorfree ANNtoSNN conversion.
Experiments
Implementation details
We evaluate the performance of our methods for classification tasks on CIFAR10, CIFAR100 and ImageNet datasets. For comparison, we utilize VGG16, ResNet20 and ResNet18 network structures as previous work. The proposed ANNtoSNN conversion algorithm is given in Algorithm 1
. For the source ANN, we replace all maxpooling layers with averagepooling layers. In addition, similar to
Ho and Chang (2020), we add trainable clipping layers to the source ANNs, enabling a better set of the firing thresholds of converted SNNs. For the SNN, we copy both weights and biases from source ANN, and set the firing threshold () equal to the upper bound of the activation of analog neurons. Besides, the initial membrane potential of all spiking neurons in layer is set to the same optimal value . The details of the preprocessing, parameter configuration, and training are as follows.Method  ANN Acc.  T=8  T=16  T=32  T=64  T=128  T=256  T512 
VGG16 Simonyan and Zisserman (2014) on CIFAR10  
Robust Norm Rueckauer et al. (2017) ^{1}  92.82    10.11  43.03  81.52  90.80  92.75  92.75 
Spike Norm Sengupta et al. (2019)  91.70              91.55 
Hybrid Train Rathi et al. (2020)  92.81          91.13    92.48 
RMP Han et al. (2020)  93.63      60.30  90.35  92.41  93.04  93.63 
TSC Han and Roy (2020)  93.63        92.79  93.27  93.45  93.63 
Opt. Deng and Gu (2021)  95.72      76.24  90.64  94.11  95.33  95.73 
RNL Ding et al. (2021)  92.82    57.90  85.40  91.15  92.51  92.95  92.95 
Calibration Li et al. (2021)  95.72      93.71  95.14  95.65  95.79  95.79 
Ours  94.57  90.96  93.38  94.20  94.45  94.50  94.49  94.55 
ResNet20 He et al. (2016) on CIFAR10  
SpikeNorm Sengupta et al. (2019)  89.10              87.46 
Hybrid Train Rathi et al. (2020)  93.15            92.22  92.94 
RMP Han et al. (2020)  91.47          87.60  89.37  91.36 
TSC Han and Roy (2020)  91.47        69.38  88.57  90.10  91.42 
Ours  92.74  66.24  87.22  91.88  92.57  92.73  92.76  92.75 
ResNet18 He et al. (2016) on CIFAR10  
Opt. Deng and Gu (2021) ^{2}  95.46      84.06  92.48  94.68  95.30  94.42 
Calibration Li et al. (2021)^{2}  95.46      94.78  95.30  95.42  95.41  95.45 
Ours  96.04  75.44  90.43  94.82  95.92  96.08  96.06  96.06 

Our implementation of Robust Norm.

Instead of utilizing the standard ResNet18 or ResNet20, they add two more layers to standard ResNet18.
Method  ANN Acc.  T=8  T=16  T=32  T=64  T=128  T=256  T512 
VGG16 Simonyan and Zisserman (2014) on CIFAR100  
SpikeNorm Sengupta et al. (2019)  71.22              70.77 
RMP Han et al. (2020)  71.22          63.76  68.34  70.93 
TSC Han and Roy (2020)  71.22          69.86  70.65  70.97 
Opt. Deng and Gu (2021)  77.89      7.64  21.84  55.04  73.54  77.71 
Calibration Li et al. (2021)  77.89      73.55  76.64  77.40  77.68  77.87 
Ours  76.31  60.49  70.72  74.82  75.97  76.25  76.29  76.31 
ResNet20 He et al. (2016) on CIFAR100  
SpikeNorm Sengupta et al. (2019)  69.72              64.09 
RMP Han et al. (2020)  68.72      27.64  46.91  57.69  64.06  67.82 
TSC Han and Roy (2020)  68.72          58.42  65.27  68.18 
Ours  70.43  23.09  52.34  67.18  69.96  70.51  70.59  70.53 
ResNet18 He et al. (2016) on CIFAR100  
Opt. Deng and Gu (2021) ^{1}  77.16      51.27  70.12  75.81  77.22  77.19 
Calibration Li et al. (2021) ^{1}  77.16      76.32  77.29  77.73  77.63  77.25 
Ours  79.36  57.70  72.85  77.86  78.98  79.20  79.26  79.28 

Instead of utilizing the standard ResNet18 or ResNet20, they add two more layers to standard ResNet18.
Method  ANN Acc.  T=8  T=16  T=32  T=64  T=128  T=256  T512 
VGG16 on ImageNet  
Rmp Han et al. (2020)  73.49            48.32  73.09 
TSC Han and Roy (2020)  73.49            69.71  73.46 
Opt. Deng and Gu (2021)  75.36      0.114  0.118  0.122  1.81  73.88 
Calibration(advanced) Li et al. (2021)  75.36      63.64  70.69  73.32  74.23  75.32 
Ours  74.85  6.25  36.02  64.70  72.47  74.24  74.62  74.69 
Preprocessing. We randomly crop and resize the images of CIFAR10 and CIFAR100 datasets into shape
after padding 4, and then conduct random horizontal flip to avoid overfitting. Besides, we use Cutout
DeVries and Taylor (2017) with the recommended parameters. Specifically, the hole and length are 1 and 16 for CIFAR10, and 1 and 8 for CIFAR100. The AutoAugment Cubuk et al. (2019)policy is also applied for both datasets. Finally, we apply data normalization on all datasets to ensure that the mean value of all input values is 0 and the standard deviation is 1. For ImageNet datasets, we randomly crop and resize the image into
. We also apply CollorJitter and Label Smooth Szegedy et al. (2016) during training. Similar to CIFAR datasets, we normalize all input data to ensure that the mean value is 0 and the standard deviation is 1.HyperParameters. When training ANNs, we use the Stochastic Gradient Descent optimizer Bottou (2012) with a momentum parameter of 0.9 and a cosine decay scheduler Loshchilov and Hutter (2017)
to adjust the learning rate. The initial learning rates for CIFAR10 and CIFAR100 are 0.1 and 0.02, respectively. Each model is trained for 300 epochs. For ImageNet dataset, the initial learning rates is set to 0.1 and the total epoch is set to 120. The L2regularization coefficient of the weights and biases is set to
for CIFAR datasets and for ImageNet. The weight decays of the upper bound parameter are for VGG16 on CIFAR10, for ResNet18/20 on CIFAR10, VGG16/ ResNet18/20 on CIFAR100, and for VGG16 on ImageNet.Training details. When evaluating our converted SNN, we use constant input of the test images. In Fig. 4, we train ResNet20 networks on the CIFAR10 dataset for RMP and RNL, respectively. The RMP model is reproduced according to the paper Han et al. (2020), and the performance is slighter high than the authors’ report. The RNL model Ding et al. (2021)
is tested with the codes on GitHub provided by the authors. All experiments are implemented with PyTorch on a NVIDIA Tesla V100 GPU.
The effect of membrane potential initialization
We first evaluate the effectiveness of the proposed membrane potential initialization. We train an ANN and convert it to four SNNs with different initial membrane potentials. Fig. 2
illustrates how the accuracy of converted SNN changes with respect to latency. The blue curve denotes zero initialization, namely without initialization. The orange, green, and red curves denote optimal initialization, random initialization from a uniform distribution, and random initialization from a Gaussian distribution, respectively. One can find that the performance of converted SNNs with nonzero initialization (orange, green and red curves) is much better than that of the converted SNN with zero initialization (blue curve), and the converted SNN with optimal initialization achieves the best performance. Moreover, the SNN with zero initialization cannot work if the latency is fewer than 10 timesteps. This phenomenon could be explained as follows. As illustrated in Fig.
1, without initialization, the neurons in converted SNN take a long time to fire the first spikes, and thus the network is “inactive” in the first few timesteps. When the latency is large enough (256 timesteps), we can find that all these methods can get the same accuracy as source ANN (dotted line).Then we compare different constant initial membrane potentials, ranging from 0 to 1. The results are shown in Fig. 3. Overall, a larger timesteps will bring more apparent performance improvement, and the performance of all converted SNNs approach the performance of source ANN with enough timesteps. Furthermore, we can find that the converted SNNs with nonzero initialization are much better than the converted SNN with zero initialization. Note that in this experiment, all thresholds of spiking neurons are resized to 1 by scaling the weights and biases. Thus the theoretically optimal initial membrane potential is 0.5. The converted SNN from VGG16/ResNet20 with an initial membrane potential of 0.5 achieves optimal or nearoptimal performance on CIFAR10 and CIFAR100 datasets. In fact, the derivations of optimal initial membrane potential are based on the assumption of piecewise uniform distribution. Thus a small deviation may be expected, as the assumptions cannot be strictly satisfied. To verify it further, we make a concrete analysis of the performance of converted SNN with different initial values. As shown in Figure 5 in the appendix section, the optimal initial value is always between 0.4 and 0.6. The converted SNN with an initial membrane potential of 0.5 achieves optimal or nearoptimal performance.
Comparison with the StateoftheArt
We compare our method to other stateoftheart ANNtoSNN conversion methods on the CIFAR10 dataset, and the results are listed in Table 1. Our model can achieve nearly lossless conversion with a small inference time. For VGG16, the proposed method reaches an accuracy of 94.20% using only 32 timesteps, whereas the methods of Robust Norm, RMP, Opt., RNL and Calibration reach 43.03%, 60.3%, 76.24%, 85.4% and 93.71% at the end of 32 timesteps. Moreover, the proposed method achieves an accuracy of 90.96% using unprecedented 8 timesteps, which is 8 time faster than RMP and Opt. that use 64 timesteps. For ResNet20, it reaches 91.88% top1 accuracy with 32 timesteps. Note that the works of Opt. and Calibration add two layers to standard ResNet18 rather than use the standard ResNet18 or ResNet20 structure. For a fair comparison, we add the experiments of converting an SNN from ResNet18. For the same timesteps, the performance of our method is much better than Opt., and nearly the same as Calibration, which utilizes advanced pipeline to calibrating the error and adds two more layers. Moreover, we can achieve an accuracy of 75.44% even the timesteps is only 8. These results show that the proposed method outperforms previous work and can implement fast inference.
Next, we test the performance of our method on the CIFAR100 dataset. Table 2 compares our method with other stateoftheart methods. For VGG16, the proposed method reaches an accuracy of 74.82% using only 32 timesteps, whereas the methods of Opt. and Calibration reach 7.64% and 73.55% with the same timesteps. Moreover, for ResNet20 and ResNet18, our method can reach 52.34% and 72.85% top1 accuracies, respectively, with 16 timesteps. These results demonstrate that our methods achieve stateoftheart accuracy, using fewer timesteps.
Finally, we test our method on the ImageNet dataset with VGG16 architecture. Table 3 compares the results with other stateofthe art methods. Our proposed method can achieve 64.70% top1 accuracy using only 32 timesteps and achieve 72.47% top1 accuracy with only 64 timesteps. All these results demonstrate that our methods is still effective on very large datasets and can reach stateoftheart accuracy on all datasets.
Apply optimal initial potentials to other models
Here we test whether the proposed algorithm can be applied to other ANNtoSNN conversion models. We consider the RMP Han et al. (2020) and RNL Ding et al. (2021) models. We train a ResNet20 network on the CIFAR10 dataset for each model and then convert it to two SNNs with/without optimal initial membrane potential. As illustrated in Fig. 4, one can find that the performance of converted SNN with optimal initial potential (orange curve) is much better than SNN without initialization (blue curve). For RMP model, the converted SNN with optimal initialization outperforms the original SNN by 20% in accuracy (87% vs 67%) using 64 timesteps. For RNL model, the converted SNN with optimal initialization outperforms the original SNN by by 37% in accuracy (82% vs 45%) using 64 timesteps. Moreover, the RNL model with optimal initialization outperforms the original SNN by by 45% in accuracy (70% vs 25%) using 32 timesteps. These results imply that our method is compatible with many ANNtoSNN conversion methods and can remarkably improve performance when the timesteps is small.
Conclusion
In this paper, we theoretically derive the relationship between the forwarding process of of an ANN and the dynamics of an SNN. We demonstrate that optimal initialization of membrane potentials can not only implement expected errorfree ANNtoSNN conversion, but also reduce the time to the first spike of neurons and thus shortening the inference time. Besides, we show that the converted SNN with optimal initial potential outperforms stateoftheart comparing methods on the CIFAR10, CIFAR100 and ImageNet datasets. Moreover, our algorithm is compatible with many ANNtoSNN conversion methods and can remarkably promote performance in low inference time.
Acknowledgments
This work was supported by the National Natural Science Foundation of China (62176003, 62088102, 61961130392).
References
 Stochastic gradient descent tricks. In Neural networks: Tricks of the trade, pp. 421–436. Cited by: Implementation details.

Spiking deep convolutional neural networks for energyefficient object recognition
.International Journal of Computer Vision
113 (1), pp. 54–66. Cited by: Introduction, Gradientbased optimization, ANNs and SNNs, ANNs and SNNs.  Pruning of deep spiking neural networks through gradient rewiring. In International Joint Conference on Artificial Intelligence, pp. 1713–1721. Cited by: Gradientbased optimization.

Autoaugment: learning augmentation strategies from data.
In
IEEE Conference on Computer Vision and Pattern Recognition
, pp. 113–123. Cited by: Implementation details.  Loihi: a neuromorphic manycore processor with onchip learning. IEEE Micro 38 (1), pp. 82–99. Cited by: Introduction.
 Rethinking the performance comparison between snns and anns. Neural Networks 121, pp. 294–307. Cited by: Gradientbased optimization.
 Optimal conversion of conventional artificial neural networks to spiking neural networks. In International Conference on Learning Representations, Cited by: Introduction, Table 1, Table 2, Table 3.
 Improved regularization of convolutional neural networks with cutout. arXiv preprint arXiv:1708.04552. Cited by: Implementation details.

Fastclassifying, highaccuracy spiking deep networks through weight and threshold balancing
. In International Joint Conference on Neural Networks, pp. 1–8. Cited by: Introduction, Gradientbased optimization, ANNs and SNNs, Theory for ANNSNN conversion.  Optimal annsnn conversion for fast and accurate inference in deep spiking neural networks. In International Joint Conference on Artificial Intelligence, pp. 2328–2336. Cited by: Gradientbased optimization, Implementation details, Apply optimal initial potentials to other models, Table 1.
 Deep residual learning in spiking neural networks. In ThirtyFifth Conference on Neural Information Processing Systems, Cited by: Gradientbased optimization.
 Incorporating learnable membrane time constant to enhance learning of spiking neural networks. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 2661–2671. Cited by: Gradientbased optimization.
 Overview of the spinnaker system architecture. IEEE Transactions on Computers 62 (12), pp. 2454–2467. Cited by: Introduction.
 Spiking neuron models: single neurons, populations, plasticity. Cambridge university press. Cited by: Introduction.
 Deep spiking neural network: energy efficiency through time based coding. In European Conference on Computer Vision, pp. 388–404. Cited by: Gradientbased optimization, ANNs and SNNs, Theory for ANNSNN conversion, Table 1, Table 2, Table 3.
 RMPSNN: residual membrane potential neuron for enabling deeper highaccuracy and lowlatency spiking neural network. In IEEE Conference on Computer Vision and Pattern Recognition, pp. 13558–13567. Cited by: Introduction, Gradientbased optimization, ANNs and SNNs, Implementation details, Apply optimal initial potentials to other models, Table 1, Table 2, Table 3.
 Deep residual learning for image recognition. In IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778. Cited by: Table 1, Table 2.
 TCL: an anntosnn conversion with trainable clipping layers. arXiv preprint arXiv:2008.04509. Cited by: Gradientbased optimization, Implementation details.
 Spiking deep residual network. arXiv preprint arXiv:1805.01352. Cited by: Energy Estimation on Neuromorphic Hardware.
 Lowlatency spiking neural networks using precharged membrane potential and delayed evaluation. Frontiers in Neuroscience 15, pp. 135. Cited by: Gradientbased optimization.
 Temporal backpropagation for spiking neural networks with one spike per neuron. International Journal of Neural Systems 30 (06), pp. 2050027. Cited by: Gradientbased optimization.
 Unifying activation and timingbased learning rules for spiking neural networks. In Advances in Neural Information Processing Systems, pp. 19534–19544. Cited by: Gradientbased optimization.
 Spikingyolo: spiking neural network for energyefficient object detection. In AAAI Conference on Artificial Intelligence, pp. 11270–11277. Cited by: Gradientbased optimization.
 Enabling spikebased backpropagation for training deep neural network architectures. Frontiers in Neuroscience 14. Cited by: Introduction, Gradientbased optimization.
 Training deep spiking neural networks using backpropagation. Frontiers in Neuroscience 10, pp. 508. Cited by: Introduction, Gradientbased optimization.

A free lunch from ann: towards efficient, accurate spiking neural networks calibration.
In
International Conference on Machine Learning
, pp. 6316–6325. Cited by: Gradientbased optimization, Table 1, Table 2, Table 3.  SGDR: stochastic gradient descent with warm restarts. In International Conference on Learning Representations, Cited by: Implementation details.
 Networks of spiking neurons: the third generation of neural network models. Neural Networks 10 (9), pp. 1659–1671. Cited by: Introduction.
 A million spikingneuron integrated circuit with a scalable communication network and interface. Science 345 (6197), pp. 668–673. Cited by: Introduction.
 Supervised learning based on temporal coding in spiking neural networks. IEEE Transactions on Neural Networks and Learning Systems 29 (7), pp. 3227–3235. Cited by: Gradientbased optimization.
 Surrogate gradient learning in spiking neural networks: bringing the power of gradientbased optimization to spiking neural networks. IEEE Signal Processing Magazine 36 (6), pp. 51–63. Cited by: Introduction, Gradientbased optimization.
 Towards artificial general intelligence with hybrid tianjic chip architecture. Nature 572 (7767), pp. 106–111. Cited by: Introduction.

A reconfigurable online learning spiking neuromorphic processor comprising 256 neurons and 128K synapses
. Frontiers in Neuroscience 9, pp. 141. Cited by: Energy Estimation on Neuromorphic Hardware.  Enabling deep spiking neural networks with hybrid conversion and spike timing dependent backpropagation. In International Conference on Learning Representations, Cited by: Table 1.
 Towards spikebased machine intelligence with neuromorphic computing. Nature 575 (7784), pp. 607–617. Cited by: Introduction.
 Conversion of continuousvalued deep networks to efficient eventdriven networks for image classification. Frontiers in Neuroscience 11, pp. 682. Cited by: Introduction, ANNs and SNNs, Theory for ANNSNN conversion, Table 1.
 Theory and tools for the conversion of analog to spiking convolutional neural networks. arXiv preprint arXiv:1612.04052. Cited by: Gradientbased optimization.
 A waferscale neuromorphic hardware system for largescale neural modeling. In IEEE International Symposium on Circuits and Systems, pp. 1947–1950. Cited by: Introduction.
 Going deeper in spiking neural networks: VGG and residual architectures. Frontiers in Neuroscience 13, pp. 95. Cited by: Gradientbased optimization, Table 1, Table 2.
 Inherent adversarial robustness of deep spiking neural networks: effects of discrete input encoding and nonlinear activations. In European Conference on Computer Vision, pp. 399–414. Cited by: Introduction.
 SLAYER: spike layer error reassignment in time. In Advances in Neural Information Processing Systems, pp. 1419–1428. Cited by: Introduction, Gradientbased optimization.
 Very deep convolutional networks for largescale image recognition. arXiv preprint arXiv:1409.1556. Cited by: Table 1, Table 2.
 Optimized spiking neurons can classify images with high accuracy through temporal coding with two spikes. Nature Machine Intelligence 3 (3), pp. 230–238. Cited by: Gradientbased optimization.

Inceptionv4, inceptionresnet and the impact of residual connections on learning
. arXiv preprint arXiv:1602.07261. Cited by: Implementation details.  A tandem learning rule for effective training and rapid inference of deep spiking neural networks. IEEE Transactions on Neural Networks and Learning Systems, pp. 1–15. Cited by: Gradientbased optimization.
 Spatiotemporal backpropagation for training highperformance spiking neural networks. Frontiers in Neuroscience 12, pp. 331. Cited by: Introduction, Gradientbased optimization.
 Visualizing a joint future of neuroscience and neuromorphic engineering. Neuron 109 (4), pp. 571–575. Cited by: Gradientbased optimization.
 The remarkable robustness of surrogate gradient learning for instilling complex function in spiking neural networks. Neural Computation 33 (4), pp. 899–925. Cited by: Gradientbased optimization.
 Temporal spike sequence learning via backpropagation for deep spiking neural networks. In Advances in Neural Information Processing Systems, pp. 12022–12033. Cited by: Gradientbased optimization.
 Going deeper with directlytrained larger spiking neural networks. In AAAI Conference on Artificial Intelligence, pp. 11062–11070. Cited by: Introduction, Gradientbased optimization.
 Temporalcoded deep spiking neural network with easy training and robust performance. In AAAI Conference on Artificial Intelligence, pp. 11143–11151. Cited by: Gradientbased optimization.
Appendix
Proofs of Theorem 1
theorem 1. The expectation of square conversion error (Eq. 13) reaches the minimum value when the initial value is , meanwhile the expectation of conversion error reaches , that is:
(16)  
(17) 
Proof.
The expectation of square conversion error (Eq. 13 in the main text) can be rewritten as:
(18) 
where and denote the th element in and , respectively. is the number of element in , namely, the number of neurons in layer . In order to minimize , we just need to minimize each (). As is uniformly distributed in every small interval
with the probability density function
(), where for , we have:(19) 
The last equality holds as . One can find that () reaches the minimal when . Thus we can conclude that:
(20) 
Now we compute the expectation of conversion error,
(21) 
If , we have , thus we can get that and (). We can conclude that:
(22) 
∎
The effect of membrane potential initialization
We make a concrete analysis of the performance of converted SNN with different initial membrane potential, and illustrated the results in Fig. 5. Here the timessteps varies from 1 to 75, and the initial potential varies from 0.1 to 0.9. The brighter areas indicate better performance. One can find that the optimal initial potential is always between 0.4 and 0.6, and the converted SNNs with an initial membrane potential of 0.5 achieve optimal or nearoptimal performance.
Energy Estimation on Neuromorphic Hardware
We analyze the energy consumption of our method. Following the analysis method in Hu et al. (2018), we use FLOP for ANN and the synaptic operation (SOP) for SNN to represent the total numbers of operations to classily one image. We then multiply the number of operations by the power efficiency of FPGAs and neuromorphic hardware, respectively. For ANN, an Intel Stratix 10 TX operates at the cost of 12.5pJ per FLOP, while for SNN, a neuromorphic chip ROLLS consumes 77fJ per SOP Qiao et al. (2015). Table 4 compares the energy consumption of original ANNs (VGG16 and ResNet20) and converted SNNs, where the inference time of SNN is set to 32 timesteps. We can find that the proposed method can reach 62 times energy efficiency than ANN with VGG16 structure and 37 times energy efficiency than ANN with ResNet20 structure.
VGG16  ResNet20  
ANN OP (MFLOP)  332.973  41.219 
SNN OP (MSOP)  869.412  179.060 
ANN Power (mJ)  4.162  0.515 
SNN Power (mJ)  0.067  0.0138 
A/S Power Ratio  62  37 