DeepAI
Log In Sign Up

Can Deep Neural Networks be Converted to Ultra Low-Latency Spiking Neural Networks?

12/22/2021
by   Gourav datta, et al.
0

Spiking neural networks (SNNs), that operate via binary spikes distributed over time, have emerged as a promising energy efficient ML paradigm for resource-constrained devices. However, the current state-of-the-art (SOTA) SNNs require multiple time steps for acceptable inference accuracy, increasing spiking activity and, consequently, energy consumption. SOTA training strategies for SNNs involve conversion from a non-spiking deep neural network (DNN). In this paper, we determine that SOTA conversion strategies cannot yield ultra low latency because they incorrectly assume that the DNN and SNN pre-activation values are uniformly distributed. We propose a new training algorithm that accurately captures these distributions, minimizing the error between the DNN and converted SNN. The resulting SNNs have ultra low latency and high activation sparsity, yielding significant improvements in compute efficiency. In particular, we evaluate our framework on image recognition tasks from CIFAR-10 and CIFAR-100 datasets on several VGG and ResNet architectures. We obtain top-1 accuracy of 64.19 dataset with  159.2x lower compute energy compared to an iso-architecture standard DNN. Compared to other SOTA SNN models, our models perform inference 2.5-8x faster (i.e., with fewer time steps).

READ FULL TEXT VIEW PDF
10/23/2022

Towards Energy-Efficient, Low-Latency and Accurate Spiking LSTMs

Spiking Neural Networks (SNNs) have emerged as an attractive spatio-temp...
10/27/2022

Low Latency Conversion of Artificial Neural Network Models to Rate-encoded Spiking Neural Networks

Spiking neural networks (SNNs) are well suited for resource-constrained ...
07/31/2022

Ultra-low Latency Adaptive Local Binary Spiking Neural Network with Accuracy Loss Estimator

Spiking neural network (SNN) is a brain-inspired model which has more sp...
07/15/2021

Training for temporal sparsity in deep neural networks, application in video processing

Activation sparsity improves compute efficiency and resource utilization...
09/10/2018

Fast and Efficient Information Transmission with Burst Spikes in Deep Spiking Neural Networks

The spiking neural networks (SNNs), the 3rd generation of neural network...
04/07/2021

NullaNet Tiny: Ultra-low-latency DNN Inference Through Fixed-function Combinational Logic

While there is a large body of research on efficient processing of deep ...

I Introduction

Spiking Neural Networks (SNNs) attempt to emulate the remarkable energy efficiency of the brain in vision, perception, and cognition-related tasks using event-driven neuromorphic hardware [13]

. Neurons in an SNN exchange information via discrete binary spikes, representing a significant paradigm shift from high-precision, continuous-valued deep neural networks (DNN)

[26, 1]. Due to its high activation sparsity and use of accumulates (AC) instead of expensive multiply-and-accumulates (MAC), SNNs have emerged as a promising low-power alternative to DNNs whose hardware implementations are typically associated with high compute and memory costs.

Because SNNs receive and transmit information via spikes, analog inputs have to be encoded with a sequence of spikes. There have been multiple encoding methods proposed, such as rate coding [6], temporal coding [2], rank-order coding [14], and others. However, recent works [27, 32, 18] showed that, instead of converting the image pixel values into spike trains, directly feeding the analog pixel values in the first convolutional layer, and thereby, emitting spikes only in the subsequent layers, can reduce the number of time steps needed to achieve SOTA accuracy by an order of magnitude. Although the first layer now requires MACs, as opposed to the cheaper ACs in the remaining layers, the overhead is negligible for deep convolutional architectures. Hence, we adopt this technique, termed direct encoding, in this work.

In addition to accommodating various forms of encoding inputs, supervised learning algorithms for SNNs have overcome various roadblocks associated with the discontinuous derivative of the spike activation function

[21, 15]

. Moreover, SNNs can be converted from DNNs with low error by approximating the activation value of ReLU neurons with the firing rate of spiking neurons

[30]. SNNs trained using DNN-to-SNN conversion, coupled with supervised training, have been able to perform similar to SOTA DNNs in terms of test accuracy in traditional image recognition tasks [27, 28]. However, the training effort still remains high, because SNNs need multiple time steps (at least with direct encoding [27]

) to process an input, and hence, the backpropagation step requires the gradients of the unrolled SNN to be integrated over all these time steps, which significantly increases the memory cost

[24]. Moreover, the multiple forward passes result in an increased number of spikes, which degrade the SNN’s energy efficiency, both during training and inference, and possibly offset the compute advantage of the ACs. This motivates our exploration of novel training algorithms to reduce both the test error of a DNN and the conversion error to a SNN, while keeping the number of time steps extremely small during both training and inference.

In summary, the current challenges in SNNs are multiple time steps, large spiking activity, and high training effort, both in terms of compute and memory. To address these challenges, this paper makes the following contributions.

  • We analytically and empirically show that the primary source of error in current DNN-to-SNN conversion strategies [4, 22] is the incorrect and simplistic model of the distributions of DNN and SNN activations.

  • We propose a novel DNN-to-SNN conversion and fine-tuning algorithm that reduces the conversion error for ultra low latencies by accurately capturing these distributions and thus minimizing the difference between SNN and DNN activation functions.

  • We demonstrate the latency-accuracy trade-off benefits of our proposed framework through extensive experiments with both VGG [31] and ResNet [10] variants of deep SNN models on CIFAR- and CIFAR- [16]. We benchmark and compare the models’ training time, memory requirements, and inference energy efficiency in both GPU and neuromorphic hardware with two SOTA low-latency SNNs.111We use VGG on CIFAR- and CIFAR- to show compute efficiency.

The remainder of this paper is organized as follows. Section II-A provides background on DNNs and SNNs and the SOTA DNN-to-SNN conversion techniques. Section III explains why these techniques fail for ultra-low SNN latencies and discusses our proposed methodology. Our accuracy and latency results are presented in Section IV and our analysis of training resources and inference energy efficiency is presented in Sections V and VI, respectively. The paper concludes in Section VII.

Ii Background

Ii-a Difference between DNNs and SNNs

Neurons in a non-spiking DNN integrate weight-modulated analog inputs and apply a non-linear activation function. Although ReLU is widely used as the activation function, previous work [11] has proposed a trainable threshold term, , for similarity with SNNs. In particular, the neuron outputs with threshold ReLU can be expressed as

(1)

where , and and denote the outputs of the neurons in the preceding layer and the weights connecting the two layers. The gradients of

are estimated using gradient descent during the backward computations of the DNN.

On the other hand, the computation dynamics of a SNN are typically represented by the popular Leaky-Integrate-and-Fire (LIF) model [20], where a neuron transmits binary spike trains (except the input layer for direct encoding) over multiple time steps ( denotes the presence of a spike). To account for the temporal dimension of the inputs, each input has an internal state called a membrane potential, which captures the integration of the incoming (pre-neuron) spikes (denoted as ) modulated by weights and leaks with a fixed time constant. Each neuron emits an output spike whenever crosses a spiking threshold after which is reduced by . This behavior of the membrane potential and output can be expressed as

(2)
(3)
(4)

where denotes the leak term. When , the SNN model is termed Integrate-and-Fire (IF).

Ii-B DNN-to-SNN Conversion

Previous research has demonstrated that SNNs can be converted from DNNs with negligible accuracy drop by approximating the activation value of ReLU neurons with the firing rate of IF neurons using a threshold balancing technique that copies the weights from the source DNN to the target SNN [1, 29, 5, 30]. Since this technique uses the standard backpropagation algorithm for DNN training, and thus involves only a single forward pass to process a single input, the training procedure is simpler than the approximate gradient techniques used to train SNNs from scratch. However, the key disadvantage of DNN-to-SNN conversion is that it yields SNNs with much higher latency compared to other techniques. Some previous research [9, 22]

proposed to down-scale the threshold term to train low-latency SNNs, but the scaling factor was either a hyperparameter or obtained via linear grid-search, and the latency needed for convergence still remained large (

).

To further reduce the conversion error, [4] minimized the difference between the DNN and SNN post-activation values for each layer. To do this, the activation function of the IF SNN must first be derived [4, 22]. We assume that the initial membrane potential of a layer () is . Moreover, we let be the average SNN output of layer . Then, where is the discrete output at the time step, and is the total number of time steps,

(5)

where and denote the layer threshold and weight matrix respectively. Eq 5 is illustrated in Fig. 1(a) by the piecewise staircase function of the SNN activation.

Reference [4] also proved that the average difference in the post-activation values can be reduced by adding a bias term to shift the SNN activation curve to the left by , as shown in Fig. 1(a), assuming both the DNN () and SNN () pre-activation values are uniformly and identically distributed. To further reduce the difference, [4] added a non-trainable threshold equal to the maximum DNN pre-activation value to the ReLU activation function in each layer and equated it with the SNN spiking threshold, which ensures zero difference between the DNN and SNN post-activation values when the DNN pre-activation values exceed . However,

is an outlier, and

of the pre-activation values lie between . Hence, we propose to use the ReLU activation with a trainable threshold for each layer (denoted as , where for all layers) as discussed in Section II-A and shown in Fig. 1(a). This trainable threshold, as will be described below, also helps reduce the average difference for non-uniform DNN pre-activation distributions.

Iii Proposed Training Framework

In this section, we analytically and empirically show that the SOTA conversion strategies, along with our proposed modification described above, fail to obtain the SOTA SNN test accuracy for smaller time steps. We then propose a novel conversion algorithm that scales the SNN threshold and post-activation values to reduce the conversion error for small .

Fig. 1: (a) Comparison between DNN (threshold ReLU) and SNN (both original and bias-added) activation functions, the distribution of DNN and SNN () pre-activation values and variation of (see Eq. 7) with for the layer of VGG- architecture on CIFAR-, and (b) Proposed scaling of the threshold and output of the SNN post-activation values.

Iii-a Why Does Conversion Fail for Ultra Low Latencies?

Even though we can minimise the difference between the DNN and SNN post-activation values with bias addition and thresholding, in practice, the SNNs obtained are still not as accurate as their iso-architecture DNN counterparts when decreases substantially. We empirically show this trend for VGG and ResNet architectures on the CIFAR- dataset in Fig. 2. This is due to the flawed baseline assumption that the DNN and SNN pre-activation are uniformly distributed

. Both the distributions are rather skewed (i.e., most of the values are close to

), as illustrated in Fig. 1(a).

To analytically see this, let us assume the DNN and SNN pre-activation probability density functions are

and and post-activation values are denoted as and , respectively. Assuming , derived from DNN training, the expected difference in the post-activation values for a particular layer and can be written as

(6)

where the first approximation is due to the fact that greater than of both and are less than . The subsequent equality is because when . The last equality is based on the introduction of which captures the bias shift of , and and the observation that the term lies between its upper and lower integral limits, and thus can be re-written as , where lies in the range . The exact value of depends on the distribution .

Assuming , Eq. III-A can be then written as

(7)

When and are uniformly distributed in the range , they must equal . This implies that and, consequently, . Moreover, , and hence the first term of , , whereas the second term, , equals . Hence, similar to , . Thus, Eq. 7 evaluates to which implies the error can be completely eliminated, as also concluded in [4].

However, when the distributions are skewed, we observe that while is independent of , decreases significantly as we reduce below around 5, as shown in the insert in Fig. 1(a). Intuitively, for small

, most of the probability density of

lies to the left of the first staircase starting at , due to its sharply decreasing nature. Consequently, the remaining area under the curve captured in becomes negligible, reducing the number of output spikes significantly.

Fig. 2: Effect of the number of SNN time steps on the test accuracy of VGG and ResNet architectures on CIFAR- with DNN-to-SNN conversion based on both threshold ReLU and the maximum pre-activation value used in [4].

Hence, for ultra-low SNN latencies, the error per layer remains significant and accumulates over the network.

This analysis explains the accuracy gap that is observed between original DNNs and their converted SOTA SNNs for , as exemplified in Fig. 2. Moreover, training with a non-trainable threshold [4], can be modeled by replacing with in Eq. 7. This further increases , as observed from the increased accuracy degradation shown in Fig. 2.

Iii-B Conversion & Fine-tuning for Ultra Low-Latency SNNs

While Eq. 7 suggests that we can tune to compensate for low , this introduces other errors. In particular, if we replace with a down-scaled version222Up-scaling further reduces the output spike count and increases the error. , with , the SNN activation curve will shift left, as shown in Fig. 1(b), and there will be an additional difference between and that stems from the values of and in the range as follows

To mitigate this additional error term, we propose to also optimize the step size of the SNN activation function in the y-direction by modifying the IF model from Eq. 3,

(8)

which introduces another scaling factor illustrated in Fig. 1(b). Moreover, we remove the bias term since it complicates the parameter space exploration and poses difficulty in training the SNNs after conversion, changing to . This results in a new difference function

Thus, our task reduces to finding the and that minimises for a given low .

Since it is difficult to analytically compute to guide SNN conversion, we empirically estimate it by discretizing into percentiles , where is the largest integer satisfying , using the activations of a particular layer of the trained DNN. In particular, for each , we vary between and with a step size of , as shown in Algorithm 1. This percentile-based approach for is better than a linear search because it enables a finer-grained analysis in the range of with higher likelihood. We find the () pair that yields the lowest for each DNN layer.

For DNN-to-SNN conversion, we copy the SNN weights from a pretrained DNN with trainable threshold , set each layer threshold as , and produce an output whenever the membrane potential crosses the threshold. Although we incur an overhead of two additional parameters per SNN layer, the parameter increase is negligible compared to the total number of weights. Moreover, as the outputs for each time step are either or , we can absorb the scaling factor into the weight values, avoiding the need for explicit multiplication. After conversion, we apply SGL in the SNN domain where we jointly fine-tune the threshold, leak, and weights [27]. To approximate the gradient of ReLU, we compute the surrogate gradient as , and otherwise, which is used to estimate the gradients of the trainable parameters [27].

1 Input: Activations , Total time steps T, ReLU threshold
Data: , percentiles , where is the largest integer satisfying , initial scaling factors and
2 Output: Final scaling factors and
3 Function ComputeLoss(,,,,):
4       
5        foreach  do
6              for  to  do
7                     if   then
8                             #Seg-I in Fig. 1(b)
9                      end if
10                     
11               end for
12              
13        end foreach
14       if  then
15               #Seg-II in Fig. 1(b)
16        end if
17       else if  then
18               #Seg-III in Fig. 1(b)
19        return
20
21End Function.
22 Function FindScalingFactors(,,):
23        ComputeLoss
24        foreach  do
25               for  to (step size of ) do
26                      ComputeLoss
27                      if  then
28                            , ,
29                            
30                      end if
31                     
32               end for
33              
34        end foreach
35       return
36
37End Function.
Algorithm 1 Detailed algorithm for finding layer-wise scaling factors for SNN threshold & post-activations

Iv Experimental Results

Iv-a Experimental Setup

Since we omit the bias term during DNN-to-SNN conversion described in Section III-B

, we avoid Batch Normalization, and instead use Dropout as the regularizer for both ANN and SNN training. Although prior works

[27, 30, 28]

claim that max pooling incurs information loss for binary-spike-based activation layers, we use max pooling because it improves the accuracy of both the baseline DNN and converted SNN. Moreover, max pooling layers produce binary spikes at the output, and ensures that the SNN only requires AC operations for all the hidden layers

[7], thereby improving energy efficiency.

We performed the baseline DNN training for epochs with an initial learning rate (LR) of that decays by a factor of at every , , and of the total number of epochs. Initialized with the layer thresholds and post-activation values, we performed the SNN training with direct input encoding for epochs for CIFAR- and epochs for CIFAR-. We used a starting LR of which decays similar to that in DNN training. All experiments are performed on a Nvidia Ti GPU with GB memory.

Number a. b. Accuracy () c. Accuracy ()
Architecture of DNN () with DNN-to-SNN after SNN
time steps accuracy conversion training
Dataset : CIFAR-10
VGG-11 2 90.76 65.82 89.39
3 91.10 78.76 89.79
VGG-16 2 93.26 69.58 91.79
3 93.26 85.06 91.93
ResNet-20 2 93.07 61.96 90.00
3 93.07 73.57 90.06
Dataset : CIFAR-100
VGG-16 2 68.45 19.57 64.19
3 68.45 36.84 63.92
ResNet-20 2 63.88 19.85 57.81
3 63.88 31.43 59.29

TABLE I: Model performances with proposed training framework after a) DNN training, b) DNN-to-SNN conversion & c) SNN training.

Iv-B Classification Accuracy & Latency

We evaluated the performance of these networks on multiple VGG and ResNet architectures, namely VGG-, and VGG-, and Resnet- for CIFAR-, VGG- and Resnet- for CIFAR-. We report the (a) baseline DNN accuracy, (b) SNN accuracy with our proposed DNN-to-SNN conversion, and (c) SNN accuracy with conversion, followed by SGL, for and time steps. Note that the models reported in (b) are far from SOTA, but act as a good initialization for SGL.

Table II provides a comparison of the performances of models generated through our training framework with SOTA deep SNNs. On CIFAR-, our approach outperforms the SOTA VGG-based SNN [27] with fewer time steps and negligible drop in test accuracy. To the best of our knowledge, our results represent the first successful training and inference of CIFAR- on an SNN with only time steps, yielding a reduction in latency compared to others.

Ablation Study:

The threshold scaling heuristics proposed in

[22, 9], coupled with SGL, lead to a statistical test accuracy of and on CIFAR- and CIFAR- respectively, with both and time steps. Also, our scaling technique alone (without SGL) requires steps, while the SOTA conversion approach [4] needs steps to obtain similar test accuracy.

V Simulation Time & Memory Requirements

Because SNNs require iteration over multiple time steps and storage of the membrane potentials for each neuron, their simulation time and memory requirements can be substantially higher than their DNN counterparts. However, reducing their latency can bridge this gap significantly, as shown in Figure 3. On average, our low-latency, 2-time-step SNNs represent a and reduction in training and inference time per epoch respectively, compared to the hybrid training approach [27] which represents the SOTA in latency, with iso-batch conditions. Also, our proposal uses lower GPU memory compared to [27] during training, while the inference memory usage remains almost identical.

Authors Training Architecture Accuracy Time
type () steps
Dataset : CIFAR-10
Wu et al. Surrogate 5 CONV, 90.53 12
(2019) [32] gradient 2 linear
Rathi et al. Hybrid VGG- 92.70 5
(2020) [27] training
Kundu et al. Hybrid VGG- 92.74 10
(2021) [19] training
Deng et al. DNN-to-SNN VGG-16 92.29 16
(2021) [4] conversion
This work Hybrid Training VGG-16 91.79 2
Dataset : CIFAR-100
Kundu et al. Hybrid VGG- 65.34 10
(2021) [19] training CNN
Deng et al. DNN-to-SNN VGG- 65.94 16
(2021) [4] conversion
This work Hybrid Training VGG16 64.19 2
TABLE II: Performance comparison of the proposed training framework with state-of-the-art deep SNNs on CIFAR-10 and CIFAR-100.
Fig. 3: Comparison between our proposed hybrid training technique for and time steps, baseline direct encoded training for time steps [27] based on (a) simulation time per epoch, and (b) memory consumption, for VGG- architecture over CIFAR- and CIFAR- datasets
Fig. 4: Comparison between our proposed hybrid training technique for and time steps, baseline direct encoded training for time steps [27], and the optimal DNN-to-SNN conversion technique [4] for time steps, based on (a) average spike count, (b) total number of FLOPs, and (c) compute energy, for VGG- architecture over CIFAR- and CIFAR- datasets. An iso-architecture DNN is also included for comparison of FLOP count and compute energy.

Vi Energy Consumption During Inference

Vi-a Spiking Activity

As suggested in [17, 3], the average spiking activity of an SNN layer can be used as a measure of the compute energy of the model during inference. This is computed as the ratio of the total number of spikes in steps over all the neurons of the layer to the total number of neurons in that layer. Fig. 4(a) shows the per-image average number of spikes for each layer with our proposed algorithm (using both and time steps), the hybrid training algorithm by [27] (with steps), and the SOTA conversion algorithm [4] which requires

time steps, while classifying CIFAR-

and CIFAR- using VGG-. On average, our approach yields and reduction in spike count compared to [27] and [4], respectively.

Vi-B Floating Point Operations (FLOPs) & Compute Energy

We use FLOP count to capture the energy efficiency of our SNNs, since each emitted spike indicates which weights need to be accumulated at the post-synaptic neurons and results in a fixed number of AC operations. This, coupled with the MAC operations required for direct encoding in the first layer (also used in [27, 4]), dominates the total number of FLOPs. For DNNs, FLOPs are dominated by the MAC operations in all the convolutional and linear layers. Assuming and denote the MAC and AC energy respectively, the inference compute energy of the baseline DNN model can be computed as , whereas that of the SNN model as , where and are the FLOPs count in the layer of DNN and SNN respectively.

Fig. 4(b) and (c) illustrate the FLOP counts and compute energy consumption for our baseline DNN and SNN models of VGG16 while classifying CIFAR-datasets, along with the SOTA comparisons [27, 4]. As we can see, the number of FLOPs for our low-latency SNN is smaller than that for an iso-architecture DNN and the SNNs obtained from the prior works. Moreover, ACs consume significantly less energy than MACs both on GPU as well as neuromorphic hardware. To estimate the compute energy, we assume a nm CMOS process at V, where pJ, while pJ for multiplication and for addition) [12] for 32-bit integer representation. Then, for CIFAR-, our proposed SNN consumes lower compute energy compared to its DNN counterpart and and lower energy than [27] and [4] respectively. For CIFAR-, the improvements are over the baseline DNN, over the 5-step hybrid SNN, and over the 16-step optimally converted SNN.

On custom neuromorphic architectures, such as TrueNorth [23], and SpiNNaker [8], the total energy is estimated as [25], where the parameters can be normalized to and for TrueNorth and SpiNNaker, respectively [25]. Since the total FLOPs for VGG- is several orders of magnitude higher than the SOTA , the total energy of a deep SNN on neuromorphic hardware is compute bound and thus we would see similar energy improvements on them.

Vii Conclusions

This paper shows that current DNN-to-SNN algorithms cannot achieve ultra low latencies because they rely on simplistic assumptions of the DNN and SNN pre-activation distributions. The paper then proposes a novel training algorithm, inspired by empirically observed distributions, that can more effectively optimize the SNN thresholds and post-activation values. This approach enables training of SNNs with as little as time steps and without any significant degradation in accuracy for complex image recognition tasks. The resulting SNNs are estimated to consume lower energy than iso-architecture DNNs.

References

  • [1] Y. Cao et al. (2015-05)

    Spiking deep convolutional neural networks for energy-efficient object recognition

    .

    International Journal of Computer Vision

    113, pp. 54–66.
    Cited by: §I, §II-B.
  • [2] I. M. Comsa et al. (2020) Temporal coding in spiking neural networks with alpha synaptic function. In ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Vol. 1, pp. 8529–8533. Cited by: §I.
  • [3] G. Datta et al. (2021) Training energy-efficient deep spiking neural networks with single-spike hybrid input encoding. In 2021 International Joint Conference on Neural Networks (IJCNN), Vol. 1, pp. 1–8. External Links: Document Cited by: §VI-A.
  • [4] S. Deng et al. (2021) Optimal conversion of conventional artificial neural networks to spiking neural networks. In International Conference on Learning Representations, Cited by: 1st item, §II-B, §II-B, Fig. 2, §III-A, §III-A, §IV-B, Fig. 4, TABLE II, §VI-A, §VI-B, §VI-B.
  • [5] P. U. Diehl et al. (2015) Fast-classifying, high-accuracy spiking deep networks through weight and threshold balancing. In 2015 International Joint Conference on Neural Networks (IJCNN), Vol. 1, pp. 1–8. Cited by: §II-B.
  • [6] P. U. Diehl et al. (2016)

    Conversion of artificial recurrent neural networks to spiking neural networks for low-power neuromorphic hardware

    .
    In 2016 IEEE International Conference on Rebooting Computing (ICRC), pp. 1–8. Cited by: §I.
  • [7] W. Fang, Z. Yu, Y. Chen, T. Masquelier, T. Huang, and Y. Tian (2020) Incorporating learnable membrane time constant to enhance learning of spiking neural networks. arXiv preprint arXiv:2007.05785. External Links: 2007.05785 Cited by: §IV-A.
  • [8] S. B. Furber et al. (2014) The spinnaker project. Proceedings of the IEEE 102 (5), pp. 652–665. External Links: Document Cited by: §VI-B.
  • [9] B. Han et al. (2020) Deep spiking neural network: energy efficiency through time based coding. In European Conference on Computer Vision (ECCV), pp. 388–404. Cited by: §II-B, §IV-B.
  • [10] K. He et al. (2016) Deep residual learning for image recognition. In

    Proceedings of the IEEE conference on computer vision and pattern recognition

    ,
    pp. 770–778. Cited by: 3rd item.
  • [11] N. Ho et al. (2021) TCL: an ANN-to-SNN conversion with trainable clipping layers. arXiv preprint arXiv:2008.04509. External Links: 2008.04509 Cited by: §II-A.
  • [12] M. Horowitz (2014) Computing’s energy problem (and what we can do about it). In 2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC), pp. 10–14. Cited by: §VI-B.
  • [13] G. Indiveri et al. (2011) Frontiers in neuromorphic engineering. Frontiers in Neuroscience 5. Cited by: §I.
  • [14] S. R. Kheradpisheh et al. (2020-05) Temporal backpropagation for spiking neural networks with one spike per neuron. International Journal of Neural Systems 30 (06). Cited by: §I.
  • [15] Y. Kim et al. (2020) Revisiting batch normalization for training low-latency deep spiking neural networks from scratch. arXiv preprint arXiv:2010.01729. External Links: 2010.01729 Cited by: §I.
  • [16] A. Krizhevsky et al. (2009) Learning multiple layers of features from tiny images. Technical report Technical Report 0, Technical report, University of Toronto, University of Toronto, Toronto, Ontario. Cited by: 3rd item.
  • [17] S. Kundu et al. (2021-01) Spike-thrift: towards energy-efficient deep spiking neural networks by limiting spiking activity via attention-guided compression. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pp. 3953–3962. Cited by: §VI-A.
  • [18] S. Kundu et al. (2021-10) HIRE-snn: harnessing the inherent robustness of energy-efficient deep spiking neural networks by training with crafted input noise. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 5209–5218. Cited by: §I.
  • [19] S. Kundu et al. (2021) Towards low-latency energy-efficient deep SNNs via attention-guided compression. arXiv preprint arXiv:2107.12445. External Links: 2107.12445 Cited by: TABLE II.
  • [20] C. Lee et al. (2020) Enabling spike-based backpropagation for training deep neural network architectures. Frontiers in Neuroscience 14. Cited by: §II-A.
  • [21] J. H. Lee et al. (2016) Training deep spiking neural networks using backpropagation. Frontiers in Neuroscience 10. Cited by: §I.
  • [22] Y. Li et al. (2021) A free lunch from ANN: towards efficient, accurate spiking neural networks calibration. arXiv preprint arXiv:2106.06984. External Links: 2106.06984 Cited by: 1st item, §II-B, §II-B, §IV-B.
  • [23] P. Merolla et al. (2014) A million spiking-neuron integrated circuit with a scalable communication network and interface. Science 345, pp. 668–673. Cited by: §VI-B.
  • [24] P. Panda et al. (2020)

    Toward scalable, efficient, and accurate deep spiking neural networks with backward residual connections, stochastic softmax, and hybridization

    .
    Frontiers in Neuroscience 14. Cited by: §I.
  • [25] S. Park et al. (2020) T2FSNN: deep spiking neural networks with time-to-first-spike coding. arXiv preprint arXiv:2003.11741. External Links: 2003.11741 Cited by: §VI-B.
  • [26] M. Pfeiffer et al. (2018) Deep learning with spiking neurons: opportunities and challenges. Frontiers in Neuroscience 12, pp. 774. Cited by: §I.
  • [27] N. Rathi et al. (2020) DIET-SNN: direct input encoding with leakage and threshold optimization in deep spiking neural networks. arXiv preprint arXiv:2008.03658. External Links: 2008.03658 Cited by: §I, §I, §III-B, §IV-A, §IV-B, Fig. 3, Fig. 4, TABLE II, §V, §VI-A, §VI-B, §VI-B.
  • [28] N. Rathi et al. (2020) Enabling deep spiking neural networks with hybrid conversion and spike timing dependent backpropagation. arXiv preprint arXiv:2005.01807. External Links: 2005.01807 Cited by: §I, §IV-A.
  • [29] B. Rueckauer et al. (2017) Conversion of continuous-valued deep networks to efficient event-driven networks for image classification. Frontiers in Neuroscience 11, pp. 682. Cited by: §II-B.
  • [30] A. Sengupta et al. (2019) Going deeper in spiking neural networks: VGG and residual architectures. Frontiers in Neuroscience 13, pp. 95. Cited by: §I, §II-B, §IV-A.
  • [31] K. Simonyan et al. (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556. Cited by: 3rd item.
  • [32] Y. Wu et al. (2019) Direct training for spiking neural networks: faster, larger, better. In

    Proceedings of the AAAI Conference on Artificial Intelligence

    ,
    Vol. 33, pp. 1311–1318. Cited by: §I, TABLE II.