Optimizing the energy consumption of spiking neural networks for neuromorphic applications

12/03/2019 ∙ by Martino Sorbaro, et al. ∙ 0

In the last few years, spiking neural networks have been demonstrated to perform on par with regular convolutional neural networks. Several works have proposed methods to convert a pre-trained CNN to a Spiking CNN without a significant sacrifice of performance. We demonstrate first that quantization-aware training of CNNs leads to better accuracy in SNNs. One of the benefits of converting CNNs to spiking CNNs is to leverage the sparse computation of SNNs and consequently perform equivalent computation at a lower energy consumption. Here we propose an efficient optimization strategy to train spiking networks at lower energy consumption, while maintaining similar accuracy levels. We demonstrate results on the MNIST-DVS and CIFAR-10 datasets.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Since the early 2010s, computer vision has been dominated by the introduction of convolutional neural networks (CNNs), which have yielded unprecedented success in previously challenging tasks such as image recognition, image segmentation or object detection, among others. Considering the theory of neural networks was mostly developed decades earlier, one of the main driving factors behind this evolution was the widespread availability of high-performance computing devices and general purpose Graphic Processing Units (GPU). In parallel with the increase in computational requirements

(Strubell et al., 2019), the last decades have seen a considerable development of portable, miniaturized, battery-powered devices, which pose constraints on the maximum power consumption.

Attempts at reducing the power consumption of traditional deep learning models have been made. Typically, these involve optimizing the network architecture, in order to find more compact networks (with fewer layers, or fewer neurons per layer) that perform equally well as larger networks. One approach is energy-aware pruning, where connections are removed according to a criterion based on energy consumption, and accuracy is restored by fine-tuning of the remaining weights

(Yang et al., 2017; Molchanov et al., 2016). Other work looks for more efficient network structures through a full-fledged architecture search (Cai et al., 2018). The latter work was one of the winners of the Google “Visual Wake Words Challenge” at CVPR 2019, which sought models with memory usage under 250 kB, model size under 250 kB and per-inference multiply-add count (MAC) under 60 millions.

Using spiking neural networks (SNNs) on neuromorphic hardware is an entirely different, and much more radical, approach to the energy consumption problem. In SNNs, like in biological neural networks, neurons communicate with each other through isolated, discrete electrical signals (spikes), as opposed to continuous signals, and work in continuous instead of discrete time. Neuromorphic hardware (Indiveri et al., 2011; Furber, 2016; Thakur et al., 2018; Esser et al., 2016) is specifically designed to run such networks with very low power overhead, with electronic circuits that faithfully reproduce the dynamics of the model in real time, rather than simulating it on traditional von Neumann computers.

The challenge of using SNNs for machine learning tasks, however, is in their training. Mimicking the learning process used in the brain’s spiking networks is not yet feasible, because neither the learning rules, nor the precise fitness functions being optimized are sufficiently well understood, although this is currently a very active area of research

(Marblestone et al., 2016; Richards et al., 2019)

. Supervised learning routines for spiking networks have been developed

(Bohte et al., 2002; Mostafa, 2017; Nicola and Clopath, 2017; Neftci et al., 2019; Shrestha and Orchard, 2018), but are slow and challenging to use. For applications which have little or no dependence on temporal aspects, it is more efficient to train an analog network (i.e., a traditional, non-spiking one) with the same structure, and transfer the learned parameters onto the SNN, which can then operate through rate coding. In particular, the conversion of pre-trained CNNs to SNNs has been shown to be a scalable and reliable process, without much loss in performance (Diehl et al., 2015; Rueckauer et al., 2017; Sengupta et al., 2019). But this approach is still challenging, because the naive use of analog CNN weights does not take into account the specific characteristics and requirements of SNNs. In particular, SNNs are more sensitive than analog networks to the magnitude of the input. Naive weight transfer can, therefore, lead to a silent SNN, or, conversely, to one with unnecessarily high firing rates, which have a high energy cost.

Here, we propose a hybrid training strategy which maintains the efficiency of training analog CNNs, while accounting for the fact that the network is being trained for eventual use in SNNs. Furthermore, we include the energy cost of the network’s computations directly in the loss function during training, in order to minimize it automatically and dynamically. We demonstrate that networks trained with this strategy perform better per Joule of energy utilized. While we demonstrate the benefit of optimizing based on energy consumption, we believe this strategy is extendable to any approach that uses back-propagation to train the network, be it through a spiking network or a non-spiking network.

In the following sections, we will detail the training techniques we devised and applied for these purposes. We will test our networks on two standard problems. The first is the MNIST-DVS dataset of Dynamic Vision Sensor recordings. DVSs are event-based sensors, and, as such, the analysis of their recordings is an ideal application of spike-based neural networks. The second is the standard CIFAR-10 object recognition benchmark, which provides a reasonable comparison on computation cost to non-spiking networks. For each of these tasks, we will demonstrate the energy-accuracy tradeoff of the networks trained with our methods. We show that significant amounts of energy can be saved with a small loss in performance, and conclude that ours is a viable strategy for training neuromorphic systems with a limited power budget.

2 Materials and Methods

In most state-of-the art neuromorphic architectures with time multiplexed units like Merolla et al. (2014); Davies et al. (2018); Furber et al. (2014), the various states need to be fetched from memory and rewritten. Such operations happen every time a neuron receives a synaptic event. Whenever one of these operations is performed, the neuromorphic hardware consumes a certain amount of energy. For instance in Indiveri and Sandamirskaya (2019) the authors show that this energy consumption is usually of the order of . While there are several other processes that consume power on a neuromorphic device, the bulk of the active power on these devices is used by the synaptic operations. Reducing their number is therefore the most natural way to keep energy usage low.

In this paper we explore strategies to lower synaptic operations and evaluate their effect on the network’s computational performance.

2.1 Training strategies

2.1.1 Parameter scaling

By scaling the weights, biases and/or thresholds of neurons in different layers, we can influence the number of spikes generated in each layer, thereby allowing us to tune the synaptic activity of the model. This is easy to do, even with pre-trained weights. For a scale-invariant network, such as any network whose only nonlinearities are ReLUs, this method attains perfect results, because a linear rescaling of the weights causes a linear rescaling of the output, which gives identical results for classification tasks where we select the class that receives the highest activation.

We use this method as a baseline comparison for our results. We chose to rescale the weights of the first convolutional layer of our network by a variable factor :

which is equivalent to a rescaling of the input signal by the same factor. Note that an increase/decrease in the first layer’s output firing rate causes a correspondent increase/decrease in the activation of all the subsequent layers, and thereby reduces the global energy consumption of the whole network.

However, the aforementioned scale-invariance property does not hold for the corresponding spiking network, and small activation values could cause discretization errors, or even yield a completely silent spiking network from a perfectly functional analog network.

2.1.2 Synaptic Operation optimization

We measure the activity of the network, for each layer group, in correspondence with the ReLU operations, which effectively correspond to the spikes from an equivalent SNN. We denote the activity of neuron

in layer as . We define the fan-out of each group of layers, , as the number of units of layer that receive the signal emitted by a single neuron in layer

. This measure is essential in estimating the number of synaptic operations (SynOps)

elicited by each layer:

(1)

We directly add this number to the loss we want to minimize, optionally specifying a target value for the desired number of SynOps:

(2)

where is the cross-entropy loss, is the target label, and is a constant. We will refer to this additional term as SynOp loss. In this work, we will always choose , in order to keep the SynOp loss term normalized independently of .

2.1.3 Quantization-aware training and surrogate gradient

Optimizing for energy consumption with the SynOp loss mentioned above has unintended consequences. During training, the optimizer tries to achieve smaller activations, but cannot account for the fact that, when the activations are too small, discretization errors become more prominent. To solve this issue, we introduce a form of quantization during the training itself, in order to mimic the behaviour of a spiking network in the context of an analog one. To this end, we turn all ReLU activation functions into “quantized” (i.e. step-wise) ReLUs, which additionally truncate the inputs to integers, as follows:

(3)

where indicates the floor operation. This choice introduces a further problem: this function is discontinuous, and its derivative is uniformly zero wherever it is defined. To avoid the zeroing of gradients during the backward pass, we use a surrogate gradient method (Neftci et al., 2019), whereby the gradient of QReLU is approximated with the gradient of a normal ReLU during the backward pass:

(4)

This is not the only way to approximate the gradient of a step-wise function in a meaningful way, and closer approximations are certainly possible; however, we found that this linear approximation works sufficiently well for our purposes.

In this work, we apply QReLUs in combination with the SynOp loss term illustrated in the previous section, but quantization on activations could be independently used for a more accurate training of spiking networks. We note that quantization-aware training in different forms has been used before (Guo, 2018; Hubara et al., 2017)

, but its typical purpose is to sharply decrease the memory consumption of ANNs, by storing both activations and weights as lower-precision numbers (e.g. as int8 instead of the typical float32). PyTorch recently started providing support utilities for this purpose.

111https://pytorch.org/docs/stable/quantization.html

2.2 Spiking network simulations with Sinabs

After training, we tested our trained weights on spiking network simulations. Unlike tests done on analog networks, these are time-dependent simulations, which fully account for the time dynamics of the input spike trains, and closely mimic the behaviour of a neuromorphic hardware implementation, like DynapCNN (Liu et al., 2019). Our simulations are written using the Sinabs Python library222https://aictx.gitlab.io/sinabs/, which uses non-leaky integrate-and-fire neurons with a linear response function. The sub-threshold neuron dynamics of the non-leaky integrate and fire neurons are described as follows:

(5)
(6)

where is the membrane potential of the neuron, is a constant, is the synaptic input current, is a constant input current term, is the synaptic weight matrix and

is a vector of input spike trains. For the results presented in this paper, we assume

without any loss of generality. Upon reaching a spiking threshold the neuron’s membrane potential is reduced by a value (not reset to zero).

As a result of the above, between times and , for a total, slowly varying, input current , the neurons generate a number of spikes given by the following equation:

(7)

In order to simulate the equivalent SNN model on Sinabs, the ANN’s pre-trained weights are directly transferred to the equivalent SNN.

2.3 Digit recognition on DVS recordings

Figure 1: Illustration of the MNIST-DVS dataset, as used in this work, and of the network model we used for the task. A: a single accumulated frame of 3000 spikes, as used for training. B: the corresponding 3000 spikes, in location and time. C: example single-millisecond frames, as sequentially shown to the Sinabs spiking network during the tests presented in figure 2. D: The convolutional network model we used for this task. All convolutional layers and the linear layer are followed by ReLUs. Dropout is used before the linear layer at training time.

2.3.1 Task and Dataset

As a benchmark to assess the performance of the above training methods, we used an image recognition task on real data recorded by a Dynamic Vision Sensor (DVS). Given a spike train generated by the DVS, our spiking networks identify the class to which the object belongs—corresponding to the fastest-firing neuron in the output layer. For this task, we used the MNIST-DVS dataset at scale 16 (Serrano-Gotarredona and Linares-Barranco, 2015; Liu et al., 2016), a collection of DVS recordings where digits from the classic MNIST dataset (LeCun et al., 1998) are shown to the DVS camera as they move on a screen. We divided all recordings in chunks, each containing 3000 spikes. During training and testing on the analog convolutional network, each chunk was shown to the network as a single accumulated frame. During testing on the spiking network simulation, the corresponding spike trains were presented to the network with 1 ms time resolution. This value was chosen to enable reasonable simulation times but could be lowered if needed. (Figure 1A-C), to simulate the real-time event transmission between the DVS and a neuromorphic chip. The network state was reset between the presentation of a data chunk and the next. The polarity of events was ignored. Of the original 10000 recordings (1000 per digit from zero to nine), we set aside as test set.

2.3.2 Network architecture

In order to solve the task mentioned above, we used a simple convolutional neural network, with three 2D convolutional layers (3x3 filters), each followed by an average pooling layer (2x2 filters) and a rectified linear unit. The choice of average pooling is due to the difficulties of implementing max pooling in spiking networks

Rueckauer et al. (2017). The last layer is a linear (fully connected) layer, which outputs the class predictions (Figure 1D). We used a cross-entropy loss function to evaluate the model predictions and optimized the network weights using the Adam optimizer (Kingma and Ba, 2014) with a learning rate of . Bias parameters were deactivated everywhere in the network. A 50% dropout was used just before the fully connected layer at training time. The network was implemented using PyTorch (Paszke et al., 2017).

2.4 Object recognition on CIFAR-10

2.4.1 Task and Dataset

In order to validate the approach on a dataset with higher complexity than MNIST, we also benchmarked our work on CIFAR-10 (Krizhevsky et al., 2009), a visual object classification task. The input images were augmented with random crop and horizontal flip, and then normalized to . A 20% dropout rate was applied to the input layer to further augment the input data.

For the experimental results on this dataset, we directly injected the image pixel analog value to the first layer of convolutions as input current in each simulation time step for timesteps. The magnitude of the current was scaled down by the same value , in order to have an accumulated current, over the whole simulation, equal to the analog input value. The Sinabs simulations were run for time steps, obtaining SynOps and accuracy values. The network state was reset between the presentation of an image and the next.

2.4.2 Network architecture

In order to solve the task mentioned above, we used an All-ConvNet (Springenberg et al., 2014), a 9-layer convolutional network, without bias terms, which has 1.9M parameters in total. The ReLU layers in the model, including the last output layer, were replaced with QReLUs. All the convolutional layers in this network are followed by a dropout layer with a rate of 10%, which not only prevents over-fitting, but also compensates the SNN’s discrete representations of analog values. As illustrated in Springenberg et al. (2014)

, training lasts 350 epochs, and the learning rate is initialized at

and scaled down by a factor of 10 at epochs [200, 250, 300]. We use the Adam optimizer with weight decay of . Note that the model was trained without ReLU on the last output layer, since it is harder to train the classification layer when the outputs are only positive, while the classification accuracy was tested with ReLU on the output layer, in order to have an equivalent network to the spiking model.

2.4.3 SynOps Optimization

Before training the network with QReLU activations, the network was first trained with ReLU to get an initial set of parameters. The network with QReLU was then initialized with the scaled parameters (scaling up by on the first layer). The scaling factor was chosen to initialize the network in a state where enough information is propagated through layers so that the network performs reasonably well. Consequently, the weights of the last weighted layer were scaled by , in order to adapt the classification loss back to its original range.

During testing, we measured the ANN and SNN performance in terms of their accuracy and SynOps, and found a mismatch of SynOps between training and testing. There are two main reasons: 1) The output of a dropout layer (with a dropout rate ) is always scaled down by to compensate the dropped out activations, however the mismatch could be large after a sequence of dropout layers. 2) Due to discrete spike events operated in the network where the order (not only the count) of the spikes matters, the mismatch occurs between the spike count-based analog activation and the actual spiking ones.

To compensate for this mismatch, for all the trained models we tested the performance with both and scaled-up first layer weights. Lastly, we optimized the QReLU-based model with the objective of minimizing the classification error given a target SynOps. We trained 30 models with lower and lower target SynOps, and each model was initialized with the trained weights of the previous one.

3 Results

3.1 The SynOp loss term leads to a reduction in network activity on DVS data

Figure 2: Left: SynOps-accuracy curves computed on a spiking network simulation with Sinabs. Each point represents a different model, trained for a different value of or rescaled by a different value of . The red star represents the original model. The solid lines are smoothed versions of the curves described by the data points, provided as a guide to the eye. Center: SynOps per layer, compared between the baseline model and a selected model trained with the SynOp loss and quantization. This is the model indicated by a black circle in the first panel. Note that the input SynOps depend only on the input data, and cannot be changed by training. Right: comparison of accuracies between the same two models.

We used three methods to reduce the activity of the network, in a way that yields energy savings. First, as a baseline, we trained a traditional CNN using a cross-entropy loss function, and rescaled down the weights of its first layer. This is equivalent to rescaling the input values, and has the effect of proportionally reducing the activity in all subsequent layers of the network. Second, we introduced an additional term in the loss function, the SynOp loss, which directly pushes the estimated number of SynOps to a given value. We trained CNN models, each with a different target number of synaptic operations, independently of each other. Furthermore, excessive reduction of the SynOps leads to the silencing of certain neurons, and other discretization errors, causing an immediate drop in accuracy. To account for this, as a third method, we jointly use the SynOp loss term and quantization-aware training.

We tested our training methods on a real-world use case of spiking neural networks. Dynamic Vision Sensors (DVS) are used in neuromorphic engineering as very-low-power sources of visual information, and are a natural data source for SNNs simulated on neuromorphic hardware. We transferred the weights learned with the methods described above onto a spiking network simulation, and used it to identify the digits presented to the DVS in the MNIST-DVS dataset.

Our results show that adding a requirement on the number of synaptic operations to the loss yields better results in terms of accuracy compared to rescaling weights (Figure 2, orange). Using the SynOp loss together with quantization during training outperforms the simpler methods, allowing for further reduction of the SynOps value with smaller losses in accuracy (Figure 2, blue).

Among the models trained in this way, we selected one with a good balance between energy consumption and accuracy, and used it for a direct comparison with the baseline (that is, weights from an ANN without quantization and no additional loss terms). The second and third panels of figure 2 graphically show the large decrease in the number of synaptic operations required by each layer of our model, and the very small reduction in performance. This particular model brings accuracy down from 96.3% to 95.0%, while reducing the number of synaptic operations from 3.86M to 0.63M, an 84% reduction of the SynOp-related energy consumption.

3.2 The SynOp loss leads to a lower operations count compared to ANNs on CIFAR10

Figure 3: Accuracy vs. SynOps curves on the non-spiking CIFAR-10 task. ‘ANN’ results (red crosses and blue dots) are SynOp count estimations based on the quantized activations of an analogue network. ‘SNN’ results (light-sblue crosses and green dots) are SynOp values obtained by Sinabs simulation. The performance of models fine-tuned with SynOp loss and quantization (dots) shows a clear advantage over weight rescaling (crosses). Right panel: a zoomed-in plot of the dashed region. The good trade-off points listed in table 1 are marked with green ‘+’. The results from other work are marked with purple triangles. The original ANN model, for which the MAC is plotted instead of the SynOps, is marked with a red star.

SNNs are a natural way of working with DVS events, having advantages over ANNs in event-driven processing. However, it is also interesting to highlight the benefits of using SNNs over ANNs in conventional non-spiking computer vision tasks, e.g. CIFAR-10, where SNNs can still offer advantages in power consumption. As stated in Section 2.4 we have trained the network with two approaches: 1) conventional ANN training plus weight scaling as the baseline; 2) further training with QReLU and SynOp Loss for performance optimization.

3.2.1 Weight Scaling

We first trained the analog All-ConvNet on CIFAR-10, attaining an accuracy of 91.37% and a MAC of 306M (red star in Figure 3

). Then, we transferred the trained weights directly on the equivalent SNN and scaled the weights of its first layer to manipulate the overall activity level. This is shown by the light blue crosses in Figure

3: as the SynOp count grows, so does the accuracy. However, the SynOps are around 10 times to the MAC of ANN when the accuracy reaches an acceptable rate of 90.7%. To improve on this result, we fine-tuned this training by adding quantization and the SynOp loss.

A faster way to measure the same quantities is by testing the analog model, with ReLU layers all replaced with QReLU, and count the activation levels instead of the Sinabs spike counts. Estimations based on quantized CNNs are shown as red crosses in Figure 3. The performance on accuracy and SynOps of the analog network and its spiking equivalent are well aligned, showing that quantized activations are a good proxy for the firing rates of the simulated SNN, at least in this regime.

3.2.2 SynOp loss optimization

We further fine-tuned one of the weight-scaled models obtained above, with the addition of quantization-aware training and the SynOp loss. Figure 3 also shows the classification accuracy and SynOps for both quantized-analog and spiking models (blue and green dots respectively) trained with this method.

Multiple SNN test trials achieve better accuracy than the original ANN model (red star, 91.37%), thanks to the further training with QReLU. As the SynOp goes down, the accuracy stays above the original ANN model until 91.43% when SynOps are at 277M (see one of the green ‘+’ in Figure 3). Note that, the SNN has outperformed ANN both on accuracy and operations count, where the number of MAC in the original ANN is 306M. As another good example of accuracy-SynOp trade-off (90.37% at 127M), our model could perform reasonably well, above 90%, by reducing 58% (Syn-MAC ratio is 0.42) of computing operations from the original ANN. Therefore, running the SNN model on neuromorphic hardware will benefit on energy efficiency not only from SynOp’s lower computation cost but also from the significant reduction on operation counts. Additionally, the plot shows how this method outperforms weight scaling in terms of operation counts by roughly a factor of 10 for all accuracy values.

As far as we know, our converted SNN model from the AllConvNet reached the state-of-the-art accuracy at 91.75% among SNN models (see detailed comparison in Table 1 and Figure 3). In addition, our model is the smallest, at 1.9M parameters, while the BinaryConnect model (Rueckauer et al., 2017) is 7 times larger in size and WeightNorm, consisting of a VGG-16 (Sengupta et al., 2019), is 8-fold in size. Although achieving the best accuracy requires a SynOp of 2,179M, this can easily be reduced by 27% by giving up 0.02% in accuracy, see the two green ‘+’ on the top-right of Figure 3. Comparing to the result from Sengupta et al. (2019) (purple triangle on the right of Figure 3), our model achieves 91.47% in accuracy at 368M SynOps, thus only loses 0.08% in accuracy but saves 41% of SynOps and energy. Thanks to the optimization of the SynOp loss, the number of SynOps is continuously pushed down while keeping an appropriate accuracy, e.g. 85.71% at a SynOp of 64M. This result not only outperforms most of the early attempts of SNN models for the CIFAR-10 task (Panda and Roy, 2016; Cao et al., 2015; Hunsberger and Eliasmith, 2015), but also brings down the SynOps to only 1/5 of the MAC and saves 86% energy compared to Rueckauer et al. (2017).

In a breif summary, 1) the energy-aware training strategy pushes down the SynOps 10 times compared to its weight scaling baseline; 2) the QReLU-trained SNN achieves the state-of-the-art accuracy in CIFAR-10 task; and 3) the trade-off performances between accuracy and energy show a significant save in computation cost/energy comparing to exisitng SNN models and the equivalent non-spiking CNN.

SNN Models Net Architecture Best Accuracy Accuracy-SynOps trade-off
N. par. MAC Acc. SynOps
Syn-MAC
ratio
Acc. SynOps
Syn-MAC
ratio
BinaryConnect 14M 616M 90.85 N/A N/A 84.87 460M 0.75
WeightNorm 15M 313M 91.55 618M 1.98 91.55 618M 1.98
Ours 1.9M 306M
91.75
91.73
2179M
1593M
7.12
5.21
91.47
91.43
90.37
85.71
368M
277M
127M
64M
1.20
0.91
0.42
0.21
Table 1: Comparison with best SNN models on CIFAR-10. BinaryConnect: 8-layer ConvNet from Rueckauer et al. (2017); WeightNorm: VGG-16 model from Sengupta et al. (2019). The most efficient model and the best performance are highlighted as bold fonts. Regarding to the comparisons on accuracy-SynOps trade-off, blue colored result in our models refers to the performance close to that of Rueckauer et al. (2017); while green shows the result approximates to Sengupta et al. (2019). And the numbers in bold only highlight the winner in these two comparisons.

4 Discussion and Conclusion

We presented two techniques that significantly improve the energy requirements of machine learning models that run on neuromorphic hardware, while maintaining similar performances.

The first improvement consisted in optimising the energy expenditure by directly adding it to the loss function during training. This method encourages smaller activations in all neurons, which is not in itself an issue in analog models, but can lead to discretization errors, due to the lower firing rates, once the weights are transferred to a spiking network. To solve this problem, we introduced the second improvement; quantization-aware training, whereby the network activity is quantized at each layer, i.e. only integer activations are allowed. Discretising the network’s activity would normally reduce all gradients to zero: we showed that this can be solved by substituting the true gradient with a surrogate.

Applying these two methods together, we achieved an up to ten-fold drop in the number of synaptic operations and the consequent energy consumption in the DVS-MNIST task, with only a minor (1-2%) loss in performance. To demonstrate scalability of this approach, we also show that, as the network grows bigger to solve a much more complex task of CIFAR-10 image classification, the SynOps are reduced to 42% of the MAC, while losing 1% of accuracy (90.37% at 127M). The accuracy-energy tradeoff can be flexibly tuned at training time.

Our work emphasizes the fact that each layer’s activity is weighted differently in its contribution to synaptic activity depending on the layer’s fan-out. Consequently, the learning algorithm could potentially converge to a different set of weights than if one were to simply perform a L1 or L2 cost optimization (Neil et al., 2016) on the total activity of the network.

While training based on static frames is not the optimal approach to leverage all the benefits of spike based computation, it enables fast training with the use of state-of-the-art deep learning tools. In addition, the hybrid strategy to train SNNs based on a target power metric is unique to SNNs. Conversely, optimizing the energy requirement of an ANN/CNN requires modification of the network architecture itself, which can require large amounts of computational resources (Cai et al., 2018). In this work we demonstrated that we can train an SNN to a target energy level without the need to alter the network hyper parameters. The quantization and SynOp-based optimization used in this paper can potentially be applied, beyond the method illustrated here, in more general contexts such as algorithms based on back-propagation through time to reduce power usage.

Such a reduction in power usage can make a large difference when the model is ran on a mobile, battery-powered, neuromorphic device, with potential for a significant impact in the industrial applications.

Author Contributions

SS designed research; QL and SS contributed to the methods; MS, QL, and MB contributed code and performed experiments; all authors wrote the paper.

Funding

This work is supported in part by H2020 ECSEL grant TEMPO (826655) and by aiCTX AG. The authors performed this work as part of their duties at aiCTX AG.

Acknowledgments

The authors would like to thank Mr. Felix Bauer, Mr. Ole Richter, Dr. Dylan Muir and Dr. Ning Qiao for their support and feedback on this work.

Data and Code Availability

The third-party datasets used in this study are available from their respective authors, cited in the main text. The Python/PyTorch code used for training and analysis is publicly available at gitlab.com/aiCTX/synoploss. Reuse and feedback are encouraged, within the terms of the license provided.

References

  • Bohte et al. (2002) Bohte, S. M., Kok, J. N., and La Poutre, H. (2002).

    Error-backpropagation in temporally encoded networks of spiking neurons.

    Neurocomputing 48, 17–37
  • Cai et al. (2018) Cai, H., Zhu, L., and Han, S. (2018). Proxylessnas: Direct neural architecture search on target task and hardware. arXiv preprint arXiv:1812.00332
  • Cao et al. (2015) Cao, Y., Chen, Y., and Khosla, D. (2015). Spiking deep convolutional neural networks for energy-efficient object recognition. International Journal of Computer Vision 113, 54–66
  • Davies et al. (2018) Davies, M., Srinivasa, N., Lin, T.-H., Chinya, G., Cao, Y., Choday, S. H., et al. (2018). Loihi: A neuromorphic manycore processor with on-chip learning. IEEE Micro 38, 82–99
  • Diehl et al. (2015) Diehl, P. U., Neil, D., Binas, J., Cook, M., Liu, S.-C., and Pfeiffer, M. (2015).

    Fast-classifying, high-accuracy spiking deep networks through weight and threshold balancing.

    In 2015 International Joint Conference on Neural Networks (IJCNN) (IEEE), 1–8
  • Esser et al. (2016) Esser, S. K., Merolla, P. A., Arthur, J. V., Cassidy, A. S., Appuswamy, R., Andreopoulos, A., et al. (2016). Convolutional networks for fast energy-efficient neuromorphic computing. Proc. Nat. Acad. Sci. USA 113, 11441–11446
  • Furber (2016) Furber, S. (2016). Large-scale neuromorphic computing systems. Journal of neural engineering 13, 051001
  • Furber et al. (2014) Furber, S. B., Galluppi, F., Temple, S., and Plana, L. A. (2014). The spinnaker project. Proceedings of the IEEE 102, 652–665
  • Guo (2018) Guo, Y. (2018). A survey on methods and theories of quantized neural networks. arXiv preprint arXiv:1808.04752
  • Hubara et al. (2017) Hubara, I., Courbariaux, M., Soudry, D., El-Yaniv, R., and Bengio, Y. (2017). Quantized neural networks: Training neural networks with low precision weights and activations. The Journal of Machine Learning Research 18, 6869–6898
  • Hunsberger and Eliasmith (2015) Hunsberger, E. and Eliasmith, C. (2015). Spiking deep networks with LIF neurons. arXiv preprint arXiv:1510.08829
  • Indiveri et al. (2011) Indiveri, G., Linares-Barranco, B., Hamilton, T. J., Van Schaik, A., Etienne-Cummings, R., Delbruck, T., et al. (2011). Neuromorphic silicon neuron circuits. Frontiers in neuroscience 5, 73
  • Indiveri and Sandamirskaya (2019) Indiveri, G. and Sandamirskaya, Y. (2019). The importance of space and time for signal processing in neuromorphic agents: The challenge of developing low-power, autonomous agents that interact with the environment. IEEE Signal Processing Magazine 36, 16–28
  • Kingma and Ba (2014) Kingma, D. P. and Ba, J. (2014). Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980
  • Krizhevsky et al. (2009) Krizhevsky, A., Hinton, G., et al. (2009). Learning multiple layers of features from tiny images. Tech. rep., Citeseer
  • LeCun et al. (1998) LeCun, Y., Cortes, C., and Burges, C. J. (1998).

    The mnist database of handwritten digits, 1998.

    URL http://yann.lecun.com/exdb/mnist 10, 34
  • Liu et al. (2016) Liu, Q., Pineda-García, G., Stromatias, E., Serrano-Gotarredona, T., and Furber, S. B. (2016). Benchmarking spike-based visual recognition: a dataset and evaluation. Frontiers in neuroscience 10, 496
  • Liu et al. (2019) Liu, Q., Richter, O., Nielsen, C., Sheik, S., Indiveri, G., and Qiao, N. (2019).

    Live demonstration: Face recognition on an ultra-low power event-driven convolutional neural network asic.

    In

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops

    . 0–0
  • Marblestone et al. (2016) Marblestone, A. H., Wayne, G., and Kording, K. P. (2016). Toward an integration of deep learning and neuroscience. Frontiers in computational neuroscience 10, 94
  • Merolla et al. (2014) Merolla, P. A., Arthur, J. V., Alvarez-Icaza, R., Cassidy, A. S., Sawada, J., Akopyan, F., et al. (2014). A million spiking-neuron integrated circuit with a scalable communication network and interface. Science 345, 668–673. doi:10.1126/science.1254642
  • Molchanov et al. (2016) Molchanov, P., Tyree, S., Karras, T., Aila, T., and Kautz, J. (2016). Pruning convolutional neural networks for resource efficient inference. arXiv preprint arXiv:1611.06440
  • Mostafa (2017) Mostafa, H. (2017). Supervised learning based on temporal coding in spiking neural networks. IEEE transactions on neural networks and learning systems 29, 3227–3235
  • Neftci et al. (2019) Neftci, E. O., Mostafa, H., and Zenke, F. (2019). Surrogate gradient learning in spiking neural networks. arXiv preprint arXiv:1901.09948
  • Neil et al. (2016) Neil, D., Pfeiffer, M., and Liu, S.-C. (2016). Learning to be efficient: Algorithms for training low-latency, low-compute deep spiking neural networks. In ACM Symposium on Applied Computing (Proceedings of the 31st Annual ACM Symposium on Applied Computing)
  • Nicola and Clopath (2017) Nicola, W. and Clopath, C. (2017). Supervised learning in spiking neural networks with force training. Nature communications 8, 2208
  • Panda and Roy (2016) Panda, P. and Roy, K. (2016). Unsupervised regenerative learning of hierarchical features in spiking deep networks for object recognition. In 2016 International Joint Conference on Neural Networks (IJCNN) (IEEE), 299–306
  • Paszke et al. (2017) Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., et al. (2017). Automatic differentiation in PyTorch. In NIPS Autodiff Workshop
  • Richards et al. (2019) Richards, B. A., Lillicrap, T. P., Beaudoin, P., Bengio, Y., Bogacz, R., Christensen, A., et al. (2019). A deep learning framework for neuroscience. Nature neuroscience 22, 1761–1770
  • Rueckauer et al. (2017) Rueckauer, B., Lungu, I.-A., Hu, Y., Pfeiffer, M., and Liu, S.-C. (2017). Conversion of continuous-valued deep networks to efficient event-driven networks for image classification. Frontiers in neuroscience 11, 682
  • Sengupta et al. (2019) Sengupta, A., Ye, Y., Wang, R., Liu, C., and Roy, K. (2019). Going deeper in spiking neural networks: VGG and residual architectures. Frontiers in neuroscience 13
  • Serrano-Gotarredona and Linares-Barranco (2015) Serrano-Gotarredona, T. and Linares-Barranco, B. (2015). Poker-dvs and mnist-dvs. their history, how they were made, and other details. Frontiers in neuroscience 9, 481
  • Shrestha and Orchard (2018) Shrestha, S. B. and Orchard, G. (2018). Slayer: Spike layer error reassignment in time. In Advances in Neural Information Processing Systems. 1412–1421
  • Springenberg et al. (2014) Springenberg, J. T., Dosovitskiy, A., Brox, T., and Riedmiller, M. (2014). Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806
  • Strubell et al. (2019) Strubell, E., Ganesh, A., and McCallum, A. (2019). Energy and policy considerations for deep learning in nlp. arXiv preprint arXiv:1906.02243
  • Thakur et al. (2018) Thakur, C. S. T., Molin, J., Cauwenberghs, G., Indiveri, G., Kumar, K., Qiao, N., et al. (2018). Large-scale neuromorphic spiking array processors: A quest to mimic the brain. Frontiers in neuroscience 12, 891
  • Yang et al. (2017) Yang, T.-J., Chen, Y.-H., and Sze, V. (2017). Designing energy-efficient convolutional neural networks using energy-aware pruning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5687–5695