Efficient Computation in Adaptive Artificial Spiking Neural Networks

10/13/2017 ∙ by Davide Zambrano, et al. ∙ 0

Artificial Neural Networks (ANNs) are bio-inspired models of neural computation that have proven highly effective. Still, ANNs lack a natural notion of time, and neural units in ANNs exchange analog values in a frame-based manner, a computationally and energetically inefficient form of communication. This contrasts sharply with biological neurons that communicate sparingly and efficiently using binary spikes. While artificial Spiking Neural Networks (SNNs) can be constructed by replacing the units of an ANN with spiking neurons, the current performance is far from that of deep ANNs on hard benchmarks and these SNNs use much higher firing rates compared to their biological counterparts, limiting their efficiency. Here we show how spiking neurons that employ an efficient form of neural coding can be used to construct SNNs that match high-performance ANNs and exceed state-of-the-art in SNNs on important benchmarks, while requiring much lower average firing rates. For this, we use spike-time coding based on the firing rate limiting adaptation phenomenon observed in biological spiking neurons. This phenomenon can be captured in adapting spiking neuron models, for which we derive the effective transfer function. Neural units in ANNs trained with this transfer function can be substituted directly with adaptive spiking neurons, and the resulting Adaptive SNNs (AdSNNs) can carry out inference in deep neural networks using up to an order of magnitude fewer spikes compared to previous SNNs. Adaptive spike-time coding additionally allows for the dynamic control of neural coding precision: we show how a simple model of arousal in AdSNNs further halves the average required firing rate and this notion naturally extends to other forms of attention. AdSNNs thus hold promise as a novel and efficient model for neural computation that naturally fits to temporally continuous and asynchronous applications.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 4

page 7

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

Methods

Adaptive Spiking Neurons.

In the ASN, the kernel is computed as the convolution of a spike-triggered postsynaptic current (PSC) with a filter , decaying exponentially respectively with time constants and ; and the input signal is similarly computed from a current injection . The adaptation kernel decays with time-constant .

The AdSNNs’s are created by converting standard Deep Neural Networks[2] trained with a mathematically derived transfer function of the ASN (full derivation in SI), defined as the function that maps the activation to the average post-synaptic contribution. This has the form:

where,

are constants computed from the neuron parameters setting, and defines the spike size. Here, by normalising to when , becomes a scaling factor for the network’s trained weights, allowing communication with binary spikes.

Adaptive Spiking Neural Networks (AdSNNs).

Analog units using as their transfer function, AANs, in trained ANNs can be replaced directly and without modification with ASNs. In the presented results, the adaptation kernel decays with , the membrane filter with , the refractory response with and the PSP with , all roughly corresponding to match biological neurons, and

. Batch Normalization (BN)

[19]

is used to avoid the vanishing gradient problem

[20] for saturating transfer functions like half-sigmoids and to improve the network training and regularisation. After training, the BN layers are removed and integrated into the weights’ computation [10]. A BN-AAN layer is also used as a first layer in all the networks to convert the inputs into spikes. When converting, biases are added to the post-synaptic activation. Max and Average Pooling layers are converted by merging them into the next ASN-layer: the layer activation is computed from incoming spikes, then the pooling operator is applied and the ASN-layer computes spikes as output. The last ASN layer acts as a smoothed read-out layer with , where spikes are converted into analog values for classification. The classification is performed as in the ANN network, usually using SoftMax: at every time-step the output with highest value is considered the result of the classification.

ANN training.

We trained ANN with AANs on widely used datasets: for feedforward ANNs, IRIS and SONAR; and for deep convolutional ANNs: MNIST, CIFAR-10, CIFAR-100 and ILSVRC-2012. All the ANNs are trained using Keras

222https://keras.io/

with Tensorflow

333https://www.tensorflow.org/

as its backend. We used categorical cross-entropy as a loss function with Adam

[21] as the optimiser, except for ILSVRC-2012 where we used Stochastic Gradient Decent with Nesterov (learning rate , decay and momentum

). Consistent with the aim of converting high performance ANNs into AdSNNs, for each dataset, we selected the model at the training epoch where it performed best on the test set.

We trained a feedforward ANN on the IRIS dataset: IRIS is a classical non-linearly separable toy dataset containing classes – types of plants – with instances each, to be classified from input attributes. Similarly, for the SONAR dataset[22] we used a ANN to classify entries of sonar signals divided in energy measurements in a particular frequency band in two classes: metal cylinder or simple rocks. We trained both ANNs for epochs and obtained competitive performance.

The deep convolutional ANNs are trained on standard image classification problems with incremental difficulty. The simplest is the MNIST dataset[23], where images of handwritten digits have to be classified. We used a convolutional ANNs composed of , where is a convolutional layer with feature maps and a kernel size of ,

is a max pooling layer with kernel size

, and is a dense layer with neurons. Images are pre-normalised between and , and the convolutional ANN was trained for epochs.

The CIFAR-10 and CIFAR-100 data sets[24] are harder benchmarks, where colour images have to be classified in 10 or 100 categories respectively. We use a VGG-like architecture[25] with layers: for CIFAR-10 and for CIFAR-100. Dropout[26] was used in the non-pooling layers ( in the top fully-connected layers, and for the first epochs and for the last in the others). Images were converted from RGB to YCbCr and then normalised between and .

The ImageNet Large-Scale Visual Recognition Challenge (ILSVRC)[27] is a large-scale image classification task with over 15 million labeled high-resolution images belonging to roughly categories. The 2012 task-1 challenge was used, a subset of ImageNet with about images in each of categories. We trained a ResNet-18 architecture in the Identity-mapping variant [28] for 100 epochs and the top-1 error rate is reported. As in[25], we rescaled the images to a resolution of pixels and then performed random cropping during training and centre cropping for testing.

AdSNN evaluation.

The AdSNNs are evaluated in simulations with 1 timesteps, where inputs are persistently presented for (identical to the method used in [2]). The Firing Rate (FR) in Table 1 is computed as the average number of spikes emitted by a neuron, for each image, in this time window. The time window is chosen such that all output neurons reach a stable value; we defined the Matching Time (MT) as the time to which of the maximum classification accuracy is reached for each simulation. From the MT to the end of the time interval, the standard deviation of the accuracy is computed to evaluate the stability of the network’s response. Each dataset was evaluated for a range of values of and the minimum firing rate needed to match the ANN performance is reported. All the AdSNNs simulations are run on MATLAB in a modified version of the MatConvNet framework444http://www.vlfeat.org/matconvnet/.

Arousal.

For Arousal, we highlight uncertain inputs by increasing firing-rate and corresponding precision. The network is simulated with set to , the standard low-precision parameter; if the input is selected by the arousal mechanism, this parameter is set to high precision value: (and is changed identically). Selection is determined by accumulating the winning and the 2nd-highest outputs for starting from a pre-defined specific for each dataset. If the difference between these two outputs exceeds a threshold , the input is not highlighted –

is estimated by observing those images that are not correctly classified when the precision is decreased on the training set. The Arousal method selects more images than needed: we defined Selectivity as the proportion of highlighted images (Table SI1 ). In addition,

increases linearly with the accumulation time interval as , while Selectivity decreases exponentially. We report results for the minimum firing rate recorded for each dataset (Fig. 3c), which is obtained at a specific : in fact, starting from very low precision leads to higher Selectivity, which in turn results in a higher average firing rate. The parameter is chosen as the lowest precision needed to match the ANN performance. Table SI1 reports the values of Selectivity, for each dataset. Note that, since deeper networks need more time to settle to the high precision level, we extended the simulation time for these networks (see Table 1).

Supplementary Information

To convert a trained Artificial Neural Network (ANN) into an Adaptive Spiking Neural Network (AdSNN), the transfer function of the ANN units needs to match the behaviour of the Adaptive Spiking Neuron (ASN). The ASN transfer function is derived for the general case of using an approximation of the ASN behaviour.

Derivation of the ASN activation function

We consider a spiking neuron with activation that is constant over time, and the refractory response approximates using a variable threshold . Whenever , the neuron emits a spike of fixed height

to the synapses connecting to the target neurons, and a value of

is added to , with the time of the spike. At the same time, the threshold is increased by . The post-synaptic current (PSC) in the target neuron is then given by , which is convolved with the membrane filter to obtain the contribution to the post-synaptic potential; a normalized exponential filter with short time constants will just smooth the high-frequency components of . We derive the transfer function that maps the activation to the PSC of the target neuron. We recall the ASN model here, elaborating the SRM to include the current-to-potential filtering:

PSC: (4)
activation: (5)
threshold: (6)
refractory response: (7)

where denotes the timing of incoming spikes that the neuron receives and the timing of outgoing spikes.

Since the variables of the ASN decay exponentially, they converge asymptotically. For a given fixed size current injection, we consider a neuron that has stabilised around an equilibrium, that is and at the time of a spike always reach the same values. Let these values be denoted as and respectively. Then, and for all . The PSC also always declines to the same value, , before it receives a new spike. Setting for the last time that there was a spike, we can rewrite our ASN equations, Equations (4), (5), (6) and (7), for and to:

The transfer function of the ASN is a function of the value of ; should be a bit larger than since that is the lowest value of , and we are interested in the average value of between two spikes: .

Since we are in a stable situation, the time between each spike is fixed; we define this time as . Thus, if the last spike occurred at , the next spike should happen at . This implies that and at must have reached their minimal values and respectively.

To obtain the activation function

, we solve the following set of equations:

and by noting that the neuron only emits a spike when , we also have:

We first notice:

(8)

We now want an expression for :

We can rewrite this to:

(9)

Using equations and , we get:

Inserting Equation 9 gives:

This can be rewritten to:

(10)

Approximation of the AAN activation function

In the general case of , a (second order) Taylor series expansion can be used to approximate the exponential function:

for close to . We can use this in our previous equation:

We need a few steps to isolate :

This leads to our expression for

We now insert this expression in Equation 8 and get:

To make sure that our activation function is at we choose our activation function to be:

(11)

for and for with .

Parameters used in the Arousal attention method

DataSet Selectivity(%) (ms)
IRIS
SONAR
MNIST
CIFAR-10
CIFAR-100
LSVRC-2012
Table SI4: Parameters used in for the Arousal attention method.

Supplementary results

Figure SI4: Effects of on MNIST. a. The classification error over time is shown for increasing values of s, ms. Note that changing changes the transfer function shape, and thus different networks were trained. The plotted results are obtained with . MT visibly increases for longer s. b. Networks’ firing rates. Longer s require less spikes to approximate a signal.
Figure SI5:

Classification error over time. The effect of the Arousal method on the classification error is reported for MNIST, CIFAR-10, CIFAR-100, and the Imagenet LSVRC-2012. The vertical line denotes the moment in time,

, where the outputs start being accumulated. Selection for Arousal is then determined 50ms later. The increase of the firing rate on selected images causes a brief loss of accuracy, after which a lower classification error is reached.

References

  • [1] Cao, Y., Chen, Y. & Khosla, D. Spiking deep convolutional neural networks for energy-efficient object recognition.

    International Journal of Computer Vision

    113, 54–66 (2015).
  • [2] Diehl, P. et al. Fast-classifying, high-accuracy spiking deep networks through weight and threshold balancing. In IEEE International Joint Conference on Neural Networks (IJCNN), 1–8 (2015).
  • [3] Attwell, D. & Laughlin, S. An energy budget for signaling in the grey matter of the brain. J. Cerebral Blood Flow & Metabolism 21, 1133–1145 (2001).
  • [4] Fairhall, A., Lewen, G., Bialek, W. & de Ruyter van Steveninck, R. Efficiency and ambiguity in an adaptive neural code. Nature 412, 787–792 (2001).
  • [5] Gerstner, W. & Kistler, W. Spiking Neuron Models: Single Neurons, Populations, Plasticity (Cambridge University Press, 2002).
  • [6] Bohte, S. Efficient Spike-Coding with Multiplicative Adaptation in a Spike Response Model. In Advances in Neural Information Processing (NIPS), vol. 25, 1844–1852 (2012).
  • [7] Pozzorini, C., Naud, R., Mensi, S. & Gerstner, W. Temporal whitening by power-law adaptation in neocortical neurons. Nature Neuroscience 16, 942–948 (2013).
  • [8] Yoon, Y. LIF and Simplified SRM Neurons Encode Signals Into Spikes via a Form of Asynchronous Pulse Sigma-Delta Modulation. IEEE Transaction on Neural Networks and Learning Systems (TNNLS) 1–14 (2016).
  • [9] Hunsberger, E. & Eliasmith, C. Training spiking deep networks for neuromorphic hardware. preprint arXiv:1611.05141 (2016).
  • [10] Rueckauer, B., Lungu, I.-A., Hu, Y. & Pfeiffer, M. Theory and tools for the conversion of analog to spiking convolutional neural networks. preprint arXiv:1612.04052 (2016).
  • [11] Esser, S. K. et al. Convolutional networks for fast, energy-efficient neuromorphic computing. Proceedings of the National Academy of Sciences 201604850 (2016).
  • [12] Roelfsema, P., Lamme, V. & Spekreijse, H. Object-based attention in the primary visual cortex of the macaque monkey. Nature (1998).
  • [13] Saproo, S. & Serences, J. T. Spatial attention improves the quality of population codes in human visual cortex. Journal of neurophysiology 104, 885–895 (2010).
  • [14] Friston, K. The free-energy principle: a unified brain theory? Nature Reviews Neuroscience 11, 127–138 (2010).
  • [15] Boerlin, M. & Denève, S. Spike-based population coding and working memory. PLoS computational biology 7, e1001080 (2011).
  • [16] Denève, S. & Machens, C. K. Efficient codes and balanced networks. Nature Neuroscience 19, 375–382 (2016).
  • [17] Abbott, L. & Regehr, W. G. Synaptic computation. Nature 431, 796 (2004).
  • [18] Furber, S. B. et al. Overview of the spinnaker system architecture. IEEE Transactions on Computers 62, 2454–2467 (2013).
  • [19] Ioffe, S. & Szegedy, C. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In

    International Conference on Machine Learning (ICML)

    , 448–456 (2015).
  • [20] Hochreiter, S. The vanishing gradient problem during learning recurrent neural nets and problem solutions. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems 6, 107–116 (1998).
  • [21] Kingma, D. & Ba, J. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).
  • [22] Gorman, R. & Sejnowski, T. Analysis of hidden units in a layered network trained to classify sonar targets. Neural Networks 1, 75–89 (1988).
  • [23] Lecun, Y., Bottou, L., Bengio, Y. & Haffner, P. Gradient-based learning applied to document recognition. Proceedings of the IEEE 86, 2278–2324 (1998).
  • [24] Krizhevsky, A. Learning multiple layers of features from tiny images. Master’s thesis, University of Toronto, Canada (2009).
  • [25] Simonyan, K. & Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).
  • [26] Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I. & Salakhutdinov, R. Dropout: A simple way to prevent neural networks from overfitting. The Journal of Machine Learning Research 15, 1929–1958 (2014).
  • [27] Russakovsky, O. et al. ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision (IJCV) 115, 211–252 (2015).
  • [28] He, K., Zhang, X., Ren, S. & Sun, J. Deep residual learning for image recognition. In

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

    , 770–778 (2016).