Methods
Adaptive Spiking Neurons.
In the ASN, the kernel is computed as the convolution of a spiketriggered postsynaptic current (PSC) with a filter , decaying exponentially respectively with time constants and ; and the input signal is similarly computed from a current injection . The adaptation kernel decays with timeconstant .
The AdSNNs’s are created by converting standard Deep Neural Networks[2] trained with a mathematically derived transfer function of the ASN (full derivation in SI), defined as the function that maps the activation to the average postsynaptic contribution. This has the form:
where,
are constants computed from the neuron parameters setting, and defines the spike size. Here, by normalising to when , becomes a scaling factor for the network’s trained weights, allowing communication with binary spikes.
Adaptive Spiking Neural Networks (AdSNNs).
Analog units using as their transfer function, AANs, in trained ANNs can be replaced directly and without modification with ASNs. In the presented results, the adaptation kernel decays with , the membrane filter with , the refractory response with and the PSP with , all roughly corresponding to match biological neurons, and
. Batch Normalization (BN)
[19]is used to avoid the vanishing gradient problem
[20] for saturating transfer functions like halfsigmoids and to improve the network training and regularisation. After training, the BN layers are removed and integrated into the weights’ computation [10]. A BNAAN layer is also used as a first layer in all the networks to convert the inputs into spikes. When converting, biases are added to the postsynaptic activation. Max and Average Pooling layers are converted by merging them into the next ASNlayer: the layer activation is computed from incoming spikes, then the pooling operator is applied and the ASNlayer computes spikes as output. The last ASN layer acts as a smoothed readout layer with , where spikes are converted into analog values for classification. The classification is performed as in the ANN network, usually using SoftMax: at every timestep the output with highest value is considered the result of the classification.ANN training.
We trained ANN with AANs on widely used datasets: for feedforward ANNs, IRIS and SONAR; and for deep convolutional ANNs: MNIST, CIFAR10, CIFAR100 and ILSVRC2012. All the ANNs are trained using Keras
^{2}^{2}2https://keras.io/with Tensorflow
^{3}^{3}3https://www.tensorflow.org/as its backend. We used categorical crossentropy as a loss function with Adam
[21] as the optimiser, except for ILSVRC2012 where we used Stochastic Gradient Decent with Nesterov (learning rate , decay and momentum). Consistent with the aim of converting high performance ANNs into AdSNNs, for each dataset, we selected the model at the training epoch where it performed best on the test set.
We trained a feedforward ANN on the IRIS dataset: IRIS is a classical nonlinearly separable toy dataset containing classes – types of plants – with instances each, to be classified from input attributes. Similarly, for the SONAR dataset[22] we used a ANN to classify entries of sonar signals divided in energy measurements in a particular frequency band in two classes: metal cylinder or simple rocks. We trained both ANNs for epochs and obtained competitive performance.
The deep convolutional ANNs are trained on standard image classification problems with incremental difficulty. The simplest is the MNIST dataset[23], where images of handwritten digits have to be classified. We used a convolutional ANNs composed of , where is a convolutional layer with feature maps and a kernel size of ,
is a max pooling layer with kernel size
, and is a dense layer with neurons. Images are prenormalised between and , and the convolutional ANN was trained for epochs.The CIFAR10 and CIFAR100 data sets[24] are harder benchmarks, where colour images have to be classified in 10 or 100 categories respectively. We use a VGGlike architecture[25] with layers: for CIFAR10 and for CIFAR100. Dropout[26] was used in the nonpooling layers ( in the top fullyconnected layers, and for the first epochs and for the last in the others). Images were converted from RGB to YCbCr and then normalised between and .
The ImageNet LargeScale Visual Recognition Challenge (ILSVRC)[27] is a largescale image classification task with over 15 million labeled highresolution images belonging to roughly categories. The 2012 task1 challenge was used, a subset of ImageNet with about images in each of categories. We trained a ResNet18 architecture in the Identitymapping variant [28] for 100 epochs and the top1 error rate is reported. As in[25], we rescaled the images to a resolution of pixels and then performed random cropping during training and centre cropping for testing.
AdSNN evaluation.
The AdSNNs are evaluated in simulations with 1 timesteps, where inputs are persistently presented for (identical to the method used in [2]). The Firing Rate (FR) in Table 1 is computed as the average number of spikes emitted by a neuron, for each image, in this time window. The time window is chosen such that all output neurons reach a stable value; we defined the Matching Time (MT) as the time to which of the maximum classification accuracy is reached for each simulation. From the MT to the end of the time interval, the standard deviation of the accuracy is computed to evaluate the stability of the network’s response. Each dataset was evaluated for a range of values of and the minimum firing rate needed to match the ANN performance is reported. All the AdSNNs simulations are run on MATLAB in a modified version of the MatConvNet framework^{4}^{4}4http://www.vlfeat.org/matconvnet/.
Arousal.
For Arousal, we highlight uncertain inputs by increasing firingrate and corresponding precision. The network is simulated with set to , the standard lowprecision parameter; if the input is selected by the arousal mechanism, this parameter is set to high precision value: (and is changed identically). Selection is determined by accumulating the winning and the 2ndhighest outputs for starting from a predefined specific for each dataset. If the difference between these two outputs exceeds a threshold , the input is not highlighted –
is estimated by observing those images that are not correctly classified when the precision is decreased on the training set. The Arousal method selects more images than needed: we defined Selectivity as the proportion of highlighted images (Table SI1 ). In addition,
increases linearly with the accumulation time interval as , while Selectivity decreases exponentially. We report results for the minimum firing rate recorded for each dataset (Fig. 3c), which is obtained at a specific : in fact, starting from very low precision leads to higher Selectivity, which in turn results in a higher average firing rate. The parameter is chosen as the lowest precision needed to match the ANN performance. Table SI1 reports the values of Selectivity, for each dataset. Note that, since deeper networks need more time to settle to the high precision level, we extended the simulation time for these networks (see Table 1).Supplementary Information
To convert a trained Artificial Neural Network (ANN) into an Adaptive Spiking Neural Network (AdSNN), the transfer function of the ANN units needs to match the behaviour of the Adaptive Spiking Neuron (ASN). The ASN transfer function is derived for the general case of using an approximation of the ASN behaviour.
Derivation of the ASN activation function
We consider a spiking neuron with activation that is constant over time, and the refractory response approximates using a variable threshold . Whenever , the neuron emits a spike of fixed height
to the synapses connecting to the target neurons, and a value of
is added to , with the time of the spike. At the same time, the threshold is increased by . The postsynaptic current (PSC) in the target neuron is then given by , which is convolved with the membrane filter to obtain the contribution to the postsynaptic potential; a normalized exponential filter with short time constants will just smooth the highfrequency components of . We derive the transfer function that maps the activation to the PSC of the target neuron. We recall the ASN model here, elaborating the SRM to include the currenttopotential filtering:PSC:  (4)  
activation:  (5)  
threshold:  (6)  
refractory response:  (7) 
where denotes the timing of incoming spikes that the neuron receives and the timing of outgoing spikes.
Since the variables of the ASN decay exponentially, they converge asymptotically. For a given fixed size current injection, we consider a neuron that has stabilised around an equilibrium, that is and at the time of a spike always reach the same values. Let these values be denoted as and respectively. Then, and for all . The PSC also always declines to the same value, , before it receives a new spike. Setting for the last time that there was a spike, we can rewrite our ASN equations, Equations (4), (5), (6) and (7), for and to:
The transfer function of the ASN is a function of the value of ; should be a bit larger than since that is the lowest value of , and we are interested in the average value of between two spikes: .
Since we are in a stable situation, the time between each spike is fixed; we define this time as . Thus, if the last spike occurred at , the next spike should happen at . This implies that and at must have reached their minimal values and respectively.
To obtain the activation function
, we solve the following set of equations:and by noting that the neuron only emits a spike when , we also have:
We first notice:
(8) 
We now want an expression for :
We can rewrite this to:
(9) 
Using equations and , we get:
Inserting Equation 9 gives:
This can be rewritten to:
(10) 
Approximation of the AAN activation function
In the general case of , a (second order) Taylor series expansion can be used to approximate the exponential function:
for close to . We can use this in our previous equation:
We need a few steps to isolate :
This leads to our expression for
We now insert this expression in Equation 8 and get:
To make sure that our activation function is at we choose our activation function to be:
(11) 
for and for with .
Parameters used in the Arousal attention method
DataSet  Selectivity(%)  (ms)  

IRIS  
SONAR  
MNIST  
CIFAR10  
CIFAR100  
LSVRC2012 
Supplementary results
References

[1]
Cao, Y., Chen, Y. &
Khosla, D.
Spiking deep convolutional neural networks for
energyefficient object recognition.
International Journal of Computer Vision
113, 54–66 (2015).  [2] Diehl, P. et al. Fastclassifying, highaccuracy spiking deep networks through weight and threshold balancing. In IEEE International Joint Conference on Neural Networks (IJCNN), 1–8 (2015).
 [3] Attwell, D. & Laughlin, S. An energy budget for signaling in the grey matter of the brain. J. Cerebral Blood Flow & Metabolism 21, 1133–1145 (2001).
 [4] Fairhall, A., Lewen, G., Bialek, W. & de Ruyter van Steveninck, R. Efficiency and ambiguity in an adaptive neural code. Nature 412, 787–792 (2001).
 [5] Gerstner, W. & Kistler, W. Spiking Neuron Models: Single Neurons, Populations, Plasticity (Cambridge University Press, 2002).
 [6] Bohte, S. Efficient SpikeCoding with Multiplicative Adaptation in a Spike Response Model. In Advances in Neural Information Processing (NIPS), vol. 25, 1844–1852 (2012).
 [7] Pozzorini, C., Naud, R., Mensi, S. & Gerstner, W. Temporal whitening by powerlaw adaptation in neocortical neurons. Nature Neuroscience 16, 942–948 (2013).
 [8] Yoon, Y. LIF and Simplified SRM Neurons Encode Signals Into Spikes via a Form of Asynchronous Pulse SigmaDelta Modulation. IEEE Transaction on Neural Networks and Learning Systems (TNNLS) 1–14 (2016).
 [9] Hunsberger, E. & Eliasmith, C. Training spiking deep networks for neuromorphic hardware. preprint arXiv:1611.05141 (2016).
 [10] Rueckauer, B., Lungu, I.A., Hu, Y. & Pfeiffer, M. Theory and tools for the conversion of analog to spiking convolutional neural networks. preprint arXiv:1612.04052 (2016).
 [11] Esser, S. K. et al. Convolutional networks for fast, energyefficient neuromorphic computing. Proceedings of the National Academy of Sciences 201604850 (2016).
 [12] Roelfsema, P., Lamme, V. & Spekreijse, H. Objectbased attention in the primary visual cortex of the macaque monkey. Nature (1998).
 [13] Saproo, S. & Serences, J. T. Spatial attention improves the quality of population codes in human visual cortex. Journal of neurophysiology 104, 885–895 (2010).
 [14] Friston, K. The freeenergy principle: a unified brain theory? Nature Reviews Neuroscience 11, 127–138 (2010).
 [15] Boerlin, M. & Denève, S. Spikebased population coding and working memory. PLoS computational biology 7, e1001080 (2011).
 [16] Denève, S. & Machens, C. K. Efficient codes and balanced networks. Nature Neuroscience 19, 375–382 (2016).
 [17] Abbott, L. & Regehr, W. G. Synaptic computation. Nature 431, 796 (2004).
 [18] Furber, S. B. et al. Overview of the spinnaker system architecture. IEEE Transactions on Computers 62, 2454–2467 (2013).

[19]
Ioffe, S. & Szegedy, C.
Batch normalization: Accelerating deep network
training by reducing internal covariate shift.
In
International Conference on Machine Learning (ICML)
, 448–456 (2015).  [20] Hochreiter, S. The vanishing gradient problem during learning recurrent neural nets and problem solutions. International Journal of Uncertainty, Fuzziness and KnowledgeBased Systems 6, 107–116 (1998).
 [21] Kingma, D. & Ba, J. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).
 [22] Gorman, R. & Sejnowski, T. Analysis of hidden units in a layered network trained to classify sonar targets. Neural Networks 1, 75–89 (1988).
 [23] Lecun, Y., Bottou, L., Bengio, Y. & Haffner, P. Gradientbased learning applied to document recognition. Proceedings of the IEEE 86, 2278–2324 (1998).
 [24] Krizhevsky, A. Learning multiple layers of features from tiny images. Master’s thesis, University of Toronto, Canada (2009).
 [25] Simonyan, K. & Zisserman, A. Very deep convolutional networks for largescale image recognition. arXiv preprint arXiv:1409.1556 (2014).
 [26] Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I. & Salakhutdinov, R. Dropout: A simple way to prevent neural networks from overfitting. The Journal of Machine Learning Research 15, 1929–1958 (2014).
 [27] Russakovsky, O. et al. ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision (IJCV) 115, 211–252 (2015).

[28]
He, K., Zhang, X., Ren,
S. & Sun, J.
Deep residual learning for image recognition.
In
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
, 770–778 (2016).