1 Introduction
Despite the remarkable success of deep neural networks [14]
in areas such as computer vision
[13, 7] or speech recognition [8], biological neural systems clearly outshine their artificial counterparts in terms of compactness, speed and energy consumption. One putative reason for such efficiency may lie in the way signals are represented and transmitted in animal brains: data is transmitted sparsely and asynchronously, in small packets by means of spikes. This is in stark contrast with the framebased approach of classical neural networks, which always compute the complete output of one layer synchronously, before passing it on to the next layer. Indeed, spikebased processing allows for more efficient utilization of communication channels and computing resources, and can lead to speedups in processing [17]. These advantages have sparked interest in dedicated spiking neural network electronic devices based on such eventdriven processing schemes [16, 11]. Achieving the same accuracy as stateoftheart deep learning models with eventbased updates has remained challenging, but recently a number of methods have been proposed which convert a previously trained conventional analog neural network (ANN) into a spiking one
[1, 5, 9, 6]. The principle of such conversion techniques is to approximate the continuousvalued activations of ANN units by the spike rates of eventbased neurons. Although successful on several classical benchmark tasks, all these methods suffer from approximation errors, and typically require a multitude of spikes to represent a single continuous value, thereby losing some of the advantages of eventbased computation.In this work, we propose a set of minimalistic eventbased asynchronous neural network models, which process input data streams continuously as they arrive, and which are formally equivalent to conventional framebased models. This class of models can be used to build highly efficient eventbased processing systems, potentially in conjunction with eventbased sensors [4, 18]. The resulting systems process the stream of incoming data in real time, and yield first predictions of the output typically already after a few data packets have been received, and well before the full input pattern has been presented to the network.
Here we demonstrate how the computation carried out by counter networks exactly reproduces the computation done in the conventional framebased network. This allows maintaining the high accuracy of deep neural networks, but does so with a power and resource efficient representation and communication scheme. Specifically we show how, as a consequence of the eventdriven style of processing, the resulting networks do not require computationally expensive multiplication operations. We initially demonstrate the principle for networks with binary activations, and then extend the model to nonbinary activations. The performance of our novel models is evaluated on the mnist dataset. The numerical results indicate that counter networks require fewer operations than previous approaches to process a given input, while enabling stateoftheart classification accuracy.
2 Counter neural networks
Multiplications are the most expensive operations when using conventional neural networks for inference on digital hardware. It is therefore desirable to reduce the number of required multiplications to a minimum. In this section, we introduce an eventbased neuron model, which only makes use of addition operations, counter variables, and simple comparisons, all of which can be implemented very efficiently in simple digital electronic circuits.
2.1 Multiplicationfree networks
Previous work has shown how framebased neural networks can be implemented using additions only, by either restricting all weights to binary (or ternary) values [10, 15, 3], or by using binary activations [2]
. The binary variable (weight or activation) then represents an indicator function, and all neural activations can be computed by simply adding up the selected weights or activations. To introduce our eventbased model in its most basic form, we first investigate the case where neurons have binary activations, such that the output of all neurons
in layer is given by(1) 
where is the weight matrix, are threshold values corresponding to bias terms, and
(2) 
Thus, the output of a neuron is 1 if its net input is greater than its threshold value
and 0 otherwise. While this model does not pose any constraints on the weights and thresholds, we use low precision integer values in all experiments to keep the computational cost low and allow for highly efficient digital implementations. We consider here multilayer networks trained through stochastic gradient descent using the backpropagation algorithm
[19]. Since the error gradient is zero almost everywhere in the discrete network given by eqs. 2 and 1, we replace the binarization function
by a logistic sigmoid function
in the backward pass,(3) 
where is a scaling factor. Furthermore, during training, we keep copies of the highresolution weights and activations, and use them to compute the gradient in the backward pass, as proposed by [3, 20]. In addition to the activations and network parameters, the inputs to the network are binarized by scaling them to lie in the range and rounding them to the nearest integer.
2.2 Lossless eventbased implementation through counter neurons
The multiplicationfree network proposed above can directly be turned into an asynchronous eventbased neural network by turning every unit of the ANN into a counter neuron, which we describe below. The weights and biases obtained through the training procedure above can be used in the eventbased network without further conversion.
Each counter neuron is defined by an internal counter variable , which is updated whenever the neuron receives positive or negative inputs in the form of binary events (or spikes). The neuron operation is illustrated in fig. 1 and is described by algorithm 1. A counter neuron essentially counts, or adds up, all the inputs it receives. Whenever this sum crosses the threshold , a binary event is emitted. The value of the event depends on the direction in which the threshold was crossed, i.e. a positive event is emitted when the threshold is crossed from below, and a negative event when falls below the threshold. Whenever neuron emits an event the quantity is provided as input to neuron of the next layer, with the sign determined by the output of neuron . Thus, the neurons themselves do not continuously provide output signals, which makes information transmission and computation in the network very sparse. The input to the network is also provided in eventbased form as a stream of binary events (or spikes), i.e. the network at discrete points in time receives a list of indices of one or more pixels, indicating that these pixels are active. Specifically, a binary input image (or other data) is streamed to the network pixelbypixel in an asynchronous manner, whereby the order and exact timing of the pixels does not matter. In the following we will show analytically that an eventbased network based on counter neurons produces exactly the same output as its framebased counterpart.
Proof of the equivalence.
To prove the equivalence of the framebased and the eventbased model we have to show that the outputs of individual neurons are the same in both settings. In the following, we assume without loss of generality. Let an eventbased network be initialized at time , such that all . Within the time interval , neuron of layer in the eventbased network receives a sequence of inputs, from a set of source neurons at times , where is the sign of the th event, and . It follows from algorithm 1 that the value of the counter variable at time is
(4) 
as it simply sums up the inputs. The sign of might change several times during the presentation of the input, and trigger the emission of a positive or negative event at every zerocrossing. Since is initialized at , there are sign changes in total, where is a nonnegative integer, and thus the total input communicated to neuron of the next layer is
(5) 
as the sign changes cancel out. On the other hand, recursively applying eq. 5 leads to
(6) 
Since for the input layer, the equivalence must also hold for all higher layers, according to eq. 6. ∎
With the notion of equivalence, it is clear that the eventbased network, if configured with the parameters obtained for the framebased network, is able to exactly reproduce the output of the latter. Unlike in previous work [1, 5], the resulting eventbased network is guaranteed to provide the same result as the ‘ideal’ framebased implementation. Thereby, the respective output value can be obtained by adding up the events emitted by the output layer. Technically, the equivalence holds only in the case where the full stream of input events has been presented to the network, and propagated through all layers. In practice, however, a few input events are often enough to activate the right neurons and produce the correct output long before the full set of input events has been presented. As a consequence, on average far fewer operations than in the framebased model are required to compute the correct output (see fig. 2 for an example).
2.3 Extension to nonbinary inputs
The constraint that the input patterns are binary, and each input unit either produces an event or not, can be safely relaxed to integervalued inputs without further modifications of the model. The framework thus supports finer grained input scales, which is important e.g. for input images using multiple graylevels or RGB values to encode different colors. The simple modification is that each individual input unit produces over time a number of events that corresponds to the encoded integer value. While such integervalued inputs would require multiplication in the framebased network with nonbinary (or nonternary) weights, the eventbased network remains free of multiplication operations, as the instantaneous output of any input unit is still binary.
2.4 Extended counter neuron network with nonbinary activations
Using binary outputs for neurons might be a disadvantage, because due to the the limited output bandwidth of individual neurons it might be necessary to use more neurons in hidden layers to obtain the same accuracy as a nonbinary network. The counter network model can easily be extended to nonbinary activations, without requiring multiplication in the eventbased implementation. In order to train a network based on neurons featuring multiple, discrete levels of activation, we use a discretized version of the ReLU activation function during training:
(7) 
where is a small, positive constant, is a scaling factor, and is the typical ReLU halfwave rectification,
(8) 
As in the binary case, the discrete activation is used during training in the forward pass, and a continuous approximation in the backward pass. Specifically, can be approximated by a shifted and scaled ReLU,
(9) 
in the backward pass. The learned parameters can then again be directly transferred to configure a network of eventbased neurons without further conversion of weights or biases. The dynamics of this network are illustrated in fig. 3, and described in algorithm 2. The equivalence of the framebased and the eventbased implementation can be proven similarly to the binary case. A sketch of the proof is outlined below:
Proof of the equivalence.
From algorithm 2 it can be seen that after a neuron has processed a number of input events, its internal variable has the value , where is the accumulated input provided over time. On the other hand, the value of changes only when an event is emitted, and its value corresponds to the number of positive events emitted, minus the number of negative events emitted. Thus, the accumulated output communicated by the neuron corresponds precisely to , and thereby to the output of the framebased model given by eq. 7, since corresponds to the total input provided by neurons from the previous layer, . ∎
The discretized ReLU offers a wider range of values than the binary activation, and therefore allows for a more finegrained response, thereby facilitating training. On the other hand, using nonbinary outputs might lead to larger output delays compared to the binary case, as the trained neurons might now require a multitude of events from individual neurons to arrive at the correct output.
3 Results
Various networks were trained on the mnist dataset of handwritten digits to demonstrate competitive classification accuracy. In particular, we evaluated fully connected networks (FCNs) of three hidden layers (78410001000100010) and five hidden layers (78475075075075075010) to investigate how the depth of the network affects the processing speed and efficiency. In addition we trained convolutional networks (CNNs) with two (78412c512c710) and four (78412c312c512c712c910) hidden layers. The network dimensions were chosen such that the number of parameters remains roughly the same in the shallower and deeper networks (2.8 mio. parameters for the FCNs, and 50k for the CNNs.)
3.1 Training details
The networks were trained through stochastic gradient descent using the Adam method [12]. The gradients for the backward pass were computed using the continuous approximations described by eqs. 9 and 3. All parameters were restricted to 8bit integer precision in the forward pass, and floating point representations were used only in the backward pass, as suggested by [3, 20]. The biases
were restricted to nonnegative values through an additional constraint in the objective function, otherwise categorical crossentropy was used as the loss function. The learning rate was set to 0.01 for the CNNs and to 0.005 for the FCNs. The
mnist dataset was split into a training set of 50000 samples, a validation set of 10000 samples, and a test set of 10000 samples, and a batch size of 200 samples was used during training. The networks were trained until a classification error of on the validation set was obtained. The lowprecision parameters were then directly used in an eventbased network of equivalent architecture.3.2 Fast and efficient classification
The main advantage of eventbased deep networks is that outputs can be obtained fast and efficiently. We quantify this in Figure 4, where the processing speed is measured as the time taken to produce the same output as the framebased model, and efficiency is quantified as the number of addition operations required to compute the output. For this analysis, individual pixels of the input image are provided to the network one by one in random order. In the systems based on the basic counter network model with binary neurons, the majority of events is emitted at the beginning of the input data presentation, with activity gradually declining in the course of the presentation. This reflects the fact that counter neurons cannot emit two events of the same sign in a row, leading to quick saturation in the output. The opposite is the case for the extended counter neuron based on the discretized ReLU, where activity gradually ramps up. Overall, this allows the extended model to produce the desired output with fewer operations than the basic model, as can be seen in fig. 5. The achieved efficiency is beyond what had been possible with previous conversionbased methods: our method achieves classification of mnist at 500k events (CNN based on the extended neuron model), while the best reported result of a conversionbased network, to our knowledge, is 1 mio. events [17]. Despite the different network architectures, the behavior of FCNs and CNNs is qualitatively similar, with the main differences being due to the neuron model. In general, deeper networks seem to require a greater number of operations than shallower networks to achieve equivalent results.
4 Discussion
The two presented counter neuron models allow efficient eventbased implementations of deep neural network architectures. While previous methods constructed deep spiking networks by converting parameters and approximating activations with firing rates, the output of our model is provably equivalent to its framebased counterpart. Training is done in the framebased domain, where stateoftheart neural network optimization methods can be exploited. The discrete nature of counter networks allows hardwarefriendly digital implementations, and makes them very suitable to process data from eventbased sensors [18]. The resulting systems differ from traditional neural networks in the sense that units are updated ‘depth first’, rather than ‘breadth first’, meaning that any neuron can fire when its threshold is crossed, instead of waiting for all neurons in previous layers to be updated, as in conventional neural networks. This allows processing of input data as they arrive, rather than waiting for a whole frame to be transferred to the input layer. This can significantly speed up processing in digital applications. Compared to other deep spiking neural networks based on parameter conversion [17], counter networks require fewer operations to process input images, even in our nonoptimized setting. We expect that adding further constraints to enforce sparsity or reduce neuron activations can make counter networks even more efficient. Further research is required to investigate the applicability of the counter neuron model in recurrent networks. Finally, eventbased systems are appealing because they allow for purely local, eventbased weight update rules, such as spiketiming dependent plasticity (STDP). Preliminary results indicate that STDPbased training of counter networks is possible, which not only would allow efficient inference but also training of deep eventbased networks.
Acknowledgements
The research was supported by the Swiss National Science Foundation Grant 200021146608 and the European Union ERC Grant 257219.
References

[1]
Yongqiang Cao, Yang Chen, and Deepak Khosla.
Spiking deep convolutional neural networks for energyefficient object recognition.
International Journal of Computer Vision, 113(1):54–66, 2015.  [2] Matthieu Courbariaux and Yoshua Bengio. BinaryNet: Training deep neural networks with weights and activations constrained to +1 or 1. arXiv preprint arXiv:1602.02830, 2016.
 [3] Matthieu Courbariaux, Yoshua Bengio, and JeanPierre David. BinaryConnect: Training deep neural networks with binary weights during propagations. In Advances in Neural Information Processing Systems, pages 3123–3131, 2015.
 [4] Tobi Delbrück, Bernabe LinaresBarranco, Eugenio Culurciello, and Christoph Posch. Activitydriven, eventbased vision sensors. In Proceedings of 2010 IEEE International Symposium on Circuits and Systems, pages 2426–2429. IEEE, 2010.
 [5] Peter U Diehl, Daniel Neil, Jonathan Binas, Matthew Cook, ShihChii Liu, and Michael Pfeiffer. Fastclassifying, highaccuracy spiking deep networks through weight and threshold balancing. In 2015 International Joint Conference on Neural Networks (IJCNN). IEEE, 2015.
 [6] Steven K. Esser, Paul A. Merolla, John V. Arthur, Andrew S. Cassidy, Rathinakumar Appuswamy, Alexander Andreopoulos, David J. Berg, Jeffrey L. McKinstry, Timothy Melano, Davis R. Barch, Carmelo di Nolfo, Pallab Datta, Arnon Amir, Brian Taba, Myron D. Flickner, and Dharmendra S. Modha. Convolutional networks for fast, energyefficient neuromorphic computing. Proceedings of the National Academy of Sciences, 2016.

[7]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun.
Delving deep into rectifiers: Surpassing humanlevel performance on ImageNet classification.
In Proceedings of the IEEE International Conference on Computer Vision, pages 1026–1034, 2015.  [8] Geoffrey E Hinton, Li Deng, Dong Yu, George E Dahl, AbdelRahman Mohamed, Navdeep Jaitly, Andrew Senior, Vincent Vanhoucke, Patrick Nguyen, Tara N Sainath, et al. Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Processing Magazine, 29(6):82–97, 2012.
 [9] Eric Hunsberger and Chris Eliasmith. Spiking deep networks with LIF neurons. arXiv preprint arXiv:1510.08829, 2015.
 [10] Kyuyeon Hwang and Wonyong Sung. Fixedpoint feedforward deep neural network design using weights +1, 0, and 1. In 2014 IEEE Workshop on Signal Processing Systems (SiPS), pages 1–6. IEEE, 2014.
 [11] Giacomo Indiveri, Federico Corradi, and Ning Qiao. Neuromorphic architectures for spiking deep neural networks. In 2015 IEEE International Electron Devices Meeting (IEDM). IEEE, 2015.
 [12] Diederik Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
 [13] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems, pages 1097–1105, 2012.
 [14] Yann LeCun, Yoshua Bengio, and Geoffrey E Hinton. Deep learning. Nature, 521(7553):436–444, 2015.
 [15] Zhouhan Lin, Matthieu Courbariaux, Roland Memisevic, and Yoshua Bengio. Neural networks with few multiplications. arXiv preprint arXiv:1510.03009, 2015.
 [16] Paul A Merolla, John V Arthur, Rodrigo AlvarezIcaza, Andrew S Cassidy, Jun Sawada, Filipp Akopyan, Bryan L Jackson, Nabil Imam, Chen Guo, Yutaka Nakamura, et al. A million spikingneuron integrated circuit with a scalable communication network and interface. Science, 345(6197):668–673, 2014.
 [17] Daniel Neil, Michael Pfeiffer, and ShihChii Liu. Learning to be efficient: algorithms for training lowlatency, lowcompute deep spiking neural networks. In Proceedings of the 31st Annual ACM Symposium on Applied Computing, pages 293–298. ACM, 2016.
 [18] Christoph Posch, Teresa SerranoGotarredona, Bernabe LinaresBarranco, and Tobi Delbruck. Retinomorphic eventbased vision sensors: bioinspired cameras with spiking output. Proceedings of the IEEE, 102(10):1470–1484, 2014.
 [19] D E Rumelhart, G E Hinton, and R J Williams. Learning representations by backpropagating errors. Nature, 323:533–536, 1986.

[20]
Evangelos Stromatias, Daniel Neil, Michael Pfeiffer, Francesco Galluppi,
Steve B Furber, and ShihChii Liu.
Robustness of spiking deep belief networks to noise and reduced bit precision of neuroinspired hardware platforms.
Frontiers in neuroscience, 9, 2015.