Minibatch Processing in Spiking Neural Networks

Spiking neural networks (SNNs) are a promising candidate for biologically-inspired and energy efficient computation. However, their simulation is notoriously time consuming, and may be seen as a bottleneck in developing competitive training methods with potential deployment on neuromorphic hardware platforms. To address this issue, we provide an implementation of mini-batch processing applied to clock-based SNN simulation, leading to drastically increased data throughput. To our knowledge, this is the first general-purpose implementation of mini-batch processing in a spiking neural networks simulator, which works with arbitrary neuron and synapse models. We demonstrate nearly constant-time scaling with batch size on a simulation setup (up to GPU memory limits), and showcase the effectiveness of large batch sizes in two SNN application domains, resulting in ≈880X and ≈24X reductions in wall-clock time respectively. Different parameter reduction techniques are shown to produce different learning outcomes in a simulation of networks trained with spike-timing-dependent plasticity. Machine learning practitioners and biological modelers alike may benefit from the drastically reduced simulation time and increased iteration speed this method enables. Code to reproduce the benchmarks and experimental findings in this paper can be found at https://github.com/djsaunde/snn-minibatch.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

10/16/2019

The Heidelberg spiking datasets for the systematic evaluation of spiking neural networks

Spiking neural networks are the basis of versatile and power-efficient i...
09/16/2018

Direct Training for Spiking Neural Networks: Faster, Larger, Better

Spiking neural networks (SNNs) are gaining more attention as a promising...
12/16/2019

Faster and Simpler SNN Simulation with Work Queues

We present a clock-driven Spiking Neural Network simulator which is up t...
03/23/2018

Gaussian and exponential lateral connectivity on distributed spiking neural network simulation

We measured the impact of long-range exponentially decaying intra-areal ...
01/19/2022

Temporal Computer Organization

This document is focused on computing systems implemented in technologie...
05/20/2022

EXODUS: Stable and Efficient Training of Spiking Neural Networks

Spiking Neural Networks (SNNs) are gaining significant traction in machi...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Research into training SNNs for machine learning (ML) tasks has rapidly accelerated in recent years [30, 39]. This is due in part to their impressive computational power [21], their natural applicability to computation over spatio-temporal signals [40], their biological plausibility – and, therefore, possibilities for synergy with neuroscience [23] – along with the promise of low energy consumption and rapid processing time once implemented in neuromorphic hardware [36]. Software for the efficient training of these networks is, however, largely undeveloped relative to libraries for the training and deployment of artificial neural networks (ANNs). In particular, existing solutions do not support the independent, parallel processing of data through a single network structure.

Due in part to a general lack of mature software infrastructure, researchers have been hesitant to adopt SNNs for cutting-edge ML experimentation. As a result, the development of training algorithms for SNNs has been slow relative to the proliferation of research on ANNs. With large datasets and the complex neural network models needed to process them, advances in software and hardware technology for ANNs has been critical to enabling their practical training and application. In order to bring SNNs to the technological forefront, similar advances are needed. In this paper, we take another step towards the practical use of SNNs with a general-purpose implementation of GPU-enabled minibatch processing. We argue that SNNs may enjoy similarly widespread applicability once they are made simpler to build, simulate, and train [41].

Neurons in spiking neural networks [22] are set apart from those of ANNs in part by their maintenance of simulation state variables over time, e.g., voltages or refractoriness. That is, neurons in SNNs are stateful, whereas those typically used in ANNs are stateless

, with the notable exception of recurrent neural networks (RNNs), which maintain hidden state over time. Indeed, SNNs can be seen as a special case of RNNs, wherein recurrent processing is carried out by dynamic state variables rather than explicit recurrent connections (although, such recurrent connections may also be used in SNNs)

[26]. Statelessness implies that multiple inputs (i.e., a minibatch [12]) can be input in parallel to an ANN and processed independently without any additional memory overhead. However, in SNNs and RNNs, where inputs are processed for a length of time and neurons’ state variables often depend on their values in the previous time step, there is no choice but to maintain these variables in memory.

Minibatch processing, both at training and inference time, has a number of useful properties:

  • Reduced simulation wall-clock time: Running multiple simulations in parallel enables processing more data per unit wall-clock time than running one at a time. Using a GPU with enough memory, the amount of data processed per unit wall-clock time can be expected to increase approximately linearly with minibatch size.

  • Reduced variance in parameter updates

    : Computing parameter updates over minibatches results in less noisy updates than those computed from single examples. This reduces the effect that outliers have on parameter updates. On the other hand, for optimization purposes, the noise resulting from small minibatch sizes may help move a model out of local minima

    [3].

  • Improved generalization

    : There is good reason to believe that, in the small-minibatch regime, stochastic gradient descent (SGD) improves the generalization performance of ANNs

    [31]. It may be described intuitively as a “bagging” procedure [4], where, by computing parameter updates based on a minibatch of examples, we enforce changes that generalize across across minibatches.

While mainly useful for machine learning experimentation, researchers working in biological modeling may also benefit from batched simulation. There is no clear analogue of minibatch processing in neural circuits: although the brain is a highly parallel computing device, neural circuits must process their inputs one at a time. However, the technique is not meant to mimic biological phenomena, but rather to increase computational efficiency. To that end, experimenters can use minibatch processing to simulate multiple, independent trials in order to take trial averages, gather data, or calibrate parameter settings more quickly.

In this paper, we describe a general-purpose implementation of minibatch processing in SNNs. To our knowledge, it is the first of its kind, although minibatching in restricted situations has been discussed in prior work. We supplement this description with a concrete implementation in BindsNET, an open source SNN simulation library [13]. Written in the Python programming language on top of the PyTorchdeep learning framework [28], BindsNET was built with ease of prototyping and machine learning applications in mind. There is support for processing on CPUs and GPUs, where on the latter, users may see significant speed improvements due to the use of minibatch processing. We also provide experiments which showcase the multifaceted benefits of using minibatch processing in SNNs, and discuss the use different batch-wise parameter reduction techniques in the online learning setting.

GPUs are well-suited to parallelizing many of the mathematical operations needed to simulate spiking networks in BindsNET, e.g., the matrix multiplication used to compute the current incident to post-synaptic neurons based on synapse weights and pre-synaptic spiking activity. In general, operations where a single instruction can be applied to many data can easily be mapped to GPUs and parallelized to a large degree.

2 Related Work

2.1 Minibatch processing

To our knowledge, ours is the first general-purpose implementation of minibatch processing in SNN simulation. Moreover, it is the first to be implemented in an SNN simulation library, and, importantly, works with all available neuron and synapse types and training methodologies. Perhaps the closest to our work is the implementation in NengoDL [32], which does not support minibatch processing with online learning rules, i.e., those which compute updates to parameters concurrently with data processing, a key feature of spiking neural networks.

The idea of processing with batches of data for the purpose of training statistical or machine learning models is not a new one [1]. Indeed, in the original formulation of gradient descent, updates to fitted parameters are computed over the entire training dataset. With increasingly large datasets and limitations on memory, this approach is not always feasible, and so computing stochastic updates over randomly sampled batches of data has become standard practice. Indeed, it has even been argued that small batch sizes are desirable in some cases, where it can improve the stability of training and decrease generalization error on the test data [25].

Several prior works have incorporated bespoke implementations of minibatch processing for restricted types of SNNs. [9]

describe a binary STDP rule that allows for processing minibatches of data, although it only considers the precise timing of neurons’ first spikes, and it involves approximating spiking neurons as rectified linear units (ReLUs).

[27] describe an unusual spiking neural network model that allows both positively and negatively signed “spikes” and derive approximations to the back-propagation algorithm, claiming “…in principle it is possible to do minibatch training”, although their experiments involve one-by-one processing of data points. [19] pre-train a convolutional SNN layer-wise with STDP, and fine-tune the network’s weights for a downstream classification task with back-propagation on low-pass filtered spike trains. The networks are trained with minibatch updates, but it is unclear whether they are computed in parallel, or are instead computed serially and later averaged to produce a minibatch update. [43] implement a three-factor learning rule for learning precise spatiotemporal spike patterns which is computed over minibatches of data. [5] implement minibatched exact back-propagation for training spike times and synapse weights in a network of spiking neurons that emit single spikes.

Other authors have approximated spiking neurons by smoothing their activation function, so as to incorporate them into ANNs to be trained with the back-propagation algorithm

[16, 15]. Here, minibatch processing is obtained for free as a result of the smooth approximations used. However, it is difficult to describe the neurons in these networks as “spiking”, in the sense that they do not fire all-or-nothing pulses in the event of a voltage threshold crossing.

2.2 SNN training methodologies

Due to their power efficiency and event-based operation, research into methods for training SNNs for machine learning tasks has accelerated. Their non-differentiability, due to the all-or-nothing, discontinuous nature of spiking neurons, has made it impossible to train them with the popular back-propagation algorithm. To deal with this, several general training approaches have been developed for SNNs, to all of which minibatch processing is applicable. We review a number of the most well-known approaches:

  • Local learning rules: Local learning rules [20, 42], such as Hebbian learning [14] and spike-timing-dependent plasticity (STDP) [24, 2], operate by updating synaptic strengths as a function of pre- and post-synaptic neural activity and possibly a third, global factor such as dopamine or other neuromodulators [10]. In the context of minibatch processing, updates to synapses can be reduced across the minibatch dimension, effectively increasing the speed of learning and possibly decreasing the step-by-step variability of weight changes.

  • Rate-based gradient methods: In this setting, the temporal aspects of spikes are ignored, and firing rates are considered in lieu of precise spike timing or ordering. Firing rates are often continuous with respect to neuronal inputs, and can therefore be used in back-propagation calculations [16, 27, 38].

  • Surrogate gradient methods: These methods provide an approach for overcoming the difficulties associated with the spiking discontinuity by providing an approximating surrogate gradient for the neuron’s spiking nonlinearity [43, 40, 37, 26]. Networks are then trained with gradient descent. One such work argues that their derived rule could be used in minibatch updates [43].

  • Differentiable approximations: Several prior works [16, 15] have devised differentiable approximations to spiking neuron models and incorporated them into artificial neural networks. These networks may be trained with minibatch updates, as they ignore the temporal dynamics of spiking neurons [16], or incorporate them into recurrent ANNs [15].

  • ANN to SNN conversion: A recent thread of research into deploying spiking neural networks on neuromorphic hardware involves the conversion of trained ANNs to SNNs with little or no loss in performance on classification [7, 34, 33, 35, 44]

    and reinforcement learning

    [29] tasks. ANNs are trained with a variant of minibatch gradient descent, but, once converted to SNNs, these works do not apply minibatch processing. See Table 1 for a comparison of error rates between ANNs and their converted SNN counterparts, demonstrating that, in principle, SNNs may perform just as well on complex classification tasks as ANNs can.

Dataset ANN error SNN error
MNIST [18] 0.56% 0.56%
CIFAR-10 [17] 8.09% 9.15%
ImageNet [6] 23.88% 25.40%
Table 1:

Comparison of ANN and converted SNN error rates on popular computer vision benchmarks. All results are taken from

[33], which reports the lowest conversion error rates across all datasets to date.

3 Implementation

Since certain neuron and synapse models in SNNs maintain various stateful quantities during simulation, for a minibatch size , our implementation duplicates these variables times at the start of a simulation. During simulation, these time-sensitive variables evolve independently across the batch dimension. Quantities that are not stateful are not duplicated, such as rest and reset voltages, fixed thresholds, voltage decay rates, etc. Adaptive parameters such as connection weights (synaptic strengths) and adaptive voltage thresholds are updated during simulation, but there is only one copy of each of these parameters; updates to them are aggregated across the batch dimension via averaging, summation, or possibly many other reductions, which we will later discuss.

3.1 Dynamic minibatch size

Adaptive minibatch sizes are supported. Changes in minibatch size may occur when moving from training to inference; e.g., large amounts of training data may be bundled into minibatches to expedite training, whereas at inference time, queries to the trained SNN may occur one at a time as needed. It may also change when the size of a dataset is not evenly divisible by the minibatch size, and so the last batch of examples will be smaller than the rest. Adaptive minibatch size is implemented by checking the batch size of an input against the expected batch size; if it is different, state variables are re-initialized to match it, and simulation proceeds as normal.

3.2 Episodic vs. continuing simulation

Implicit in our discussion thus far is a reliance on episodic, trial-based experimentation. Between trials (processing a minibatch of size B for time T), time-sensitive neuronal state variables must be reset to common values; otherwise, we are not performing independent simulations with the same initial conditions. This setup is well suited to many machine learning tasks: unsupervised, supervised, and episodic reinforcement learning proceed on a example-by-example or episode-by-episode basis.

However, if the user is comfortable with relaxing the assumption of identical initial conditions, continuing simulations may be used, where input data may change over time without requiring the re-initialization of state variables. This is well-suited for cases where said state variables are relatively transitory, and when their initial conditions don’t have a strong effect on the measured simulation outcomes. For example, after a short simulation time, neuron voltages may change quite rapidly, and it is difficult to guess at their initial values. Continuing simulation may be used for batched continuing reinforcement learning, or for SNN simulations which have no natural notion of “resetting”.

3.3 Reduction methods

It is common practice to average updates to an ANN’s parameters over the batch dimension. Every neuron in an ANN participates during the network’s forward pass, and averaging the weight updates over the batch dimension results in an unbiased estimate of its derivative with respect to the loss function. The neurons of spiking neural networks, on the other hand, output non-zero values (spikes) relatively sparsely in time, which often trigger parameter updates that have no bearing on a global loss function. Therefore, averaging parameter updates over the batch dimension may result in overly conservative parameter updates and slow learning due to the presence of many zero values in the average, and which can be avoided when not training with gradient descent.

For this reason, our implementation supports arbitrary reduction methods, namely, those that process PyTorchtensors and may be used to reduce the minibatch dimension. Custom reduction methods may be written by users as long as they support this simple API. By default, parameter updates are averaged over the batch dimension. As we will discuss, different applications may benefit from using different reduction methods.

3.4 Complexity

Duplicating stateful variables across the batch dimension may quickly consume memory. For per-neuron variables (e.g., membrane voltage), assuming a minibatch size of and a neuron population of size , memory is required. For per-synapse variables (e.g., synapse conductances), assuming pre- and post-synaptic neuron populations of size and , respectively, is needed. Multiple stateful variables per network component may need to be extended across the batch dimension, the number of which generally increases with the complexity of the neuron or synapse model. Users must be wary of setting batch sizes such that the total memory usage is greater than what is available, so as to prevent frequent swapping of tensors in and out of memory or triggering out-of-memory errors. As a result, in comparison with ANNs, minibatch processing in SNNs is fundamentally more memory-intensive due to the use of stateful, time-dependent variables.

It is well-established that GPUs are suited for highly parallel processing due to their large number of cores, which all execute the same instructions simultaneously. For this reason, we expect that the wall-clock time for a given simulation with an SNN of fixed size will remain roughly constant with increasing batch size, up until the point where network variables no longer fit into GPU memory, at which point simulation time will increase as tensors will needed to be swapped in and out of GPU memory. This will be shown empirically in 4.

4 Experiments

In the following, we describe a few simple experiments aimed at communicating the usefulness of the minibatch processing approach to SNNs simulation. We investigate the scaling of a simple two-layer network to increasing output layer and minibatch sizes. We then show how a simple multi-layer perceptron converted to a near-equivalent SNN can maintain accuracy and classify test data increasingly rapidly with increasing batch size. Finally, SNNs of fixed size are trained in an semi-supervised fashion to classify the MNIST dataset, effective for a wide range of minibatch sizes. Unless otherwise stated, a 1ms simulation time resolution is used.

4.1 Scaling a Two Layer Network

We construct a simple two layer network consisting of 100 input neurons with Poisson spike trains with rates randomly sampled in [0Hz, 120Hz] connected to a variable-sized layer of leaky integrate-and-fire (LIF) neurons [11] with synapse weights randomly sampled from . Varying the minibatch size, we run the network for 1 second of simulated time in 10 independent trials and report the statistics of the required wall-clock time.

Figure 1 depicts the results for networks with a variable number of output neurons, with or without training the synapse weights with a simple online STDP rule. In all cases, simulation wall-clock time remains roughly constant for small- and medium-sized batch sizes, but begins to grow quickly as the batch size grows large. This is due to running out of GPU memory (12GB) with larger network and minibatch sizes and using STDP. Learning with STDP incurs a higher memory and computational cost, from recording the “spike traces” in the pre- and post-synaptic populations required for online STDP, and from computing weight updates and reducing them across the batch dimension.

Figure 1: Wall-clock time of a 1s simulation vs. batch size with varying numbers of output neurons, with and without STDP. 10 independent trials are run on a GeForce GTX TITAN X, and their average time

one standard deviation is reported. Increase in simulation time for large networks and batch sizes is due to running out of GPU memory (12GB).

4.2 ANN to SNN conversion

Following the methodology of [33]

, we first train a 3-layer multi-layer perceptron to classify the MNIST data and convert it to an SNN with little loss in performance. The network has hidden layers with sizes of 256 and 128 and ReLU activations. It is converted into an spiking neural network with identical architecture, except that the ReLU non-linearities are approximated by the firing rates of (non-leaky) integrate-and-fire (IF) neurons with

reset by subtraction. That is, instead of resetting neuron voltages back to a baseline value after a spike (typically zero), the difference between the firing threshold and baseline value is subtracted off the neuron’s voltage. This ensures that, if a neuron exceeds its threshold by some amount, that amount is not lost by the resetting mechanism. To derive a classification decision from the network, we sum the inputs to the final layer (with size equal to the number of classes) over the simulation run, and take the label corresponding to the maximizing argument.

Accuracy of the converted SNN compared to the original ANN is given in Table 2. The ANN achieves 98.13% test accuracy, while the SNN with 10ms inference time achieves 97.86%, a reduction. With 3ms of simulation time, the converted SNN already achieves 97.30% accuracy. Setting the simulation time higher than 10ms does not result in better performance (data not shown). Figure 1(a) plots the wall-clock time required to run inference in the converted SNN on the entire MNIST test dataset (10K images). With batch size 1 (serial processing) and 10ms of simulation time, inference takes over 11 minutes. On the other hand, with batch size 1024, this same procedure takes 0.75 seconds, a 880X reduction in wall-clock time. Finally, inference time per minibatch for various settings of batch size and simulation time is plotted in Figure 1(b). For small batch sizes, each simulation time step takes 0.01s, while for larger batch sizes, each step requires between 0.01 and 0.1 seconds. A single simulation step with batch size 1 requires just over 1ms of wall clock time, running nearly in “real time”.

Time 1ms 2ms 3ms 4ms 5ms 10ms
Accuracy 29.37% 94.03% 97.30% 97.62% 97.73% 97.86%
Table 2: Simulation time per example vs. overall accuracy on the MNIST test dataset. The original MLP achieves 98.13% accuracy.
(a) Test set inference time
(b) Batch inference time
Figure 2: (a) Wall-clock time required to classify the test dataset with the converted SNN with various settings of batch size and simulation time. (b) Inference time for a single batch of data for various settings of batch size and simulation time.

4.3 Unsupervised Learning of MNIST digits

We implemented a slightly modified, minibatched version of the experimental setup from [8]. The considered SNN consists of an input layer with dimensionality equal to the input data, in this case, the MNIST digits, with shape 2828. The input data is encoded into Poisson spike trains with firing rates in [0Hz, 128Hz], obtained by dividing the pixel-wise input data by 2. This layer connects all-to-all with STDP-modifiable synapses to a population of LIF neurons with adaptive thresholds, which increase by 0.05mV each time a spike is emitted, and are otherwise decaying back to their default value with a time constant of 1000s. This layer is recurrently connected with large, fixed inhibitory synapses, which is used to implement a soft winner-take-all (WTA) circuit: when a neuron in this layer spikes, all other neurons in the layer are inhibited, allowing it to continue spiking unchallenged. Accordingly, we use an online version of STDP (i.e., weight updates are made during simulation) which utilizes only positive weight updates triggered by the firing of the post-synaptic neuron, along with a weight normalization technique such that the sum of weights incident to a post-synaptic neuron remains constant. We implement a simple classification scheme on the output of the network; namely, individual neurons are assigned labels according to the class of data for which they fire most for during training. At test time, spikes are counted per neuron and aggregated into class-wise bins. The bin with the largest count determines the label of the input data.

(a) Parameter reduction: maximum
(b) Wall-clock time comparison
Figure 3: (a) Accuracy curves for networks with over the course of training for various settings of batch size, with parameter updates computed by taking the maximum of individual contributions over the minibatch dimension. Accuracy curves are smoothed with a Hann filter of length 10. Wall-clock time needed to reach 80% test set accuracy is plotted in (b).

Fixing and varying the batch size in , we investigate how the originally serial method performs in the minibatched setting. Output neurons are re-labeled and accuracy on the test dataset is assessed after every 250 training examples. With larger batch sizes, the network fails to learn to classify the data with the default parameter reduction: averaging parameter updates across the minibatch dimension (data not shown). On the other hand, Figure 2(a) shows that this issue can be partially mitigated by utilizing a different parameter reduction method: taking the per-parameter (synapse) maximum on each time step (the “maximum” method). With this reduction, the networks achieve a comparable maximum test accuracy regardless of batch size.

We conjecture this mismatch in accuracy is due to there being more examples per minibatch then there are output neurons; as described above, one neuron typically “wins” per example in the soft WTA. Therefore, with more inputs than neurons, there must exist at least one neuron which fires for two or more different examples in the minibatch, leading to conflicting weight updates that may cancel each other out. Using the per-parameter maximum partially solves this problem by discarding smaller weights updates, allowing the larger updates to coalesce and enabling learning of coherent synapse weights. Still, there is a non-negligible loss in accuracy with moderately large batch sizes, and this problem is exacerbated with increasingly larger batch sizes (data not shown). An interesting direction of future work is to investigate training methods that are more robust to the choice of batch size.

Figure 2(b) compares the wall-clock time required to reach 80% accuracy accuracy with the same network trained with various batch sizes. In particular, with batch size 1, nearly 12 minutes is required to reach this accuracy level. With batch size 64, less than 30 seconds are needed, a 24X speedup, with only a small loss in maximum performance. Figure 4 compares the weights learned with different settings of the batch size. Importantly, visual inspection reveals very few qualitative differences between the learned filters. This suggests that, with proper tuning of the hyper-parameters of the classification part of the method, networks trained with larger minibatch sizes may attain classification performance equal to that of the serial method.

(a) Batch size = 1
(b) Batch size = 256
Figure 4: Filter weights learned by networks trained with the “maximum” reduction method and (a) serial updates (batch size 1) and (b) a large degree of data parallelism (batch size 256).

5 Discussion

Our implementation can be extended to arbitrarily complex neuron and synapse models. The user may subclass BindsNET’s Nodes or Connection objects, and then specify which neuronal variables need to be duplicated across the batch dimension. As discussed before, per-synapse variables typically require more memory than per-neuron variables, and each batched variable will require its own memory resources. For this reason, networks with simplistic neuron and synapse types can often be enlarged and parallelized to a greater degree than networks with more complex components.

Although our focus in the exposition and experiments of this paper has been on GPU-based simulation, minibatch processing can also be used with CPUs. However, the reductions in wall-clock time from using this approach are much less drastic than simulating with GPUs.

For any given task, the careful selection of a reduction of parameter updates across the minibatch dimension may be needed to achieve the desired learning outcome. In online learning rules, since synapse weights are triggered in an event-based fashion, these may be sparse in time, so taking the average update among many zeros may result in slow learning. For that reason, users are free to select or implement reduction methods that suit their particular learning problem.

6 Conclusion

Spiking neural networks are rapidly becoming viable tools for investigations into powerful, biologically plausible forms of machine learning [30]. While processing batches of data in parallel in real neural circuits may not be plausible, in simulation, it serves as a useful optimization for the sake of computational efficiency. To date, the bulk of simulation has been implemented as serial processes, which often does not scale to large datasets: the speed of research iteration is extremely low due to the high cost of running even a single pass through the data. Thus, we have introduced and demonstrated the utility of a general-purpose implementation of minibatch processing in SNNs that can be leveraged to reduce simulation run-times and increase the speed of iteration of research ideas. With enough GPU memory and the proper choice of minibatch size, the wall-clock time of any simulation can be significantly reduced while preserving learning capabilities; we believe this is an important technological milestone in the effort to leverage spiking neural networks in modeling studies and machine learning experiments alike.

7 Acknowledgements

We would like to thank Sam Wenke, Jim Fleming, and Mike Qiu for their careful review of the manuscript.

References

  • [1] D. P. Bertsekas (1996-03)

    Incremental least squares methods and the extended kalman filter

    .
    6 (3), pp. 807–822. External Links: ISSN 1052-6234, Link, Document Cited by: §2.1.
  • [2] G. Bi and M. Poo (1998) Synaptic modifications in cultured hippocampal neurons: dependence on spike timing, synaptic strength, and postsynaptic cell type. 18 (24), pp. 10464–10472. External Links: Document, ISSN 0270-6474, Link, https://www.jneurosci.org/content/18/24/10464.full.pdf Cited by: 1st item.
  • [3] L. Bottou (2010) Large-scale machine learning with stochastic gradient descent. In Proceedings of COMPSTAT’2010, Y. Lechevallier and G. Saporta (Eds.), Heidelberg, pp. 177–186. External Links: ISBN 978-3-7908-2604-3 Cited by: 2nd item.
  • [4] L. Breiman (1996-08-01) Bagging predictors. 24 (2), pp. 123–140. External Links: ISSN 1573-0565, Document, Link Cited by: 3rd item.
  • [5] I. M. Comsa, K. Potempa, L. Versari, T. Fischbacher, A. Gesmundo, and J. Alakuijala (2019) Temporal coding in spiking neural networks with alpha synaptic function. ArXiv abs/1907.13223. Cited by: §2.1.
  • [6] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei (2009) ImageNet: A Large-Scale Hierarchical Image Database. In CVPR09, Cited by: Table 1.
  • [7] P. U. Diehl, D. Neil, J. Binas, M. Cook, S. Liu, and M. Pfeiffer (2015-07) Fast-classifying, high-accuracy spiking deep networks through weight and threshold balancing. In 2015 International Joint Conference on Neural Networks (IJCNN), Vol. , pp. 1–8. External Links: Document, ISSN 2161-4407 Cited by: 5th item.
  • [8] P. Diehl and M. Cook (2015) Unsupervised learning of digit recognition using spike-timing-dependent plasticity. Frontiers in Computational NeuroscienceFrontiers in Computational NeurosciencearXivJournal of Applied PhysicsFrontiers in NeuroscienceFrontiers in NeuroscienceNeural NetworksFrontiers in Computational NeuroscienceCoRRFrontiers in NeuroscienceFrontiers in NeuroinformaticsNeural NetworksCoRRFrontiers in Neural CircuitsNeural ComputationFrontiers in NeuroscienceCoRRScienceJournal of NeuroscienceNeural ComputationCoRRNeural ComputationCoRRFrontiers in NeuroscienceFrontiers in NeuroscienceCoRRPLOS Computational BiologySIAM J. on OptimizationCoRRMachine Learning 9, pp. 99. External Links: Link, Document, ISSN 1662-5188 Cited by: §4.3.
  • [9] P. Ferré, F. Mamalet, and S. J. Thorpe (2018) Unsupervised feature learning with winner-takes-all based stdp. 12, pp. 24. External Links: Link, Document, ISSN 1662-5188 Cited by: §2.1.
  • [10] N. Frémaux and W. Gerstner (2016) Neuromodulated spike-timing-dependent plasticity, and theory of three-factor learning rules. 9, pp. 85. External Links: Link, Document, ISSN 1662-5110 Cited by: 1st item.
  • [11] W. Gerstner and W. Kistler (2002) Spiking neuron models: an introduction. Cambridge University Press, New York, NY, USA. External Links: ISBN 0521890799 Cited by: §4.1.
  • [12] P. Goyal, P. Dollár, R. B. Girshick, P. Noordhuis, L. Wesolowski, A. Kyrola, A. Tulloch, Y. Jia, and K. He (2017) Accurate, large minibatch SGD: training imagenet in 1 hour. abs/1706.02677. External Links: Link, 1706.02677 Cited by: §1.
  • [13] H. Hazan, D. J. Saunders, H. Khan, D. Patel, D. T. Sanghavi, H. T. Siegelmann, and R. Kozma (2018) BindsNET: a machine learning-oriented spiking neural networks library in python. 12, pp. 89. External Links: Link, Document, ISSN 1662-5196 Cited by: §1.
  • [14] D. O. Hebb (1949-06-15) The organization of behavior: A neuropsychological theory. Wiley, New York. Note: Hardcover External Links: ISBN 0-8058-4300-0 Cited by: 1st item.
  • [15] D. Huh and T. J. Sejnowski (2018) Gradient descent for spiking neural networks. In Advances in Neural Information Processing Systems 31, S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett (Eds.), pp. 1433–1443. External Links: Link Cited by: 4th item, §2.1.
  • [16] E. Hunsberger and C. Eliasmith (2016) Training spiking deep networks for neuromorphic hardware. abs/1611.05141. External Links: Link, 1611.05141 Cited by: 2nd item, 4th item, §2.1.
  • [17] A. Krizhevsky (2009) Learning multiple layers of features from tiny images. Technical report . Cited by: Table 1.
  • [18] Y. LeCun and C. Cortes (2010) MNIST handwritten digit database. Note: http://yann.lecun.com/exdb/mnist/ External Links: Link Cited by: Table 1.
  • [19] C. Lee, P. Panda, G. Srinivasan, and K. Roy (2018)

    Training deep spiking convolutional neural networks with stdp-based unsupervised pre-training followed by supervised fine-tuning

    .
    Frontiers in Neuroscience 12, pp. 435. External Links: Link, Document, ISSN 1662-453X Cited by: §2.1.
  • [20] R. Linsker (1992) Local synaptic learning rules suffice to maximize mutual information in a linear network. 4 (5), pp. 691–702. External Links: Document, Link, https://doi.org/10.1162/neco.1992.4.5.691 Cited by: 1st item.
  • [21] W. Maass (1996) Lower bounds for the computational power of networks of spiking neurons. 8 (1), pp. 1–40. External Links: Document, Link, https://doi.org/10.1162/neco.1996.8.1.1 Cited by: §1.
  • [22] W. Maass (1997) Networks of spiking neurons: the third generation of neural network models. 10 (9), pp. 1659 – 1671. External Links: ISSN 0893-6080, Document, Link Cited by: §1.
  • [23] A. H. Marblestone, G. Wayne, and K. P. Kording (2016) Toward an integration of deep learning and neuroscience. 10, pp. 94. External Links: Link, Document, ISSN 1662-5188 Cited by: §1.
  • [24] H. Markram, J. Lübke, M. Frotscher, and B. J. Sakmann (1997) Regulation of synaptic efficacy by coincidence of postsynaptic aps and epsps.. 275 5297, pp. 213–5. Cited by: 1st item.
  • [25] D. Masters and C. Luschi (2018) Revisiting small batch training for deep neural networks. abs/1804.07612. External Links: Link, 1804.07612 Cited by: §2.1.
  • [26] E. O. Neftci, H. Mostafa, and F. Zenke (2019) Surrogate gradient learning in spiking neural networks. abs/1901.09948. External Links: Link, 1901.09948 Cited by: §1, 3rd item.
  • [27] P. O’Connor and M. Welling (2016) Deep spiking networks. abs/1602.08323. External Links: Link, 1602.08323 Cited by: 2nd item, §2.1.
  • [28] A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison, L. Antiga, and A. Lerer (2017) Automatic differentiation in PyTorch. In NIPS Autodiff Workshop, Cited by: §1.
  • [29] D. Patel, H. Hazan, D. J. Saunders, H. T. Siegelmann, and R. Kozma (2019) Improved robustness of reinforcement learning policies upon conversion to spiking neuronal network platforms applied to ATARI games. abs/1903.11012. External Links: Link, 1903.11012 Cited by: 5th item.
  • [30] M. Pfeiffer and T. Pfeil (2018) Deep learning with spiking neurons: opportunities and challenges. 12, pp. 774. External Links: Link, Document, ISSN 1662-453X Cited by: §1, §6.
  • [31] T. A. Poggio, K. Kawaguchi, Q. Liao, B. Miranda, L. Rosasco, X. Boix, J. Hidary, and H. Mhaskar (2018) Theory of deep learning III: explaining the non-overfitting puzzle. abs/1801.00173. External Links: Link, 1801.00173 Cited by: 3rd item.
  • [32] D. Rasmussen (2018) NengoDL: combining deep learning and neuromorphic modelling methods. 1805.11144, pp. 1–22. External Links: Link Cited by: §2.1.
  • [33] B. Rueckauer and S. Liu (2018-05) Conversion of analog to spiking neural networks using sparse temporal coding. In 2018 IEEE International Symposium on Circuits and Systems (ISCAS), Vol. , pp. 1–5. External Links: Document, ISSN 2379-447X Cited by: 5th item, Table 1, §4.2.
  • [34] B. Rueckauer, I. Lungu, Y. Hu, M. Pfeiffer, and S. Liu (2017) Conversion of continuous-valued deep networks to efficient event-driven networks for image classification. 11, pp. 682. External Links: Link, Document, ISSN 1662-453X Cited by: 5th item.
  • [35] A. Sengupta, Y. Ye, R. Wang, C. Liu, and K. Roy (2019) Going deeper in spiking neural networks: vgg and residual architectures. 13, pp. 95. External Links: Link, Document, ISSN 1662-453X Cited by: 5th item.
  • [36] J. Seok (2018) Tutorial: neuromorphic spiking neural networks for temporal learning. 124 (15), pp. 152002. External Links: Document, Link, https://doi.org/10.1063/1.5042243 Cited by: §1.
  • [37] S. B. Shrestha and G. Orchard (2018) SLAYER: spike layer error reassignment in time. In Advances in Neural Information Processing Systems 31, S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett (Eds.), pp. 1412–1421. External Links: Link Cited by: 3rd item.
  • [38] E. Stromatias, M. Soto, T. Serrano-Gotarredona, and B. Linares-Barranco (2017) An event-driven classifier for spiking neural networks fed with synthetic or dynamic vision sensor data. 11, pp. 350. External Links: Link, Document, ISSN 1662-453X Cited by: 2nd item.
  • [39] A. Tavanaei, M. Ghodrati, S. R. Kheradpisheh, T. Masquelier, and A. Maida (2019) Deep learning in spiking neural networks. 111, pp. 47 – 63. External Links: ISSN 0893-6080, Document, Link Cited by: §1.
  • [40] Y. Wu, L. Deng, G. Li, J. Zhu, and L. Shi (2018)

    Spatio-temporal backpropagation for training high-performance spiking neural networks

    .
    12, pp. 331. External Links: Link, Document, ISSN 1662-453X Cited by: §1, 3rd item.
  • [41] Y. Wu, L. Deng, G. Li, J. Zhu, Y. Xie, and L. Shi (2019-Jul.) Direct training for spiking neural networks: faster, larger, better.

    Proceedings of the AAAI Conference on Artificial Intelligence

    33 (01), pp. 1311–1318.
    External Links: Link, Document Cited by: §1.
  • [42] S. Zappacosta, F. Mannella, M. Mirolli, and G. Baldassarre (2018-08) General differential hebbian learning: capturing temporal relations between events in neural networks and the brain. 14 (8), pp. 1–30. External Links: Link, Document Cited by: 1st item.
  • [43] F. Zenke and S. Ganguli (2018)

    SuperSpike: supervised learning in multilayer spiking neural networks

    .
    30 (6), pp. 1514–1541. Note: PMID: 29652587 External Links: Document, Link, https://doi.org/10.1162/neco_a_01086 Cited by: 3rd item, §2.1.
  • [44] L. Zhang, S. Zhou, T. Zhi, Z. Du, and Y. Chen (2019-Jul.) TDSNN: from deep neural networks to deep spike neural networks with temporal-coding. Proceedings of the AAAI Conference on Artificial Intelligence 33 (01), pp. 1319–1326. External Links: Link, Document Cited by: 5th item.