The function and behavior of Spiking Neural Networks (SNN) are derived from its inspiration, i.e. the brain, which is capable of performing numerous cognitive tasks with minimal energy requirements. The potential of SNN has drawn various research interests including emerging device, algorithm and applications [30, 23, 17, 29]
. In general, the brain’s capability comes from the complex dynamics of the networks of spiking neurons and the plastic synapses connecting them. These dynamics can capture complex spatial temporal features of input encoded as sparse temporal spiking activity. Despite the biological inspiration, majority of the existing models of SNNs are unable to replicate such dynamics to encode, learn and decode temporal information.
The limitations of existing SNNs are multifold. Firstly, most SNN models and training algorithms consider only the statistics of spike activities. A numerical value is represented by spike counts in a time window . Though, this type of SNN has demonstrated state-of-art performance in various tasks, it suffers from high spike activities 
. Thus, it cannot fully benefit from event-driven computation. Secondly, directly adapting backpropagation is not feasible because spiking neuron’s output is a Dirac delta function. One approach to address the problem is to train an artificial neuron network (ANN) such as multi-layer perceptron (MLP) and map the weights to SNN. However, it suffers form accuracy degradation, additional fine-tuning of weights and thresholds is required to minimize the performance penalty. Recently gradient surrogate is proposed to approximate the gradient of the spiking function, enabling backpropagation [15, 6, 24, 26, 25, 8].  derived a cumulative error function as gradient surrogate.  derived a simplified model from Leaky Integrate and Fire neuron (LIF), and proposed four functions as gradient surrogates. Other approaches include replacing hard threshold function by a differentiable soft spike  . However, it compromises SNN’s most distinct feature, binary spike.
While most SNN models and training algorithm use spike counts to resemble numerical value, it is observed that in biological neural networks, temporal structure of spike train also conveys information . Two spike trains of same spike rates can have distinct temporal patterns, hence the represented information is different. Such temporal encoding can efficiently encode information using extremely sparse spikes-events . There are some existing works to train neurons to detect temporal spike patterns. Tempotron  trains individual neuron to perform binary classifications for different spatial temporal input spike patterns. Neuron generate at least one spike for positive pattern, and remain inactive for other patterns. Based on Tempotron,  proposed an algorithm to adjust synaptic weights such that neuron can generate desired number of spikes given a specific input pattern. SPAN  trains an individual neuron to associate a spatial temporal input pattern with a specific output spike pattern. However, these works aim at training individual neurons, cannot be extended to multiple layers, therefore the performance is limited. There are also recent works utilize backpropagation through time (BPTT) to address the temporal dependency problems.  proposed a training rule to reassign errors in time. 
proposed a novel loss function and derived an iterative model from Tempotron. Based on the iterative model, network can be unrolled hence BPTT is possible. captures the temporal dependency on membrane potentials and use membrane potential as objective function to learn temporal patterns.
Existing works have achieved comparable performance with DNN in vision tasks such as static image or event-based data classification, however few SNN models address the time series classification tasks. The first challenge is the limitation of rate coding, since it treats spikes statistically, spike coding cannot represent temporal information. Though it is possible to flatten the time series into a 1-D array and then represent it by rate coding, this increases the input size. In addition, for real time applications, flattening input requires buffering, which increases computation latency. Secondly, unlike images, such as MNIST images, where all value’s range, precision and scale are identical, multivariate time series may be collected from different sensors, therefore their precision and range may vary. Rate coding has to guarantee the precision for the most high-resolution input. Such ”design for worst case” coding scheme lacks flexibility and hence is not efficient. New coding methods to represent time series are required to exploit the potential of SNNs.
In this work, our contributions are summarized as follows:
We present a coding method that can convert time series into sparse spatial temporal spike patterns.
We derive an iterative SNN model from Spike Response Model, such that the Backpropagation Through Time is possible. An event-based updating algorithm is also proposed to reduce computation overhead for inference.
We formulate a backpropagation rule for the iterative SNN model and propose a training algorithm to train the model on spatial temporal patterns.
We evaluate the proposed method on multiple datasets, and achieve comparable accuracy with DNN. To the best of our knowledge, this is the first work applying SNN for multivariate time series classification.
Ii SNN Model
Without loss of generality, we adopt the a widely used Leaky Integrate and Fire (LIF) neuron defined by Spike Response Model [10, 12]. Each input spike induces a charge in the neuron’s membrane potential, which is called a postsynaptic potential (PSP):
where denotes the arrival time of input spike. is synapse kernel. Neuron accumulate all input PSPs, such that the membrane potential is defined as:
where is the number of input synapse. is the arrival time of spike at input synapse. is a time constant of neuron. is the weight associated with each input synapse. is the time when the neuron generates an output spike and the rightmost term can be interpreted as a negative voltage applied to the neuron itself such that the membrane potential is decreased by a factor of the threshold voltage . This serves as the reset mechanism at the time of spike. Thus, the neuron’s potential is the summation of all weighted input PSP plus the negative voltage given by rightmost term . The neuron model with inputs is illustrated in figure 1. PSP kernel is defined as:
At time , PSP and membrane potential are determined by all previous inputs. However, it is not feasible to directly implement the SNN model defined by Equation 2. At any given time , has to be computed by recursively convolving input spike trains with , thus, incurring significant computation overhead. To address this issue, an incremental way to update the PSP can be derived from the SRM model in discrete time domain.
More formally, input spike train can be defined as a sequence of time shifted Dirac Delta functions:
where denotes a spike received at time , otherwise . Similarly, output spike train can defined as:
where satisfies . To derive the incremental model, We define , , such that the can be expressed as the combination of and :
and can be computed incrementally:
Similarly, we can compute reset voltage :
Such that the SNN defined in Equation 2 can be equivalently expressed as:
where indexes , , denote layer index, neuron index and input index respectively. denotes the number of neurons in layer. is input current, is reset voltage, and is neuron output. , , are three decay factors. More specifically, denotes the encoding layer, which will be discussed in section III, is the number of layers in the network and denotes output layer. is a Heaviside step function:
In the above model, the temporal dependency can be clearly seen in Equation 9a - 9f. At each time , the PSP, membrane potential and output can be computed based on time , hence by unfolding the network, Backpropagation Through Time (BPTT) can be used to train the network. Note that the gradient of is a Dirac Delta function, therefore backpropagation cannot be directly applied. Its approximation will be discussed in IV.
Ii-a Event-driven Inference
The above model provides an explicit approach to update SNN’s states based on step-wise computation, and is suitable for training. In inference, the model can be simulated in an event-driven manner, i.e. computation is only necessary when there is a spike event, hence significantly reducing the computation overhead.
Suppose at time , the value of or is known, without input spike, after unit time later, i.e. at time , the and can be computed as:
When there is an input spike at time . There is an instantaneous unit charge on and :
Similar update rule applies for :
The event-driven computation algorithm is shown in Algorithm 1. By tracking the elapsed time , computation is only necessary when there is an input or output spike. In addition, the kernel decays over time, and becomes effectively 0 after a period. Therefore the decay factor for different can be pre-computed and stored in a look-up table, so that the expensive exponential function is avoided.
Iii Spatial Temporal Population Encoding
Rate coding represents a numerical value by the activity of an individual neuron that fires at a particular rate. For example, in vision tasks, to encode one pixel, a spike train’s spike count in a given time window is proportional to the pixel value. There are several drawbacks of rate coding. 1) the precision is limited because the value represented by rate coding is quantized by bin size . Though higher precision can be obtained by increasing , the computational latency increases as well; 2) it is unable to represent temporal information as it treats spike activity statistically. Time series have to be flattened and then converted to spike trains. In real time scenarios, it requires data stream to be buffered, which causes additional latency; 3) Individual neuron is too noisy due to stochastic nature, thus it introduces additional noise; 4) it causes high spiking activity, as larger value has to be represented by more spikes, which deprives the SNN energy efficiency. 5) It is incapable of representing negative values, which are common in sensor inputs.
To address the above issues, we employ a coding method suitable for encoding time series by combining population coding and temporal coding 
. In population coding the information is represented by the activity of a group of neurons. Inside a population, each neuron has its favorable input, i.e. each neuron responds to a particular input and remains relatively inactive for other inputs. In temporal coding, the spike train patterns also convey information.
We utilize a population of Current-based Integrate and Fire(CUBA) neurons as encoder. A CUBA neuron is defined as a hybrid system :
where is the external time-varying input current, is the membrane time constant, which determines the decay speed of membrane potential, is the neuron membrane potential, is the gain. Neuron accumulates the input current and updates the membrane potential continuously. When the exceeds , a reset is triggered, membrane potential is forced to 0.
In practice, CUBA neuron model is simulated in discrete time, is evaluated on a time grid and the interval of the grid is . is also sampled at each time grid. Such that the discrete version of the model represented by Equation 14 is:
With this coding scheme, a univariate time series, for example sensor data is treated as input current and connected to a population of encoder neurons. Each neuron may have different time constant and gain , so that each encoder responds to the input differently. In addition, by setting to negative, neuron can also respond to negative input, which overcomes the drawback of rate coding. We utilize Neural Engineering Framework (NEF) to pre-train the encoder. Using this approach, time varying signal is converted into time varying spike patterns. For multivariate time series of channels, each channel can be encoded using the above approach. Unlike vision tasks in which all input dimensions have identical resolution and range, multivariate time series maybe collected from different sensors, therefore the resolution, precision and range may vary. By coding method, the population size and tuning curve can be adjusted to provide just enough precision for all the channels [27, 7].
Iv Training Algorithm
For the classification task, the neuron in the output layer that fires most frequently represents the result, we use cross-entropy loss defined as:
is the probability of each class calculated by softmax,given by:
where is the label, is number of layers, denotes output of last layer, and is the neuron number of last layer.
is given by:
At last layer , can be computed as:
is computed as:
For hidden layer , can be computed recursively from output layer and time to input layer and time 0:
Heaviside step function is non-differentiable. We employ gradient surrogate  to address this issue. In forward path, the spike generation mechanism remains unchanged, while in the backward path, the derivative of
is replaced by the derivative of a smooth function. We use a sigmoid function proposed by as the gradient surrogate in the backward path, such that the gradient of is approximated as:
Based on above equations, the gradient of weight can be computed as:
The proposed network model and algorithm are implemented in PyTorch. We demonstrate the effectiveness of our work in two experiments. In the first experiment, we compared the coding efficiency of rate coding and temporal population coding in terms of spike rate and input size. In second experiments, the proposed network and algorithm is evaluated on various multivariate time series classification tasks.
V-a Coding Efficiency
First, we study the efficiency of the proposed coding method by utilizing the Articulary Word Recognition dataset collected by UEA & UCR Time Series repository  as an example. This dataset consists of multivariate sensor data recorded by Electromagnetic Articulograph (EMA), which is a device to track the motion of speakers’ tongue and lips. Each sample contains 9 variates of length 144. An example of this dataset is shown in figure 4, each line represents a time varying input. We use both rate coding and temporal population coding to convert the time series to spikes. The spike patterns are shown in figure 5. Each dot in figure 4(a) represents a spike, and in figure 4(b) a spike is represented by a vertical line. We use an input window of 300. Due to the incapability of rate coding to represent temporal information, the time series have to be flattened, resulting in 1296 spike trains. For clarity, only the first 100 spike trains are shown in figure. In temporal coding, for each variate, we use 5 neurons to encode. It is clearly seen that the temporal population coding is sparser. In addition, it is encoding the input with 45 spike trains, which is significantly smaller than the number of spike trains obtained by rate coding. This is beneficial to reduce the SNN model size.
We tested our coding method with rate coding on four multivariate datasets, details including average spike count, spike rate, input size are shown in table I. Temporal in table I refers to temporal population coding. The spike rates are significantly lower than rate-based coding. As can be seen in last column, the input size of our coding method is also significant less. Particularly for long time series, such as Atrial Fibrillation dataset. It consists of two variates, and the length is 640, therefore flattening operation resulting a large number of inputs. Buffering such long time series also causes significant latency in real time applications. While temporal population coding can convert input on the fly, not only input size is reduced, buffering is also no longer required.
V-B Computation Overhead
To evaluate the computation overhead of proposed coding method and event driven inference algorithm, we build a SNN to classify Articulary Word Recognition dataset. A vanilla 2 layer stacked LSTM and RNN of unit size 300 are also built as reference. The network structure, number of network parameters, and accuracy are shown in table II. Our network achieved comparable accuracy with 11 % number of parameters of LSTM. In addition, the length of this time series is 144, LSTM and RNN have to perform computation step by step, this introduces significant amount of operations. Our model can benefit from the event driven nature, computations are only necessary when there are spike events. Given the average number of input spike is 518.7, the computation overhead is minimal compared with LSTM/RNN.
|Model||Network structure||Parameter number||Accuracy|
V-C Time Series Classification
Our algorithm is evaluated on 7 multivariate time series datasets. We build a network of size 500-500-500-X, X indicates the size of last layer, which varies according to different dataset class numbers. Adam optimizer is used and the learning rate is 0.0001. The result is shown in table III. ED and DTW refer to 1-Nearest Neighbor with Euclidean Distance and Dynamic Time Warping respectively . TapNet is a DNN-based approach for time series classification . No accuracy in SNN domain is listed, to our best knowledge, there is no previous work that comprehensively focusing on time series classification with SNN.
In all the 7 datasets, our method outperforms the 1-Nearest Neighbor classifier, which is a standard classifier for time series classification. In Spoken Arabic Digits dataset and Racket Sport dataset, our method achieves higher accuracy than DNN based approach. In Articulary Word Recognition dataset, Heart Beat dataset, thought TapNet achieves better accuracy, however the advantage is insignificant: 0.987 v.s. 0.98, 0.751 v.s. 0.72.
In this work, we presented an iterative SNN model and training algorithm for spatial temporal spike pattern classification. A coding method to convert continuous time series to discrete spikes is also proposed. Our coding method is able to represent information by sparse spike patterns, such that the computation overhead can be significantly reduced. We evaluate our algorithm and coding method on various multivariate time series dataset, and outperform the standard 1-Nearest Neighbor classifiers and also show competitive performance with DNN based approaches.
-  (2018) The uea multivariate time series classification archive, 2018. arXiv preprint arXiv:1811.00075. Cited by: §V-A, §V-C, TABLE III.
-  (2007) Simulation of networks of spiking neurons: a review of tools and strategies. Journal of computational neuroscience 23 (3), pp. 349–398. Cited by: §III.
-  (2007) Temporal precision in the neural code and the timescales of natural vision. Nature 449 (7158), pp. 92. Cited by: §I.
-  (2015) Fast-classifying, high-accuracy spiking deep networks through weight and threshold balancing. In 2015 International Joint Conference on Neural Networks (IJCNN), pp. 1–8. Cited by: §I.
-  (2004) Neural engineering: computation, representation, and dynamics in neurobiological systems. MIT press. Cited by: §III.
-  (2015) Backpropagation for energy-efficient neuromorphic computing. In Advances in Neural Information Processing Systems, pp. 1117–1125. Cited by: §I.
-  (2019) An event-driven neuromorphic system with biologically plausible temporal dynamics. In 38th IEEE/ACM International Conference on Computer-Aided Design, ICCAD 2019, pp. 8942083. Cited by: §III, §III.
-  (2020) Exploiting neuron and synapse filter dynamics in spatial temporal learning of deep spiking neural network. arXiv preprint arXiv:2003.02944. Cited by: §I.
-  (2019) A general framework to map neural networks onto neuromorphic processor. In 20th International Symposium on Quality Electronic Design (ISQED), pp. 20–25. Cited by: §I.
-  (2002) Spiking neuron models: single neurons, populations, plasticity. Cambridge university press. Cited by: §II, §II.
STCA: spatio-temporal credit assignment with delayed feedback in deep spiking neural networks.
Proceedings of the 28th International Joint Conference on Artificial Intelligence, pp. 1366–1372. Cited by: §I.
-  (2006) The tempotron: a neuron that learns spike timing–based decisions. Nature neuroscience 9 (3), pp. 420. Cited by: §I, §II.
-  (2016) Spiking neurons can discover predictive features by aggregate-label learning. Science 351 (6277), pp. aab4113. Cited by: §I, §II.
-  (2018) Gradient descent for spiking neural networks. In Advances in Neural Information Processing Systems, pp. 1433–1443. Cited by: §I.
-  (2016) Training deep spiking neural networks using backpropagation. Frontiers in neuroscience 10, pp. 508. Cited by: §I.
-  (2017) Mt-spike: a multilayer time-based spiking neuromorphic architecture with temporal error backpropagation. In Proceedings of the 36th International Conference on Computer-Aided Design, pp. 450–457. Cited by: §I, §I.
-  (2020) Tiny but accurate: a pruned, quantized and optimized memristor crossbar framework for ultra efficient dnn implementation. In 2020 25th Asia and South Pacific Design Automation Conference (ASP-DAC), Vol. , pp. 301–306. Cited by: §I.
-  (2014) A million spiking-neuron integrated circuit with a scalable communication network and interface. Science 345 (6197), pp. 668–673. Cited by: §I.
-  (2012) Span: spike pattern association neuron for learning spatio-temporal spike patterns. International journal of neural systems 22 (04), pp. 1250012. Cited by: §I.
-  (2019) Surrogate gradient learning in spiking neural networks. arXiv preprint arXiv:1901.09948. Cited by: §I, §IV.
-  (1999) Exact digital simulation of time-invariant linear systems with applications to neuronal modeling. Biological cybernetics 81 (5-6), pp. 381–402. Cited by: §III.
-  (2017) Multivariate time series classification with weasel+ muse. arXiv preprint arXiv:1711.11343. Cited by: TABLE III.
A spike-based long short-term memory on a neurosynaptic processor. In 2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD), pp. 631–637. Cited by: §I.
-  (2019) Approximating back-propagation for a biologically plausible local learning rule in spiking neural networks. In Proceedings of the International Conference on Neuromorphic Systems, pp. 10. Cited by: §I.
-  (2018) SLAYER: spike layer error reassignment in time. In Advances in Neural Information Processing Systems, pp. 1412–1421. Cited by: §I, §I.
-  (2018) Spatio-temporal backpropagation for training high-performance spiking neural networks. Frontiers in neuroscience 12. Cited by: §I, §IV.
-  (2015) The influence of population size, noise strength and behavioral task on best-encoded stimulus for neurons with unimodal or monotonic tuning curves. Frontiers in computational neuroscience 9, pp. 18. Cited by: §III.
-  (2018) Spike timing or rate? neurons learn to make decisions for both through threshold-driven plasticity. IEEE transactions on cybernetics 49 (6), pp. 2178–2189. Cited by: §II.
-  (2019) An ultra-efficient memristor-based dnn framework with structured weight pruning and quantization using admm. In 2019 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED), Vol. , pp. 1–6. Cited by: §I.
-  (2017) Memristor crossbar-based ultra-efficient next-generation baseband processors. In 2017 IEEE 60th International Midwest Symposium on Circuits and Systems (MWSCAS), pp. 1121–1124. Cited by: §I.
Superspike: supervised learning in multilayer spiking neural networks. Neural computation 30 (6), pp. 1514–1541. Cited by: §I.
-  (2020) TapNet: multivariate time series classification with attentional prototypical network. Cited by: §V-C, TABLE III.