ihmehimmeli
None
view repo
The timing of individual neuronal spikes is essential for biological brains to make fast responses to sensory stimuli. However, conventional artificial neural networks lack the intrinsic temporal coding ability present in biological networks. We propose a spiking neural network model that encodes information in the relative timing of individual neuron spikes. In classification tasks, the output of the network is indicated by the first neuron to spike in the output layer. This temporal coding scheme allows the supervised training of the network with backpropagation, using locally exact derivatives of the postsynaptic spike times with respect to presynaptic spike times. The network operates using a biologically-plausible alpha synaptic transfer function. Additionally, we use trainable synchronisation pulses that provide bias, add flexibility during training and exploit the decay part of the alpha function. We show that such networks can be trained successfully on noisy Boolean logic tasks and on the MNIST dataset encoded in time. The results show that the spiking neural network outperforms comparable spiking models on MNIST and achieves similar quality to fully connected conventional networks with the same architecture. We also find that the spiking network spontaneously discovers two operating regimes, mirroring the accuracy-speed trade-off observed in human decision-making: a slow regime, where a decision is taken after all hidden neurons have spiked and the accuracy is very high, and a fast regime, where a decision is taken very fast but the accuracy is lower. These results demonstrate the computational power of spiking networks with biological characteristics that encode information in the timing of individual neurons. By studying temporal coding in spiking networks, we aim to create building blocks towards energy-efficient and more complex biologically-inspired neural architectures.
READ FULL TEXT VIEW PDFNone
More biologically plausible computation
Inspired by the biology of the nervous system, artificial neural networks have recently been used to achieve resounding success in solving many real-world problems, sometimes surpassing the performance of humans [1, 2, 3]. However, conventional artificial networks lack the intrinsic ability to encode information using temporal coding schemes in the same way as biological brains do. In the nervous system, the timing of individual neuronal spikes is fundamental during behaviours requiring rapid encoding and processing of perceptual stimuli. The relative firing time of neurons has been shown to encode stimulus information in the visual [4], auditory [5] and tactile [6] cortices, but also in higher-level neural structures like the thalamus [7] and hippocampus [8]. Moreover, by observing in-vivo the low latency of responses to visual stimuli in the temporal cortex, it can be concluded that the response is produced by individual spikes occurring at every synaptic stage across the visual areas of the brain [9].
Architectures for atemporal networks that emulate the processing of information in a temporal fashion using memory mechanisms have been proposed [10]. However, these do not have the advantages conferred by encoding information in temporal domain. One disadvantage is that every neuron needs to wait for the activation of all the neurons in the previous layer in order to produce an answer. Compared to biological brains, this is inefficient in terms of energy consumption. Moreover, information in the real world almost always has a temporal dimension. Therefore, additional processing is needed for encoding it in an atemporal network, potentially losing information in the process.
In contrast with the field of conventional artificial neural networks, the field of artificial spiking networks that encode information in time has not been thoroughly explored. So far, spiking networks have only achieved modest results. The main difficulty in advancing the field of spiking networks has been their training process, as the techniques usually used for supervised learning in atemporal networks cannot be directly applied when information is encoded in a temporal sequence of asynchronous events. Atemporal networks owe much of their success to the development of the backpropagation algorithm
[11, 12]. Backpropagation exploits the existence of an end-to-end differentiable relationship between a loss function and the network inputs and outputs, which can be expressed in terms of local derivatives at all hidden network parameters. One can thus find local updates that minimise the loss function and apply them as incremental updates to train the network. On the other hand, in spiking networks, which encode information in sequences of binary spike events, differentiable relationships do not naturally exist.
The problem of training in spiking networks has been addressed in several ways. Many spiking models have adopted a rate coding scheme. In contrast with temporal coding of information, which is based on individual spike timing, rate coding averages over multiple spikes. Approximate gradients have been proposed in order to allow gradient descent optimization in such networks [13, 14, 15]. Various other learning rules have been proposed with the objective of producing custom spiking patterns or spike distributions [16, 17, 18, 19, 20, 21, 22, 23, 24, 25]. Alternatively, atemporal deep neural networks can be trained and then converted to spiking networks using rate-coding schemes [26, 27, 28, 29]
. Spiking networks can also be trained using methods such as evolutionary algorithms
[30][31]. However, such rate-coding schemes may still be redundant considering the evidence that biological systems can react to stimuli on the basis of single spikes [9, 4, 6].In contrast with rate-coding models, here we are interested in the temporal encoding of information into single spikes. Crucially, this change in coding scheme allows us to shift the differentiable relationships into temporal domain. To find a backpropagation scheme for temporal coding, we need a differentiable relationship of the time of a postsynaptic spike with respect to the weights and times of the presynaptic spikes. Encoding information in temporal domain makes this possible.
A similar idea has already been proposed by the SpikeProp model [32]. In this model, a spiking network was able to solve a supervised XOR problem encoded in temporal domain using neurons that generate single spikes. The model was able to implement backpropagation by approximating the differentiable relationship for small learning rates. It was also necessary to explicitly encode inhibitory and excitatory neurons in order for training to converge. Various extensions to SpikeProp have been proposed [33, 34, 35, 36]. Recently, Mostafa [37] trained a spiking network to solve temporally encoded XOR and MNIST problems by deriving locally exact gradients with non-leaky spiking neurons.
One important choice when modelling a spiking neural network is that of the synaptic transfer function, which defines the dynamics of the neuron membrane potential in response to a presynaptic spike. From a machine learning perspective, this is equivalent to the activation function in conventional networks, but, importantly, it operates in time. Historically, the Hodgkin-Huxley model
[38] was the first to offer a detailed description of the process of neuronal spiking using differential equations describing the dynamics of sodium and potassium ionic channels. In practice, however, this model is often needlessly complex. Instead, many applications use the reduced Spike Response Model (SRM) [39]. In this model, the membrane potential is described by the integration of kernels reflecting the incoming currents at the synapse. The neuron spikes when its membrane potential reaches a threshold, then enters a refractory period. Provided that an appropriate kernel function is used
[40], the SRM can approximate the dynamics of the Hodgkin-Huxley model to a satisfactory extent, while demanding less computational power and being easier to analyse. Integrate-and-fire neurons and their leaky counterparts are examples of SRM. A commonly used SRM impulse response function is exponential decay, .A more biologically-realistic but less explored alternative is the alpha function, of the form , produced by the integration of exponentially decaying kernels. In contrast with the single exponential function, the alpha function gradually rises before slowly decaying (Figure 0(a)), which allows more intricate interactions between presynaptic inputs. The alpha function has been proposed [41] as a close match to the shape of postsynaptic potentials measured in vitro [42, 43]. It hence provides a biologically-plausible model for exploring the problem-solving abilities of spiking networks with temporal coding schemes. As we show below, it is possible to derive exact gradients with respect to spike times using this model.
With these considerations in mind, here we propose a spiking network model that uses the alpha function for synaptic transfer and encodes information in relative spike times. The network is fully trained in temporal domain using exact gradients over domains where relative spiking order is preserved. To our knowledge, this model has not been previously used for supervised learning with backpropagation. We explore the capacity of this model to learn to solve standard benchmark problems, such as Boolean logic gates and MNIST, encoded in individual spike times. To facilitate transformations of the class boundaries, we use synchronisation pulses, which are neurons that send spikes at input-independent, learned times. The model is easily able to solve temporally-encoded Boolean logic and other benchmark problems. We perform a search for the best set of hyperparameters for this model using evolutionary-neural hybrid agents
[44] in order to solve temporally-encoded MNIST. The result improves the state-of-the-art accuracy on non-convolutional spiking networks and is comparable to the performance of atemporal non-convolutional networks. Furthermore, we analyse the behaviour of the spiking network during training and show that it spontaneously displays two operational regimes that reflect a trade-off between speed and accuracy: a slow regime that is slow but very accurate, and a fast regime that is slightly less accurate but makes decisions much faster.With this work, we also aim to increase the familiarity of the research community with the concept of temporal coding in spiking neural networks. Given that biological brains have evolved for millions of years to use temporal coding mechanisms in order to process information efficiently, we expect that an equivalent development will also be a key step in advancing artificial intelligence in the future. The current study is an early building block in this direction that invites further exploration of more complex recurrent architectures, spike-based state machines and interfacing between artificial and biological spiking neural networks.
In this model, information is encoded in the relative timing of individual spikes. The input features are encoded in temporal domain as the spike times of individual input neurons, with each neuron corresponding to a distinct feature. More salient information about a feature is encoded as an earlier spike in the corresponding neuron. Information propagates through the network in a temporal fashion. Each hidden and output neuron spikes when its membrane potential rises above a fixed threshold. Similarly to the input layer, the output layer of the network encodes a result in the relative timing of output spikes. In other words, the computational process consists of producing a temporal sequence of spikes across the network in a particular order, with the result encoded in the ordering of spikes in the output layer.
We use this model to solve standard classification problems. Given a classification problem with inputs and possible classes, the inputs are encoded as the spike times of individual neurons in the input layer and the result is encoded as the index of the neuron that spikes first among the neurons in the output layer. An example drawn from class
is classified correctly if and only if the
output neuron is the first to spike. An earlier output spike can reflect more confidence of the network in classifying a particular example, as it implies more synaptic efficiency or a smaller number of presynaptic spikes. In a biological setting, the winning neuron could suppress the activity of neighbouring neurons through lateral inhibition [45], while in a machine learning setting the spike times of the non-winning neurons can be useful in indicating alternatives predictions of the network. The learning process aims to change the synaptic weights and thus the spike timings in such a way that the target order of spikes is produced.The neuronal membrane dynamics are governed by a SRM model with alpha function of synaptic transfer [39, 40, 41]. This is obtained by integrating over time the incoming exponential synaptic current kernels of the form , where is the decay constant. The potential of the neuronal membrane in response to a single incoming spike is then of the form . This function has a gradual rise and a slow decay, peaking at . Every synaptic connection has an efficiency, or a weight. The decay rate has the effect of scaling the induced potential in amplitude and time, while the weight of the synapse has the effect of scaling the amplitude only (Figure 0(a)).
Given a set of presynaptic inputs arriving at times with weights and assuming the postsynaptic neuron has not yet spiked, its membrane potential at time is given by:
(1) |
The neuron spikes when the membrane potential crosses the firing threshold (Figure 0(b)). To compute the spike time of a neuron, we determine the minimal subset of all presynaptic inputs with which cause the membrane potential to reach the threshold while rising:
(2) |
Note that the set may not simply be computed as the earliest subset of presynaptic inputs that cause the membrane voltage to cross . If a subset of inputs causes the membrane to cross at time , any additional inputs that occur between the maximum and must be considered, and must be recomputed.
Eq. 2 has two potential solutions — one on the rising part of the function and one on the decaying part. If a solution exists (in other words, if the neuron spikes), then its spike time is the earlier of the two solutions.
For a set of inputs that cause the neuron to spike, we denote and and compute the spike time by solving Eq. 2 using the Lambert function [46, 47]:
(3) |
As we are interested in the earlier solution of this equation, we employ the main branch of the Lambert function. The Lambert function is only real-valued when its argument is larger than or equal to . It can be proven that this is always the case when Eq. 2 has a solution, by expanding , where is the peak of the membrane potential function corresponding to the presynaptic set of inputs .
In Appendix A, we show that the proposed model is powerful enough to represent any sufficiently well-behaved function.
The spiking network learns to solve problems whose inputs and solution are encoded in the times of individual input and output spikes. Therefore, the goal is to adjust the output spike times so that their relative order is correct. Given a classification problem with
classes, the neuron corresponding to the correct label should be the earliest to spike. We therefore choose a loss function that minimises the spike time of the target neuron and maximises the spike time of the non-target neurons. Note that this is the opposite of the usual classification setting involving probabilities, where the value corresponding to the correct class is maximised and those corresponding to incorrect classes are minimised. To achieve this, we use the softmax function on the
negative values of the spike times (which are always positive) in the output layer:(4) |
We then employ the cross-entropy loss in the usual form: , where
is an element of the one-hot encoded target vector of output spike times. Taking the negative values of the spike times ensures that minimising the cross-entropy loss minimises the spike time of the correct label and maximises the rest.
We minimise the cross-entropy loss by changing the value of the weights across the network. This has the effect of delaying or advancing spike times across the network. For any presynaptic spike arriving at time with weight , we denote and compute the exact derivative of the postsynaptic spike time with respect to any presynaptic spike time and its weight as:
(5) |
(6) |
Note that as the postsynaptic spike time moves earlier or later in time, if a presynaptic spike no longer matters (because it arrives too late) or a new presynaptic spike occurs early enough to matter, the loss function landscape also changes. Therefore, the differential relationships above only hold in local neighbourhoods where does not change, i.e. for small learning steps. Making the loss function exact would require considering all possible changes to , which is computationally infeasible. Nevertheless, we find that in practice it is possible to optimize the loss function using the above gradients only, with the network being able to recover from the reordering of spikes.
To be able to adjust the class boundaries in temporal domain, a temporal form of bias is also needed to adjust spike times, i.e. to delay or advance them in time. In this model, we introduce synchronisation pulses acting like additional inputs across every layer or the network, in order to provide temporal bias across the network. These can be thought of as similar to internally-generated rhythmic activity in biological networks, such as alpha waves in the visual cortex [48] or theta and gamma waves in the hippocampus [49].
A set of pulses can be connected to all neurons in the network, to neurons within individual layers, or to individual neurons. A per-neuron bias is biologically implausible and more computationally demanding, hence in this model we use either a single set of pulses per network, to solve easier problems, or a set of pulses per layer, to solve more difficult problems. All pulses are fully connected to either all non-input neurons in the network or to all neurons of the non-input layer they are assigned to.
Each pulse spikes at a predefined and trainable time, providing a reference spike delay. Each set of pulses is initialised to spike at times evenly distributed in the interval . Subsequently, the spike time of each pulse is learned using Eq. 5, while the weights between pulses and neurons are trained using Eq. 6, in the same way as all other weights in the network.
We trained fully connected feedforward networks with topology n_hidden (a vector of hidden layer sizes). We used Adam optimization [50] with mini-batches of size batch_size
to minimise the cross-entropy loss. The Adam optimizer performed better than stochastic gradient descent. We used different learning rates for the pulse spike time (
learning_rate_pulses) and the weights of both pulse and non-pulse neurons (learning_rate). We used a fixed firing threshold (fire_threshold) and decay constant (decay_constant).Network weight initialisation is crucial for the subsequent training of the network. In a spiking network, it is important that the initial weights are large enough to cause at least some of the neurons to spike; in absence of spike events, there will be no gradient to use for learning. We therefore used a modified form of Glorot initialization [51]
where the weights are drawn from a normal distribution with standard deviation
(as in the original scheme) and custom mean . If the multiplication factor of the mean is , this is the same is the original Glorot initialization scheme. We set different multiplication factors for pulse weights (pulse_initialisation_multiplier), and non-pulse weights (nonpulse_initialisation_multiplier). This allows the two types of neurons to pre-specialise into inhibitory and excitatory roles. In biological brains, internal oscillations are thought to be generated through inhibitory activities that regulate the excitatory effects of incoming stimuli [52, 53].Despite careful initialisation, the network might still become quiescent during training. We prevent this problem by adding a fixed small penalty (penalty_no_spike) to the derivative of all presynaptic weights of a neuron that has not fired. In practice, after the training phase, some of the neurons will spike too late to matter in the classification and thus they do need to spike at all.
Another problem is that the gradients become very large as a spike becomes closer to, but not sufficient for the postsynaptic neuron to reach the firing threshold. In this case, in Eq. 4 and 5, the value of the Lambert function will approach its minimum () as its argument approaches , the denominator of the derivatives will approach zero and the derivatives will approach infinity. To counter this, we clip the derivatives to a fixed value clip_derivative. Note that this behaviour will occur in any activation function that has a maximum (hence, a biologically-plausible shape), is differentiable, and has a continuous derivative.
In addition to these hyperparameters, we explored several other heuristics for the spiking net. These included weight decay, adding random noise during training to the spike times of either the inputs or all non-output neurons in the network, averaging over brightness values in a convolutional-like manner and adding additional input neurons responding to the inverted version of the image, akin to the on/off bipolar cells in the retina. Additionally, to improve the computation time of the network, we tried removing presynaptic neurons from the presynaptic set once their individual contribution to the potential decayed below a decay threshold. This can be achieved by solving an equation similar to Eq.
2 for reaching a decay threshold on the decaying part of the function, using the branch of the Lambert function. None of these techniques improved our results, so we did not include them in the Results.
Parameter
Default value
(Boolean tasks)
Search range
Chosen value
(MNIST)
batch_size
^{*}
clip_derivative
batch_constant ()
fire_threshold ()
learning_rate
^{*}
learning_rate_pulses
^{*}
n_hidden
^{*}
n_pulses
nonpulse_weight_mean_multiplier
penalty_no_spike
pulse_weight_mean_multiplier
* - logarithmic search space
We first tested the problem on noisy Boolean logic problems: AND, OR, XOR. For each example, we encoded the two inputs as the individual spike times of the two input neurons. All spikes occurred at times between and
. We drew True and False values from uniform distributions between
and , respectively. The result is encoded as the first neuron to spike in the output layer. We used a single synchronisation pulse connected to all non-input neurons of the network.We also solved a concentric circles problem, where the value represents a 2D coordinate uniformly drawn from either an inner circle with a radius of or an outer circle with an inner radius of and an outer radius of .
We solved the classic MNIST benchmark by encoding the raw pixel brightness values in the temporal delay of the neurons of the input layer (Figure 2). All values were encoded as spikes occurring in the interval . Darker pixels were encoded as earlier spikes compared to brighter pixels, as they represent more salient information. The scale of the brightness-to-temporal encoding was linear. Input neurons corresponding to white pixels did not spike.
The output of the network was encoded as the first neuron to spike among the neurons in the output layer. We used individual sets of pulses with fixed size connected to each individual layer, in order to allow for deeper architectures where the spike times of different layers might become considerably different.
To find the optimal parameters for solving MNIST, we performed a hyperparameter search using evo-lutionary-neural hybrid agents [44]. This technique combines deep reinforcement learning with the method of evolutionary agents and has been shown to outperform both approaches in architecture search for a variety of classification tasks. It consists of an evolutionary agent encoding an architecture whose mutations are guided by a neural network trained with reinforcement learning. We used Google Vizier as infrastructure for the hyperparameter search [54].
As detailed in Table 2.5, the hyperparameter search includes the decay constant of the neurons, whereas the spike times of the inputs are fixed. In this work, we make no claims about the biological meaning of the temporal scales themselves, in particular the spike times of the encoded problem relatively to the decay constant of the model. Rather, we encode well-known problems in time in order to show the general capability of spiking models with temporal coding to solve non-linear, complex problems.
We ran the hyperparameter search with parameter ranges detailed in Table 2.5. Each trial trained a network on a set of parameters suggested by the hyperparameter search agent within the ranges specified in Table 2.5. The MNIST training examples were randomly split into a training set () and a validation set (
). The final reported objective to be maximised by the search agent was the accuracy on the validation set after 100 epochs. The search was run for
trials. We then chose the hyperparameters of the model as the set that produced the best objective value during the search. These values are given in Table 2.5.Finally, we used this set of hyperparameters to train networks for epochs. This time we trained on the whole MNIST training data with no validation set and tested on the MNIST test data.
We explored the representations learned by the spiking network using the family of “deep dream” methods as commonly applied in atemporal neural networks
[55, 56, 57]. Given a network with fixed weights and pulses, a blank input image can be gradually adjusted in order to minimise the spike time of a particular neuron. Optionally, at the same time, the spike times of the other neurons in the same layer can be maximised. To do this, a one-hot encoded vector is set as the target for a non-input layer of the network and the derivative with respect to the spike time (Eq. 5) is used to backpropagate errors to the input layer. Then, the same procedure is repeated with the adjusted image, until the output is close enough to the target.We performed feature visualisation on the network that performed best. We started from a blank image with each pixel initialised to spike at , though we confirmed that starting from a random image is also feasible for this procedure. We gradually adjusted the input image once per epoch with a learning rate of , until the classification produced the correct target class for consecutive epochs. As an additional constraint, we only allowed non-negative spike times.
We first trained small networks containing hidden neurons and synchronisation pulse to solve 2-di-mensional Boolean logic problems encoded in spike times. The networks were trained using the default parameters from Table 2.5 for a maximum of epochs on training examples. They were tested on randomly generated test examples from the same distribution. We found that we achieved faster training for these problems when Adam updates were computed and applied only on examples that were classified wrong. In practice, this method has the advantage of discouraging overfitting and reducing the number of operations required for training.
We were able to train such small spiking networks with accuracy on all four problems. Figure 3 exemplifies class boundaries for such networks. All trained networks in these examples had only positive weights.
Figure 4 shows the dynamics of the membrane potential of every individual neuron during the classification of a typical XOR example. In this particular case, all neurons spike. With larger spacing between inputs, one of the hidden neurons no longer needs to spike. Given the limited range of inputs with respect to the decay constant, the classification occurs before any significant decay has occurred. However, the network can also be successfully trained with scaled inputs or decay constant so that the time to reach the alpha function peak is exceeded during the classification, in which case the information will also propagate during the decay phase (see Figure 0(a)).
The purpose of solving a set of simple problems was to demonstrate that spiking networks of a small size can be trained without hyperparameter tuning, and hence we did not focus on performance metrics for these problems.
We found that the best out of the three networks obtained as described in the Methods section reached a maximum accuracy of and on the MNIST train and test sets, respectively (Table 3.2). The hyperparameters used to achieve this result are given in Table 2.5. The hyperparameter values are given in the table up to -decimal precision, however in practice an amount of imprecision can be tolerated. In general, the best candidate sets of hyperparameters had a low batch size (up to ), a small decay rate (between and ), and a non-pulse learning rate usually under , whereas the other hyperparameters had comparatively more varied values.
Slow regime Fast regime Training accuracy (%) Training error (mean) Test accuracy (%) Test error (mean)
This result improves on recent spiking models published recently (but not trained with conversion from atemporal networks), as detailed in the Discussion. For comparison, we also trained a fully-connected conventional multilayer perceptron with ReLU neurons with the same architecture as the spiking network (
–– neurons) on MNIST. We used the same training setting as for the spiking network: Glorot initialisation, Adam optimizer (learning rate ), the same train/test split, the same number of epochs. We tested batch sizes , and . For each batch size, we ran the training three times. During the nine runs, we obtained a maximal accuracy of on the test set. Although it can be claimed that various techniques can improve the performance of conventional neural networks (as it also may well be the case with the spiking model presented here), we aim to show that, generally-speaking, the performances of the two types of networks are comparable.During the training process, we observed the same spiking network spontaneously learning two different operating modes for the classification of inputs: a slow regime and a fast regime (Figure 5). In the slow regime, the network spikes in a typical feedforward manner, with the neurons in the output layer usually waiting for all hidden neurons and pulses to spike before spiking themselves. The best accuracy is reported during the slow regime. On the other hand, in the fast regime, the network makes very fast decisions, with the first spike in the output layer occurring before the mean spike in the hidden layer. The transitions between the two regimes is sudden.
Investigating this transition, we found that the input layer pulses had an oscillatory behaviour during training (likely due to a high learning rate pre-set using hyperparameter search) and the transition occurred when these pulses synchronised and simultaneously reached a minimal spike time (0.0). Although they then recovered to larger non-synchronous spike times, this drove the hidden layer pulses to spike considerably earlier (Figure 6). We observed the same transition from a slow to a fast regime occurring in one of the other two trained networks, whereas the third network went directly into a fast regime. With this set of hyperparameters, in particular the relatively high learning rate for pulses leading to oscillatory changes across training epochs, it is likely that the network has a preference for settling into the fast regime.
In the best-performing slow and fast networks, we investigated the individual membrane potentials of the output neurons during the classification of individual examples (Figure 7). We found that all output neurons were first inhibited before producing a spike, even in the case of the fast network. Moreover, in both networks, the winning neuron spiked before most or all of the hidden layer pulses spiked. This suggests that one role of the pulses was to produce late spikes where neurons would not have normally spiked, and thus to allow the flow of gradients from non-target neurons during training.
The optimal value for the decay constant () chosen by the hyperparameter search means that the alpha function has its maximum at . The input spikes were prefixed to occur between and . On the other hand, the pulses learned to produce spikes at much later times (Figure 6). This means that the decaying part of the function was exploited indeed, as the output neurons generally spiked only after a subset of the pulses.
Finally, we investigated whether it was possible to reconstruct target digits using the “deep dream” method. It was indeed possible to produce recognizable digits in this manner (Figure 8).
In this paper, we proposed a spiking neural network with biologically-plausible alpha synaptic function that encodes information in the relative timing of individual spikes. The activation function is differentiable with respect to time when the presynaptic set is fixed, thus making the network trainable using gradient descent. A set of synchronisation pulses is used to provide bias and extra degrees of freedom during training. We showed that the network is able to successfully solve benchmark problems encoded in time, such as noisy Boolean logic problems and MNIST. Our open-source code and network is openly available at
[58].Our model improves upon existing spiking network models, in particular those from the SpikeProp family, in multiple ways. First, we use a biologically-plausible alpha synaptic function. The alpha function includes a decay component that allows earlier inputs to be forgotten when a neuron has not spiked, which can help in correcting potentially false alarms. We also use a continuous (as opposed to a discretised) scheme where the spike time of an input neuron is proportional to the value of a feature, such as the brightness of a pixel. Finally, we use trainable synchronisation pulses connected to individual layers of the network, which provide temporal bias and allow extra flexibility during training,
Our results on the MNIST benchmark compare favourably with other recent models from literature, including similar models with larger or deeper architectures. Recent works on non-convolutional spiking network whose benchmark results we improve include [37, 59, 28, 60, 61, 62]. Recently, performant spiking networks have been obtained by conversion from atemporal neural networks [63], however we are not aware of such networks performing computations by encoding information in the timing of individual spikes.
The main challenge posed by this type of model is the discontinuous nature of the loss function with respect to the parameters of the network at the points where the order of pre- and postsynaptic spikes changes. In general, discontinuities also appear in other deep learning models, such as the derivative of the ReLU activation function in atemporal networks. We found that the training process is able to overcome these discontinuities.
Another practical challenge is the computational complexity of the activation function and the training process. In conventional atemporal networks, the feedforward pass can be performed as a sequence of matrix multiplications, which can be efficiently parallelised on the GPU or other specialised hardware. In spiking networks, the feedforward pass cannot be parallelised in the same efficient fashion, leading to slower training times. In addition, solving the threshold crossing equation in the spiking network requires the computation of exponentials and solving the Lambert function, which is relatively expensive. The networks presented in this work are trainable on the order of a few seconds to a few hours.
Despite challenges, there are two prominent goals motivating progress in the field of spiking neural networks. On the one hand, spiking neural networks are interesting as computational models of biological processes. They can contribute to understanding low-level information processing occurring in the brain and help us in the search for neural correlates of cognition. On the other hand, they can be deployed in neuromorphic hardware [64, 65, 22] to implement rapid and energy-efficient computations.
From a computational biology perspective, the main contribution of this work is showing that it is possible to perform complex, nonlinear computations in networks with biologically-plausible activation functions that encode information in the timing of single spikes. The timing of neuronal spikes, as opposed to spike rates, must play an important role in biological brains during the fast processing of perceptual stimuli. Neurons involved in the processing of sensory stimuli, such as ganglion cells in the retina, fire with very high precision and convey more information through their timing then through their spike count [66]. The relative difference between the first spikes of different ganglion cells can encode the spatial structure of an image [4]. In the primary sensory neurons in human fingertips, the relative timing of the first spikes encodes information about tactile pressure and texture [6]. Further constraints can be inferred from the response times of cortical neurons. In macaque brains, neurons in the temporal area can respond to a visual stimulus within 100ms; given that the anatomy of the visual cortex indicates that at least 10 synaptic stages must be passed to reach the temporal cortex, and given the conduction speed of intracortical fibers and the distance between the brain areas involved, it must be that the response is generated by single, or at most double, spikes in individual neurons [9]. It must therefore be that the temporal encoding of stimuli into single spikes fired by individual neurons are essential in rapid stimulus processing.
More generally, relative spike times are reflected in phase synchronisation between neuronal populations, which can be observed both in individual-neuron studies [5] and in macroscale recording of brain activity [67]. Such synchronisation has been shown to be reliable across different neural stages [5, 68, 7, 8], particularly in dynamic contexts [69]. The relative encoding of information in the precise timing of spikes produced by single neurons is thought to represent a solution to the binding problem — the ability of the brain to coordinate over features highly distributed neural representations, such as associating a colour and a shape with a particular object in the environment [70, 71].
We do not claim that temporal coding is solely responsible for all of the complex cognitive processes performed by biological brains, but rather aim to draw attention that this type of encoding can be an efficient and computationally powerful alternative to rate coding. A reasonable hypothesis is that the two encoding methods are complementary in the brain. A temporal code can be transformed into a rate codes [72] and vice versa [73].
Another point that deserves discussion is backpropagation in biological networks. Our investigation was concerned with the representational capacity, as opposed to the learning mechanism of the network. We therefore used backpropagation to teach the spiking network. The idea of backpropagation in neural networks has for a long time been considered implausible [74], due to requirements such as the existence of symmetrical connections between neuronal layers. However, recent works have begun to challenge this viewpoint and have made promising attempts to understand how backpropagation-like mechanisms might work in a biological network [75, 76, 77, 78, 79]. Notably, it has been shown that random feedback connections are able to support learning with backpropagation [78]. The main learning mechanism observed in biological neurons is spike-time dependent plasticity [80], which can theoretically implement weight updates as required by backpropagation [81, 79]. This topic is nevertheless still an open question. Neuroscience is still uncovering the role of neural elements previously thought to have passive roles in the nervous system, which in fact appear to actively participate in synaptic regulation; they might also have significant roles for learning in the brain [82, 83].
To conclude, here we showed that a spiking neural network model with biologically-plausible characteristics is able to solve standard machine learning benchmark problems using temporal coding. These results invite further exploration into the possibility of using such networks to generate behaviours resembling those of biological organisms, as well as investigations into the computational capabilities of spiking neural networks with more complex and recurrent architectures.
On a more speculative note, we expect that further advances in spiking networks with temporal coding will open exciting new possibilities for artificial intelligence, in the same way that it has for biological brains. We envision that neural spike-based state machines will offer a natural solution for the efficient modelling and processing of real-world analogue signals. It will be possible to integrate artificial spiking networks with biological neural networks and create interfaces between the two. These advances will come with significant computing energy savings. By observing and testing models such as the one presented in this study, we aim to increase the familiarity of the research community with the temporal coding paradigm and to create building blocks towards recurrent and state-based architectures for neural computing.
A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet classification with deep convolutional neural networks,”
Communications of the ACM, vol. 60, pp. 84–90, may 2017.2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
, pp. 1–9, IEEE, jun 2015.S. Hochreiter and J. Schmidhuber, “Long Short-Term Memory,”
Neural Computation, vol. 9, pp. 1735–1780, nov 1997.P. U. Diehl, G. Zarrella, A. Cassidy, B. U. Pedroni, and E. Neftci, “Conversion of artificial recurrent neural networks to spiking neural networks for low-power neuromorphic hardware,” in
IEEE International Conference on Rebooting Computing (ICRC), pp. 1–8, IEEE, oct 2016.Here we prove that the proposed spiking network model is an universal approximator for sufficiently well-behaved functions.
Recall that the maximum voltage of a spike of positive weight is obtained units of time after the spike itself, reaching a value of .
For every , here is a spiking network with inputs, each of them constrained to the range, and a single output that produces a spike in the interval , with if and only if each of the inputs belongs to a specified interval. Otherwise, no spike is produced. Moreover, it is possible to build one such network by using neurons.
We will first prove:
We can use one neuron and three auxiliary pulse inputs to conditionally produce a spike in the time interval if and only if the input spike happens at a time that is smaller than (or bigger than) a given constant , provided that .
Let us consider the function:
This describes the potential of a neuron that receives one spike at time with positive weight , and one at time with negative weight ; note that this formula is valid for . The sign of this function is determined fully by the term . Note that .
If , then , thus . As for , it follows that if then .
If , we have . As , it follows that if then .
Thus, we can consider this setup:
an input spike at time with positive weight ,
a fixed spike at time with negative weight ,
a fixed spike at time with weight ,
a fixed spike at the time when the spike starting at would reach the firing threshold if no other spikes were present, with a big enough negative weight to ensure that that spike reaches but does not cross the threshold by itself.
If , then at time the potential of the neuron is positive (as proven above). Thus, the fixed spike at will cross the firing threshold and produce an output spike after a small delay (which can be made smaller than by suitably increasing ). On the other hand, if , the potential will be negative, and no output spike is produced.
In a similar way, we can have a configuration of neurons and weights that will produce a spike in the interval if and only if . ∎
By connecting of the configurations shown in Lemma 2 to a single neuron, having connections with weight for a fixed that depends on the desired , we obtain an output spike in the interval ) if and only if the input belongs to a specified product of intervals. This requires the output spikes of Lemma 2 to belong to an interval that is small enough to ensure that any configuration of input spikes will still reach a potential of . This is always possible if the chosen in Lemma 2 is small enough.
Note that in the configuration from Lemma 2 all the fixed input spikes can be shared, except for the one defining the threshold. Thus, this setup requires at most neurons. ∎
Let be a continuous function from to . Then , there is a spiking network with hidden layers and decay constant that computes a function satisfying
If is also Lipschitz with constant , then it is possible to realize such a network using at most neurons.
Since is continuous on a compact set, it is uniformly continuous. Thus, for any , there is a such that, in each subdivision of the domain in -dimensional boxes with extension along each coordinate axis being no more than , the difference between the maximum and the minimum of is at most . Moreover, if the function is Lipschitz with constant , we can choose .
By Lemma 1, for each such box there is a network that produces a spike in the time interval if and only if the input belongs to the box. Connecting the outputs of those networks to a single neuron with sufficiently high weights, we can ensure that it will produce a spike with a delay of at most after receiving one as an input.
Thus, as a whole the network will produce a spike for input with a time that is at most away from the value , proving the thesis.
∎
Note that the requirement for the function domain to be greater than some prefixed values is necessary: any network that uses temporal coding, takes two inputs , and produces as an output would violate causality.