SNU_Chainer
Spiking Neural Unit (SNU) implemented with Chainer
view repo
Neural networks have become the key technology of Artificial Intelligence (AI) that contributed to breakthroughs in several machine learning tasks, primarily owing to advances in Artificial Neural Networks (ANNs). Neural networks incorporating spiking neurons have held great promise because of their brain-inspired temporal dynamics and high power efficiency, however scaling out and training deep Spiking Neural Networks (SNNs) remained significant challenges - they still fall behind ANNs in terms of accuracy on many traditional learning tasks. In this paper, we propose an alternative perspective on the spiking neuron as a particular ANN construct called Spiking Neural Unit (SNU). Specifically, the SNU casts the stateful temporal spiking neuron, of a LIF type, as a recurrent ANN unit, analogous to LSTMs and GRUs. Moreover, by introducing the concept of proportional reset, we generalize the LIF dynamics to a variant called soft SNU (sSNU). The proposed family of units has a series of benefits. Firstly, the ANN training methods, such as backpropagation through time, naturally apply to them and enable a simple approach for successful training of deep spiking networks. For example, a 4- or 7-layer SNN trained on a temporal version of MNIST dataset provides higher accuracy in comparison to RNNs, LSTM- and GRU-based networks of similar architecture. Secondly, for the task of polyphonic music prediction on the JSB dataset, an sSNU-based network surpasses the state-of-the-art performance of RNNs, LSTM- and GRU-based networks. The novel family of units introduced in this paper bridges the biologically-inspired SNNs with ANNs. It provides a systematic methodology for implementing and training deep networks incorporating spiking dynamics that achieve accuracies as high, or better than, state-of-the-art ANNs. Thus, it opens a new avenue for the widespread adoption of SNNs to practical applications.
READ FULL TEXT VIEW PDFSpiking Neural Unit (SNU) implemented with Chainer
The research interest in neural networks has considerably grown over the recent years owing to their remarkable success in many applications. Record accuracy was obtained in deep networks applied for image classification krizhevsky_imagenet_2012 ; szegedy_going_2015 ; he_deep_2016 , and new architectures enabled solving cognitively-challenging tasks, such as multiple object detection trained end-to-end redmon_you_2016 , pixel-level segmentation of images he_mask_2017-1 , or even the playing of computer games based on raw screen pixels mnih_human-level_2015 . Moreover, sequence-to-sequence models enabled language translation sutskever_sequence_2014 and the combination of convolutional layers with recurrent layers has led to language-independent end-to-end speech recognition amodei_deep_2016 . These architectures surpassed the performance of traditional domain-specific models, and established neural networks as the standard approach in the industry.
Although the term neural networks elicits associations to the sophisticated functioning of the brain, the advances in the field were obtained by extending the original simple ANN paradigm of the 50’s to complex deep neural networks trained with backpropagation. The ANNs take only high-level inspiration from the structure of the brain comprising neurons interconnected with synapses, which results in human-like performance, albeit at a much higher power budget than the
20 W required by the human brain. At the same time, the neuroscientific community –- whose focus is understanding the brain dynamics –- has been exploring architectures with more biologically-realistic dynamics, such as SNNs that in their simplest form consist of Leaky Integrate-and-Fire (LIF) neurons dayan_theoretical_2005 ; eliasmith_how_2013 ; gerstner_neuronal_2014 . The SNN paradigm encompasses rich temporal dynamics, due to the brain-inspired LIF neurons, and promises low power consumption, due to the use of sparse asynchronous voltage pulses, called spikes, to compute and propagate information. Thus, SNNs are considered to be the next generation of neural networks beyond ANNs maass_networks_1997 , with advantages stemming from their efficient implementation, rich dynamics and novel learning capabilities.From an implementation perspective, the inherent characteristics of SNNs have led to highly-efficient computing architectures with collocation of memory and processing units. For example, incorporation of the design principles of the SNN paradigm in the field of neuromorphic computingmead_neuromorphic_1990 has led to the development of non-von Neumann systems with significantly increased parallelism and reduced energy consumption, demonstrated in chips such as FACETS/BrainScales meier_mixed-signal_2015 , Neurogrid benjamin_neurogrid:_2014 , IBM’s TrueNorthcassidy_real-time_2014 and Intel’s Loihidavies_loihi:_2018 . Moreover, recent breakthroughs in the area of memristive nanoscale devices have enabled further improvements in area and energy efficiency of mixed digital-analog implementations of synapses and spiking neurons kuzum_synaptic_2013 ; tuma_stochastic_2016 ; wozniak_learning_2016 ; pantazi_all-memristive_2016 ; tuma_detecting_2016 .
From a neural and synaptic dynamics perspective, large-scale simulations following the neuroscientific insights were performed to explore the activity patterns of SNNs markram_blue_2006 ; izhikevich_large-scale_2008 ; ananthanarayanan_cat_2009 ; markram_human_2012 , or to address concrete interesting cognitive problems eliasmith_large-scale_2012 ; rasmussen_spiking_2014 . However, the most appealing models eliasmith_large-scale_2012 involved complex task-specific architectures and the execution of their dynamics could take up to 2.5 hours and 24GB RAM to calculate one second response. Owing to lack of holistic understanding of large-scale activity patterns and complex design of cognitive simulations, further bottom up research of simpler architectures is needed. Specifically, approaches such as that in which the competitive dynamics of inhibitory circuits were abstracted to Winner-Take-All (WTA) architectures maass_computational_2000 , or that in which the rich recurrent dynamics were exploited in Liquid State Machines maass_real-time_2002 , provide direct examples of cases where SNN dynamics can enhance the computational capabilities.
Finally, from a learning perspective, biologically-inspired unsupervised Hebbian learning rules, such as Spike-Timing-Dependent Plasticity (STDP)markram_regulation_1997 ; song_competitive_2000 and its extension to Fatiguing STDP (FSTDP) moraitis_fatiguing_2017 , were applied for correlation detection song_competitive_2000 ; gutig_learning_2003 ; tuma_stochastic_2016 ; wozniak_learning_2016 ; pantazi_all-memristive_2016 ; tuma_detecting_2016 ; moraitis_fatiguing_2017 or high frequency signals sampling tuma_stochastic_2016 . STDP in WTA networks was applied for handwritten digit recognition querlioz_simulation_2011 ; diehl_unsupervised_2015 ; sidler_unsupervised_2017 , but yielded limited accuracy. The reason is an insufficient generalization of the internal representation of knowledge in WTA architectures, effectively implementing a -NN algorithm wozniak_THESIS_2017 . Further improvements in the internal representation were obtained through feature learning bichler_extraction_2012 ; burbank_mirrored_2015 ; wozniak_IJCNN_2017 .
However, despite the great promise of SNNs in terms of efficient implementation, rich dynamics and learning capabilities, it has been unclear how to effectively train large generic networks of spiking neurons to reach the accuracy of ANNs for common machine learning tasks. The main reason was attributed to lack of a scalable supervised SNN learning algorithm, such as the backpropagation (BP) in ANNs. It was shown that porting the weights from ANNs trained with BP to SNNs enhances their performance oconnor_real-time_2013 ; diehl_conversion_2016 ; rueckauer_conversion_2018
. Moreover, there have been multiple attempts to develop supervised learning approaches inspired by the idea of BP
bohte_error-backpropagation_2002 ; anwani_normad_2015 , and also to explore how STDP could perform BP bengio_stdp-compatible_2017 ; tavanaei_bp-stdp:_2017 , or how to directly implement BP. Examples include: using a differentiable approximation of the LIF response function hunsberger_spiking_2015 , applying BP instead of STDP when output spikes are emitted esser_convolutional_2016 , deriving BP formulas on lowpass-filtered neuronal activities lee_training_2016 , or using the concept of backpropagation through time with differentiable approximations of temporal network dynamics huh_gradient_2017 ; bellec_long_2018 . Despite many performance improvements, these state-of-the-art architectures usually do not surpass the accuracy of ANNs on common datasets and involve sophisticated implementation.In this paper, we take a different perspective of the relationship between the ANNs and SNNs. We reflect on the nature of the spiking neural dynamics and postulate to unify the SNN model with its ANN counterpart. In the first part of the paper, we focus on the LIF dynamics and provide a constructive proof that a spiking neuron can be transformed into a simple novel recurrent ANN unit called Spiking Neural Unit (SNU) that shares many similarities with those of the Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU). Moreover, we generalize the LIF dynamics to a non-spiking case by introducing a so-called soft variant of SNU (sSNU). In the second part, we address the learning challenges. We show that for the proposed units the existing ANN frameworks can be naturally reused for training with backpropagation through time, enabling a very easy implementation and successful learning of any SNN architecture. Finally, we demonstrate the efficacy of our approach by training deep SNNs with up to seven layers and analyzing the performance of two tasks: handwritten digit recognition and polyphonic music prediction. The results obtained from the SNU family-based networks show competitive performance compared to RNNs, LSTM- and GRU-based networks – the state-of-the-art models commonly used in tasks involving temporal data.
Conventional ANNs are feed-forward architectures implementing layers of neurons described by the equation
(1) |
where and
are the input and output vectors,
is the weights matrix, is a vector of biases andis the activation function, such as a sigmoid
or the rectified linear function, used in Rectified Linear Units (ReLUs)
nair_rectified_2010 .Such ANNs are stateless and have no inherent notion of time, yet through an appropriate transformation of the last temporal inputs into a large spatial input vector waibel_modular_1989 , it is possible to utilize these neuronal models with temporal data. However, this solution is computationally inefficient, because a larger model with -times more inputs needs to be evaluated during each time step, and the system needs to implement shift buffering or delay lines to provide the required past inputs.
A Recurrent Neural Network (RNN) is an extension of an ANN that is capable of operating directly on temporal data. This is achieved through the introduction of recurrent connections between the neurons within each layer
(2) |
where indicates the discrete time and denotes the matrix of recurrent weights. As a result, streams of temporal data may be directly fed into such networks that maintain the temporal context in the transient activation values circulating through the recurrent connections.
However, the rapid transient dynamics of RNNs posed significant challenges for training these models on long sequences. For each time step, the temporal context of the entire RNN layer is combined with the new inputs and transformed by the non-linear activation function. This leads to a rapid saturation of the neuronal activations that negatively impacts the learning – known as the vanishing gradients problem. The solution was to provide a long-term temporal context that was unaffected by the non-linear interactions outside the neuronal cells. This was achieved by introducing into the neurons an internal state variable called carry
. Its dynamics is controlled by surrounding trainable gates that form together a stateful Long Short-Term Memory (LSTM) unit gers_learning_1999(3) | ||||
where multiple RNN units are combined and indexed with , , to denote the input, the output, and the forget gate, respectively. Moreover, and are input and output activation functions, and denotes the inner product. This approach became the state-of-the-art in recurrent networks.
Recently, the Gated Recurrent Units (GRUs) cho_properties_2014 became a popular alternative to the LSTM units that achieve similar performance with fewer gates chung_empirical_2014 . They are formulated as
(4) | ||||
where is called an update gate, is called a reset gate and is the activation function.
Meanwhile, SNNs have been developing almost independently from ANNs. The common basic Leaky Integrate-and-Fire (LIF) spiking neuron model is inherently temporal, comprising a state variable , called the membrane potential, with the dynamics described by the differential equation gerstner_neuronal_2014
(5) |
where is the time constant of the neuron, and represent the resistance and the capacitance of the neuronal cell soma, and is the incoming current from the synapses. The synapses of a neuron receive spikes and modulate them by the synaptic weights to provide the input current to the neuronal cell soma. The input current is integrated into the membrane potential . When crosses a firing threshold at time , an output spike is emitted: , and the membrane potential is reset to the resting state , often defined to be equal to .
It is common to describe the membrane potential dynamics using a discrete-time approximation that is obtained from Eq. 5 assuming a discretization step
(6) |
Assuming that we do not consider the temporal dynamics of biologically-realistic models of synapses and dendritesgerstner_neuronal_2014 , the input current may be defined as . This formulation provides a simple framework for the analysis of the LIF dynamics, commonly explored in SNN research. However, understanding how to take advantage of these temporal dynamics to build large-scale generic deep spiking networks that would achieve high accuracy on common machine learning tasks has remained an open question.
Here, we introduce a novel way of looking at SNNs that makes their temporal dynamics easier to understand and enables them to be incorporated in deep learning architectures. In particular, we propose a succinct high-level model of a spiking LIF neuron, which we call a Spiking Neural Unit (SNU). The SNU comprises two ANN neurons as subunits:
, which models the membrane potential accumulation dynamics, and , which implements the spike emission, as illustrated in Fig. 1. The integration dynamics of the membrane-potential state variable is realized through a single self-looping connection to in the accumulation stage. The spike emission is realized through a neuron with step activation function. Simultaneously, an activation of controls the resetting of the state variable by gating the self-looping connection at . Thus, SNU –- a discrete-time abstraction of a LIF neuron –- represents a construct that is directly implementable as a neural unit in ANN frameworks.Following the standard ANN convention, the formulas that govern the computation occurring in a layer of SNUs are as follows
(7) |
where is the vector of internal state variables calculated by the subunits, is the output vector calculated by the subunits, is the accumulation stage activation function, and is the output activation function.
is a ReLU, i.e.,
is the rectified linear activation function, based on the assumption that the membrane potential value is bounded by the resting state . The inputs are weighted by the synaptic weights in matrix and there is no bias term. The self-looping weight applied to the previous state value performs a discrete time approximation of the membrane potential decay that occurred in the time period . The last term relies on the binary output values of the spiking output to either retain the state, or reset it after spike emission. is a thresholding neuron, i.e. it has a step activation function , which returns that corresponds to an output spike if , or otherwise. There is no weight on the connection from , but it is biased with to implement the spiking threshold.The parameters of the SNUs are , and . If is fixed to 1, the state does not decay and the SNU corresponds to the Integrate-and-Fire (IF) neuron without the leak term. Otherwise, these parameters correspond to the parameters of the LIF neuron introduced in Eq. 6, i.e.,
(8) |
Thus, the same set of parameter values can be used in an SNU-based network, implemented by utilizing standard ANN frameworks, as well in a native LIF-based implementation, utilizing standard SNN frameworks, or even in neuromphorphic hardware. To demonstrate this, we have used TensorFlow
^{1}^{1}1 http://www.tensorflow.org to produce sample plots of the spiking dynamics for a single SNU in Fig. 2. As can be seen, the state variable of the SNU increases each time an input spike arrives at the neuron, and decreases following the exponential decay dynamics. When the spiking threshold is reached, an output spike is emitted and the membrane potential is reset. These dynamics are aligned with the reference LIF dynamics, which we obtained for the corresponding parameters by running a simulation in a well-known Brian2 ^{2}^{2}2 http://brian2.readthedocs.io SNN framework.In SNNs, information is transmitted throughout the network with all-or-none spikes, typically modeled as binary values. As a result, the input data is binarized, and the step function is used to determine the binary neuronal outputs. However, the proposed SNU implementation, within the ANN framework, allows the all-or-none constraint to be relaxed, thus allowing the benefits of a variant of the SNU, called soft SNU (sSNU) to be explored. The sSNU is a member of the family of SNUs characterized by the dynamics in Eq.
7. It generalizes this dynamics to non-spiking ANNs, in which the input data does not have to be binarized and the activation functionis set to a sigmoid function. This formulation has the additional interesting property of an analog proportional reset, i.e., the magnitude of the output determines what fraction of the membrane-potential state variable is retained. Exploiting the intermediate values at all stages of processing, viz., input, reset and output, facilitates on-par performance comparison of LIF-like dynamics with other ANN models, eliminating any potential performance loss stemming from the limited value resolution of the standard SNU.
The sSNU concept has another interesting intuitive interpretation. In a sense, the 0 or 1 binarized output of a neuron represents its confidence in a certain hypothesis concerning the information presented at its inputs. In handling static data, all relevant information is presented simultaneously at the inputs and, as a consequence, an artificial neuron would directly output its confidence regarding this input information. However, in cases with temporal data, the information is spread over time and the LIF neurons collect it in the membrane potential wozniak_THESIS_2017 . When enough information aligned with the hypothesis has been collected, an output spike transmits this fact to the downstream neurons and restarts the process through the state reset. However, with sSNU, a floating point output is always transmitted. To avoid repeated retransmission of the same information to the downstream neurons, the value of the membrane potential has to be reduced. On the other hand, in order to retain certain memory, the neuron should not be fully reset at each time step. Thus, the solution provided by sSNU is to attenuate the state variable proportionally to the output value transmitted to the downstream neurons.
The temporal context of the units from the SNU family is captured through the internal state corresponding to the membrane potential of the LIF neuron. In this sense, the structure of the SNUs is similar to LSTM or GRU in that it also relies on their internal state as a means of storing temporal context. This structural similarity is visible in Figs. 3
a-c, in which all the aforementioned units have an internal state that is maintained through a recurrent loop within the units’ boundaries, drawn in gray. Besides similarities in the structure, the SNUs possess unique features not present in the other models, viz., a non-linear transformation
within the internal state loop, a parametrized state loop connection, a bias of the state output connection to the output activation function , indicated by bold arrows in Fig. 3c, and a direct reset gate controlled by the output .Network structure | ||
---|---|---|
Feed-forward | Recurrent | |
Stateless units | ANN | RNN |
Stateful units | SNU | LSTM/GRU |
The SNUs can be optionally interconnected through recurrent connections, as is indicated in Fig. 3c. Thus, similar to the Liquid State Machine SNN modelmaass_real-time_2002 and the learning-to-learn architecturebellec_long_2018 , it might be beneficial for certain tasks to extend the SNUs-based networks to include the recurrent connections matrix . However, the most typical architecture with SNUs is feed-forward, which creates a novel category of ANN architectures for temporal processing, as summarized in Tab. 1. Note that processing temporal data without the use of a recurrent neural network structure but rather using only the internal state has long been the standard approach in the SNN community wozniak_learning_2016 ; pantazi_all-memristive_2016 ; maass_computational_2000 ; song_competitive_2000 ; moraitis_fatiguing_2017 ; gutig_learning_2003 ; querlioz_simulation_2011 ; diehl_unsupervised_2015 ; sidler_unsupervised_2017 ; bichler_extraction_2012 ; wozniak_IJCNN_2017 . Thus, in the rest of the paper we will focus on the classic feed-forward SNN network architectures.
The use of feed-forward stateful architectures for temporal problems has a series of profound advantages. From an implementation perspective, all-to-all connectivity between the neuronal outputs and the neuronal inputs within the same layer is not required. This may lead to highly-parallel software implementations or neuromorphic hardware designs. From a theoretical standpoint, owing to inherent temporal neural dynamics, a feed-forward network of SNUs is the simplest temporal neural network architecture with a lower number of parameters than that in RNNs, LSTM- or GRU-based networks, which may result in faster training and reduced overfitting.
In addition to the synaptic weights, the trainable parameters in SNUs may include the membrane time constant and the neuronal threshold. For a layer of neurons with inputs, the number of parameters for an SNU with trainable synaptic weights and firing thresholds, but constant , is , which is equivalent to the number of parameters in the simplest feed-forward ANN. Moreover, as summarized in Tab. 2, even if a trainable time constant is considered, the number of parameters in an SNU architecture is smaller than that of an RNN, an LSTM with four fully-parametrized gates or a GRU with three fully parametrized gates.
Network model | # of parameters |
---|---|
ANN | |
RNN | |
LSTM | |
GRU | |
SNU | |
SNU with trainable | |
recurrent SNU |
SNUs provide a mapping of the spiking neural dynamics into the ANN frameworks, which naturally enables to reuse the existing backpropagation training procedures. However, backpropagation requires that all parts of the network are differentiable, which is the case for the sSNU variant, but not for the step function in the standard SNU. Nevertheless, in particular cases it is possible to train non-differentiable neural networks by providing a pseudo-derivative for the non-differentiable functions bengio_estimating_2013 . In the remaining part of the paper, we follow this approach and use the derivative of as the pseudo-derivative of the step function.
Even though an SNU-based network is a feed-forward architecture, the state within the units is implemented using self-looping recurrent connections. Therefore, to train such deep networks, we follow the idea of using backpropagation through time (BPTT) werbos_generalization_1988 in SNNs huh_gradient_2017 ; bellec_long_2018 . This implies that the SNU structure is unfolded over time, i.e., the computational graph and its parameters are replicated for each time step, as illustrated in Fig. 4, and then the standard backpropagation algorithm is applied. The unfolding involves only the local state of the neuron, which is different from the common RNN architectures that require unfolding of the activations of all units in a layer through recurrent connection matrices . In practice, these details do not matter for the ANN frameworks that generate a computational graph and use automatic differentiation for the training, so that the entire training code is created dynamically.
In the case standard SNUs are used for the output layer of a network, we propose to adapt the learning loss to reflect the differences of how SNNs and ANNs are assessed. For RNNs, LSTMs and GRUs it is quite often that the last output after presentation of the entire sequence, for instance in Fig. 4, is considered. In the case of SNNs, it is common to assess the output spiking rate of the neurons over a range of time, such as the entire output sequence in Fig. 4. To reflect this, we define the SNU spiking rate loss as the mean squared error (MSE) between the rate of the mean spiking output , calculated over an assessment period , and the target firing rate
(9) |
In deep learning, normalized target values in the range between 0 and 1 are used by convention. If we normalize the mean spiking rate by the maximum spiking rate of , we obtain the normalized spiking rate loss, or simply the mean output loss for normalized targets
(10) |
that does not depend on the sampling time from the SNN discretization, and is also suitable for use with any ANN model.
We evaluated the performance of deep SNU-based networks in comparison to other popular temporal ANN models. Here, we proposed a temporal variation of the MNIST handwritten digit classification task, in which we assumed that the inputs are spikes from an asynchronous camera. We assumed that for each input image pixel belonging to the digit, defined as a pixel having a positive intensity value, the camera sends a spike at a random time instance. Thus, as illustrated in Fig. 5
, the generated spikes convey jittered information about the digits. For repeatability of the results and to limit the regularization effects, the transformation from standard MNIST to the jittered MNIST was performed upfront for the entire dataset using five time steps per digit and a random seed equal to 0, so that the same inputs are presented at each epoch to all models.
We aimed to develop a training setting similar to the SNN convention querlioz_simulation_2011 ; sidler_unsupervised_2017 , in which the patterns from the dataset form a continuous stream, i.e., the spikes representing the current training digit come directly after the spikes of the preceding digit. Thus, the network operates with a non-zero initial state and has to identify consecutive digits without receiving explicit information when the digit at the input has changed. However, a direct implementation of this approach would require to apply BPTT to a continuous stream of 60000 training digits forming a single training input stream per entire dataset. Instead, to apply BPTT to the individual training digits and benefit from parallel training with batching, we trained the networks on sequences formed by feeding a dummy random digit to initialize the network state first and then consecutively presenting the training digit. The neuronal outputs of all the models were then evaluated during the training digit presentation period using the mean output loss defined in Eq. 10.
We trained 3-, 4- and 7-layer network architectures with 784-250-10, 784-256-256-10 and 784-256-256-256-256-256-10 neurons, respectively. The networks were homogeneous – all the neurons were of the same kind: SNUs, or sSNUs, with trainable , and fixed
; or RNNs with sigmoidal neurons; or LSTMs with sigmoidal activation functions; or GRUs with sigmoidal activation functions. For each epoch, the training set of 60000 MNIST images was presented with a batch size of 15. The learning was performed using Stochastic Gradient Descent (SGD) with learning rate
. To assess the consistency of the results, the networks were executed for 10 different random weight initializations and the mean test accuracy was calculated.Firstly, we analyzed the impact of the network depth on the performance of an SNN implemented with SNUs. The evolution of the test accuracy is plotted in Fig. 6. A 3-layer SNU-based network achieved mean test accuracy of 96.8%, whereas with a 4-layer architecture the accuracy increased to 97.41%. A further slight improvement was achieved with a 7-layer architecture that increased the test accuracy to 97.46%. These results indicate that deep networks incorporating spiking neural dynamics have high potential in addressing efficiently, and with high accuracy, machine learning tasks that involve temporal data.
Secondly, we compared the performance of the units from the SNU family with the state-of-the-art temporal ANNs. The learning curves of the best performing RNNs, GRU- and LSTM-based networks are depicted in Fig. 7. The best test accuracy was obtained with 3- or 4-layer networks and was similar for all of them. The exact accuracy values and numbers of parameters of the networks are reported in Tab. 3. As can been seen, the 4- or 7-layer SNU-based network architectures have surpassed in test accuracy these state-of-the-art temporal ANN models. Moreover, a soft variant of the SNU in a 4-layer configuration has achieved the highest result of accuracy.
Network | Total # of parameters | Mean accuracy | Maximum accuracy |
---|---|---|---|
GRU 3-layer | 0.9694 | 0.9708 | |
LSTM 4-layer | 0.9699 | 0.9719 | |
RNN 4-layer | 0.9708 | 0.9718 | |
SNU 4-layer | 0.9741 | 0.9754 | |
sSNU 4-layer | 0.9796 | 0.9802 |
To further validate the performance of SNUs, we considered the task of polyphonic music prediction. We used the dataset of Johann Sebastian Bach’s (JSB) chorales that comprises over seven hours of music in 382 pieces, in the form provided by Boluanger-Lewandowski et al. boulanger-lewandowski_modeling_2012
. The piano notes ranging from A0 to C8 were coded as 88-dimensional binary vectors, in which ones correspond to a note being played. They were sequentially fed into a network, which had to predict at each time step the set of the notes that were to be played in the consecutive time step. The goal was to minimize the negative log-probability of notes’ predictions.
We compared SNU-based networks against the state-of-the-art ANN resultschung_empirical_2014 ; greff_lstm:_2017 obtained with RNNs, GRUs and LSTMs, including a No Input Activation Function (NIAF) variant of an LSTM with , which nevertheless do not surpass the performance of the best task-specific model boulanger-lewandowski_modeling_2012 obtaining the loss of 5.56. The standard ANN assessment architecture comprises: an input layer receiving a set of input notes; followed by a hidden layer of analyzed units; followed by a softmax output layer predicting the next set of notes.
Network | # of hidden units | # of hidden layer param. | Total # of parameters |
---|---|---|---|
RNN tanhchung_empirical_2014 | |||
GRUchung_empirical_2014 | |||
LSTMchung_empirical_2014 | |||
LSTMgreff_lstm:_2017 | |||
NIAF-LSTMgreff_lstm:_2017 | |||
SNU | |||
sSNU |
We trained a feed-forward network with 150 SNUs, or sSNUs, with trainable , and . The various network architectures and details are summarized in Tab. 4. Learning was performed using SGD for 2000, or 500 epochs, with the learning rate , or , respectively. The parameter adjustments were applied after presentation of each music piece, following the state-of-the-art convention. We executed the learning for 10 different random initializations and reported the mean of the lowest test negative log-probabilities.
An architecture with a single feed-forward layer of spiking SNUs performed better than a standard recurrent layer of units, as illustrated in Fig. 8
. However, it performed worse than more sophisticated neural units for temporal data processing. The results improved when we used soft SNUs that enable to transmit intermediate values and make full use of the output softmax layer. The average negative log-probability of sSNU was lower than for GRUs
chung_empirical_2014 with similar number of parameters, or the average for the best 10% of 200 trials executed for LSTMs and NIAF versions of LSTMsgreff_lstm:_2017 requiring significantly more parameters. However, the best single NIAF-LSTM trial was able to achieve 8.38greff_lstm:_2017 . In our case, the sSNUs performed consistently with a minimum of 8.47 close to the mean of 8.49.For a long time, SNN and ANN research and applications have been developing separately. There has been significant effort to understand the SNN dynamics and to take advantage of their unique capabilities, albeit with limited success compared to the spectacular progress witnessed with ANNs. In this paper, we have tried to unify these neural network architectures by proposing the SNU that incorporates the spiking neural dynamics in a common ANN framework.
The transformation of the spiking model to an SNU allows SNNs to benefit from the advances in the ANN frameworks and also enables direct comparison of the dynamics of the spiking neural units with the state-of-the-art recurrent units. Moreover, with the sSNU variant, we have generalized the neural dynamics to the non-spiking case. Using this methodology, deep networks consisting of SNUs or sSNUs can be efficiently trained with BPTT. The benchmark results demonstrated that a feed-forward sSNU-based network outperforms conventional RNNs, LSTM- or GRU-based networks in temporal tasks. Therefore, the sSNU offers an alternative stateful model with the lowest number of parameters among the existing models for temporal data processing. Even though the results using the SNU-based network demonstrated slightly inferior performance compared to the sSNU-based networks, the binary input-output characteristics of the spiking communication provides an interesting implementation alternative for low power AI applications.
The proposed SNU family opens many new avenues for future work. It enables us to explore the capabilities of biologically-inspired neural models and benefit from their low computational power as well as their simplicity. It also provides an easy approach to training spiking networks that could increase their adoption for practical applications and would enable power-efficient neuromorphic hardware implementations. Finally, the compatibility of the SNU family with the ANN frameworks and models enables the use of existing or forthcoming ANN accelerators for SNN implementation and deployment.
2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
, 1–9 (2015).Human-level control through deep reinforcement learning.
Nature 518, 529–533 (2015).Mirrored STDP implements autoencoder learning in a network of spiking neurons.
PLOS Computational Biology 11, 1–25 (2015).Neuromorphic system with phase-change synapses for pattern learning and feature extraction.
In 2017 International Joint Conference on Neural Networks (IJCNN) (IEEE, 2017).Real-time classification and sensor fusion with a spiking deep belief network.
Frontiers in Neuroscience 7 (2013).STDP-compatible approximation of backpropagation in an energy-based model.
Neural Computation 29, 555–577 (2017).Rectified linear units improve restricted Boltzmann machines.
In Proceedings of the 27th International Conference on International Conference on Machine Learning, ICML’10, 807–814 (Omnipress, USA, 2010).On the properties of neural machine translation: Encoder-decoder approaches.
In Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation (SSST-8), 2014 (2014).