1 Introduction
Biological experiments (Markram and Sakmann, 1995; Gerstner et al., 1996)
suggest that synaptic change in several types of neurons and stimulation contexts depends on the relative timing of presynaptic and postsynaptic spikes. Learning rules that mimic this observation are called SpikeTiming Dependent Plasticity (STDP) rules. In this paper, we investigate a learning rule that only depends on firing rates and their temporal derivative while being consistent with the STDP observations, according to a series of simulations.
This learning rule is motivated by the desire to make it easier to study machine learning interpretations for STDP, since artificial neural networks are generally cast in terms of firing rates. Deep neural networks
(Goodfellow et al., 2015)have been extremely successful in several major artificial intelligence applications in areas such as computer vision, speech recognition, natural language processing, game playing and robotics. All of these breakthroughs have been achieved thanks to at least two ingredients: (a) a deep enough neural network (with enough layers) and (b) a form of stochastic gradient descent on an objective function of interest, where the gradient is obtained by backpropagation
(Rumelhart et al., 1986).To illustrate the potential use of the proposed STDP rule in bridging the gap between machine learning based backpropagation and biology, we show that this STDP rule would correspond to stochastic gradient descent on an objective function if the neural activity would move towards reducing that objective function.
As usual, we assume the existence of a nonlinear transformation
that is monotonically increasing, from the integrated activity (the expected membrane potential, averaged over the random effects of both pre and postsynaptic spikes) to the actual firing rate. In deriving a link between STDP and their ratebased learning rule Xie and Seung (2000) started by assuming a particular pattern relating spike timing and weight change, and then showed that the resulting weight change could be approximated by the product of presynaptic firing rate and temporal rate of postsynaptic firing rate. Instead, we go in the other direction, showing that if the weight change is proportional to the product of presynaptic firing rate (or simply the presence of a spike) and the postsynaptic activity, then we recover a relationship between spike timing and weight change that has the precise characteristics of the one observed experimentally by biologists. We present both an easy to understand theoretical justification for this result, as well as, simulations that confirm it.Middle and right:Spikebased simulation shows that when weight updates follow SGD on the proposed predictive objective function, we recover the biologically observed relationship between spike timing difference (horizontal axis, postsynaptic spike time minus presynaptic spike time) and the weight update (vertical axis). Middle: the weight updates are obtained with the proposed update rule (Eq. 1). Right: the weight updates are obtained using the nearest neighbor STDP rule. Compare with the biological finding, left.
2 Spiketiming dependent plasticity
Spiketiming dependent plasticity (STDP) is a central subject of research in synaptic plasticity but much more research is needed to solidify the links between STDP and a machine learning interpretation of it at the scale of a whole network, i.e., with “hidden layers” which need to receive a useful training signal. See Markram et al. (2012) for a recent review of the neuroscience research on STDP.
We present here a simple view of STDP that has been observed at least in some common neuron types and preparations. There is a weight change if there is a presynaptic spike in the temporal vicinity of a postsynaptic spike: that change is positive if the postsynaptic spike happens just after the presynaptic spike (and larger if the timing difference is small), negative if it happens just before (and again, larger if the timing difference is small, on the order of a few milliseconds), as illustrated with the biological data shown in Figure 1 (left) from Bi and Poo (2001). The amount of change decays to zero as the temporal difference between the two spikes increases beyond a few tens of milliseconds. We are thus interested in this temporal window around a presynaptic spike during which a postsynaptic neuron spikes, before or after the presynaptic spike, and this induces a change of the weight.
Keep in mind that the above pattern of spiketiming dependence is only one aspect of synaptic plasticity. See Feldman (2012)
for an overview of aspects of synaptic plastiticy that go beyond spike timing, including firing rate (as one would expect from this work) but also synaptic cooperativity (nearby synapses on the same dendritic subtree) and depolarization (due to multiple consecutive pairings or spatial integration across nearby locations on the dendrite, as well as the effect of the synapse’s distance to the soma).
2.1 Ratebased aspect of STDP
Let represent an abstract variable characterizing the integrated activity of neuron , with being its temporal rate of change. To give a precise meaning to , we define as being the firing rate of neuron , where
is a bounded nonlinear activation function that converts the integrated voltage potential into a probability of firing.
A hypothesis inspired by Xie and Seung (2000) and Hinton (2007) is that the STDP weight change can be associated with the temporal rate of change of postsynaptic activity, as a proxy for its association with post minus presynaptic spike times. Both of the above contributions focus on the rate aspect of neural activity, and this is also the case of this paper.
Using the notation introduced above, the proposed equation for the average weight change for the synapse associated with presynaptic neuron and postsynaptic neuron is the following:
(1) 
Since stochastic gradient only cares about the average change, we could as well have written
(2) 
where is the binary indicator of a spike from neuron . This works if we assume that the input spikes are randomly drawn from a Poisson process with a rate proportional to , i.e., we ignore spike synchrony effects (which we will do in the rest of this paper, and leave for future work to investigate). Note that in our simulations, we approximated the Poisson process by performing a discretetime simulation with a binomial draw of the binary decision spike versus no spike within the time interval of interest.
Left: When the postsynaptic rate (yaxis) is rising in time (xaxis), consider a presynaptic spike (middle dotted vertical line) and a window of sensitivity before and after (bold red window). Because the firing probability is greater on the right subwindow, one is more likely to find a spike there than in the left subwindow, and it is more likely to be close to the presynaptic spike if that slope is higher, given that when spikes occur on both sides, no weight update occurs. This induces the appropriate correlation between spike timing and the temporal slope of the postsynaptic activity level, which is confirmed by the simulation results of Section
3 and the results on the right.Right: The average STDP update according to the STDP nearest neighbor rule (vertical axis) versus the weight update according to the proposed rule (Eq. 1) (horizontal axis). We see that in average over samples (by binning values of the xaxis values), the two rules agree closely, in agreement with the visual inspection of Fig. 1.
2.2 Why it works
Consider a rising postsynaptic activity level, i.e., , as in Figure 2 (left) and a presynaptic spike occurring somewhere during that rise. We assume a Poisson process for the postsynaptic spike as well, with a rate proportional to . According to this assumption, postsynaptic spikes are more likely in a fixed time window following the presynaptic spike than in a window of the same length preceding it. Therefore, if only one spike happens over the window, it is more likely to be after the presynaptic spike, yielding a positive spike timing difference, at the same time as a positive weight change. The situation would be symmetrically reversed if the activity level was decreasing, i.e., and negative spike times are more likely.
Furthermore, the stronger the slope , the greater will this effect be, also reducing the relative spike timing between the presynaptic spike and stochastically occurring postsynaptic spikes, possibly just after or just before the presynaptic spike. This assumes that when spikes occur on both sides and with about the same time difference, the effects on cancel each other. Of course, the above reasoning runs directly in reverse when the slope of is negative, and one gets negative updates, in average. To validate these hypothesis, we ran the simulations presented in Section 3. They confirm these hypotheses and show that Eq. 1 yields a relationship between spike timing difference and weight change that is consistent with biological observations (Fig. 1).
3 Simulation results
3.1 Method
We simulate random variations in a presynaptic neural firing rate as well as random variations in the postsynaptic firing rates induced by an externally driven voltage. By exploring many configurations of variations and levels at pre and postsynaptic sides, we hope to cover the possible natural variations. We generate and record pre and postsynaptic spikes sampled according to a binomial at each discrete with probability proportional to and respectively, and record as well, in order to implement either a classical nearest neighbor STDP update rule or Eq. 1, ^{1}^{1}1 Python scripts for those simulations are available at http://www.iro.umontreal.ca/bengioy/src/STDPsimulations. The nearest neighbor STDP rule is as follows. For every presynaptic spike, we consider a window of 20 time steps before and after. If there is one or more postsynaptic spike in both left and right windows, or no postsynaptic spike at all, the weight is not updated. Otherwise, we measure the time difference between the closest postsynaptic spike (nearest neighbor) and the presynaptic spike and compute the weight change using the current variable values. If both spikes coincide, we make no weight change. To compute the appropriate averages, 500 random sequences of rates are generated, each of length 160 time steps, and 1000 randomly sampled spike trains are generated according to these rates.
For measuring the effect of weight changes, we measure the average squared rate of change in two conditions: with weight changes (according to Eq. 1), and without.
3.2 Results
Examples of the spike sequences and underlying pre and postsynaptic states and are illustrated in Fig. 3.
Fig. 1 (middle and right) shows the results of these simulations, comparing the weight change obtained at various spike timing differences for Eq. 1 and for the nearest neighbor STDP rule, both matching well the biological data (Fig. 1, left). Fig. 2 shows that both update rules are strongly correlated, in the sense that for a given amount of weight change induced by one, we observe in average a linearly proportional weight change by the other.
3.3 Link to Stochastic Gradient Descent and BackPropagation
Let us consider the common simplification in which postsynaptic neural activity is obtained as a sum with the usual terms proportional to the product of synaptic weight () and presynaptic firing rate (), i.e., in discrete time,
from some quantities and . In that case,
Stochastic gradient descent on with respect to an objective function would follow
(3) 
Hence, if
(4) 
our STDP rule (Eq. 1) would produce stochastic gradient on .
The assumption in Eq. 4 and the above consequence was first suggested by Hinton (2007): neurons would change their average firing rate so as make the network as a whole produce configurations corresponding to better values of our objective function. This would make STDP do gradient descent on the prediction errors made by the network.
But how could neural dynamics have that property? That question is not completely answered, but as shown in Bengio and Fischer (2015), a network with symmetric feedback weights would have the property that a small perturbation of “output units” towards better predictions would propagate to internal layers such that hidden units would move to approximately follow the gradient of the prediction error with respect to , making our STDP rule correspond approximately to gradient descent on the prediction error. The experiments reported by Scellier and Bengio (2016) have actuallly shown that these approximations work and enable a supervised multilayer neural network to be trained.
4 Related work
The closest work to this paper is probably that of Xie and Seung (2000), which we have already discussed. Notable additions to this work include demonstrating that the spiketiming to weight change relationship is a consequence of Eq. 1, rather than the other way around (and a small difference in the use of versus ).
There are of course many other papers on theoretical interpretations of STDP, and the reader can find many references in Markram et al. (2012), but more work is needed to explore the connection of STDP to machine learning. Many approaches (Fiete and Seung, 2006; Rezende and Gerstner, 2014) that propose how hidden layers of biological neurons could get credit assignment rely on variants of the REINFORCE algorithm (Williams, 1992)
to estimate the gradient of a global objective function (basically by correlating stochastic variations at each neuron with the changes in the global objective). Although this principle is simple, it is not clear that it will scale to very large networks due to the linear growth of the variance of the estimator with the number of neurons. It is therefore tempting to explore other avenues.
The proposed rule can be contrasted with theoretical synaptic learning rules which tend to be based on the Hebbian product of pre and postsynaptic activity, such as the BCM rule (Bienenstock et al., 1982; Intrator and Cooper, 1992). The main difference is that the rule studied here involves the temporal derivative of the postsynaptic activity, rather than the actual level of postsynaptic activity.
5 Conclusion and Open Questions
We have shown through simulations and a qualitative argument that updating weights in proportion to the rate of change of postsynaptic activity times the presynaptic activity yielded behavior similar to the STDP observations, in terms of spike timing differences.
This is consistent with a view introduced by Hinton (2007) that neurons move towards minimizing some prediction error because it would make this STDP rule perform stochastic gradient descent on that prediction error. In parallel to this work, Bengio and Fischer (2015) has shown how early propagation of perturbations in a recurrent network behaves like gradient propagation in a layered network, and could compute the gradients necessary for learning using our proposed STDP update. Scellier and Bengio (2016) have actuallly shown through experiments using this approach that it was possible to train deep supervised neural networks.
Many open questions also remain on the side of biological plausibility. STDP only describes one view of synaptic plasticity, and it may be different in different neurons and preparation. A particularly interesting type of result to consider in the future regards the relationship between more than two spikes induced statistically by the proposed STDP rule. It would be interested to compare the statistics of triplets or quadruplets of spikes timings and weight change under the proposed rule. against the corresponding biological observations (Froemke and Dan, 2002).
Future work should also attempt to reconcile the theory with neuroscience experiments showing more complex relationships between spikes and weight change, involving factors other than timing, such as firing rate, of course, but also synaptic cooperativity and depolarization (Feldman, 2012). In addition to these factors, some synaptic changes do not conform to the STDP pattern, either in sign (antiHebbian STDP) or in nature (for example incorporating a general Hebbian element as in the BCM rule). In this regard, although this work suggests that some spike timing behavior may be explained purely in terms of the rates trajectory, it remains an open question if (and where) precise timing of spikes remains essential to explain networklevel learning behavior from a machine learning perspective. It is interesting to note in this regard how the BCM rule was adapted to account for the relative timing of spike triplets (Gjorgjievaa et al., 2011).
Acknowledgments
The authors would like to thank Benjamin Scellier, DongHyun Lee, Jyri Kivinen, Jorg Bornschein, Roland Memisevic and Tim Lillicrap for feedback and discussions, as well as NSERC, CIFAR, Samsung and Canada Research Chairs for funding, as well as Compute Canada for computing resources.
References
 Bengio and Fischer (2015) Bengio, Y. and Fischer, A. (2015). Early inference in energybased models approximates backpropagation. Technical Report arXiv:1510.02777, Universite de Montreal.
 Bi and Poo (2001) Bi, G. and Poo, M. (2001). Synaptic modification by correlated activity: Hebb’s postulate revisited. Annu. Rev. Neurosci., 24, 139––166.
 Bienenstock et al. (1982) Bienenstock, E. L., Cooper, L. N., and Munro, P. W. (1982). Theory for the development of neuron selectivity: Orientation specificity and binocular interaction in visual cortex. Journal of Neuroscience, 2.
 Feldman (2012) Feldman, D. E. (2012). The spike timing dependence of plasticity. Neuron, 75(4), 556–571.
 Fiete and Seung (2006) Fiete, I. R. and Seung, H. S. (2006). Gradient learning in spiking neural networks by dynamic perturbations of conductances. Physical Review Letters, 97(4).
 Froemke and Dan (2002) Froemke, R. C. and Dan, Y. (2002). Spiketimingdependent synaptic modification induced by natural spike trains. Nature, 416(6879), 433–438.
 Gerstner et al. (1996) Gerstner, W., Kempter, R., van Hemmen, J., and Wagner, H. (1996). A neuronal learning rule for submillisecond temporal coding. Nature, 386, 76–78.
 Gjorgjievaa et al. (2011) Gjorgjievaa, J., Clopathb, C., Audetc, J., and Pfister, J.P. (2011). A triplet spiketiming–dependent plasticity model generalizes the bienenstock–cooper–munro rule to higherorder spatiotemporal correlations. PNAS, 108(48).
 Goodfellow et al. (2015) Goodfellow, I. J., Courville, A., and Bengio, Y. (2015). Deep learning. Book in preparation for MIT Press.

Hinton (2007)
Hinton, G. E. (2007).
How to do backpropagation in a brain.
Invited talk at the NIPS’2007 Deep Learning Workshop.  Intrator and Cooper (1992) Intrator, N. and Cooper, L. N. (1992). Objective function formulation of the BCM theory of visual cortical plasticity: statistical connections, stability conditions. Neural Networks, 5, 3–17.
 Markram and Sakmann (1995) Markram, H. and Sakmann, B. (1995). Action potentials propagating back into dendrites triggers changes in efficacy. Soc. Neurosci. Abs, 21.
 Markram et al. (2012) Markram, H., Gerstner, W., and Sjöström, P. (2012). Spiketimingdependent plasticity: A comprehensive overview. Frontiers in synaptic plasticity, 4(2).
 Rezende and Gerstner (2014) Rezende, D. J. and Gerstner, W. (2014). Stochastic variational learning in recurrent spiking networks. Frontiers in Computational Neuroscience, 8(38).
 Rumelhart et al. (1986) Rumelhart, D. E., Hinton, G. E., and Williams, R. J. (1986). Learning representations by backpropagating errors. Nature, 323, 533–536.
 Scellier and Bengio (2016) Scellier, B. and Bengio, Y. (2016). Towards a biologically plausible backprop. arXiv:1602.05179.
 Shepherd (2003) Shepherd, G. M. (2003). The synaptic organization of the brain. Oxford University Press.

Williams (1992)
Williams, R. J. (1992).
Simple statistical gradientfollowing algorithms connectionist reinforcement learning.
Machine Learning, 8, 229–256.  Xie and Seung (2000) Xie, X. and Seung, H. S. (2000). Spikebased learning rules and stabilization of persistent neural activity. In S. Solla, T. Leen, and K. Müller, editors, Advances in Neural Information Processing Systems 12, pages 199–208. MIT Press.
Comments
There are no comments yet.