Notes on Generalized Linear Models of Neurons

by   Jonathon Shlens, et al.

Experimental neuroscience increasingly requires tractable models for analyzing and predicting the behavior of neurons and networks. The generalized linear model (GLM) is an increasingly popular statistical framework for analyzing neural data that is flexible, exhibits rich dynamic behavior and is computationally tractable (Paninski, 2004; Pillow et al., 2008; Truccolo et al., 2005). What follows is a brief summary of the primary equations governing the application of GLM's to spike trains with a few sentences linking this work to the larger statistical literature. Latter sections include extensions of a basic GLM to model spatio-temporal receptive fields as well as network activity in an arbitrary numbers of neurons.



page 1

page 2

page 3

page 4


Non-parametric generalized linear model

A fundamental problem in statistical neuroscience is to model how neuron...

Boundary solution based on rescaling method: recoup the first and second-order statistics of neuron network dynamics

There is a strong nexus between the network size and the computational r...

A Sinc Wavelet Describes the Receptive Fields of Neurons in the Motion Cortex

Visual perception results from a systematic transformation of the inform...

Linear Models are Most Favorable among Generalized Linear Models

We establish a nonasymptotic lower bound on the L_2 minimax risk for a c...

The Gabor-Einstein Wavelet: A Model for the Receptive Fields of V1 to MT Neurons

Our visual system is astonishingly efficient at detecting moving objects...

Continual One-Shot Learning of Hidden Spike-Patterns with Neural Network Simulation Expansion and STDP Convergence Predictions

This paper presents a constructive algorithm that achieves successful on...

The encoding of proprioceptive inputs in the brain: knowns and unknowns from a robotic perspective

Somatosensory inputs can be grossly divided into tactile (or cutaneous) ...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Poisson likelihood

A common starting point for modeling a neuron is the idealization of a Poisson process Dayan and Abbott (2001). If

is the observed number of spikes, the probability of such an event is given by a Poisson distribution


where is the expected number of spikes in a small unit of time and is the intensity of the Poisson process.

A spike train is defined as a vector of spike counts binned at a brief time resolution and indexed by time . The likelihood that the spike train arises from a time-varying (inhomogenous) Poisson process with a conditional (time-varying) intensity is the product of independent observations,

where refers to the collection of intensities over the entire spike train. The log-likelihood of observing the entire spike train is

In a statistical setting our goal is to infer the hidden variables from the observed spike train . Many criterion exist for judging the quality of values for  Kay (1993); Moon and Stirling (1999). Selecting a particular set of values for that maximize the likelihood (or log-likelihood) is one such criterion that is intuitive and and relatively easy to calculate. The goal of this manuscript is to calculate the maximum likelihoodestimate of hidden variables by optimizing (or maximizing) the log-likelihood.

In our setting, the spike train has been observed and is therefore fixed. The log-likelihood is solely a function of the (unknown) conditional intensities . To emphasize our application of likelihood, we relabel and group the terms independent of as some arbitrary normalization constant ,

In practice, models of neurons are at fine temporal resolutions, thus is effectively binary, if not largely sparse. The binary representation of simplifies the log-likelihood further


In the final form of the log-likelihood, we ignore the normalization constant because the goal of the optimization is to determine the hidden variables (not the absolute value of the likelihood).

Ii Generalized Linear Model

Figure 1: Diagram of a generalized linear model of a single neuron (a) and a network of neurons (b).

A GLM can attribute the variability of a spike train to a rich number of factors including the stimulus, spike history and network dynamics. The GLM extends the maximum likelihood procedure to more interesting variables than by positing that the conditional intensity is directly related to biophysically interesting parameters. For example, pretend there is some parameter , a neuron’s receptive field. If we posit that the receptive field is linearly related to some known (and invertible) function of , then in principle it is just as easy to infer as it is to infer . By positing a linear relationship, the stimulus filter is quite simple to estimate, but surprisingly provides a rich structure for modeling the neural response.

More precisely, the goal of a single neuron GLM is to predict the current number of spikes using the recent spiking history and the preceding stimulus. Let represent the vector of preceding stimuli up to but not including time . Let be a vector of preceding spike counts up to but not including time .

Importantly, note that and are fixed known variables but

is an unknown random variable. We posit that

is distributed according to a Poisson distribution whose conditional intensity is related to the stimulus and previous spiking history by


where is a stimulus filter of the neuron (i.e. receptive field), is a post-spike filter to account for spike history dynamics (e.g. refractoriness, bursting, etc.) and is a constant bias to match the neuron’s firing rate. Figure 1a provides a diagram of Equation 3. Each term of the conditional intensity increases or decreases the average firing rate depending on the preceding stimulus (term 1) and the spiking history of the neuron (term 2). is an arbitrary invertible function termed a link function. One possibilty is to select , termed the natural link function, because it conveniently simplifies the likelihood calculation and the interpretation of individual terms McCullagh and Nelder (1989). In this case, a stimulus preceding time that closely matches the filter increases the average spike rate by a multiplicative gain . Likewise, if a spike occurs just prior to time , the convolution of the spike occurrence (a delta function) with the post-spike filter diagrammed in Figure 1a decreases the probability of a spike by factor to mimic a refractory period.

It is worth noting at this point that the name generalized linear model refers to the requirement that is linearly related to the parameters of interest, . Combining equations 2 and 3, the probability of observing the complete spike train is


Importantly, the likelihood is concave everywhere in the parameter space , thus no local maxima exist and ascending the gradient leads to a single, unique global maximum. To calculate the maximum likelihood estimate of one must calculate the gradient and Hessian of the likelihood with respect to each variable,

Gradient Hessian
= = =
= = =
= = =

where denotes the total number of spikes in the spike train. These series of equations are solved by standard gradient-ascent algorithms (e.g. Newton-Raphson), permitting an estimate of the model parameters for a given spike train and stimulus movie. In practice, a model with 30 parameters can be fit with roughly 200 spikes.

Iii Receptive fields in space and time

Generalizing the GLM to multiple spatial dimensions, the conditional intensity retains the same form222For the remainder of this manuscript we drop the subscript on , and for notational simplicity.


with the inner product of the stimulus filter and the stimulus summed over all spatial locations indexed by . In practice though the number of parameters in the stimulus filter grows quadratically as the number of spatial locations and samples in the receptive field’s integration period, making any estimate computationally slow and potentially intractable. To exploit matrix algebra in further equations, define and . Rather than letting the number of parameters grow quadratically, one simplification is to posit that is space-time separable, meaning that can be written as an outer product,

where the vectors and independently capture the spatial and temporal components of the receptive field, respectively. Rewriting the conditional intensity with matrix algebra,


the model is not linear, but quadratic, in parameter space and the likelihood surface is not concave everywhere (i.e. local maxima exist). In particular, the lack of concavity arises from the new Hessian term measuring the dependency between the space and time filters.


Ascending the gradient, however, produces a good approximation of the standard model but with a notable reduction in the number of parameters. The gradient formulae are all the same as Section II if one replaces with for the derivatives associated with , and for the derivatives associated with .

Iv Networks of neurons

The most significant feature of networks of neurons is that the activity of one neuron significantly influences the activity of surrounding neurons. One common example of correlated activity is the observation of synchronized firing, when two or more neurons fire nearly simultaneously more often then expected by chance. Importantly, correlated activity can be independent of shared stimulus drive and instead reflect an underlying biophysical mechanisms (e.g. common input, electrical or synaptic coupling, etc.).

A naive implementation of the GLM (Equation 3) for two neurons with conditional intensities and would fail to capture synchrony or any correlated activity because and are independent333More precisely, the neurons modeled by and would not be independent but conditionally independent Schneidman et al. (2003). The latter distinction means that both neurons would be independent given that the stimulus is held fixed. This distinction is important because one could select a stimulus which drives both neurons and attribute the apparent correlated activity to an underlying mechanism as opposed to the choice of stimulus.. A simple extension of the conditional intensity can mimic correlated activity such as synchrony Pillow et al. (2008). In particular, one can add post-spike filters to permit spikes from one neuron to influence the firing rate of another neuron (see Figure 1b). The conditional intensity of neuron is then


where we have added the subscript throughout to label neuron . The term sums over post-spike activity received by neuron , whether internal dynamics () or activity from other neurons ().

The complete likelihood of a population of neurons is


where is the parameter set. Note again that the subscript for time is dropped from the stimulus and spike trains for clarity – although the likelihood sums over this implicit subscript. The gradient of the likelihood of each post-spike coupling term must be generalized

where care must be taken to pair the appropriate spike train with conditional intensity . Most of the terms of the Hessian are preserved from the single neuron case (Section II) except for the terms associated with :

Again, care must be taken to pair the appropriate spike train with the appropriate conditional intensity . Note that Hessian terms between different neurons (e.g. , , for ) are zero due to the linearity of the conditional intensity. Because the cross terms of the Hessian between neurons are zero, one can fit a parameter set for each neuron independently. In practice, this independence vastly simplifies the fitting procedure because each neuron can be fit in parallel.

V Conclusions

This manuscript provided a very brief summary of analyzing neural data with generalized linear models (GLM’s). The GLM is a flexible framework that allows one to attribute spiking activity to arbitrary phenomena – whether network activity, sensory stimuli or other extrinsic factors – and then estimate parameters efficiently using standard maximum likelihood techniques. Thus far, GLM’s (or related ideas) have been used to model network activity in the retina Pillow et al. (2008), motor cortex Truccolo et al. (2005), visual cortex Koepsell et al. (2008), hippocampus Brown et al. (2001) as well as devise new strategies for the design of experiments Paninski et al. (2007), quantify stimulus information in spike trains Pillow and Paninski (2007) and help a lowly graduate student complete a Ph.D. thesis Shlens (2007).


  • Brown et al. (2001) Brown, E., D. Nguyen, L. Frank, M. Wilson, and V. Solo, 2001, Proceedings of the National Academy of Sciences 98(21), 12261.
  • Dayan and Abbott (2001) Dayan, P., and L. Abbott, 2001, Theoretical Neuroscience: Computational and Mathematical Modeling of Neural Systems (The MIT Press).
  • Kay (1993) Kay, S. M., 1993, Fundamentals of Statistical Signal Processing, Volume I: Estimation Theory (Prentice Hall PTR).
  • Koepsell et al. (2008) Koepsell, K., T. Blanche, N. Swindale, and B. Olshausen, 2008, in Computational and Systems Neuroscience (Cosyne) 2008 (Salt Lake City, Utah).
  • McCullagh and Nelder (1989) McCullagh, P., and J. A. Nelder, 1989, Generalized Linear Models, Second Edition (Monographs on Statistics and Applied Probability) (Chapman & Hall/CRC).
  • Moon and Stirling (1999) Moon, T. K., and W. C. Stirling, 1999, Mathematical Methods and Algorithms for Signal Processing (Prentice Hall).
  • Paninski (2004) Paninski, L., 2004, Network 15(4), 243.
  • Paninski et al. (2007) Paninski, L., J. Pillow, and J. Lewi, 2007, in Computational Neuroscience: Theoretical Insights into Brain Function, edited by P. Cisek, T. Drew, and J. Kalaska (Elsevier, Amsterdam).
  • Pillow and Paninski (2007) Pillow, J., and L. Paninski, 2007, Neural Computation , submitted.
  • Pillow et al. (2008) Pillow, J., J. Shlens, L. Paninski, A. Sher, A. Litke, E. Chichilnisky, and E. Simoncelli, 2008, Nature , accepted.
  • Schneidman et al. (2003) Schneidman, E., W. Bialek, and M. Berry, 2003, J Neurosci 23(37), 11539.
  • Shlens (2007) Shlens, J., 2007, Synchrony and concerted activity in the neural code of the retina, Ph.D. thesis, University of California, San Diego.
  • Truccolo et al. (2005) Truccolo, W., U. Eden, M. Fellows, J. Donoghue, and E. Brown, 2005, J Neurophysiol 93(2), 1074.