Log In Sign Up

Estimating smooth and sparse neural receptive fields with a flexible spline basis

Spatio-temporal receptive field (STRF) models are frequently used to approximate the computation implemented by a sensory neuron. Typically, such STRFs are assumed to be smooth and sparse. Current state-of-the-art approaches for estimating STRFs based on empirical Bayes are often not computationally efficient in high-dimensional settings, as encountered in sensory neuroscience. Here we pursued an alternative approach and encode prior knowledge for estimation of STRFs by choosing a set of basis functions with the desired properties: natural cubic splines. Our method is computationally efficient and can be easily applied to a wide range of existing models. We compared the performance of spline-based methods to non-spline ones on simulated and experimental data, showing that spline-based methods consistently outperform the non-spline versions.


page 1

page 4

page 12

page 27

page 30


Dynamic texture recognition using time-causal and time-recursive spatio-temporal receptive fields

This work presents a first evaluation of using spatio-temporal receptive...

Polyhedral Spline Finite Elements

Spline functions have long been used in numerically solving differential...

A Tchebycheffian extension of multi-degree B-splines: Algorithmic computation and properties

In this paper we present an efficient and robust approach to compute a n...

P-spline smoothing for spatial data collected worldwide

Spatial data collected worldwide at a huge number of locations are frequ...

Adaptive spline fitting with particle swarm optimization

In fitting data with a spline, finding the optimal placement of knots ca...

Multidimensional Adaptive Penalised Splines with Application to Neurons' Activity Studies

P-spline models have achieved great popularity both in statistical and i...

1 Introduction

Spatio-temporal receptive fields (STRFs) are frequently used in neuroscience to approximate the computation implemented by a sensory neuron. Such models consist typically of one or more linear filters summing up sensory inputs across time and space, followed by a static nonlinearity and a probabilistic output process [aljadeff_analysis_2016]. For this, different distributions can be used depending on the type of recorded neuronal responses.

The simplest way to estimate a STRF for a given data set is arguably to compute the spike-triggered average (STA), the average over all stimuli preceding a spike [schwartz_spike-triggered_2006]. However, the results of the spike-triggered analysis are often noisy due to limited data and prone to be distorted when the neuron is stimulated with correlated noise or natural stimuli [paninski_convergence_2003]. The maximum likelihood estimate (MLE), also known as whitened-STA (wSTA) [theunissen_estimating_2001], does not suffer from such shortcomings, but it requires even more data to converge as the whitening procedure tends to amplify noise at frequencies with low power, making it a less preferred option in many experimental settings. A modified wSTA that removes low power frequencies can result in smoother STRF, while the threshold for such a frequency cutoff is normally chosen by cross-validation [theunissen_estimating_2001].

It has been suggested that these shortcomings can be overcome by computing the maximum a posteriori (MAP) estimates within the framework of Bayesian inference

[wu_complete_2006]. Under this framework, prior knowledge about STRFs such as sparsity, smoothness [sahani_evidence_2002], and locality[park_receptive_2011]

can be encoded into a prior covariance matrix, whose hyperparameters are learnt automatically from the data via evidence optimization (also known as empirical Bayes, or Type II maximum likelihood in the literature

[casella_introduction_1985]). These algorithms thus provide STRF estimates with the desired properties even with little or noisy data. Although elegant in theory, the evidence optimization step of this framework is computationally costly: the cubic complexity in the dimensionality of the parameter space renders it difficult to scale to high-dimensional settings as often encountered in sensory neuroscience [wu_complete_2006]. Also, current popular implementations of MAP estimates work mostly under the Linear-Gaussian encoding model, which assumes additive Gaussian noise to the neural response, and are restricted to simple one filter models.

An alternative approach to encoding prior knowledge for estimating STRFs is to parameterize the STRF with a set of basis functions with the desired properties. In this way, the model fitting becomes much more efficient as the number of parameters to be optimized is significantly reduced, and some degree of smoothness is automatically enforced. Previous studies used a raised-cosine basis (Figure 1A) for 1D retinal ganglion cell STRFs in macaque monkey [pillow_spatio-temporal_2008] and Gaussian spectral or Morlet wavelet basis for 2D auditory STRFs in ferrets [thorson_essential_2015]. Those basis functions are chosen to capture certain STRF properties in their corresponding studies, thus they are relatively inflexible to generalize to other settings. Besides, they are typically controlled by multiple hyperparameters, which creates a burdensome and time-consuming model selection process.

Figure 1: Illustration of the raised cosine and natural cubic spline basis. A: A typical raised-cosine bases as in [pillow_spatio-temporal_2008] controlled by several parameters, such as number of basis functions, the degree of nonlinearity and the placement of endpoints. The choice of these parameters determines the final goodness of fit. B

: An equally spaced 1D natural cubic spline, controlled by only one parameter: the number of basis functions (aka the degree of freedom).

C: Fitted 1D filters using different bases and different sets of parameters. D: An example of 2D natural cubic splines. Left: Two selected bases of 2D natural cubic splines. Middle: Weighted bases after fitted to a simulated 2D STRF. Right: An example of a fitted 2D STRF overlaid on the ground truth.

Here, we propose to use an alternative basis for receptive field estimation: natural cubic splines (Figure 1

B). These are known as the smoothest possible interpolant and can be easily extended to high-dimensional settings with minimal assumptions and few hyperparameters

[wood_generalized_2017]. This spline basis can be incorporated into a wide range of existing receptive field models (Figure 2), such as the Linear-Gaussian (LG) and the Linear-Nonlinear-Poisson (LNP) model[pillow_prediction_2005, pillow_spatio-temporal_2008] for single STRF estimation as well as Linear-Nonlinear-Linear-Nonlinear (LNLN) cascade models[mcfarland_inferring_2013, liu_inference_2017] and spike-triggered clustering methods [liu_inference_2017, shah_inference_2020] for subunit estimation. When combined with proper regularization, smooth and sparse STRFs can be retrieved with very little data and in a computationally efficient manner, while their predictive performance reaches state-of-the-art level. Furthermore, we introduce diagnostic tools for assessing the quality of an estimated STRF: (1) a significance test based on the analytical solution of the STRF covariance matrix and (2) a permutation test based on model prediction. We provide all developed methods as part of the RFEst Python toolbox.

2 Results

We developed the RFEst Python toolbox for spline-based STRF estimation in various models, including the single-filter LG and LNP models as well as the multi-filter LNLN cascade model (Figure 2). We measured the performance of the spline-based methods and compared it with their non-spline versions or other previous methods, using simulated data as well as neural recordings from the retina and the visual cortex. To quantify the goodness of fits for the simulated benchmark data, we measured the mean squared error (MSE) between the ground truth STRF and the estimated STRFs for various methods and studied their dependence on the amount of available data. For the experimental data, we computed the correlation between the predicted and measured responses on a held-out test set as a performance measure.

Figure 2: Overview of the different spatio-temporal receptive field (STRF) models used in this study. A: Linear-Gaussian (LG) encoding model. , where is the stimulus design matrix, the response, the STRF. B: Linear-Nonlinear-Poisson (LNP) model. , where is the filter nonlinearity. C: Linear-Nonlinear-Linear-Nonlinear-Poisson (LNLN) model. , where is the output nonlinearity. Optionally, an autoregressive term convolving previous response with a response-history filter can be added before the nonlinearity to improve the model predictive performance.

2.1 Estimating receptive fields from simulated LG, LNP and LNLN models

First, we simulated the responses of a sensory neuron with additive Gaussian noise using a STRF with 30 time frames and 40 pixels in space in response to white and pink noise stimuli for various stimulus lengths (see Methods 4.1 for details). First, we fit LG models (Figure 2A) to these responses using various STRF estimation techniques.

We measured the MSEs between the ground truth STRF and the different STRF estimates with respect to the ratio of the number of samples to the number of parameters , averaged over 10 different random seeds (Figure 3A). As expected, the STA and wSTA approached the ground truth only in high data-to-parameters regimes. For limited data (e.g. n/d=4), the estimated STRF showed substantial background noise (Figure 3B). In the case of pink noise, the STA was in fact never able to recover the STRF due to the high correlation between pixels in the stimulus [paninski_convergence_2003] (Figure 3C-D).

We also applied more advanced techniques to estimate the STRF to the simulated data, such as automatic smoothness determination (ASD, aka MAP with smoothness prior [sahani_evidence_2002]) and automatic locality determination (ALD, aka MAP with locality prior [park_receptive_2011]). These estimates were much closer to the ground truth already with a relatively small amount of data, with ALD outperforming ASD for both white and pink noise (Figure 3). However, their computation time for estimating the optimized prior was long due to the high computational complexity.

We then applied a spline-based version of the wSTA (SPL wSTA) with and without L1 regularization on the basis function coefficients (see Methods). This simple change of basis functions led to a STRF estimator which performed very well for all n-d-ratios tested, even better than ASD and ALD, especially when using L1 regularization (Figure 3). At the same time, this estimator was also computationally very efficient, as the number of coefficients to be estimated was reduced to the number of basis functions (for this example, the STRF had parameters, whereas the number of basis functions was only )

Figure 3: Linear-Gaussian (LG) Model. 2D simulated RF stimulated by white and pink noise in various lengths. A and C: Similarity between ground truth and estimated RFs measured as mean squared error (MSE) as a function of the amount of simulated data. Error bars are computed by averaging over 10 trials with different random seeds. B and D: Ground truth and examples of estimated RFs by different methods at n/d=4, indicated by vertical grey dashed line in A and C.

We next simulated responses from an LNP model (Figure 2B) with Poisson noise similar to before using white and pink noise for stimulation. For simplicity, we used an exponential nonlinearity and treated it as known during the estimation procedure. We estimated the linear filter from different amounts of data using the STA, wSTA, ASD and ALD, as well as spline and non-spline LNP fitted via gradient descent with L1 regularization (referred to as LNP+SPL and LNP, respectively). Note that the original ASD and ALD were not designed for the LNP but the LG model (though some progress has been made for the former [park_bayesian_2013]). Despite this model mismatch, they still recovered good estimates of the STRF (as has been demonstrated in the original papers [sahani_evidence_2002, park_receptive_2011]).

Again, we found that different methods resulted in estimates of different quality depending on the available amount of data (Figure 4). Here, in contrast to the LG model, we changed the x-axis of Figure 4A and C to time in minutes for better presentation, as the model performance was dependent on the number of effective samples (frames that contain at least one spike) instead of the total number of samples, and the stochastic nature of the Poisson spike generator yielded different number of spikes for each trial. For both stimulus conditions, we found that the spline-based LNP yielded extremely data-efficient estimates of the linear filter compared to all other methods. In contrast, gradient-based optimization of an LNP model without spline basis, even with L1 regularization, performed only slightly better than the STA and wSTA computed under the mismatched LG model assumptions. The gap in performance of the LNP+SPL and MAP estimators of ASD and ALD were larger here compared to the LG case. This was partially due to the fact that the number of effective samples was smaller in Poisson spiking data, but the mentioned model mismatch between the cost function of the LG model optimized by the ASD and ALD methods and the data likely also contributed.

Figure 4: Linear-Nonlinear-Poisson (LNP) Model. A simulated 2D RF stimulated by white and pink noise for various stimulus lengths. A and C: Details are the same as in Figure 3. B and D: Example estimated RFs by different methods at data=4 minutes, indicated by vertical grey dash line in A and C.

We finally investigated how recently proposed subunit models (Figure 2C), in particular an LNLN cascade model [mcfarland_inferring_2013, maheswaranathan_inferring_2018] and spike-triggered clustering or non-negative matrix factorization methods[shah_inference_2020, liu_inference_2017], can be made more data-efficient using a spline basis. We compared their performance with and without spline augmentation (see Methods 4.6

). The assumption behind the spike-triggered clustering methods is that each spike can be attributed to a specific subunit of a STRF, so that each subunit can be viewed as a STA of a cluster of spikes. If this assumption holds, then a spline-approximated STA should be able to serve as a better centroid for clustering, as the smoothed STA is closer to the ground truth subunit STRF. We used k-means clustering as an in-place of the previous published soft-clustering method

[shah_inference_2020] and used a multiplicative update rule[ding_convex_2010] for semi-nonnegative matrix factorization (semi-NMF) [liu_inference_2017]. Moreover, we modified the previous published semi-NMF [liu_inference_2017] and imposed the nonnegative constraint to the weight matrix instead of the subunit matrix, which allowed us to retrieve antagonistic STRFs.

Figure 5:

Subunit models. Two antagonistic difference-of-Gaussian 2D RFs stimulated by white noise. The subunit STRFs were estimated by k-means clustering, semi-NMF and LNLN, without and with splines. Examples shown are estimated subunits by different methods at data = 4 and data = 32 minutes, indicated by vertical light and dark grey dash line in the bottom panel, respectively. Similarity between ground truth and estimated RFs measured as mean squared error (MSE) as a function of the amount of simulated data.

We simulated responses with Poisson noise from the model by stimulating two 20x20 pixels antagonistic center-surround subunits with different lengths of white noise stimuli. Both filter outputs first went through an exponential nonlinearity, were then summed up, and finally went through a softplus output nonlinearity.

We estimated subunit STRFs using k-means clustering or semi-NMF of the spike-triggered ensemble, and maximum-likelihood estimation of the LNLN model directly, with or without a spline basis (Figure 5). Similar to the results from previous simulations, retrieved subunit STRFs from spline-based models were generally much more similar to the ground truth than models without spline, and required a smaller amount of data to achieve a similar level of performance. For example, when using a spline basis, we were able to retrieve the two subunit filters at high quality using only 4 minutes of data. With this amount of data, all other methods still showed essentially only noise with a hint of signal in the estimate of the linear filter.

It is worth noting that the computational efficiency of LNLN estimation was improved a lot by using a spline basis, as the number of coefficients to be estimated was significantly reduced. In contrast, the computation time for both spike-triggered clustering methods was actually increased by incorporating splines, because more computational steps were added to the original algorithms in those modified versions (see Methods 4.6).

2.2 Comparing computation time between different methods and model selection

It was not straightforward to fairly compare the computation time between MAP-derived and spline-based GLMs, as the most time-consuming part for MAP was to estimate the prior hyperparameters via evidence optimization, while spline-based GLMs need to resort to cross-validation for selecting the number of basis functions (, also known as the degrees of freedom) for each dimension. Yet, we can still get an impression of the algorithmic efficiency of each method by profiling their most time-consuming step.

Therefore, we compared the time taken by the computation of the STA (see equation 1 in Methods), wSTA (equation 2), MAP (equation 3) and SPL wSTA (equation 12) for different amounts of data by separately profiling the single most time consuming computation (Figure 6). Not surprisingly, STA was the fastest, as it had a linear time complexity () and its computation time depended mainly on the amount of data (), while wSTA and MAP were the slowest, as they were both limited by the calculation of the inverse matrix, and their computation time grew in a cubic manner with respect to the number of coefficients () in the STRF due to the cubic time complexity () of the matrix inversion. SPL on the other hand, was less time-consuming than wSTA and MAP, even though the time complexity was still cubic. However, the improvement was due to the reduced number of model parameters by parameterizing the STRF coefficients with the spline basis ( coefficients), which resulted in a much smaller covariance matrix to invert (), thereby speeding up the most time-consuming step of the algorithm substantially. Thus, spline-based STRF estimators offer the best available compromise between STRF estimation quality and computation time.

Figure 6: Comparison of computation time between different methods with different number of STRF coefficients, amount of data and number of spline basis. MAP refers to both ASD and ALD methods, as the size of the covariance matrix was the same for both techniques. Measurements of computation time were averaged over 10 repetitions and performed on a 2019 16-inch MacBook Pro with a 2.3 GHz 8-core Intel Core i9 and 16 GB of RAM. See Method 4.10 for more information.

The total amount of time spent on finding an optimal STRF depends on many factors, such as how the model is initialized, the choice of learning rate and stopping criteria for gradient descent optimization. Importantly, the way hyperparameters are selected also contributes significantly: most commonly, the model prediction performance is monitored for on a held-out data set over a discrete grid of hyperparameters. Even for evidence optimization methods, in which the main selling point is to learn the optimal hyperparameters automatically, cross-validation can still be used to assess model goodness of fit and generalization performance [sahani_evidence_2002, park_bayesian_2011]. For spline-based GLM, this requires refitting the same model as often as the number of hyperparameter sets. Given that splines are equally spaced, a good strategy to accelerate the process would be to measure the performance of SPL wSTA over a grid of possible candidates from very few to many, choose the number when the performance is starting to plateau, then fit SPL+L1 with gradient descent to retrieve a sparse STRF.

2.3 Diagnostic tools

Figure 7:

Examples of diagnosing STRF quality with confidence interval, p-value, prediction performance.

A: Ground truth STRF with the dimensions 25x25x25. The 3D STRF is the Kronecker product of the respective temporal and spatial RF. B: A well fitted STRF with the sample size equals to . (i) the estimated spatial RF at the minimum of temporal RF, indicated by the gray vertical dotted line in (ii); (ii) the estimated temporal RF; (iii) the change of correlation between the data and predicted response over all gradient descent iteration. (iv) the predicted performance on the permuted validation set. C: Same as B but with half amount of data. D: Same as B but with stronger additive Gaussian noise. E: Same as B but the simulated response were permuted.

In addition, even if a well-regularized model is chosen according to predictive performance, additional diagnostic tools can be useful to decide whether a certain STRF is reasonable and shows a clear structure (Figure 7

). We devised three diagnostic tools that can be employed to this end: (1) the confidence interval (CI) of the fitted STRF and its corresponding p-values based on Wald statistics (with the null hypothesis that all STRF coefficients are zero, see also Method

4.7); (2) the difference between the metrics of the training and the validation set over all iterations of the optimization; and (3) a permutation test on model prediction for validation and/or test data set (Method 4.8).

To demonstrate their usefulness, we investigated a ground truth 3D STRF under a Linear-Gaussian model. We showcased four different scenarios (Figure 7): First, we looked at a well-fitted STRF (Figure 7

A). Here, the CI was narrow and significantly different from an all-zero STRF (p<0.001), the gap between the training and validation prediction metrics was small and the prediction on the validation set was better than chance (p<0.001). The confidence interval and its corresponding p-value were determined by the training data. Next, we looked at scenarios where the amount of data was small or the signal-to-noise ratio (SNR) was low (Figure

7 C and D). Here, the STRF did not pass the significance test due to a lack of power, but it is still possible that the model predictions on the validation set were better than chance. Additionally, if the gap between the performance metric on the training and validation sets gets too large, it would be a good idea to refit the model with a stronger regularization. Finally, in the case of the data with extremely low SNR (Figure 7E), which we generated by randomly permuting the simulated response, the prediction performance of the fitted STRF was just at the chance level regarding all indicators.

2.4 Application to experimental data

We next documented the complete process of using gradient-based estimation for the parameters of spline-based LNP and LNLN models in a previously published experimental data set: extracellularly recorded spikes from tiger salamander retinal ganglion cells (RGC), stimulated by 3D non-repeat checkerboard white noise [liu_inference_2017] (Figure 9A). In addition, we applied the same techniques to 2-photon calcium imaging data recorded from a mouse retinal ganglion cell soma stimulated by checkerboard white noise (Figure 9B) (using the positive gradient of the calcium trace as the response). We compared the results of spline-based models with non-spline versions.

For both data sets, only the first 10 minutes of the whole recording were used for fitting the models, another 2 minutes were used for model validation and another 4 sets of 2 minutes for testing. The predictive performance was measured by the average of the four correlation coefficients of the predicted and measured responses. We set the number of subunits throughout. The dimensionality of the 3D STRFs were [20, 25, 25] and [25, 15, 15], respectively. We added diagnostic information as in Figure 7. Moreover, a contour was drawn on the spatial filter to illustrate the spatial extent of the STRF.

We fitted the non-spline LNP and LNLN with different L1 regularization weights from 0 and 10, with steps of 1.

For spline LNP and LNLN, the number of basis functions (aka degree of freedom, df) for temporal and spatial dimensions (assuming x and y dimensions here used the same df) were first selected based on the predicted performance of the maximum likelihood estimates on the validation set (from 1/3 to 2/3 of the pixels in each dimension, with steps of 1). Finally, for data set (1), the grid search took 11.2 minutes on the same hardware setup used for Figure 6 and yielded the optimal df=[9, 9, 9] ; for data set (2), it took 2.4 minutes and yielded the optimal df=[12, 9, 9] (Figure 8). See also how different amounts of training data affects the optimal df in Supplementary Section 1.

Figure 8: Choosing the number of basis functions with cross-validation. The predictive performance of the SPL wSTA on the validation set for a grid of different degrees of freedom (df). The optimal df are indicated by red crosses. A: Spike recordings from a salamander RGC. B: 2-photon calcium imaging recordings from a mouse RGC (see Methods).

Then, we fitted spline LNP and LNLN with L1 regularization weights ranged from 0.5 and 1.5, with steps of 0.1 (as we observed, the spline model required a smaller value for regularization due to the reduction of model parameters). The ordered grid search for the L1 weight, starting with the smallest value, was interrupted if the current predictive performance was lower than the previous one. All models were initialized with maximum likelihood estimates (plus small additive Gaussian noise) and fitted by gradient descent with 1500 iterations, but an early stop was also deployed with two triggers: either the training cost changed less than 1e-5 for the last 10 iterations, or the validation cost monotonically increased for the last 10 iterations. Sometimes the optimization didn’t stop after reaching the lowest validation cost due to the adaptive learning rate resulting in the cost changing in a zigzag manner. But regardless of the stopping trigger, the model at the lowest validation cost was returned. The computation time of each fit for all four models was summarized by box plots in Figure 9A and B.

Fitting a 3D STRF with gradient descent directly was considered infeasible unless certain regularization was imposed [maheswaranathan_inferring_2018], and the situation would become even worse for subunit models, as the number of model parameters increases linearly with the number of subunits. Previously methods [liu_inference_2017, shah_inference_2020] (as shown in Figure 5) relied on dimensionality reduction techniques to retrieve subunit STRFs but those methods could not be applied to 2-photon calcium imaging data directly without an extra step of spike inference. Spline GLMs not only overcome both of these problems but also allow us to assess the quality of the estimated STRF with CI and their corresponding p-values (In principle, the same statistics can also be applied to non-spline GLMs, yet a larger number of model coefficients would result in high computational cost for computing the STRF posterior covariance).

We found that for both example neurons from the two data sets, STRFs estimated by non-spline models were noisy and prediction performance was only moderate (Figure 9A, B, A, B). This was especially prominent for LNLN estimates, as here the number of parameters to estimate exceeded the amount of available data to fit the models (Figure 9A and B). In contrast, the spline-based LNP and LNLN estimated in general were smooth and sparse even with a severely limited amount of data, and they consistently outperformed their non-spline versions (Figure 9A and B).

The number of subunits that could be retrieved by the model was limited by the amount of available data (total length of the recording, and for spike data, the total number of spikes) as well as the signal-to-noise ratio of the data. With the limited data in our examples, we found three and two non-overlapping subunits for the two neurons, respectively (Figure 9A and B). Additional subunits available to the model were pushed to near zero by the L1 regularization and the significance test failed to reject the null hypothesis that all coefficients in those subunit STRFs were close to zero. But as noted in the previous section, the p-value should not be the only judge for the quality of the estimated STRF. As shown in 9B, the STRF from the spline LNP model was also non-significant but looked reasonable from the snapshot visualization. In cases like this, other tools such as the permutation test shown in the previous section would be required to determine whether the STRF in question was good or not.

We provided two more examples of visual neurons (RGC and V1) and one example of non-visual neuron (entorhinal grid cell) in Supplementary Section 2.

Figure 9: Spline-based LNP and LNLN consistently outperformed their non-spline versions on both example neurons from two data sets: A: Spike recordings from a salamander RGC, B: 2-photon calcium imaging data from a mouse RGC. (see Methods 4.1 Experimental data for details). i and ii: STRFs estimated from LNP and LNLN with or without spline, where number of subunits for all LNLN models; Shading for temporal filters show 95% confidence interval (see Methods, 4.7), which is too small in A to be visible. iii: predictive performance comparison of LNP and LNLN with and without spline, measured by the averaged correlation coefficients of the test set response and the predictive ones. Error bars were computed from 4 sets of test data. iv: Computation time of each fit for the same models were summarized in the box plots.

3 Discussion

Here, we showed that natural cubic splines form a good basis for STRF estimation by tackling one of the main challenges in sensory neuroscience: to efficiently estimate high-dimensional STRFs with limited data. Using this basis function set, we showed that inference for single filter LN models with different noise models as well as LNLN cascade models can be efficiently performed from both spiking as well as two-photon imaging data, even if the signal-to-noise ratio is low (and see also how spline-based methods compare to previous methods in Table 1). We provide an implementation of the proposed methods in the RFEst-package for Python.

Methods Stimuli Type Noise Distribution Data needed Complexity Multiple Filters
STA White Noise only Gaussian a lot No
wSTA Any Gaussian a lot No
MAP Any Gaussian little No
LNLN Any Poisson a lot Yes
SPL LG Any Gaussian little No
SPL LNP Any Poisson little No
SPL LNLN Any Poisson little Yes
Table 1: Model comparison. Here, is the number of samples, the number of STRF coefficients, the number of spline basis coefficients and the number of subunits. In most cases, .

3.1 Choice of basis functions

We chose the natural cubic spline basis to represent the linear filters as default in our toolbox, because among many other spline bases such as cyclic cubic splines, B-splines, or thin-plate splines, natural cubic splines have been shown to be the smoothest possible interpolant through any set of data and theoretically near-optimal to approximate any true underlying functions closely [wood_generalized_2017]. Conveniently, assuming all knots are spaced equally in each dimension, natural cubic splines only require setting one hyperparameter: the degree of freedom, or the number of basis functions. This is unlike the raised-cosine basis used previously[pillow_spatio-temporal_2008]

, which is governed by multiple hyperparameters. Also, natural cubic splines can be easily extended to high-dimensional settings, also known as tensor product bases, by simply taking the row-wise Kronecker product of the basis matrix along each dimension (see Method

4.4), making them uniquely suitable for estimating STRFs. There is also an interesting connection between smoothing splines and Gaussian processes. As shown by [kimeldorf_correspondence_1970, bay_generalization_2016, bay_generalization_2016], spline regression converges to the MAP solution of a Gaussian process regression under certain constraints.

Through extensive validation with simulated and real experimental data sets, we showed that STRFs estimated with this basis can not only be more accurate than previous state-of-the-art methods, but the estimation procedure was also computationally much more efficient. The estimated STRFs were smoothed automatically, as they were the linear combination of a set of smooth basis functions. Further, sparsity and localization could often be achieved by adding an L1 regularization term to the loss function, which pushed the coefficient for the less relevant basis functions to zeros. Moreover, by parameterizing STRFs with basis functions, the number of coefficients could be significantly reduced, hence model fitting became much more computationally efficient, and the amount of data needed for the optimization to converge was also significantly less.

3.2 Potential pitfalls of using splines in non-ideal situations

One caveat of using splines to fit STRFs is that some STRFs might not need smoothing, thus using splines would lead to suboptimal results. For example, if a STRF occupies only one pixel in space or the amplitude changes sharply in the temporal profile, spline-based model would likely yield suboptimal results by yielding overly smoothed filters. However, one can identify these cases by monitoring the cross-validation performance on held-out data while increasing the number of basis functions in each dimension. If the performance keeps increasing as the number of basis functions reaches the number of pixels in each dimension, a spline-free model could be preferable.

Another concern of using splines is that it would be difficult to tell a poor fit from a good one in some cases, as a poorly-fitted spline-based STRF would also look like a smooth STRF, unlike pixel-based methods that yield a noisy STRF when the fit is poor. In these cases, the diagnostic tools we introduced in Section 2.3 and Figure 7 would be helpful to assess the quality of the STRFs.

3.3 Other approaches to efficient estimation of receptive fields

Similar to our approach, fast automatic smoothness determination (fASD) [aoi_scalable_2017]

has been suggested to speed up the process of evidence optimization in ASD. Because the smoothness prior covariance can be represented in frequency coordinates and many high-frequency coefficients of the Fourier-transformed filter tend to be small, the size of the prior covariance matrix can be reduced by cutting off those small value frequencies (an idea similar to the modified wSTA

[theunissen_estimating_2001]), leading to a speed-up in the optimization process. However, as the efficiency of fASD is controlled completely by the initial frequency truncation, and the level of frequency to be truncated has to be fixed before the evidence optimization starts, one needs to resort back to grid search and cross-validation for choosing a good starting point, making the automatic hyperparameters selection via evidence optimization less automatic and comparable to the need to find the optimal degrees of freedom for the spline basis. Also, as fASD (and original ASD) rely on only a few hyperparameters, it might not even require to do evidence optimization at all but just to find the optimal hyperparameters through cross-validation completely. Importantly, fASD is so far only applicable within the framework of linear models with Gaussian additive noise, while our approach is ready to use in nonlinear models with non-Gaussian noise, as shown for LNP and LNLN models.

Regarding the inference for multiple STRFs in LNLN-cascade models, previous work has attempted to provide efficient algorithms by first retrieving subunit STRFs through spike-triggered clustering methods such as soft-clustering [shah_inference_2020] or semi-NMF [liu_inference_2017]. These were subsequently used to estimate other model parameters via gradient descent while holding the subunit STRFs fixed. Utilizing only the collection of stimuli that elicit a spike, these approaches have advantages in terms of the computational cost in time and memory compared to the direct fitting of the complete model using gradient descent [mcfarland_inferring_2013, maheswaranathan_inferring_2018].

The ease of using these algorithms relying on the spike-triggered ensemble under white noise stimulation is also the disadvantage of these approaches – they can only be applied to spike data under white noise stimulation. In many cases, experiments today record more diverse measurements from neurons, such as two-photon calcium imaging [baden_functional_2016, ran_type-specific_2020] or synaptic current recording [trong_origin_2008] under more diverse types of stimulation, such as correlated noise and natural stimuli [qiu_mouse_2020]. In those cases, spike-triggered analysis can not be used directly without modification, as the spike-triggered average is no longer the maximum likelihood estimate [paninski_convergence_2003]. In contrast, our method does not rely on Gaussianity of the stimulus or the discreteness of the response, and in principle can be generalized to other stimuli distribution and other response types.

In addition, current spike-triggered clustering methods rely on temporally collapsing the 3D stimuli to further reduce the model complexity, assuming all subunits share the same temporal response kernel. This limits the usability of these methods to some very specific cell types and will not account for the spatio-temporal inseparable computation implemented by neurons receiving inputs from multiple sources, such as bi-stratified On-Off direction-selective cell in the retina [taylor_diverse_2002]. In this case, a more general approach such as our method should be preferred.

3.4 Further extensions

Here, we only showcased how to estimate STRFs using a natural cubic spline in a set of popular models with standard gradient descent with L1 regularization. This simple spline add-on can be directly incorporated into other previous models [mcfarland_inferring_2013, maheswaranathan_inferring_2018] with different constraints, regularization and optimization techniques, as it only changed the linear step of the models without modifying other parts.

Furthermore, to further improve model prediction performance, a response-history filter (as shown in Figure 2) and flexible nonlinearity can be added to the model [williamson_equivalence_2015, maheswaranathan_inferring_2018], as both of which can also be parameterized by a 1D spline basis. With such extra flexibility, a model can better adapt to the cell-specific firing threshold and adaptation to the stimulus ensemble. Still, even with the response-history filter, a STRF model may still not be able to account for the total variability of the neuronal responses, if the response is influenced by other sources. Future extension of the current GLM for STRFs can consider incorporating more diverse sources of behavioral data [balzani_efficient_2020] of the experimental animal under experimental or naturalistic settings as 1D time-dependent filters.

4 Material and Methods

4.1 Data sets

Simulated data

For the simulated data shown in Figure 3 and 4, we used a 2D rank-2 STRF with 30 time frames and 40 pixels in space () as shown as "ground truth" in both Figures and generated responses with two types of flicker bar stimuli: Gaussian white noise and pink noise. For LNP models, we used a fixed exponential nonlinearity.

For Figure 5, we used two 20x20 pixels antagonistic center-surround STRF and with Gaussian white noise as stimulus. We used a fixed exponential function for the filter nonlinearity and a softplus function for the output nonlinearity. We controlled the mean firing rate of all simulated spike trains to about 21 Hz by adjusting the intercept of the corresponding models (for Figure 4A and C, the intercepts were 3.5 and 2.5, respectively. For Figure 5, the intercept was 0) and also multiplying a constant to .

For all simulations, we measured the similarity of the ground truth and the model estimators by computing the MSE between the respective normalized STRFs. The STRFs were normalized by element-wise division by the matrix-norm.

For Figure 3, the similarity was measured for various stimulus length (), which were scaled by different factors () with respect to the number of coefficients of the STRF. For Figure 4 and 5, the similarity was measured at various lengths of time ( minutes), which is equal to setting , where is the time bin size and took the value of 0.033 s.

Error bars in Figure 3-5 were obtained by averaging over 10 trials of simulations with different random seeds for the stimulus.

For the simulated data shown in Figure 7, we used a 3D STRF with 25 time frames and 25x25 pixels in space (), where the spatial profile was a 2D Gaussian field and the temporal profile had a sharp dip close to zero time lag, and generated responses with Gaussian white noise (sample size unless noted otherwise). Additive Gaussian noise was added to the responses ( except (c), for which ).

Experimental data

We used a previously published data set for Figure 9A: extracellularly recorded spikes from tiger salamander retinal ganglion cells [liu_inference_2017], available to download at

Another two data sets of visual neurons and one data set of a non-visual neuron were used for the supplementary Figure 2 and 3:

  1. Extracellularly recorded spikes from a tiger salamander retinal ganglion cell [maheswaranathan_inferring_2018], available to download at;

  2. Extracellular recorded spikes from a V1 complex cell of macaque monkey [rust_spatiotemporal_2005], available on request from the authors;

  3. Extracellular recorded spikes from a grid cell in rat entorhinal cortex [hafting_microstructure_2005], available to download at;

The details of the second data set for Figure 9B are described below.

4.2 Experimental procedures for 2-photon calcium imaging

Animals and tissue preparation

One wild-type mouse (C57Bl/6J, JAX 000664) housed under a standard 12 hour day/night rhythm was used in this study. Before experiments, the mouse was dark adapted for at least 2 hours. Then, the animal was anaesthetized with isoflurane (Baxter) and killed with cervical dislocation. Eyes were quickly enucleated in carboxygenated (95% O, 5% CO) artificial cerebral spinal fluid (ACSF) solution containing (in mM): 125 NaCl, 2.5 KCl, 2 CaCl, 1 MgCl, 1.25 NaHPO, 26 NaHCO, 20 glucose, and 0.5 L-glutamine (pH 7.4). After carefully removed the cornea, sclera and vitreous body, the retina was flattened on an Anodisc (0.2 M pore size, GE Healthcare) with the RGC side facing up and electroporated (see below). Then, the retina was transferred to the recording chamber of the microscope. To visualize the blood vessels and the damaged cells, we added 5 M sulforhodamine-101 (SR101, Invitrogen) into 2l of ACSF solution. All experimental procedures were carried out under very dim red light. All animal procedures were approved by the governmental review board (Regierungspräsidium Tübingen, Baden-Württemberg, Konrad-Adenauer-Str. 20, 72072 Tübingen, Germany) and performed according to the laws governing animal experimentation issued by the German Government.


For two-photon imaging, Oregon-Green BAPTA-1 (OGB-1, hexapotassium salt; Life Technologies) was bulk electroporated [briggman_bulk_2011]. After the retina was flattened on an Anodisc and placed between two 4-mm horizontal plate electrodes (CUY700P4E/L, Nepagene/Xceltis), 10 l OGB-1 (5 mM) diluted in ACSF solution was suspended on the upper electrode. Then the upper electrode was lowered onto the retina and applied 10 pulses ( 9.2 V, 100 ms pulse width, at 1 Hz) from a pulse generator/wide-band amplifier combination (TGP110 and WA301, Thurlby handar/Farnell). For complete recovery of the electroporated retina, we started recordings 30 min after electroporation.

Two photon imaging and light stimulation

A MOM-type two-photon microscope (designed by W. Denk, MPI, Martinsried; purchased from Sutter Instruments/Science Products) as described previously [euler_eyecup_2009, euler_studying_2019] was used for this study. Briefly, the system was equipped with a mode-locked Ti:Sapphire laser (MaiTai-HP DeepSee, Newport Spectra-Physics, Darmstadt, Germany), green and red fluorescence detection channels for OGB-1 (HQ 510/84, AHF, Tübingen, Germany) and SR101 (HQ 630/60, AHF), and a water immersion objective (16x/0.80W, DIC N2, /0 WD 3.0, Nikon). For imaging, we tuned the laser to 927 nm, and used a custom-made software (ScanM, by M. Müller, MPI, Martinsried, and T.E.) running under IGOR Pro 6.3 for Windows (Wavemetrics). Time-elapsed signals were recorded with 6416 pixel image sequences (31.25 Hz).

Light stimuli were projected through the objective lens [euler_studying_2019]. A digital light processing (DLP) LightCrafter 4500 (Texas Instruments, Dallas, TX) coupled with external unit with light-emitting diodes (LEDs) – "green" (576 nm) and UV (387 nm) that match the spectral sensitivities of mouse M- and S-opsins (for details, see [baden_functional_2016, franke_arbitrary-spectrum_2019]) was used in this study. Both LEDs were intensity-calibrated to range from ("black" background) to ("white" full field) photoisomerizations per cone. The light stimulus was centered before every experiment, ensuring that its center corresponded to the center of the microscope’s scan field. For all experiments, the tissue was kept at a constant mean stimulator intensity level for 15 s after the laser scanning started and before light stimuli were presented.

Binary dense noise (2015 matrix of 3030 M pixels; each pixel displayed an independent, balanced random sequence at 5 Hz for 20 minutes) were generated and presented using the Python-based software package QDSpy (

Data pre-processing

imaging data were pre-processed using custom-written scripts in IGOR Pro. Regions-of-interest (ROIs) were manually defined depending on the outline of the soma. For each ROI, traces were extracted using the image analysis toolbox SARFIA for IGOR Pro [dorostkar_computational_2010]. A stimulus time marker embedded in the recorded data served to align the traces relative to the light stimulus at a temporal resolution of 2 ms. All stimulus-aligned traces together with the relative ROI position were exported for further analysis.

4.3 Previous methods

Spike-triggered Average (STA)

The STA was computed as the average of all stimuli that proceeded a spike [aljadeff_analysis_2016]:


where is the stimulus design matrix, the response, can be the number of spikes or the number of samples (rows) in and for non-spike data.

Whitened spike-triggered average (wSTA)

The maximum-likelihood estimator under a Gaussian noise model, or whitened-STA (wSTA), was computed by multiplying the STA with the inverse of the stimulus auto-correlation matrix:


In our code base, we did not compute the matrix inverse explicitly but computed the least squared solution instead, which is common practice.

Maximum-a-posteriori estimator (MAP)

Similar, the MAP under a Gaussian noise model was computed by adding the inverse of the prior covariance to the stimulus auto-correlation matrix:


where is a prior covariance matrix of choice, e.g. corresponding to a smoothness or sparseness assumption [sahani_evidence_2002, park_receptive_2011]. Its hyperparameters can be optimized by minimizing the negative log-evidence[sahani_evidence_2002, park_receptive_2011]:



is variance of the additive Gaussian noise,

and are posterior mean and covariance, respectively:




Smoothness prior covariance matrix for ASD

The 1D ASD prior covariance matrix is formulated as following [sahani_evidence_2002]:


where is the squared distance between STRF coefficients and , controls the scale and controls the smoothness.

Locality prior covariance matrix for ALD

The 1D ALD prior covariance matrix is formulated as following [park_receptive_2011]:


where is an orthogonal basis matrix for the 1D discrete Fourier transform, and are the diagonal ALD prior covariance in space and frequency, respectively, which can be constructed as:


where is scalar values control the shape and extent of the local region, is the STRF coefficient coordinates in space-time and is center coordinate of the STRF. And


where is scalar values control the scale of smoothness in frequency, is of STRF coefficient coordinates in frequency and is center coordinate of the STRF in Fourier domain.

High-dimensional prior covariance matrix can be approximated by taking the Kronnecker product () of the covariance matrix in each dimension.

4.4 Estimating STRFs with Natural Cubic Regression Splines

To generate a natural cubic spline matrix (), we used the implementation from Patsy (0.5.1), a Python equivalent of the original implementation in R package mcgv [wood_generalized_2017]. The only hyperparameter needed to generate a 1D spline basis matrix was the number of basis functions, or the degrees of freedom (df), as we set all knots to be equally spaced.


where is the spline basis matrix, is knot locations, , and , , , , , are defined in Table 2.

Table 2: Basis functions for natural cubic spline.

An illustration of 1D and 2D natural cubic splines can be seen in Figure 1B. A spline-based STRF under a Gaussian noise model can be fitted in the similar manner as the wSTA:


where is the spline coefficients, and the corresponding STRF is:


The spline matrix can be extended to high dimensions, which is also called tensor product smooth [wood_generalized_2017], by simply taking the Kronnecker product () of the basis functions in each dimension:


4.5 Spline-based GLMs with gradient descent

The spline coefficients can also be fitted by gradient descent with respect to the cost function defined in each model with L1 regularization.

For the LG model, the cost function is the sum square error:


where is the weight of L1 regularization.

For LNP and LNLN model, the cost function is the negative log-likelihood:


where is the conditional intensity or the instantaneous firing rate, is the size of time bin, and is a nonlinearity applied to the filter output, which can be fixed as a softplus or exponential nonlinearity.

We used JAX [bradbury_jax_2018] for automatic differentiation, and Adam optimizer [kingma_adam_2017]for parameters optimization.

4.6 Spline-based spike-triggered clustering methods

Both k-means clustering and semi-NMF can be formulated as a matrix factorization problem:


where is the spike-triggered ensemble, is clustering centroids matrix (or subunit matrix), and is the clustering label (or subunit weight matrix). A spline basis can be easily incorporated into k-Means clustering by assuming that each centroid can be approximated by the same spline basis matrix: , where is the spline coefficient matrix for each subunit, and can be computed as the least-squares solution to the linear matrix equation , where is pseudo-inverse. For semi-NMF, splines can also be combined with the update rules [ding_convex_2010]:


where is point-wise multiplication, and .

4.7 Confidence interval and statistical significance of the estimated STRFs

The asymptotic posterior probability of the estimated STRF coefficients can be computed analytically

[wood_generalized_2017]: , where is the fitted STRF coefficients, is the posterior covariance. For LG model:


where is the residual sum of squares of the data and predicted response from the training set. And for LNP model:


where is a diagonal matrix with entries , and

is the nonlinearity of the model. Thus, the standard error of

is the square root of the diagonal of , and the corresponding confidence interval (CI) of the estimated STRF is as following:


Given the computed confidence interval, Wald Chi-Squared Test can be used to determine whether a STRF is statistically significant (against the null hypothesis that all STRF coefficients are zero), with the Wald statistic .

4.8 Permutation Test

To evaluate the predictive performance of the estimated STRF, we permuted the validation test stimulus in time and made prediction over 100 repetitions. A one-tailed one-sample Student’s T-test was used to evaluate if the model prediction was better than chance level.

4.9 Visualization of 3D STRF

For visualization of 3D STRF in Figure 7 and 9

, we first performed singular value decomposition (SVD) to separate the spatial and temporal components of the estimated 3D STRF. The coordinates of the extremum pixel in the spatial component were selected, and the temporal filter of the original STRF is shown for those coordinates. The confidence interval of the same extremum point was plotted as a gray shaded region on the temporal filter. For the spatial filter, we show the frame of the original STRF at the extremum point of the temporal filter (indicated by a vertical dotted line). The maximum value of the color map was set to the maximum absolute value of the STRF.

4.10 Comparison of Computation Time

For comparison of computation time in Figure 6, we used simulated 2D RFs of different sizes (15x15, 20x20 and 30x30 pixels, respectively) and white noise stimuli. Simulations were carried out at different lengths of time in the same manner as Figure 4 and 5. The number of basis functions for SPL wSTA were fixed as 5 and 10 along each dimension in all three cases. Measurements of computation time were averaged over 10 repetitions and performed on a 2019 16-inch MacBook Pro with a 2.3 GHz 8-core Intel Core i9 and 16 GB of RAM.

4.11 Data availability

All data used are available in GitHub (

4.12 Code availability

All methods (spline-based GLMs, evidence optimization and spike-clustering methods) were implemented in Python3 and available in GitHub ( And Jupyter notebooks for all figures are available also in GitHub (


We thank the authors of [maheswaranathan_inferring_2018, liu_inference_2017, rust_spatiotemporal_2005] and [hafting_microstructure_2005]

for making their data available. Research was funded by the Deutsche Forschungsgemeinschaft through a Heisenberg Professorship (BE5601/4-1 and BE5601/8-1, PB), the Collaborative Research Center 1233 “Robust Vision” (reference number 276693517) as well as individual research grants (EU 42/10-1, BE5601/6-1), the German Ministry of Education and Research through the Bernstein Award (01GQ1601, PB). PB is a member of the Excellence Cluster 2064 “Machine Learning — New Perspectives for Science” (reference number 390727645) and the Tübingen AI Center (FKZ: 01IS18039A).