Fast amortized inference of neural activity from calcium imaging data with variational autoencoders

11/06/2017 ∙ by Artur Speiser, et al. ∙ Howard Hughes Medical Institute Landhaus Caesar 0

Calcium imaging permits optical measurement of neural activity. Since intracellular calcium concentration is an indirect measurement of neural activity, computational tools are necessary to infer the true underlying spiking activity from fluorescence measurements. Bayesian model inversion can be used to solve this problem, but typically requires either computationally expensive MCMC sampling, or faster but approximate maximum-a-posteriori optimization. Here, we introduce a flexible algorithmic framework for fast, efficient and accurate extraction of neural spikes from imaging data. Using the framework of variational autoencoders, we propose to amortize inference by training a deep neural network to perform model inversion efficiently. The recognition network is trained to produce samples from the posterior distribution over spike trains. Once trained, performing inference amounts to a fast single forward pass through the network, without the need for iterative optimization or sampling. We show that amortization can be applied flexibly to a wide range of nonlinear generative models and significantly improves upon the state of the art in computation time, while achieving competitive accuracy. Our framework is also able to represent posterior distributions over spike-trains. We demonstrate the generality of our method by proposing the first probabilistic approach for separating backpropagating action potentials from putative synaptic inputs in calcium imaging of dendritic spines.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Spiking activity in neurons leads to changes in intra-cellular calcium concentration which can be measured by fluorescence microscopy of synthetic calcium indicators such as Oregon Green BAPTA-1

tsien1980new or genetically encoded calcium indictors such as GCaMP6 Chen_Wardill_13 . Such calcium imaging has become important since it enables the parallel measurement of large neural populations in a spatially resolved and minimally invasive manner Kerr_Denk_08 ; Grienberger_Konnerth_12 . Calcium imaging can also be used to study neural activity at subcellular resolution, e.g. for measuring the tuning of dendritic spines smith2013dendritic ; chen2013ultrasensitive . However, due to the indirect nature of calcium imaging, spike inference algorithms must be used to infer the underlying neural spiking activity leading to measured fluorescence dynamics.

Most commonly-used approaches to spike inference Vogelstein_Watson_09 ; Vogelstein_Packer_10 ; Pnevmatikakis_Merel_13 ; Pnevmatikakis_Soudry_16 ; Ganmor_Krumin_16 ; deneux2016accurate ; pachitariu2016suite2p ; Greenberg_2015 are based on carefully designed generative models that describe the process by which spiking activity leads to fluorescence measurements. Spikes are treated as latent variables, and spike-prediction is performed by inferring both the parameters of the model and the spike latent variables from fluorescence time series, or “traces” Vogelstein_Watson_09 ; Vogelstein_Packer_10 ; Pnevmatikakis_Merel_13 ; Pnevmatikakis_Soudry_16 . The advantage of this approach is that it does not require extensive ground truth data for training, since simultaneous electrophysiological and fluorescence recordings of neural activity are difficult to acquire, and that prior knowledge can be incorporated in the specification of the generative model. The accuracy of the predictions depends on the faithfulness of the generative model of the transformation of spike trains into fluorescence measurements Greenberg_2015 ; deneux2016accurate

. The disadvantage of this approach is that spike-inference requires either Markov-Chain Monte Carlo (MCMC) or Sequential Monte-Carlo techniques to sample from the posterior distribution over spike-trains or alternatively, iterative optimization to obtain an approximate maximum a-posteriori (MAP) prediction. Currently used approaches rely on bespoke, model-specific inference algorithms, which can limit the flexibility in designing suitable generative models. Most commonly used methods are based on simple phenomenological (and often linear) models

Vogelstein_Watson_09 ; Vogelstein_Packer_10 ; Pnevmatikakis_Merel_13 ; Pnevmatikakis_Soudry_16 ; pachitariu2016suite2p .

Recently, a small number of cell-attached electrophysiological recordings of neural activity have become available, with simultaneous fluorescence calcium measurements in the same neurons. This has made it possible to train powerful and fast classifiers to perform spike-inference in a discriminative manner, precluding the need for accurate generative models of calcium dynamics


. The disadvantage of this approach is that it can require large labeled data-sets for every new combination of calcium indicator, cell-type and microscopy method, which can be expensive or impossible to acquire. Further, these discriminative methods do not easily allow the incorporation of prior knowledge about the generative process. Finally, current classification approaches yield only pointwise predictions of spike probability (i.e. firing rates), independent across time, and ignore temporal correlations in the posterior distribution of spikes.

Figure 1: Amortized inference for predicting spikes from imaging data. A) Our goal is to infer a spike train from an observed time-series of fluorescence-measurements . We assume that we have a generative model of fluorescence given spikes with (unknown) parameters , and we simultaneously learn as well as a ‘recognition model’ which approximates the posterior over spikes given and which can be used for decoding a spike train from imaging data. B)

We parameterize the recognition-model by a multi-layer network architecture: Fluorescence-data is first filtered by a deep 1D convolutional network (CNN), providing input to a stochastic forward running recurrent neural network (RNN) which predicts spike-probabilities and takes previously sampled spikes as additional input. An additional deterministic RNN runs backward in time and provides further context.

Here, we develop a new spike inference framework called DeepSpike (DS) based on the variational autoencoder technique which uses stochastic variational inference (SVI) to teach a classifier to predict spikes in an unsupervised manner using a generative model. This new strategy allows us to combine the advantages of generative Vogelstein_Watson_09 and discriminative approaches Theis_Berens_16 into a single fast classifier-based method for spike inference. In the variational autoencoder framework, the classifier is called a recognition model and represents an approximate posterior distribution over spike trains from which samples can be drawn in an efficient manner. Once trained to perform spike inference on one dataset, the recognition model can be applied to perform inference on statistically similar datasets without any retraining: The computational cost of variational spike inference is amortized, dramatically speeding up inference at test-time by exploiting fast, classifier based recognition models.

We introduce two recognition models: The first is a temporal convolutional network which produces a posterior distribution which is factorized in time, similar to standard classifier-based methods Theis_Berens_16 . The second is a recurrent neural network-based recognition model, similar to oord2016pixel ; larochelle2011neural which can represent any correlated posterior distribution in the non-parametric limit. Once trained, both models perform spike inference with state-of-the-art accuracy, and enable simultaneous spike inference for populations as large as in real time on a single GPU.

We show the generality of this black-box amortized inference method by demonstrating its accuracy for inference with a classic linear generative model Vogelstein_Watson_09 ; Vogelstein_Packer_10 , as well as two nonlinear generative models deneux2016accurate . Finally, we show an extension of the spike inference method to simultaneous inference and demixing of synaptic inputs from backpropagating somatic action potentials from simultaneous somatic and dendritic calcium imaging.

2 Amortized inference using variational autoencoders

2.1 Approach and training procedure

We observe fluorescence traces , representing noisy measurements of the dynamics of somatic calcium concentration in neurons . We assume a parametrised, probabilistic, differentiable generative model with (unknown) parameters . The generative model predicts a fluorescence trace given an underlying binary spike train , where indicates that the neuron produced an action potential in the interval indexed by . Our goal is to infer a latent spike-train given only fluorescence observations . We will solve this problem by training a deep neural network as a “recognition model” Rezende_Mohamed_14 ; Kingma_Welling_13 ; Titsias_Lazaro_14 parametrized by weights . Use of a recognition model enables fast computation of an approximate posterior distribution over spike trains from a fluorescence trace . We will share one recognition model across multiple cells, i.e. that for each . We describe an unsupervised training procedure which jointly optimizes parameters of the generative model and the recognition network in order to maximize a lower bound on the log likelihood of the observed data, Kingma_Welling_13 ; Rezende_Mohamed_14 ; Titsias_Lazaro_14 .

We learn the parameters and simultaneously by jointly maximizing , a multi-sample importance-weighting lower bound on the log likelihood given by (Burda_Grosse_15, )


where are spike trains sampled from the recognition model . This stochastic objective involves drawing samples from the recognition model, and evaluating their likelihood by passing them through the generative model. When , the bound reduces to the evidence lower bound (ELBO). Increasing yields a tighter lower bound (than the ELBO) on the marginal log likelihood, at the cost of additional training time. We found that increasing the number of samples leads to better fits of the generative model; in our experiments, we used .

To train and

by stochastic gradient ascent, we must estimate the gradient

. As our recognition model produces an approximate posterior over binary spike trains, the gradients have to be estimated based on samples. Obtaining functional estimates of the gradients

with respect to parameters of the recognition model is challenging and relies on constructing effective control variates to reduce variance

Mnih_Gregor_14 . We use the variational inference for monte carlo objectives (VIMCO) approach of Mnih_Rezende_2016

to produce low-variance unbiased estimates of the gradients

. The generative training procedure could be augmented with a supervised cost term Kingma_Shakir_14 ; Maaloe_Sonderby_15 , resulting in a semi-supervised training method.

Gradient optimization:

We use ADAM Kingma_Ba_14 , an adaptive gradient update scheme, to perform online stochastic gradient ascent. The training data is cut into short chunks of several hundred time-steps and arranged in batches containing samples from a single cell. As we train only one recognition model but multiple generative models in parallel, we load the respective generative model and ADAM parameters at each iteration. Finally, we use norm-clipping to scale the gradients acting on the recognition model: the norm of all gradients is calculated, and if it exceeds a fixed threshold the gradients are rescaled. While norm-clipping was introduced to prevent exploding gradients in RNNs pascanu2013difficulty , we found it to be critical to achieve high performance both for RNN and CNN architectures in our learning problem. Very small threshold values (0.02) empirically yielded best results.

2.2 Generative models

To demonstrate that our computational strategy can be applied to a wide range of differentiable models in a black-box manner, we consider four generative models: a simple, but commonly used linear model of calcium dynamics Vogelstein_Watson_09 ; Vogelstein_Packer_10 ; Pnevmatikakis_Merel_13 ; Pnevmatikakis_Soudry_16 , two more sophisticated nonlinear models which additionally incorporate saturation and facilitation resulting from the dynamics of calcium binding to the calcium sensor, and finally a multi-dimensional model for dendritic imaging data.

Linear auto-regressive generative model (SCF):

We use the name SCF for the classic linear convolutional generative model used in Vogelstein_Watson_09 ; Vogelstein_Packer_10 ; Pnevmatikakis_Merel_13 ; Pnevmatikakis_Soudry_16 , since this generative process is described by the Spikes , which linearly impact Calcium concentration , which in turn determines the observed Fluorescence intensity ,


with linear auto-regressive dynamics of order for the calcium concentration with parameters , spike-amplitude , gain , constant fluorescence baseline , and additive measurement noise .

Nonlinear auto-regressive and sensor dynamics generative models (SCDF & MLphys):

As examples of nonlinear generative models Rahmati_Kirmse_16 , we consider two simple models of the discrete-time dynamics of the calcium sensor or dye. In the first (SCDF), the concentration of fluorescent dye molecules is a function of the somatic Calcium concentration , and has Dynamics


where and are the rates at which the calcium sensor binds and unbinds calcium ions, and is a Hill coefficient. We constrained these parameters to be non-negative. is the total concentration of the dye molecule in the soma, which sets the maximum possible value of . The richer dynamics of the SCDF model allow for facilitation of fluorescence at low firing rates, and saturation at high rates. The parameters of the SCDF model are .

The second nonlinear model (MLphys) is a discrete-time version of the MLspike generative model deneux2016accurate , simplified by not including a model of the time-varying baseline. The dynamics for and are as above, with . We replace the dynamics for by


Multi-dimensional soma + dendrite generative model (DS-F-DEN):

The dendritic generative model is a multi-dimensional SCDF model that incorporates back-propagating action potentials (bAPs). The calcium concentration at the cell body (superscript c) is generated as for SCDF, whereas for the spine (superscript s), there are two components: synaptic inputs and bAPs from the soma,


where are the amplitude coefficients of bAPs for different spine locations, and , . The spines and soma share the same dye dynamics as in (3). The parameters of the dendritic integration model are . We note that this simple generative model does not attempt to capture the full complexity of nonlinear processing in dendrites (e.g. it does not incorporate nonlinear phenomena such as dendritic plateau potentials). Its goal is to separate local influences (synaptic inputs) from global events (bAPs, or potentially regenerative dendritic events).

2.3 Recognition models: parametrization of the approximate posterior

The goal of the recognition model is to provide a fast and efficient approximation to the true posterior over discrete latent spike trains

. We will use both a factorized, localized approximation (parameterized as a convolutional neural network), and a more flexible, non-factorized and non-localized approximation (parameterized using additional recurrent neural networks).

Convolutional neural network: Factorized posterior approximation (DS-F)

In Theis_Berens_16 , it was reported that good spike-prediction performance can be achieved by making the spike probability depend on a local window of the fluorescence trace of length centered at when training such a model fully supervised. We implement a scaled up version of this idea, using a deep neural network which is convolutional in time as the recognition model. We use architectures with up to five hidden layers and 20 filters per layer with Leaky ReLUs units maas2013rectifier

. The output layer uses a sigmoid nonlinearity to compute the Bernoulli spike probabilities


Recurrent neural network: Capturing temporal correlations in the posterior (DS-NF)

The fully-factorized posterior approximation (DS-F) above ignores temporal correlations in the posterior over spike trains. Such correlations can be useful in modeling uncertainty in the precise timing of a spike, which induces negative correlations between nearby time bins. To model temporal correlations, we developed a RNN-based non-factorizing distribution which can approach the true posterior in the non-parametric limit (see figure 1B). Similar to oord2016pixel

, we use the temporal ordering over spikes and factorize the joint distribution over spikes as

, by conditioning spikes at on all previously sampled spikes. Our RNN uses a CNN as described above to extract features from the input trace. Additional input is provided by a a backwards RNN which also receives input from the CNN features. The outputs of the forward RNN and CNN are transformed into Bernoulli spike probabilities

through a dense sigmoid layer. This probability and the sample drawn from it are relayed to the forward RNN in the next time step. Forward and backward RNN have a single layer with 64 gated recurrent units each

cho2014properties .

2.4 Details of synthetic and real data and evaluation methodology

We evaluated our method on simulated and experimental data. From our SCF and SCDF generative models for spike-inference, we simulated traces of length assuming a recording frequency of . Initial parameters where obtained by fitting the models to real data (see below), and heterogeneity across neurons was achieved by randomly perturbing parameters. We used neurons each for training and validation and neurons in the test set. For each cell, we generated three traces with firing rates of 0.6, 0.9 and , assuming i.i.d. spikes.

Finally, we compared methods on two-photon imaging data from cells from Chen_Wardill_13 , which is available at Layer 2/3 pyramidal neurons in mouse visual cortex were imaged at using the genetically encoded calcium-indicators GCaMP6s and GCaMP6f, while action-potentials were measured electrophysiologically using cell-attached recordings. Data was pre-processed by removing a slow moving baseline using the 5th percentile in a window of 6000 time steps. Furthermore we used this baseline estimate to calculate . Cross-validated results where obtained using 4 folds, where we trained and validated on 3/4 of the cells in each dataset and tested on the remaining cells to highlight the potential for amortized inference. Early stopping was performed based on the the correlation achieved on the train/validation set, which was evaluated every 100 update steps.

We report results using the cross-correlation between true and predicted spike-rates, at the sampling discretization of for simulated data and for real data. As the predictions of our DS-NF model are not deterministic, we sample

times from the model and average over the resulting probability distributions to obtain an estimate of the marginal probability before we calculate cross-correlations.

We used multiple generative models to show that our inference algorithm is not tied to a particular model: SCDF for the experiments depicted in Fig. 2, SCF for a comparison with established methods based on this linear model (Table 1, column 1), and MLphys on real data as it is used by the current state-of-the-art inference algorithm (Table 1, columns 2 & 3, Fig. 3).

Figure 2: Model-inversion with variational autoencoders, simulated data A) Illustration of factorized (CNN, DS-F) and non-factorized posterior approximation (RNN, DS-NF) on simulated data (SCDF generative model). DS-NF yields more accurate reconstructions, but both methods lead to similar marginal predictions (i.e. predicted firing rates, bottom). B) Number of spikes sampled for every true spike for the factorized (red) and non-factorized (red) posterior. The correlated posterior consistently samples the correct number of spikes while still accounting for the uncertainty in the spike timing. C) Performance of amortized vs non-amortized inference on simulated data. D) Scatter plots of achieved log-likelihood of the true spike train under the posterior model (top) and achieved correlation coefficients between the marginalized spiking probabilities and true spike trains (bottom).

3 Results

3.1 Stochastic variational spike inference of factorized and correlated posteriors

We first illustrate our approach on synthetic data, and compare our two different architectures for recognition models. We simulated data from the SCDF nonlinear generative model and trained DeepSpike unsupervised using the same SCDF model. While only the more expressive recognition model (DS-NF) is able to achieve a close-to-perfect reconstructions of the fluorescence traces (Fig. 2 A, top row), both approaches yield similar marginal firing rate predictions (second row). However, as the factorized model does not model correlations in the posterior, it yields higher variance in the number of spikes reconstructed for each true spike (Fig. 2 B). This is because the factorized model can not capture that a fluorescence increase might be ‘explained away’ by a spike that has just been sampled, i.e. it can not capture the difference between uncertainty in spike-timing and uncertainty in (local) spike-counts. Therefore, while both approaches predict firing rates similarly well on simulated data (as quantified using correlation, Fig. 2

D), the DS-NF model assigns higher posterior probability to the true spike trains.

3.2 Amortizing inference leads to fast and accurate test-time inference

In principle, our unsupervised learning procedure could be re-trained on every data-set of interest. However, it also allows for amortizing inference by sharing one recognition model across multiple cells, and applying the recognition model directly on new data without additional training for fast test-time performance. Amortized inference allows for the recognition model to be used for inference in the same way as a network that was trained fully supervised. Since there is no variational optimization at test time, inference with this network is just as fast as inference with a supervised network. Similarly to supervised learning, there will be limitations on the ability of this network to generalize to different imaging conditions or indicators that where not included in the training set.

To test if our recognition model generalizes well enough for amortized inference to work across multiple cells, as well as on cells it did not see during training, we trained one DS-NF model on 50 cells (simulated data, SCDF) and evaluated its performance on a non-overlapping set of 30 cells. For comparison, we also trained 30 DS-NF models separately, on each of those cells– this amounts to standard variational inference using a neural network to parametrize the posterior approximation, but without amortizing inference. We found that amortizing inference only causes a small drop in performance (Fig. 2 C). However, this drop in performance is offset by the the large gain in computational efficiency as training a neural network takes several orders of magnitude more time then applying it at test time.

Inference using the DS-F model only requires a single forward pass through a convolutional network to predict firing rates, and DS-NF requires running a stochastic RNN for each sampled spike train. While the exact running-time of each of these applications will depend on both implementation and hardware, we give rough indications of computational speed number estimated on an Intel(R) Xeon(R) CPU E5-2697 v3. On the CPU, our DS-F approach takes to process a single trace of 10K time steps, when using a network appropriate for data. This is on the same order as the (Intel Core i5 CPU) reported by Friedrich:2016ui for their OASIS algorithm, which is currently the fastest available implementation for constrained deconvolution (CDEC) of SCF, but restricted to this linear generative model. The DS-NF algorithm requires which still compares favourably to MLspike which takes

(evaluated on the same CPU). As our algorithm is implemented in Theano

bergstra2010theano it can be easily accelerated and allows for massive parallelization on a single GPU. On a GTX Titan X, DS-F and DS-NF take and , respectively. When processing 500 traces in parallel, DS-NF becomes only 2.5 times slower. Extrapolating from these results, this implies that even when using the DS-NF algorithm, we would be able to perform spike-inference on 1 hour of recordings at for 500 cells in less then .

Dataset Dendritic dataset
Algorithm SCF-Sim. GCaMP6s GCaMP6f Soma Spine
DS-F 0.88 0.01 0.74 0.02 0.74 0.02
DS-NF 0.89 0.01 0.72 0.02 0.73 0.02
CDEC Pnevmatikakis_Soudry_16 0.86 0.01 0.39 0.03 * 0.58 0.02 *
MCMC Pnevmatikakis_Merel_13 0.87 0.01 0.47 0.03 * 0.53 0.03 *
MLSpike deneux2016accurate 0.60 0.02 * 0.67 0.01 *
DS-F-DEN 0.84 0.01 0.78 0.01
Foopsi-RR Chen_Wardill_13 0.66 0.02 0.60 0.01
Table 1: Performance comparison. Values are correlations between predicted marginal probabilities and ground truth spikes.

3.3 DS achieves competitive results on simulated and publicly available imaging data

The advantages of our framework (black-box inference for different generative models, fast test-time performance through amortization, correlated posteriors through RNNs) are only useful if the approach can also achieve competitive performance. To demonstrate that this is the case, we compare our approach to alternative generative-model based spike prediction methods on data sampled from the SCF model– as this is the generative model underlying commonly used methods Pnevmatikakis_Soudry_16 ; Pnevmatikakis_Merel_13 , it is difficult to beat their performance on this data. We find that both DS-F and DS-NF achieve competitive performance, as measured by correlation between predicted firing rates and true (simulated) spike trains (Table 1

, left column. Values are means and standard error of the mean calculated over cells).

Figure 3: Inference and reconstruction using the DS-NF algorithm on GECI data. The reconstruction based on the inferred spike trains (blue) shows that the algorithm converges to a good joint model while the reconstruction based on the true spikes (purple) shows a mismatch of the generative model for high activity which results in an overestimate of the overall firing rate.

To evaluate our performance on real data we compare to the current state-of-the-art method for spike inference based on generative modelsdeneux2016accurate . For these experiments we trained separate models on each of the GCaMP variants using the MLspike generative model. We achieve competitive accuracy to the results in deneux2016accurate (see Table 1, values marked with an asterisk are taken from deneux2016accurate , Fig. 6d) and clearly outperform methods that are based on the linear SCF model. We note that, while our method performs inference in an unsupervised fashion and is trained using an un-supervised objective, we initialized our generative model with the mean values given in deneux2016accurate (Fig. S6a), which were obtained using ground truth data. An example of inference and reconstruction using the DS-NF model is shown in Fig. 3. The reconstruction based on the true spikes (purple line) was obtained using the generative model parameters which had been acquired from unsupervised learning. This explains why the reconstruction using the inferred spikes is more accurate and suggests that there is a mismatch between the MLphys model and the true data-generating generating process. Developing more accurate generative models would therefore likely further increase the performance of the algorithm.

Figure 4: Inference of somatic spikes and synaptic input spikes from simulated dendritic imaging data. We simulated imaging data from our generative model, and compared our approach (DS-F-DEN) to an analysis inspired by Chen_Wardill_13 (Foopsi-RR), and found that our method can extract synaptic inputs more accurately. Traces at the soma and spines are used to infer somatic spikes and synaptic inputs at spines. Top: somatic trace and predictions. DS-F-DEN produces better predictions at the soma since it uses all traces to infer global events. Bottom: spine trace and predictions. DS-F-DEN performs better in terms of extracting synaptic inputs.

3.4 Extracting putative synaptic inputs from calcium imaging in dendritic spines

We generalized the DeepSpike variational-inference approach to perform simultaneous inference of backpropagating APs and synaptic inputs, imaged jointly across the entire neuronal dendritic arbor. We illustrate this idea on synthetic data based on the DS-F-DEN generative model (5). We simulated 15 cells each with 10 dendritic spines with a range of firing rates and noise levels. We then used a multi-input multi-output convolutional neural network (CNN, DS-F) in the non-amortized setting to infer a fully-factorized Bernoulli posterior distribution over global action potentials and local synaptic events.

We compared our results to an analysis technique inspired by Chen_Wardill_13 which we call Foopsi-RR. We first apply constrained deconvolution pnevmatikakis2014structured

to somatic and dendritic calcium traces, and then use robust linear regression to identify and subtract deconvolved components of the spine signal that correlated with global back-propagated action potential. Compared to the method suggested by

Chen_Wardill_13 , our model is significantly more accurate. The average correlation of our model is 0.84 for soma and 0.78 for spines, whereas for Foopsi-RR the average correlation is 0.66 for soma and 0.60 for spines (Table 1).

4 Discussion

Spike inference is an important step in the analysis of fluorescence imaging. We here propose a strategy based on variational autoencoders that combines the advantages of generative Vogelstein_Watson_09 and discriminative approaches Theis_Berens_16 . The generative model makes it possible to incorporate knowledge about underlying mechanisms and thus learn from unlabeled data. A simultaneously-learned recognition network allows fast test-time performance, without the need for expensive optimization or MCMC sampling. This opens up the possibility of scaling up spike inference to very large neural populations Ahrens_Li_12 , and to real-time and closed-loop applications. Furthermore, our approach is able to estimate full posteriors rather than just marginal firing rates.

It is likely that improvements in performance and interpretability will result from the design of better, biophysically accurate and possibly dye-, cell-type- and modality-specific models of the fluorescence measurement process, the dynamics of neurons Rahmati_Kirmse_16 and indicators, as well as from taking spatial information into account. Our goal here is not to design such models or to improve accuracy per se, but rather to develop an inference strategy which can be applied to a large class of such potential generative models without model-specific modifications: A trained recognition model that can invert, and provide fast test-time performance, for any such model while preserving performance in spike-detection.

Our recognition model is designed to serve as the common approximate posterior for multiple, possibly heterogeneous populations of cells, requiring an expressive model. These assumptions are supported by prior work Theis_Berens_16 and our results on simulated and publicly available data, but might be suboptimal or not appropriate in other contexts, or for other performance measures. In particular, we emphasize that our comparisons are based on a specific data-set and performance measure which is commonly used for comparing spike-inference algorithms, but which can in itself not provide conclusive evidence for performance in other settings and measures. Our approach includes rich posterior approximations Sonderby_Kaae_16 based on RNNs to make predictions using longer context-windows and modelling posterior correlations. Possible extensions include causal recurrent recognition models for real-time spike inference, which would require combining them with fast algorithms for detecting regions of interest from imaging-movies Pnevmatikakis_Soudry_16 ; apthorpe2016automatic . Another promising avenue is extending our variational inference approach so it can also learn from available labeled data to obtain a semi-supervised algorithm maaloe2015improving .

As a statistical problem, spike inference has many similarities with other analysis problems in biological imaging– an underlying, sparse signal needs to be reconstructed from spatio-temporal imaging observations, and one has substantial prior knowledge about the image-formation process which can be encapsulated in generative models. As a concrete example of generalization, we proposed an extension to multi-dimensional inference of inputs from dendritic imaging data, and illustrated it on simulated data. We expect the approach pursued here to also be applicable in other inference tasks, such as the localization of particles from fluorescence microscopy Betzig_Patterson_06 .

5 Acknowledgements

We thank T. W. Chen, K. Svoboda and the GENIE project at Janelia Research Campus for sharing their published GCaMP6 data, available at We also thank T. Deneux for sharing his results for comparison and comments on the manuscript and D. Greenberg, L. Paninski and A. Mnih for discussions. This work was supported by SFB 1089 of the German Research Foundation (DFG) to J. H. Macke. A. Speiser was funded by an IMPRS for Brain & Behavior scholarship by the Max Planck Society.