1 Introduction
Deciphering interpretable latent regularity or structure from highdimensional time series data is a challenging problem for neural data analysis. Many studies and theories in neuroscience posit that highdimensional neural recordings are noisy observations of some underlying, lowdimensional, and timevarying signal of interest. Thus, robust and powerful statistical methods are needed to identify such latent dynamics, so as to provide insights into latent patterns which govern neural activity both spatially and temporally. A large body of literature has been proposed to learn concise, structured and insightful dynamical portraits from noisy highdimensional neural recordings [1, 2, 3, 4, 5, 6, 7]. These methods can be categorized on the basis of four modeling strategies (“” indicates our contributions in these components):
Dynamical model ()
Dynamical models describe the evolution of latent process: how future states depend on present and past states. One popular approach assumes that latent variables are governed by a linear dynamical system [8, 9], while a second choice models the evolution of latent states with a Gaussian process, relaxing linearity and imposing smoothness over latent states [10, 1]. However, linear dynamics cannot capture nonlinearities and nonMarkov dynamical properties of complex systems; and Gaussian process only considers the pairwise correlation of time points, instead of considering explicit temporal dynamics. We argue that the proposed dynamical model in this work is able to both capture the complex state transition structures and model the long shortterm temporal dynamics efficiently and flexibly.
Mapping function ()
Mapping functions reveal how latent states generate noisefree observations. A nonlinear transformation is often ignored when pursuing efficient and tractable algorithms. Most previous methods have assumed a fixed linear or loglinear relationship between latent variables and mean response levels [2, 3]. In many neuroscience problems, however, the relationship between noisefree observation space and the quantity it encodes can be highly nonlinear. Gao et al., [4] have explored a nonlinear embedding function using deep neural networks (DNNs), which requires a large amount of data to train a large set of model parameters and can not propagate uncertainty from latent space to observation space. In this paper, we employ a nonparametric Bayesian approach, Gaussian process (GP), to model the nonlinear mapping function from latent space to observation space, which requires much less training data and propagates uncertainties with probabilistic distributions.
Model  Dynamics  Mapping function  Link function  Observation  Inference 

PLDS [2]  LDS  Linear  exp  Poisson  LP 
PfLDS [4]  LDS  NN  exp  Poisson  VI + inference network 
GCLDS [3]  LDS  Linear  exp  Count  VI 
LFADS [6]  RNN  Linear  exp  Poisson  VI + inference network 
PGPFA [11]  GP  Linear  Identity  Poisson  LP or VI 
PGPLVM [5]  GP  GP  exp  Poisson  LP 
  RNN  GP  exp  Poisson/Gaussian  VI + inference network 
; “PfLDS”: Poisson feedforward neural network linear dynamical systems
[4]; “GCLDS”: generalized count linear dynamical systems [3]; “PGPFA”: Poisson Gaussian process factor analysis [11]; “PGPLVM”: Poisson Gaussian process latent variable model [5]; and our method GPRNN: Gaussian process recurrent neural networks. “LDS” denotes Linear Dynamical Systems. “LP” and “VI” indicate Laplace approximation and variational inference, respectively.Observation model
Neural responses can be mostly categorized into two types of signals, i.e., continuous voltage data and discrete spikes. For continuous neural responses, people usually use Gaussian distributions as generating distributions. For neural spike trains, a Poisson observation model is commonly considered to characterize stochastic, noisy neural spikes. In this work, we propose models and inference methods for both Gaussian and Poisson responses, but with a focus on the Poisson observation model. Directly modeling Poisson responses with a nonconjugate prior has an intractable solution, especially for complex generative models. In some previous methods, researchers have used a Gaussian approximation for Poisson spike counts through a variance stabilization transformation
[12]. In our framework, we apply an effective optimization procedure for the Poisson model.Inference method ()
In our setting, due to the increased complexity of both the dynamical model and the mapping function, we should provide a more powerful inference method for recognizing latent states. Recent work has focused on utilizing variational inference for scalable computation, which takes advantage of both stochastic and distributed optimization [13]
. Additionally, inference networks improve computational efficiency while still keeping rich approximated posterior distributions. One of the choices for inference networks for sequential data is multilayer perceptrons (MLP)
[14]. However, it is insufficient to capture the increasing temporal complexity as the dynamic evolves. Recurrent neural networks (RNNs), e.g., long shortterm memory (LSTM) and gated recurrent unit (GRU) structures, are well known to capture dynamical structures for sequential data. We utilize RNNs as inference networks for encoding both past and future time information into the posterior distribution of latent states. Specifically, we use two LSTMs for mapping past and future time points jointly into the mean and diagonal covariance functions of the approximated Gaussian distribution. We show empirically that instead of considering only past time information as other recent works
[15, 16], using both past and future time information can retrieve intrinsic latent structures more accurately.Given current limitations in the dynamical model, mapping function, and inference method, we propose a novel method using recurrent neural networks (RNNs) as the dynamical model, Gaussian process (GP) for the nonlinear mapping function, and bidirectional LSTM structure as the inference network. This combination poses a richly distributed internal state representation and flexible nonlinear transition functions due to the representation power of RNNs (e.g., long shortterm memory (LSTM) or gated recurrent unit (GRU) structures). Moreover, it shows expressive power for discovering structured latent space by nonlinear embeddings with Gaussian process thanks to its advantage in capturing uncertainty in a nonparametric Bayesian way. In addition, the bidirectional LSTM with increasing model complexity can further enhance inference capability because it summarizes either the past or the future or both at every time step, forming the most effective approximation to the variational posterior of the latent dynamic. Our framework is evaluated on both simulated and realworld neural data with detailed ablation analysis. The promising performance of our method demonstrates that our method is able to: (1) capture better and more insightful nonlinear, nonperiodic dynamics from highdimensional time series; (2) significantly improve prediction performance over baseline methods for noisy neuronal spiking activities; and (3) robustly and efficiently learn the turning curves of underlying complex neural systems from neuronal recording datasets.
Table 1 summarizes the stateoftheart methods for extracting latent state space from highdimensional spike trains^{1}^{1}1We focus on exploring intrinsic latent structures from spike trains, and the related works mentioned here are to our knowledge the most relevant with this research line. Although some excellent works take advantages of both RNN structures and Gaussian process for either modeling or inference [17, 18, 19, 20, 21], they are out of the scope in this work. by varying different model components discussed above. In a nutshell, our contributions are threefold comparing to the listed methods:

We propose to capture nonlinear, nonMarkovian, long shortterm timedependent dynamics by incorporating recurrent neural networks in the latent variable model. Different from the vanilla RNN, we achieve a stochastic RNN structure by introducing latent variables;

We incorporate Gaussian process for learning nonlinear embedding functions, which can achieve better reconstruction performance for the lowsample scenario and provide the posterior distribution with uncertainty instead of point estimation in neural networks. Together with RNN, we provide a GPRNN model (Gaussian Process Recurrent Neural Network) that is capable of capturing better latent dynamics from complex highdimensional neural population recordings;

We evaluate the efficacy of different inference networks based on LSTM structures for inference and learning, and demonstrate that utilizing the bidirectional LSTM as the inference network can significantly improve model learning.
2 Gaussian Process Recurrent Neural Network (GpRnn)
Suppose we have simultaneously recorded spike count data from neurons. Let denote the spike count of neuron at time . We aim to discover lowdimensional, timeevolving ( depends on ) latent trajectory (, and is the latent dimensionality), which governs the evolution of the highdimensional neural population at time .
Recurrent structure latent dynamics: Let
denote a (vectorvalued) latent process, which evolves based on a recurrent structure (RNN) to capture the sequential dependence. At each time step
, the RNN reads the latent process at the previous time step and updates its hidden state by:(1) 
where is a deterministic nonlinear transition function with parameter . can be implemented via long shortterm memory (LSTM) or gated recurrent unit (GRU). It is denoted that the latent process
is modeled as random variables and
represents hidden states of the RNN model. We model the latent processby parameterizing a factorization of the joint sequence probability distribution as a product of conditional probabilities such that:
(2) 
where is an arbitrary differentiable function parametrized by . The function maps the RNN state to the parameter of the distribution of , which is modeled using a feedforward neural network with hidden layers as:
(3)  
(4) 
Nonlinear mapping function: Let denote a nonlinear function mapping from the latent variable to the th element of the observation vector . is usually referred as the neuronal tuning curve characterizing the firing rate of the neuron as a function of relevant stimulus in neural analysis. We provide a nonparametric Bayesian approach using Gaussian process (GP) as the prior for the mapping function . Noticing that is a timeinvariant function, we can omit the notation for time step and describe the GP prior as,
(5)  
(6) 
where is a spatial covariance function over its dimensional input latent space. Note that the input is a random variable with uncertainty (eq. (2
)). Given that the neuronal tuning curve is usually assumed to be smooth, we use the common radial basis function (RBF) or smooth covariance function as eq. (
6), where are arbitrary points in latent space, is the marginal variance and is the length scale. We stack across time steps to obtain . According to the definition of Gaussian process,forms a multivariate normal distribution given latent vectors at all time steps, as
(7) 
with a covariance matrix generated by evaluating the covariance function at all pairs of latent vectors in . Finally, by stacking for neurons, we form a matrix with on the th row.
Observation model: Realworld time series data is often categorized into realvalued data and countvalued data. For realvalued data, the observation model is usually a Gaussian distribution given the firing rate and some additive noise . Marginalizing out , we obtain the observation model as
(8) 
However the observation following Gaussian distribution is infeasible under countvalued setting. Considering neural spike trains, we assume that the spike rate (nonnegative value), and the spike count of neuron at time is generated as
(9) 
In summary, our model uses an RNN structure to capture nonlinearity and long shortterm temporal dependence of latent dynamics, while keeping the flexibility of nonparametric Bayesian (GP) in learning nonlinear mapping functions. Finally, we generate Gaussian observations with Gaussian additive noise given spike rates or propagate spike rates via an exponential link function to generate Poisson observations. The graphical model is shown in Fig. 1. Denote that RNN structure is not directly applied for latent process , it is over ’s prior via a neural network mapping (shown in eq. (1) and (2)), completely different from a simple RNN for latent states as existing works, e.g., LFADS. This modeling strategy, similar to [15], establishes stochastic RNN dynamics, which gives a strong and flexible prior over the latent process. is propagated with wellcalibrated uncertainty via Gaussian process to the firing rate function . The observation is generated from with Gaussian or Poisson noise based on the applications.
3 Inference for GpRnn
Gaussian response: When the observation is Gaussian, the tuning curve in eq. (8
) can be marginalized out due to the conjugacy. Variational Bayes ExpectationMaximization (VBEM) algorithm is adopted for estimating latent states
(Estep) and parameters (Mstep). In Estep, we need to characterize the full posterior distribution , which is intractable. We employ a Gaussian distribution as the variational approximate distribution. Denoting and , we approximate with , whose mean and variance are the outputs of a highly nonlinear function of observation , and encodes the function parameters. We identify the optimal and by maximizing a variational Bayesian lower bound (also called ) as(10) 
The first term in eq. (10) represents an energy, encouraging to focus on the probability mass, . The second term (including the minus sign) represents the entropy of , encouraging it to spread the probability mass thus avoiding concentrating on one point estimate. The entropy term in eq. (10) has a closedform expression:
(11) 
The gradients of eq. (10) with respect to can be evaluated by sampling directly from , for example, using Monte Carlo integration to obtain noisy estimates of both the ELBO and its gradient [22, 23]. Score function estimator achieves it by leveraging a property of logarithms to write the gradient as
(12) 
Inference Network  Vanilla MF  VAE  rLSTM  lLSTM  biLSTM 

Variational Approximation 
which first draws samples from , and then evaluates the empirical expectation using . In general, the approximate gradient using score function estimator exhibits high variance [22], and practically we compute the integral with the “reparameterization trick” proposed by [24]. We can parameterize the multivariate normal as
(13) 
therefore is distributed as a multivariate normal with mean and covariance . We finally separate the gradient estimation as
(14) 
where is the entropy of the variational distribution. Now both gradients can be approximated with MonteCarlo estimates.
On the choice of the optimal variational distribution: In eq. (10), we consider the approximated posterior as a Gaussian, , whose mean and variance are the outputs of a highly nonlinear function of observation . Here, we consider five structured distributions by encoding in different sequential patterns shown in Table 2
: (1) vanilla mean field (MF); (2) variational autoencoder (VAE); (3) LSTM conditioned on past observations (lLSTM); (4) LSTM conditioned on future observations (rLSTM) and (5) bidirectional LSTM (biLSTM) conditioned on both past and future observations.
For lLSTM and rLSTM, “l” or “r” is an abbreviation of “left” or “right”, which considers past or future information. We parametrize mean and variance for the variational approximated posterior at time step as a function of the hidden state , e.g., for lLSTM, . We illustrate the l/r/biLSTM structure of inference networks in Fig 2. Inference network maps observation to varational parameters of approximate posterior via LSTMbased structures.
The inference network maps observations to the mean and covariance functions of approximated Gaussian latent states. The parameterization (rLSTM) can be written as
(15) 
Similar with the lLSTM. Here and are weights and bias mapping to variational parameters. In biLSTM, we use a weighted mean and variance to parameterize the variational posterior as
(16) 
All operations should be performed elementwisely on the corresponding vectors. The Gaussian approximated posterior thus summarizes both the past and future information from observations.
Algorithm summarizes the inference method for the Gaussian observation model based on variational inference.
Poisson response: When the observation model is Poisson, the integration over the mapping function in eq. (9) is now intractable due to the nonconjugacy between its GP prior and the Poisson data distribution. The interplay between and involves a highlynonlinear transformation, which makes inference difficult. Inducing points [25] and decoupled Laplace approximation [5] have been recently introduced to release this dependence and make inference tractable. In this paper, we adapt a straightforward maximum a posteriori (MAP) estimation for training both and , as
where the joint distribution
of latent variables and observations of the RHS for eq. (3) is(18)  
Eq. (18) is a joint probability with three main components: (1) Poisson spiking (observation model); (2) Gaussian process (, nonlinear embedding); and (3) recurrent neural networks (, dynamical model). During the training procedure, we adapt composing inference [26], fixing or while optimizing the other in a coordinate ascent manner. More details and the pseudoalgorithm for inference of GPRNNPoisson can be found in the supplementary.
4 Experiments
To demonstrate the superiority of GPRNN in latent dynamics recovery, we compare it against other stateoftheart methods on both extensive simulated data and a real visual cortex neural dataset.
4.1 Recovery of Lorenz Dynamics
First we recover the wellknown Lorenz dynamics in a nonlinear system. The Lorenz system describes a two dimensional flow of fluids with as latent states:
(19) 
This system has chaotic solutions (for certain parameter values) that revolve around the socalled Lorenz attractor. Lorenz uses the values , and , exhibiting a chaotic behavior, which generates a nonlinear, nonperiodic, and threedimensional complex system. It has been utilized for testing latent structure discovery in recent works [27, 28, 6].
Gaussian  AR1GPLVM  GPRNN  

MF  VAE  rLSTM  lLSTM  biLSTM  MF  VAE  rLSTM  lLSTM  biLSTM  
linear  4.12  4.10  4.01  3.27  1.64  2.17  2.17  1.98  1.54  0.96 
tanh  3.20  3.22  3.01  2.46  1.17  2.01  2.01  1.83  1.41  0.78 
sine  3.12  3.12  2.74  2.33  1.02  1.81  1.78  1.34  1.12  0.56 
time points. Underlined and bold fonts indicate best performance. Results with standard errors (ste) can be found in the supplementary.
Poisson  AR1GPLVM  GPRNN  

MF  VAE  rLSTM  lLSTM  biLSTM  MF  VAE  rLSTM  lLSTM  biLSTM  
linear  6.34  6.34  6.02  5.71  3.67  6.01  6.01  5.94  5.71  3.10 
tanh  3.22  3.21  3.01  2.84  1.57  3.09  3.11  2.98  2.54  1.21 
sine  2.80  2.79  2.77  2.51  1.49  2.67  2.67  2.43  2.33  1.14 
We simulated a threedimensional latent dynamic using Lorenz system as in eq. (19), and then apply three different mapping functions for simulations: ; ; and . Note that the oscillatory response of sine wave is wellknown as the properties of grid cells [4]). Thus, we generate simulated data with nonlinear dynamics and linear/nonlinear mapping functions. Gaussian response is the Gaussian noise corrupted version of
; Poisson spike trains are generated from a Poisson distribution with
as the spike rate.In our simulation, the latent dimension is and the number of neurons is , thus and . We randomly generate weights and bias uniformly from region [0, 1.0], and the noise is drawn from . We test the ability of each method to infer the latent dynamics of the Lorenz system (i.e., the values of the three dynamic variables) from Gaussian and Poisson responses, respectively. Models are compared in three aspects: inference network, dynamical model and mapping function.
Analysis of inference network and dynamical model: Table 3 and 4 show performance of variational approximation techniques applied to both PGPLVM with AR1 kernel (AR1GPLVM) and GPRNN models on Gaussian and Poisson response data respectively. PGPLVM with AR1 kernel is mathematically equal to the GPLVM model with LDS when the linear mapping matrix in LDS is fullrank. Therefore we are essentially comparing between LDS and RNN for dynamic modeling. In general, GPRNN outperforms AR1GPLVM via capturing complex dynamics of nonlinear systems with powerful RNN representations.
biLSTM inference networks render best results due to its consideration of both past and future information. Meanwhile, lLSTM demonstrates the importance of past dependence with better results than rLSTM. Overall, LSTMstyle inference networks have more promising results than models considering current observations only (e.g., MF and VAE).
Moreover, the inference network of VAE is not a much more expressive variational model for approximating posterior distribution compared with vanilla mean field methods. With only current time points, both of them have similar inference power (as shown in Table 3 and 4 columns of “MF” and “VAE”). VAE only has global parameters of the neural network for mapping data points to posteriors, while vanilla MF has local parameters for each data point. VAE can be scaled to largescale datasets, but the performance is upperbounded by vanilla MF [26].
Analysis of mapping function:
# Data  linear  tanh  sine  
GP  NN  GP  NN  GP  NN  
N = 50  2.51  3.88  1.45  2.75  1.97  3.43 
N = 100  1.27  1.65  1.15  1.45  1.03  1.31 
N = 200  0.96  1.29  0.78  1.22  0.56  0.70 
N = 500  0.34  0.35  0.26  0.26  0.12  0.12 
Dimension  PLDS  GCLDS  PfLDS  PGPFA  PGPLVM  GPRNN 

0.641  0.435  0.698  0.733  0.784  0.869  
0.547  0.364  0.659  0.720  0.785  0.873  
0.903  0.755  0.797  0.960  0.966  0.971 
between the linearly transformed estimation of each model and the true Lorenz dynamics. Results with standard errors (ste) can be found in the supplementary.
Table 5 shows the comparison between a neural network and a Gaussian process as the nonlinear mapping functions. The dynamical model is RNN, and the true mapping functions include linear, tanh, and sine functions. The number of data points for training () are , , and . The subsequent time points following the training time points are used for testing the accuracy of reconstructions of latent trajectories. In Table 5, We can tell that a Gaussian process provides a superior mapping function for smaller datasets for training (columns of “GP” and “NN”). When we have more time points, the prediction performance of a neural network mapping is comparable with a Gaussian process (rows of and ). Bigger datasets can help to learn complex Lorenz dynamics, and meanwhile, prevent the overfitting problem in neural network models. Smaller datasets may affect latent dynamics recovery but a Gaussian process mapping enhances nonlinear embedding recovery via keeping the local constraints.
Comparison with stateoftheart methods:
Consistent with results reported in stateoftheart methods, we compare values for latent trajectory reconstruction of our GPRNN method against others as shown in Table 6. The inference network of our model is biLSTM since the simulated results shown above demonstrate its stronger power in model fitting. Note that we use the Poisson model and compare it with recently developed models for analyzing spike trains. For each dimension of Lorenz dynamics, GPRNN significantly outperforms baseline methods, e.g., (), () and () increment of values compared with the second best model PGPLVM. We have also found several excellent works combining RNN structures with Gaussian process for either modeling or inference [17, 19, 20, 21], but note that they are not in the research line of exploring latent intrinsic structures of highdimensional real or countvalued data as stated in our work. The methods we compared in our paper (e.g., PLDS [2], GCLDS [3], PfLDS [4], PGPFA [11], and PGPLVM [5]) are to our knowledge recently proposed methods analyzing the same problems and can be more worthwhile being compared.
4.2 Application to Macaque V1 Neural Data
We apply GPRNN to the neurons recorded from the primary visual cortex of a macaque [29]. Data was obtained when the monkey was watching sinusoidal grating driftings with orientations (, , , , ), and had repeated trials for each orientation. Following [4], we consider wellbehaved neurons based on their tuning curves, and bin ms spiking activity with window size ms, resulting in time points for each trial.
We take orientation as an example for visualizing dimensional (2D) latent trajectory estimation. The other orientations exhibit similar patterns. The true firing rates of two example neurons are presented in Fig. 3 (A), which exhibit clear periodic patterns locked to the phase of the sinusoidal grating stimulus. In order to get latent dynamics estimation, we fit our model with randomly selected
repeated trials, which are used to learn RNN dynamics parameters and GP hyperparameters shared across all trials, and trialdependent latent dynamics. We also apply PfLDS and PGPLVM to the same data. For better visualization purpose, Fig.
3(B) shows the results of best trials, which are selected with 10 smallest variances from the mean trajectory within each model. PfLDS has a worse performance compared with the other two methods. Different from the result shown in [4], we report the result of trials for training instead of . Benefiting from the nonparametric Bayes (Gaussian process), in such a smalldata scenario, GPRNN extracts much more clear, compact, and structured latent trajectories, which well capture oscillatory patterns in neural responses for the grating stimulus. Meanwhile, the proposed model is able to convey interpretable sinusoidal stimulus patterns in D rings without including the external stimulus as the model variable. Therefore, GPRNN with nonlinear dynamics and nonlinear embedding function can help extract latent patterns more efficiently. Although PGPLVM also achieves promising results compared with PfLDS (still worse than our GPRNN), PGPLVM needs much more effort than GPRNN to finetune the optimization hyperparameters.We next show the quantitative prediction performance of multiple methods. The evaluation procedure is well known as “cosmoothing” [27, 5], which is a standard leaveoneneuronout test. We select all the trials with , , , and orientations of sinusoidal grating drifting. We split all the trials into training sets (40 trials) and test sets (10 trials). The modelspecific parameters, e.g., RNN dynamics and GP mapping function for GPRNN, are estimated using training sets (all neurons). Then we fix the estimated model parameters and leave one neuron in test trials out and infer latent trajectories based on the remaining neurons. The leftout neuron spiking activity is then predicted given inferred latents of test trials and estimated parameters from training trials. Consistent with the results reported in the previous literature, the prediction is quantified by . It shows the prediction performance of the firing rates compared with empirical firing rates of the leftout neurons. We iterate over all neurons as leftout ones and average the prediction values for each model shown in Table. 7. In this neural dataset, each recently proposed method can only increase the value by a small amount, which is still nontrivial to achieve. GPRNN has already doubled the increment from PfLDS ( increase of value) to PGPLVM ( increase).
Dim  PLDS  PGPFA  LFADS  PfLDS  PGPLVM  GPRNN 

2  0.68  0.69  0.73  0.73  0.74  0.77 
4  0.69  0.72  0.74  0.73  0.75  0.78 
6  0.72  0.73  0.74  0.74  0.77  0.80 
8  0.74  0.74  0.75  0.75  0.77  0.80 
10  0.75  0.74  0.77  0.76  0.77  0.81 
PGPLVM and PfLDS have comparable results and we think they benefit from nonlinear mapping functions, i.e., feedforward neural network and Gaussian process. PLDS and PGPFA use linear mapping but cannot capture nonlinear embeddings, and require more latent dimensionality to achieve similar results as PGPLVM and PfLDS. Our GPRNN with RNN dynamics and GP mapping provides the most competitive prediction accuracy, due to its nonlinear dynamical model encoding time dependence and complex nonlinear embedding function with uncertainty propagation.
4.3 Implementation Notes
We have encountered the risk of overparameterization during our experiments. When the algorithm breaks down, increasing the number of hidden nodes of RNN structures cannot improve the results much. We successfully avoid it via (1) using crossvalidation to choose the number of hidden states (the risk happened with more than hidden nodes in this experimental dataset); (2) adopting Dropout(0.3)/L2 regularization for RNN gates. Too many hidden states of RNN dynamics will lead to learning both hidden states and cell states failure, also too few hidden states report much lower prediction performance (we fix hidden nodes ultimately in our experiments); (3) applying orthogonal initialization for RNN gates and clipping gradients tricks during training; and (4) instead of marginalizing out the latent function in the Poisson model, adopting the composing inference strategy and using GPFA to initialize . The experiments are benefited from the probabilistic modeling library “Edward” [26].
With respect to the stable learning process, it is robust when applying orthogonal initialization for RNN gates, Xavier Initialization for parameters of fully connected layers (mapping hidden states to latent states ), and clipping gradients tricks during training. This combination is a relatively effective way of eliminating exploding and vanishing gradients, and provides a robust learning process.
Concerning sample perturbations, in the simulation, we randomly (both Poisson and Gaussian noise) generated the observations and parameters of mapping functions (Gaussian noise) for times; and with real neural data, we shuffled the training/testing datasets for times. The learning was based on these sample perturbations (trial variants) and the abovementioned initialization strategies. The analysis of the sample perturbations are listed in the supplementary materials with standard errors.
5 Conclusion
To discover the insightful latent structure from neural data, we propose an unsupervised Gaussian process recurrent neural network (GPRNN), utilizing the representation power of recurrent neural networks and the flexible nonlinear mapping function with Gaussian process. We show that GPRNN is superior at recovering more structured latent trajectories as well as having better quantitative performance compared with other stateoftheart methods. Besides the visual cortex dataset tested in the paper, the proposed model can also be potentially applied to analyzing the neural dynamics of primary motor cortex, prefrontal cortex (PFC) or posterior parietal cortex (PPC) which plays a significant role in cognition (evidence integration, short term memory, spatial reasoning, etc.). The model can also be applied to other domains, e.g., finance, healthcare, for extracting lowdimensional, underlying latent states from complicated time series. Our codes and additional materials are available at https://github.com/sheqi/GPRNN_UAI2019.
References
 [1] M Yu Byron, John P Cunningham, Gopal Santhanam, Stephen I Ryu, Krishna V Shenoy, and Maneesh Sahani. Gaussianprocess factor analysis for lowdimensional singletrial analysis of neural population activity. In Advances in Neural Information Processing Systems (NeurIPS), pages 1881–1888, 2009.
 [2] Jakob H Macke, Lars Buesing, John P Cunningham, M Yu Byron, Krishna V Shenoy, and Maneesh Sahani. Empirical models of spiking in neural populations. In Advances in Neural Information Processing Systems (NeurIPS), pages 1350–1358, 2011.
 [3] Yuanjun Gao, Lars Busing, Krishna V Shenoy, and John P Cunningham. Highdimensional neural spike train analysis with generalized count linear dynamical systems. In Advances in Neural Information Processing Systems (NeurIPS), pages 2044–2052, 2015.
 [4] Yuanjun Gao, Evan W Archer, Liam Paninski, and John P Cunningham. Linear dynamical neural population models through nonlinear embeddings. In Advances in Neural Information Processing Systems (NeurIPS), pages 163–171, 2016.
 [5] Anqi Wu, Nicholas G Roy, Stephen Keeley, and Jonathan W Pillow. Gaussian process based nonlinear latent structure discovery in multivariate spike train data. In Advances in Neural Information Processing Systems (NeurIPS), pages 3499–3508, 2017.
 [6] Chethan Pandarinath, Daniel J O’Shea, Jasmine Collins, Rafal Jozefowicz, Sergey D Stavisky, Jonathan C Kao, Eric M Trautmann, Matthew T Kaufman, Stephen I Ryu, Leigh R Hochberg, et al. Inferring singletrial neural population dynamics using sequential autoencoders. Nature methods, page 1, 2018.

[7]
Qi She, Yuan Gao, Kai Xu, and Rosa HM Chan.
Reducedrank linear dynamical systems.
In
ThirtySecond AAAI Conference on Artificial Intelligence (AAAI)
, 2018.  [8] Rahul G Krishnan, Uri Shalit, and David Sontag. Structured inference networks for nonlinear state space models. In The Thirtyfirst AAAI Conference on Artificial Intelligence (AAAI), pages 2101–2109, 2017.
 [9] Qi She and Rosa HM Chan. Stochastic dynamical systems based latent structure discovery in highdimensional time series. In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 886–890. IEEE, 2018.

[10]
Neil D Lawrence.
Gaussian process latent variable models for visualisation of high dimensional data.
In Advances in Neural Information Processing Systems (NeurIPS), pages 329–336, 2004.  [11] Hooram Nam. Poisson extension of gaussian process factor analysis for modeling spiking neural populations. Master’s thesis, Department of Neural Computation and Behaviour, Max Planck Institute for Biological Cybernetics, Tübingen, 2015.

[12]
Guan Yu.
Variance stabilizing transformations of poisson, binomial and negative binomial distributions.
Statistics & Probability Letters, 79(14):1621–1629, 2009. 
[13]
Matthew D Hoffman, David M Blei, Chong Wang, and John Paisley.
Stochastic variational inference.
The Journal of Machine Learning Research (JMLR)
, 14(1):1303–1347, 2013.  [14] M Bishop Christopher. Pattern recognition and machine learning. SpringerVerlag New York, 2016.
 [15] Junyoung Chung, Kyle Kastner, Laurent Dinh, Kratarth Goel, Aaron C Courville, and Yoshua Bengio. A recurrent latent variable model for sequential data. In Advances in Neural Information Processing Systems (NeurIPS), pages 2980–2988, 2015.
 [16] Karol Gregor, Ivo Danihelka, Alex Graves, Danilo Jimenez Rezende, and Daan Wierstra. Draw: A recurrent neural network for image generation. arXiv preprint arXiv:1502.04623, 2015.
 [17] Roger Frigola, Yutian Chen, and Carl Edward Rasmussen. Variational gaussian process statespace models. In Advances in Neural Information Processing Systems (NeurIPS), pages 3680–3688, 2014.
 [18] Trung V Nguyen, Edwin V Bonilla, et al. Collaborative multioutput gaussian processes. In Annual Conference on Uncertainty in Artificial Intelligence (UAI), pages 643–652, 2014.
 [19] César Lincoln C Mattos, Zhenwen Dai, Andreas Damianou, Jeremy Forth, Guilherme A Barreto, and Neil D Lawrence. Recurrent gaussian processes. arXiv preprint arXiv:1511.06644, 2015.
 [20] Andreas Svensson, Arno Solin, Simo Särkkä, and Thomas Schön. Computationally efficient bayesian learning of gaussian process state space models. In Artificial Intelligence and Statistics (AISTATS), 2016 International Conference on, pages 213–221, 2016.
 [21] Stefanos Eleftheriadis, Tom Nicholson, Marc Deisenroth, and James Hensman. Identification of gaussian process state space models. In Advances in Neural Information Processing Systems (NeurIPS), pages 5309–5319, 2017.
 [22] Rajesh Ranganath, Sean Gerrish, and David Blei. Black box variational inference. In Artificial Intelligence and Statistics (AISTATS), 2014 International Conference on, pages 814–822, 2014.
 [23] Evan Archer, Il Memming Park, Lars Buesing, John Cunningham, and Liam Paninski. Black box variational inference for state space models. arXiv preprint arXiv:1511.07367, 2015.
 [24] Diederik P Kingma and Max Welling. Autoencoding variational bayes. arXiv preprint arXiv:1312.6114, 2013.
 [25] Andreas C Damianou, Michalis K Titsias, and Neil D Lawrence. Variational inference for latent variables and uncertain inputs in gaussian processes. The Journal of Machine Learning Research (JMLR), 17(1):1425–1486, 2016.
 [26] Dustin Tran, Matthew D. Hoffman, Rif A. Saurous, Eugene Brevdo, Kevin Murphy, and David M. Blei. Deep probabilistic programming. In International Conference on Learning Representations (ICLR), 2017.
 [27] Yuan Zhao and Il Memming Park. Variational latent gaussian process for recovering singletrial dynamics from population spike trains. Neural Computation, 29(5):1293–1316, 2017.
 [28] Scott Linderman, Matthew Johnson, Andrew Miller, Ryan Adams, David Blei, and Liam Paninski. Bayesian learning and inference in recurrent switching linear dynamical systems. In Artificial Intelligence and Statistics (AISTATS), 2017 International Conference on, pages 914–922, 2017.
 [29] Arnulf BA Graf, Adam Kohn, Mehrdad Jazayeri, and J Anthony Movshon. Decoding the activity of neuronal populations in macaque primary visual cortex. Nature neuroscience, 14(2):239, 2011.
Comments
There are no comments yet.