The celebrated discoveries of place cells, grid cells, and similar structures in the hippocampus have produced a detailed, experimentally validated theory of the formation and processing of spatial memories. However, the specific characteristics of non-spatial memories, e.g. memories of odors and sounds, are still poorly understood. Recent results from human fMRI and EEG experiments suggest that dynamic functional connectivity (DFC) is important for the encoding and retrieval of memories demertzi2019human khosla2018machine nielsen2016nonparametric
, yet DFC in local field potentials (LFP) in animal models has received relatively little attention. We here propose a novel latent factor Gaussian process (LFGP) model for DFC estimation and apply it to a data set of rat hippocampus LFP during a non-spatial memory taskallen2016nonspatial . The model produces strong statistical evidence for DFC and finds distinctive patterns of DFC associated with different experimental stimuli.
Due to the high-dimensionality of time-varying covariance and the complex nature of cognitive processes, effective analysis of DFC requires balancing model parsimony, flexibility, and robustness to noise. DFC models fall into a common framework with three key elements: dimensionality reduction, covariance estimation from time series, and identification of connectivity patterns preti2017dynamic
. Many neuroimaging studies use a combination of various methods, such as sliding window (SW) estimation, principal component analysis (PCA), and the hidden Markov model (HMM). In general, these methods are not fully probabilistic, which can make uncertainty quantification and inference difficult in practice.
Bayesian latent factor models provide a probabilistic approach to modeling dynamic covariance that allows for simultaneous dimensionality reduction and covariance process estimation. Examples include the latent factor stochastic volatility (LFSV) model kastner2017efficient and the nonparametric covariance model fox2015bayesian . In the LFSV model, an autoregressive process is imposed on the latent factors and can be overly restrictive. While the nonparametric model is considerably more flexible, the matrix process for time-varying loadings adds substantial complexity.
Aiming to bridge the gap between these factor models, we propose the latent factor Gaussian process (LFGP) model. In this approach, a latent factor structure is placed on the covariance process of a non-stationary multivariate time series, rather than on the observed time series itself as in other factor models. Since covariance matrices lie on the manifold of symmetric positive-definite (SPD) matrices, we utilize the Log-Euclidean metric to allow unconstrained modeling of the vectorized upper triangle of the covariance process. Dimension reduction and model parsimony is achieved by representing each covariance element as a linear combination of Gaussian process latent factorslawrence2004gaussian .
In this work, we highlight three major advantages of the LFGP model for practical DFC analysis. First, through the prior on the Gaussian process length scale, we are able to incorporate scientific knowledge to target specific frequency ranges that are of scientific interest. Second, the model posterior allows us to perform Bayesian inference for scientific hypotheses, for instance, whether the LFP time series is non-stationary, and if characteristics of DFC differ across experimental conditions. Third, the latent factors serve as a low-dimensional representation of the covariance process, which facilitates visualization of complex phenomena of scientific interest, such as the role of DFC in stimuli discrimination in the context of a non-spatial memory experiment.
2.1 Sliding Window Covariance Estimation
Sliding window methods have been extensively researched for the estimation and analysis of DFC, particularly in human fMRI studies; applications of these methods have identified significant associations of DFC with disease status, behavioral outcomes, and cognitive differences in humans. See preti2017dynamic for a recent detailed review of existing literature. For a -variate time series of length with covariance process , the sliding window covariance estimate with window length can be written as the convolution for the rectangular kernel , where is the indicator function. Studies of the performance of sliding window estimates recommend the use of a tapered kernel to decrease the impact of outlying measurements and to improve the spectral properties of the estimate handwerker2012periodic ; allen2014tracking ; leonardi2015spurious . In the present work we employ a Gaussian taper with scale defined as where is a normalizing constant. The corresponding tapered SW estimate is .
2.2 Log-Euclidean Metric
Direct modeling of the covariance process from the SW estimates is complicated by the positive definite constraint of the covariance matrices. To ensure the model estimates are positive definite, it is necessary to employ post-hoc adjustments, or to build the constraints into the model, typically by utilizing the Cholesky or spectral decompositions. The LFGP model instead uses the Log-Euclidean framework of symmetric positive definite (SPD) matrices to naturally ensure positive-definiteness of the estimated covariance process while also simplifying the model formulation and implementation.
Denote the space of SPD matrices as . For , the Log-Euclidean distance is defined by where Log is the matrix logarithm, and is the Frobenius norm. The metric space is a Riemannian manifold that is isomorphic to with the usual Euclidean norm, for .
Methods for modeling covariances in regression contexts via the matrix logarithm were first introduced in chiu1996matrix . The Log-Euclidean framework for analysis of SPD matrices in neuroimaging contexts was first proposed in arsigny2005fast , with further applications in neuroimaging having been developed in recent years zhu2009intrinsic . The present work is a novel application of the Log-Euclidean framework for DFC analysis.
2.3 Bayesian Latent Factor Models
For , the simple Bayesian latent factor model is with , and a matrix of factor loadings aguilar1998bayesian . is commonly assumed to be a diagonal matrix, implying the latent factors capture all the correlation structure of the features of . The latent factor model shares some similarities with principal component analysis, but includes a stochastic error term, which leads to a different interpretation of the resulting factors.
Variants of the linear factor model have been developed for modeling non-stationary multivariate time series prado2010time ; motta2012evolutionary . In general, these models represent the -variate observed time series as a linear combination of latent factors , with loading matrix and errors : . From this general modeling framework, numerous methods for capturing the non-stationary dynamics in the underlying time series have been developed, such as latent factor stochastic volatility (LFSV) kastner2017efficient , dynamic conditional correlation lindquist2014evaluating , and the nonparametric covariance model fox2015bayesian .
2.4 Gaussian Processes
A Gaussian process () is a continuous stochastic process for which any finite collection of points are jointly Gaussian with some specified mean and covariance. A can be understood as a distribution on functions belonging to a particular reproducing kernel Hilbert space (RKHS) determined by the covariance operator of the process van2008reproducing . Typically, a zero mean is assumed (i.e. the functional data has been centered by subtracting a consistent estimator of the mean), so that the is parameterized entirely by the kernel function that defines the pairwise covariance. Let . Then for any and we have
Further details are given in rasmussen2004gaussian .
3 Latent Factor Gaussian Process Model
We consider estimation of dynamic covariance from a sample of independent time series with variables and time points. Denote the th observed -variate time series by , . We assume that each follows an independent distribution with zero mean and stochastic covariance process . To model the covariance process, we first compute the Gaussian tapered sliding window covariance estimates for each , with fixed window size and taper to obtain . We then apply the matrix logarithm to obtain the length vector specified by , where maps a matrix to its vectorized upper triangle. We refer to as the “log-covariance" at time .
The resulting can be modeled as an unconstrained -variate time series. The LFGP model represents as a linear combination of latent factors through an loading matrix and independent Gaussian errors . The loading matrix is held constant across observations and time. Here is modeled as a product of independent Gaussian processes. Placing priors on the loading matrix
, Gaussian noise variance, and Gaussian process hyper-parameter , gives a fully probabilistic latent factor model on the covariance process:
The LFGP model employs a latent distribution of curves to capture temporal dependence between observations, thus inducing a Gaussian process on the log-covariance . This conveniently allows multiple observations to be modeled as different realizations of the same induced as done in lan2017flexible . The model posteriors are conditioned on different observations despite sharing the same kernel. For better identifiability, the variance scale is fixed so that the loading matrix can be unconstrained.
The log-covariance process induced by the LFGP model is weakly stationary when the GP kernel is weakly stationary.
The covariance of the log-covariance process depends only on the static loading matrix and the factor covariance kernels. Explicitly, for factor kernels , and assuming , with constant across observations and time, the covariance of elements of is
which is weakly stationary when is weakly stationary. ∎
Posterior contraction. To consider posterior contraction of the LFGP model, we make the following assumptions. The true log-covariance process is in the support of the product , for and defined above, with known number of latent factors . The kernel is -Hölder continuous with . is a smooth function in with respect to the Euclidean norm, and that the prior for has support on a given interval . Under the above assumptions, bounds on the posterior contraction rates then follow from previous results on posterior contraction of Gaussian process regression for -smooth functions given in ghosal2007convergence van2008rates . Specifically,
for sufficiently large and with posterior contraction rate for some , where is the expectation of the posterior under the model priors.
To illustrate posterior contraction in the LFGP model, we simulate data for five signals with various sample sizes () and numbers of observation time points (), with a covariance process generated by two latent factors. To measure model bias, we consider the mean squared error of posterior median of the reconstructed log-covariance series. To measure posterior uncertainty, the posterior sample variance is used. As shown in Table 1, both sample size and number of observation time points contribute to posterior contraction.
|12.212 (20.225)||7.845 (8.743)||7.089 (7.714)||5.869 (7.358)|
|6.911 (7.588)||4.123 (5.836)||3.273 (3.989)||3.237 (3.709)|
|3.728 (5.218)||1.682 (2.582)||1.672 (2.659)||1.672 (1.907)|
Large prior support. The prior distribution of the log-covariance process is a linear combination of independent s each with mean 0 and kernel . That is, each log-covariance element will have prior . Considering fixed, the resulting prior for has support equal to the closure of the reproducing kernel Hilbert space (RKHS) with kernel , where
is the covariance tensor formed by stackingvan2008reproducing . Accounting for the prior of , a function
will have nonzero prior probabilityif is in the closure of the RKHS with kernel for some in the support of .
3.3 Factor Selection via the Horseshoe Prior
Similar to other factor models, the number of latent factors in the LFGP model has a crucial effect on model performance, and must be selected somehow. For Bayesian factor analysis, there is extensive literature on factor selection methods, such as Bayes factors, reversible jump samplinglopes2004bayesian , and shrinkage priors bhattacharya2011sparse . While we can compare different models in terms of goodness-of-fit, we cannot compare their latent factors in a meaningful way due to identifiability issues. Therefore, we instead iteratively increase the number of factors and fit the new factors on the residuals resulting from the previous fit. In order to avoid overfitting with too many factors, we place a horseshoe prior on the loadings of the new factors, so that the loadings shrink to zero if the new factor is unnecessary.
Introduced by carvalho2009handling , the horseshoe prior in the regression setting is given by
and can be considered as a scale-mixture of Gaussian distributions. A small global scale
encourages shrinkage, while the heavy tailed Cauchy distribution allows the loadings to escape from zero. The example shown in Figure1 illustrates the shrinkage effect of the horseshoe prior when iteratively fitting an LFGP model with four factors to simulated data generated from three latent factors. For sampling from the loading posterior distribution, we use the No-U-Turn Sampler hoffman2014no as implemented in PyStan carpenter2017stan .
3.4 Scalable Computation
The LFGP model can be fit via Gibbs sampling, as commonly done for Bayesian latent variable models. In every iteration, we first sample from the conditional as are jointly multivariate Gaussian where the covariance can be written in terms of . However, it is worth noting that this multivariate Gaussian has a large covariance matrix, which could be computationally expensive to invert. Given , the parameters andis directly available. For the parameter posterior , either Metropolis random walk or slice sampling neal2003slice can be used within each Gibbs step because the parameter space is low dimensional.
For efficient posterior sampling, it is essential to exploit the structure of the covariance matrix. For each independent latent factor , there are independent sets of observations at time points. Therefore, the covariance matrix has dimensions . To reduce the computational burden, we notice that the covariance can be decomposed using a Kronecker product , where is the temporal covariance. The cost to invert using this decomposition is , which is a substantial reduction compared to the original cost . For many choices of kernel, such as the squared-exponential or Matérn kernel,
has a Toeplitz structure and can be approximated through interpolationwilson2015kernel , further reducing the computational cost.
Combining the latent factors (dimensions ) and loading matrix (dimensions ) induces a on . The dimensionality of is so the full covariance matrix is prohibitive to invert. As every column of Y is a weighted sum of the factors, the covariance matrix can be written as a sum of Kronecker products , where is the covariance matrix of the th latent factor and is a matrix based on the factor loadings. We can regress residuals of on each column of iteratively to sample from the conditional distribution so that the residual covariance is only . The inversion can be done in a computationally efficient way with the following matrix identity
where and are the spectral decompositions. In the identity, obtaining costs and , which is a substantial reduction from the cost of direct inversion, ; calculating is straightforward since and are diagonal.
4.1 Model Comparisons on Simulated Data
We here consider three benchmark models: sliding window with principal component analysis (SW-PCA), hidden Markov model, and LFSV model. SW-PCA and HMM are commonly used in DFC studies but have severe limitations. The sliding window covariance estimates are consistent but noisy, and PCA does not take the estimation error into account. HMM is a probabilistic model and can be used in conjunction with a time series model, but it is not well-suited to capturing smoothly varying dynamics in brain connectivity.
To compare the performance of different models, we simulate time series data with time-varying covariance . The covariance follows deterministic dynamics that are given by . We consider three different scenarios of dynamics
: square waves, piece-wise linear functions, and cubic splines. Note that both square waves and piece-wise linear functions give rise to dynamics that are not well-represented by the LFGP model when the squared-exponential kernel is used. For each scenario, we randomly generate 100 time series data sets and fit all the models. The evaluation metric is reconstruction loss of the covariance as measured by the Log-Euclidean distance. The simulation results in Table2 show that the proposed LFGP model has the lowest reconstruction loss among the methods considered.
|Square save||0.693 (0.499)||1.003 (1.299)||4.458 (2.416)||0.380 (0.420)|
|Piece-wise||0.034 (0.093)||0.130 (0.124)||0.660 (0.890)||0.027 (0.088)|
|Smooth spline||0.037 (0.016)||0.137 (0.113)||0.532 (0.400)||0.028 (0.123)|
Median reconstruction loss (standard deviation) across 100 data sets
Each time series has 10 variables with 1000 observations and the latent dynamics are 4-dimensional as illustrated in Figure 3. For the SW-PCA model, the sliding window size is 50 and the number of principal components is 4. For the HMM, the number of hidden states is increased gradually until the model does not converge, following the implementation outlined in bilmes1998gentle . For the LFSV model, the R package factorstochvol is used with default settings. All simulations are run on a 2.7 GHz Intel Core i5 Macbook Pro laptop with 8GB memory. The model source code will be made publicly available after the review process.
4.2 Application to Rat Hippocampus Local Field Potentials
To investigate the neural mechanisms underlying the temporal organization of memories, allen2016nonspatial recorded neural activity in the CA1 region of the hippocampus as rats performed a sequence memory task. The task involves the presentation of repeated sequences of 5 stimuli (odors A, B, C, D, and E) at a single port and requires animals to correctly identify each stimulus as being presented either “in sequence” (e.g., ABC…) or “out of sequence” (e.g., ABD…) to receive a reward. Here the model is applied to local field potential (LFP) activity recorded from the rat hippocampus, but the key reason for choosing this data set is that it provides a rare opportunity to subsequently apply the model to other forms of neural activity data collected using the same task (including spiking activity from different regions in rats holbrook2017bayesian and whole-brain fMRI in humans).
LFP signals were recorded in the hippocampi of five rats performing the task with high accuracy. The LFP data analyzed here come from one particular rat and include 200 trials. During each trial, the LFP signals are sampled at 1000Hz for one second after odor release. We focus on 41 trials of odor B and 37 trials of odor C. There are 21 LFP channels in total but we focus the analysis on the six channels that recorded most of the neurons and were confirmed to be positioned in the same cell layer. Figure4 shows the time series of these six LFP channels for a single trial.
We treat all 78 trials as different realizations of the same stochastic process without distinguishing the stimuli explicitly in the model. In order to facilitate interpretation of the latent space representation, we fit two latent factors which explain about 40% of the variance in the data. The prior for length scale is a Gamma distribution concentrated around 100ms on the time scale to encourage learning frequency dynamics close to the theta range (4-12 Hz). Notably, oscillations in this frequency range have been associated with memory function but have not previously been shown to differentiate among the type of stimuli used here, thus providing an opportunity to test the sensitivity of the model. 5000 MCMC draws are taken with the first 1000 draws discarded as burn-in.
In dynamic connectivity studies, it is often of interest to test for the presence of connectivity dynamics, and for group differences in DFC. As shown in Figure 5
, there is high posterior probability that the correlation between two specific LFP channels is time varying and different between odors. For each odor, we can calculate the posterior median latent factors across trials and visualize them as a trajectory in the latent space. This visualization shows that the two trajectories start in an almost overlapping area, with separation occurring around 250ms. This is corroborated by the experimental data indicating that animals begin to identify the odor 200-250ms after onset. We also observe that the two trajectories converge toward the end of the odor presentation. This is also consistent with the experimental data showing that, by then, animals have correctly identified the odors and are simply waiting to perform the response (thereby resulting in similar neural states). Therefore, while this pattern alone does not provide significant statistical evidence, it is well aligned with the scientific interpretation of the experimental data.
Collectively, these results provide compelling evidence that this model can use LFP activity to differentiate the representation of different stimuli, as well as capture their expected dynamics within trials. This level of differentiation has frequently been accomplished by analyzing spiking activity, but not LFP activity alone. This approach, which may be applicable to other types of neural data including spiking activity and fMRI activity, may significantly advance our ability to understand how information is represented among brain regions.
The proposed LFGP model is a novel application of latent factor models for directly modeling the dynamic covariance in multivariate non-stationary time series. As a fully probabilistic approach, the model naturally allows for inference regarding the presence of DFC, and for detecting differences in connectivity across experimental conditions. Moreover, the latent factor structure enables visualization and scientific interpretation of connectivity patterns. Currently, the main limitation of the model is scalability with respect to the number of observed signals. Thus, in practical applications it may be necessary to select a relevant subset of the observed signals, or apply some form of clustering of similar signals. Future work will consider simultaneously reducing the dimension of the signals and modeling the covariance process to improve the scalability and performance of the LFGP model.
The Gaussian process regression framework is a new avenue for analysis of DFC in many neuroimaging modalities. Within this framework, it is possible to incorporate other covariates in the kernel function to naturally account for between-subject variability. In our setting, multiple trials are treated as independent observations or repeated measurements from the same rat, while in human neuroimaging studies, there are often single observations from many subjects. Pooling information across subjects in this setting could yield more efficient inference and lead to more generalizable results.
This work was supported by NIH award R01-MH115697 (B.S., H.O., N.J.F), NSF award DMS-1622490 (B.S.), Whitehall Foundation Award 2010-05-84 (N.J.F.), NSF CAREER award IOS-1150292 (N.J.F.), and NSF award BSC-1439267 (N.J.F.).
-  Athina Demertzi, Enzo Tagliazucchi, S Dehaene, G Deco, P Barttfeld, Federico Raimondo, Charlotte Martial, D Fernández-Espejo, B Rohaut, HU Voss, et al. Human consciousness is supported by dynamic complex patterns of brain signal coordination. Science Advances, 5(2):eaat7603, 2019.
-  Meenakshi Khosla, Keith Jamison, Gia H Ngo, Amy Kuceyeski, and Mert R Sabuncu. Machine learning in resting-state fmri analysis. arXiv preprint arXiv:1812.11477, 2018.
-  Søren FV Nielsen, Kristoffer H Madsen, Rasmus Røge, Mikkel N Schmidt, and Morten Mørup. Nonparametric modeling of dynamic functional connectivity in fmri data. arXiv preprint arXiv:1601.00496, 2016.
-  Timothy A Allen, Daniel M Salz, Sam McKenzie, and Norbert J Fortin. Nonspatial sequence coding in ca1 neurons. Journal of Neuroscience, 36(5):1547–1563, 2016.
-  Maria Giulia Preti, Thomas AW Bolton, and Dimitri Van De Ville. The dynamic functional connectome: State-of-the-art and perspectives. Neuroimage, 160:41–54, 2017.
-  Gregor Kastner, Sylvia Frühwirth-Schnatter, and Hedibert Freitas Lopes. Efficient bayesian inference for multivariate factor stochastic volatility models. Journal of Computational and Graphical Statistics, 26(4):905–917, 2017.
Emily B Fox and David B Dunson.
Bayesian nonparametric covariance regression.
The Journal of Machine Learning Research, 16(1):2501–2542, 2015.
Neil D Lawrence.
Gaussian process latent variable models for visualisation of high dimensional data.In Advances in neural information processing systems, pages 329–336, 2004.
-  Daniel A Handwerker, Vinai Roopchansingh, Javier Gonzalez-Castillo, and Peter A Bandettini. Periodic changes in fmri connectivity. Neuroimage, 63(3):1712–1719, 2012.
-  Elena A Allen, Eswar Damaraju, Sergey M Plis, Erik B Erhardt, Tom Eichele, and Vince D Calhoun. Tracking whole-brain connectivity dynamics in the resting state. Cerebral cortex, 24(3):663–676, 2014.
-  Nora Leonardi and Dimitri Van De Ville. On spurious and real fluctuations of dynamic functional connectivity during rest. Neuroimage, 104:430–436, 2015.
-  Tom YM Chiu, Tom Leonard, and Kam-Wah Tsui. The matrix-logarithmic covariance model. Journal of the American Statistical Association, 91(433):198–210, 1996.
-  Vincent Arsigny, Pierre Fillard, Xavier Pennec, and Nicholas Ayache. Fast and simple calculus on tensors in the log-euclidean framework. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pages 115–122. Springer, 2005.
-  Hongtu Zhu, Yasheng Chen, Joseph G Ibrahim, Yimei Li, Colin Hall, and Weili Lin. Intrinsic regression models for positive-definite matrices with applications to diffusion tensor imaging. Journal of the American Statistical Association, 104(487):1203–1212, 2009.
-  Omar Aguilar, Gabriel Huerta, Raquel Prado, and Mike West. Bayesian inference on latent structure in time series. Bayesian Statistics, 6(1):1–16, 1998.
-  Raquel Prado and Mike West. Time series: modeling, computation, and inference. Chapman and Hall/CRC, 2010.
-  Giovanni Motta and Hernando Ombao. Evolutionary factor analysis of replicated time series. Biometrics, 68(3):825–836, 2012.
-  Martin A Lindquist, Yuting Xu, Mary Beth Nebel, and Brain S Caffo. Evaluating dynamic bivariate correlations in resting-state fmri: a comparison study and a new approach. NeuroImage, 101:531–546, 2014.
-  Aad W van der Vaart, J Harry van Zanten, et al. Reproducing kernel hilbert spaces of gaussian priors. In Pushing the limits of contemporary statistics: contributions in honor of Jayanta K. Ghosh, pages 200–222. Institute of Mathematical Statistics, 2008.
-  C. E. Rasmussen. Gaussian processes in machine learning. In Advanced lectures on machine learning, pages 63–71. Springer, 2004.
-  Shiwei Lan, Andrew Holbrook, Norbert J Fortin, Hernando Ombao, and Babak Shahbaba. Flexible bayesian dynamic modeling of covariance and correlation matrices. 2017.
-  Subhashis Ghosal, Aad Van Der Vaart, et al. Convergence rates of posterior distributions for noniid observations. The Annals of Statistics, 35(1):192–223, 2007.
-  Aad W van der Vaart, J Harry van Zanten, et al. Rates of contraction of posterior distributions based on gaussian process priors. The Annals of Statistics, 36(3):1435–1463, 2008.
-  Hedibert Freitas Lopes and Mike West. Bayesian model assessment in factor analysis. Statistica Sinica, 14(1):41–68, 2004.
-  Anirban Bhattacharya and David B Dunson. Sparse bayesian infinite factor models. Biometrika, pages 291–306, 2011.
-  Carlos M Carvalho, Nicholas G Polson, and James G Scott. Handling sparsity via the horseshoe. In Artificial Intelligence and Statistics, pages 73–80, 2009.
-  Matthew D Hoffman and Andrew Gelman. The no-u-turn sampler: adaptively setting path lengths in hamiltonian monte carlo. Journal of Machine Learning Research, 15(1):1593–1623, 2014.
-  Bob Carpenter, Andrew Gelman, Matthew D Hoffman, Daniel Lee, Ben Goodrich, Michael Betancourt, Marcus Brubaker, Jiqiang Guo, Peter Li, and Allen Riddell. Stan: A probabilistic programming language. Journal of statistical software, 76(1), 2017.
-  Radford M Neal et al. Slice sampling. The annals of statistics, 31(3):705–767, 2003.
-  Andrew Wilson and Hannes Nickisch. Kernel interpolation for scalable structured gaussian processes (kiss-gp). In International Conference on Machine Learning, pages 1775–1784, 2015.
-  Jeff A Bilmes et al. A gentle tutorial of the em algorithm and its application to parameter estimation for gaussian mixture and hidden markov models. 1998.
-  Andrew Holbrook, Alexander Vandenberg-Rodes, Norbert Fortin, and Babak Shahbaba. A bayesian supervised dual-dimensionality reduction model for simultaneous decoding of lfp and spike train signals. Stat, 6(1):53–67, 2017.