Gaussian graphical models (GGM) have been extensively used to represent sparsity structures emerging from conditional independence relations among a collection of Gaussian random variables. The graph determines the zeroes of the precision (inverse-covariance) matrix of multivariate observations. Of key interest is conducting multivariate analysis when the variables conform to a given graph. Covariance selection(dempster1972covariance)
offers a seminal result to answer this specific question. If the graph is unknown, then classical approaches, such as the graphical Lasso, estimate a sparse precision matrix through L-1 regularization(friedman2008sparse; meinshausen2006high)
. Bayesian alternatives employ Markov chain Monte Carlo (MCMC) algorithms with a conjugate hyper-inverse Wishart prior for estimating both the graph and the precision matrix(dobra2011bayesian; roverato2002hyper; carvalho2007simulation).
Graphical Gaussian models are customarily applied to multivariate data sets, where the data units, i.e., the nodes of the graph, are scalar- or vector-valued. Data sets often comprise multiple functions or processes that conceptually exist in continuum over a domain (e.g., over space or time). Examples of such functional data include physical activity or heart rate continuously measured from actigraph monitors (wearables), EEG/fMRI signals from different brain regions (neuroimaging), multiple-gene expressions from cells distributed over a tissue (spatial transcriptomics), population counts of multiple biological species over a region (ecology) and multiple pollutants measured from environmental monitoring stations (environmental sciences).
Multivariate functional data is often modelled using a latent multivariate stochastic process, such as a multivariate Gaussian process (GP) (wackernagel2013multivariate; banerjee2014hierarchical; creswikle11; qiao2019functional). Hence, it is natural to seek extensions of graphical models for multivariate functional data, where each node represents an entire function or stochastic process. Two recent approaches have extended the GGM to represent conditional dependencies among multivariate stochastic processes on continuous domains: (i) functional GGM (FGGM); and (ii) graphical Gaussian Process (GGP). The former relies on a multivariate Karhounen-Loève basis expansion and imposes sparsity on the precision matrices of the Gaussian coefficient vectors via penalised estimation (zapata2019partial; zhu2016bayesian). Functional GGMs have been primarily used for functional data analysis, which customarily uses basis function representations but has also been applied to spatial data (krock2021modeling). The method requires replicated data as it estimates the unknown graph.
Graphical GP (dey2021ggp) focuses on multivariate spatial analysis and proposes a multivariate GP covariance function that exactly conforms to a prespecified graph. This new class of multivariate GPs is essentially an extension of Dempster’s covariance selection to infinite-dimensional settings. Given a non-graphical multivariate covariance function, it obtains the optimal graphical covariance function with process-level conditional independence as specified by an undirected graph. A graphical GP is constructed by an algorithm called stitching, which ensures scalable likelihood-based analysis of highly multivariate spatial data with a large number of outcome variables by leveraging the sparsity of the graph. Graphical GP retains marginal and edge-specific covariances from a parametric multivariate covariance function, thereby facilitating interpretation of the spatial attributes of each process via inference on the parameters, which is often the primary inferential objective in spatial analysis. Unlike the functional GGM, graphical GP does not need replicate data which is unavailable in many spatial applications.
Our current contribution can be summarised as follows. Functional GGM and graphical GP are apparently complementary methods for introducing graphical models in multivariate GP. The former relies on basis function expansions and is typically used for functional data. Graphical GP focuses on spatial data and directly uses covariance functions. We bridge these two seemingly disconnected paradigms. We formally establish that the class of partially separable functional GGM (zapata2019partial) with a specific model for the covariance matrices of the coefficients is equivalent to a graphical GP. Furthermore, using Nyström’s method, we relate the stitching algorithm for construction of graphical GP to a finite-term truncation of basis functions used in functional GGM. The theoretical connections made here have important implications for multivariate functional analysis. When the graph is known (e.g. existing brain-region networks, phyologenetic trees etc.), current estimation strategies for functional GGM cannot use or respect knowledge of the graph. We leverage the equivalence between functional GGM and graphical GP to design two algorithms for analysis of multivariate functional data that exactly preserve the given functional conditional independence relationships.
2 The two paradigms
2.1 Functional Gaussian graphical models
Let be a GP over a continuous domain with node representing the univariate process for . For a set of variables , two univariate GPs and are conditionally independent, given the remaining processes if for all and , where (dey2021ggp). Here is the -algebra generated by its argument.
Functional GGM introduces this process-level conditional independence through the precision matrices of coefficient vectors in its basis expansion. Letting be the set of orthonormal basis functions in the domain , the process is represented as . zhu2016bayesian shows that modelling the coefficients to be Gaussian and assuming a sparsity structure between the discrete multivariate GP imposes the corresponding process-level conditional independence among the component processes of . In practice, they truncate the expansion of to terms and fit a GGM on the dimensional vector of coefficients. The approach involves inversion of an matrix, thereby requiring floating point operations (FLOPs) or FLOPs when for all . The computational expense being cubic in the number of basis functions , prohibits large values which, in turn, can restrict the richness of the function class.
The functional GGM in zapata2019partial and krock2021modeling introduce two further assumptions: (a) a common orthonormal basis function, i.e, for all ; and (b) the coefficients corresponding to different basis functions are independent, i.e, and are independent, where . Due to the separability in the joint covariance structure of basis coefficients, they are formally termed as partial separable processes (zapata2019partial). Their practical implementation truncates the basis function expansion to terms and uses graphical Lasso on the estimated covariance of the coefficient vectors to induce sparsity. This partial separable structure only needs inversions of matrices and considerably reduces computational complexity to . All functional GGM (i) need replicate observations to estimate basis functions and sample covariances, and cannot be applied to spatial data without replicates; (ii) do not retain the marginal processes as we need to truncate the basis functions.
2.2 Graphical Gaussian Processes
Spatial data often lack replicates and is, therefore, modelled using GPs with parametric covariance functions. This contrasts the functional data setup with replicates, which enables functional GGM to estimate a non-parameteric covariance function. The Matérn family of covariances (stein1999interpolation)
are popular in spatial analysis because their parameters can be interpreted as the variance, spatial smoothness and range of the process.dey2021ggp introduced graphical Gaussian process (GGP) models attuned to analyse highly multivariate spatial data. These are specified through multivariate covariance functions that exactly encode an inter-variable graphical model. Theorem 1 in dey2021ggp
asserts that given a non-graphical stationary multivariate GP and an inter-variable graph, the unique and optimal graphical GP approximating the original GP (in terms of integrated spectral Kullback-Leibler divergence) will retain the univariate marginal distributions and edge-specific cross-covariances from the original GP. This result generalises covariance selection(dempster1972covariance) from finite-dimensional covariance matrices to infinite-dimensional covariance operators. We denote this optimal graphical GP derived from the original covariance function and a graph as .
is a graphical GP on given a valid cross-covariance function on and a graph , such that for all , for all and , and for all .
If is a multivariate Matérn covariance funtion (gneiting2010matern; apanasovich2012valid), then each component of is a Matérn GP. Hence, the spatial properties of each variable can be studied from the estimates of the respective Matérn covariance parameters. Construction of the graphical GP uses an algorithm called “stitching” which performs covariance selection using on restricted to a finite set of locations , and then extends it to the infinite domain. We first define covariance selection for positive definite (p.d.) matrices.
Given a graph and a matrix with row and column blocks indexed by , is the unique p.d. matrix from Dempster’s covariance selection on using , i.e., for or , and for .
Given , and , stitching first defines the graphical GP on and then extends it to .
is the low rank predictive process and is the residual GP with Decomposition of a univariate GP into a predictive process and an orthogonal residual process is well known (banerjee2008gaussian). Stitching introduces two innovations to this decomposition. First, the process on is endowed with the covariance instead of . This ensures that the graph is encoded while preserving marginals on . Orthogonality implies that the marginals are preserved over when the residual processes, ’s, are stitched to the low-rank Gaussian process . Second, the ’s are chosen to be component-wise independent ensuring that the graph encoded over is inherited to process-level conditional independence relations over .
For a decomposable sparse graph, stitched graphical GP likelihood involves considerably less parameters and FLOPs than that of a full GP with covariance . Unlike functional GGM, stitched graphical GP does not require replicated data for parameter estimation or predictions.
3 Theoretical connections between functional GGM and graphical GP
3.1 Covariance selection on partially separable Gaussian processes
Consider a partially-separable multivariate GP on a continuous domain ,
Theorem 3.1 formally proves that the graphical GP derived from is a partially separable functional GGM that exactly preserves a given graphical model.
Denote , , and . The cross-covariance function of is denoted by and the conditional cross-covariance function is . By Theorem 3 of zapata2019partial,
(If part:) Suppose for all . To prove we first show it is a graphical GP. For any , by the property of covariance selection on each , we have for all . Hence, from (4) for all , and is a graphical GP with respect to the graph . Next we show retains the marginals and edge-specific cross-covariances from . This is immediate as once again for all or , we have by the property of covariance selection
and consequently .
(Only if part:) Let . Since it is a graphical GP, then for all and . By orthogonality of ’s,
This holds for all implying for all . By definition (2.1), retains the marginal and edge-specific cross-covariances. Hence, for or , implying
This holds for all implying for all , or . Hence, satisfies all the conditions of covariance selection (Definition 2.2) and is the unique matrix . ∎
The result shows that starting with a partially separable GP, the graphical GP can be obtained by applying covariance selection on each covariance matrix of the basis coefficients, and this graphical GP is a partially separable functional GGM. We subsequently discuss in Section 4 how this facilitates an estimation strategy for functional GGM that exactly conforms to a given graph.
3.2 Stitching and truncation
In practice, we cannot work with infinite basis expansions and use a finite-term truncation. The following result on truncated functional GGM is an immediate consequence of Theorem 3.1.
Consider the truncated process with covariance function . Then is given by where for .
We now relate this truncated functional GGM to a low-rank stitched graphical GP (recall Section 2.2) through an approximate result. Let be the partially separable GP in (3) with cross-covariance function , be the partially separable derived from using Theorem 3.1 and be the finite-term truncation of in Corollary 3.1.1. Then , where is the stitched predictive process defined in (2). To derive this result, we observe that where and is a random variable whose distribution is specified in (1). Next, as , the component processes retain the marginals. When the reference set is chosen to have locations, the truncated process will satisfy (see, e.g., p. 389-390 in banerjee2014hierarchical)
This approximate equivalence result is related the Nÿstrom approximation for kernel matrices (drineas2005nystrom). Rewriting (5) as and since , we only need and to approximately have the same distribution to show . As , for or , we have by definition of covariance selection. Finally, we show that for , as follows:
The middle equality follows from the -algebras and being the same since the latter is generated by the former and contains the former. Corollary 3.1.1 ensures the last equality since is a graphical GP conforming to .
This establishes the equivalence of -truncated partially separable GP with a stitched low-rank predictive process on reference locations. Analogous to Theorem 3.1, proving the equivalence of the theoretical constructions of functional GGM and graphical GP, this result establishes an analogy between the practical procedures of truncation and stitching for the two approaches. Section 3.2 also highlights an important distinction between truncation and stitching. Stitched graphical GPs, as specified in (2), is the sum of a stitched low-rank predictive process and an independent residual processes, , where . The truncated functional GGM is only equivalent to the first component and ignores the residual component; i.e., . This is a potential drawback of the truncated functional GGM as it does not respect the marginals and edge-specific cross-covariances of the original process. To see this, we state with the following.
Let denote the finite-rank predictive process approximation of on (banerjee2008gaussian), i.e., . Then the stitched predictive process of (2) is .
As and agree on the marginals and edge-specific cross-covariances, so do and . Also, as is generated by and contains , the sigma algebras and are the same. Since is a GGM on , is a graphical GP on with respect to proving . ∎
The multivariate predictive process has been shown, theoretically and empirically, to underestimate the marginal variances of the original process , oversmoothing and inferior performance (stein2014limitations; nngp). From Theorem 3.2, we know that the stitched predictive process is the graphical GP that preserves the marginals from the multivariate predictive process . Hence, inherits these issues of not retaining the marginals of the original process and so will the truncated functional GGM (using the approximate equivalence between the two shown in this Section). For a univariate predictive process this deficiency is fixed by adding the residual process (finley2009improving). Stitching proceeds similarly in a multivariate graphical model by introducing in (2) to exactly preserve the marginals. Next, we leverage this connection between truncation and stitching to present an estimation strategy for functional GGM that allows conducting multivariate functional analysis while preserving both the graph and the marginals.
In scientific applications, we either know the graph apriori or have to estimate it. Functional GGM is widely deployed in the former setting (zapata2019partial; krock2021modeling). However, if the graph is known (e.g., a known phylogenetic tree as a graph to model multiple species distributions over a region or a known auto-regressive time-dependence structure to model spatio-temporal pollutant data), current functional GGM approaches cannot explicitly encode this information and preserve the graphical model in the analysis.
Theorem 3.1 enables an extension of partially separable functional GGM that conforms to a given graph. Under the constraint of a given graphical model , we want to find the maximum likelihood estimate (MLE) of the covariance operator of random multivariate functional observations generated from a mean-zero partial separable GP as in (3). Let where denotes the correlation matrix for . Let denote the MLE of without any graphical constraints and , then following dempster1972covariance the MLE of that respects is given by
Here, following zapata2019partial, the basis functions estimates
are from a functional principal components analysis. We formally lay out the above steps in Algorithm1 in the Supplement and term our approach as FGGM-CovSel.
The truncation is usually chosen so that the first eigen-functions ’s explain a given percentage of the total variance. A larger improves the retention of marginal covariance functions but requires larger . Covariance selection on the ’s is performed using the iterative proportional scaling (IPS) algorithm (speed1986gaussian) which has a total complexity of FLOPs for the terms. For large , this mandates choosing a smaller to reduce computations. This tradeoff leads to a worse approximation of the marginal covariances as finite-term truncation is equivalent to low-rank stitching at locations (see the discussion in Section 3.2).
To remedy this, we modify Algorithm 1 to emulate the full rank stitching procedure of dey2021ggp by adding component specific residual processes. We first run FGGM-CovSel for a small or moderate as afforded by the computing resources. Then, we calculate the residuals and perform functional principal component analysis of the component-specific residuals and add the diagonal matrix of the estimated residual variances to our covariance estimate from Algorithm 1. Adding residual terms to the truncated functional GGM independently for each component process is equivalent to constructing the full-rank stitched graphical GP in (2) by adding the independent ’s to the stitched predictive process . This yields significant improvements in retaining the univariate distributions. The Algorithm is referred to as FGGM-Stitch and is formally laid out in Algorithm 2 of the Supplement.
5 Simulation studies
We perform a series of simulation experiments to gauge the performance accuracy of FGGM, FGGM-CovSel and FGGM-Stitch algorithms. The details of the data-generation mechanisms are provided in Section S2 of the Supplement. In Figure 1 we compare the true and estimated marginal covariance functions for a correctly specified setting for the FGGM. We see that FGGM-Stitch has a significantly improved estimate of the marginal covariance than both FGGM and FGGM-CovSel. This aligns with our theoretical insights from Section 3.2. For the cross-covariances, FGGM-Stitch and FGGM-CovSel estimates are the same. From Figure S2 we see that they offer improvement over the FGGM estimate on account of exactly preserving the graph. This improvement is consistent across all the edge-specific cross-covariances and also in misspecified settings (Table S1). We refer to Section S2 for more details on the results.
Supplementary Materials for “On the Relationship between Graphical Gaussian Processes and Functional Gaussian Graphical Models”
Debangan Dey, Abhirup Datta, Sudipto Banerjee
S1 Algorithms for Section 5
S2 Detailed Simulation Results
For the data-generation in the simulation studies, we consider variable GPs obeying the conditional independence structure from the graph in Figure S1. We consider two data-generating mechanisms – (a) PS: Partial-separable GP of (zapata2019partial) – We fix total number of basis functions at for data generation. We take them to be the Fourier basis functions and evaluate them at equally spaced points between . Now to generate the basis-specific coefficients, we generate the covariance matrices as for , where, the precision matrices are generated for the graph (Figure S1) based on the algorithm of peng2009partial. The decaying constants ensure that is monotonically decreasing in . (b) GM: Graphical Mateŕn of (dey2021ggp) – The cross-covariance functions were obtained by stitching (dey2021ggp) using multivariate Matérn model (apanasovich2012valid) and the graph (Figure S1). For every pair of variables, the isotropic multivariate Matérn cross-covariance function on a -dimensional domain is defined as , where , being the Matérn correlation function (apanasovich2012valid). are interpreted as the cross-covariance, cross-scale and cross-smoothness parameters. To ensure positive-definiteness of cross-covariance matrix, we parametrize as , where , and , is correlation matrix. For this simulation, we take , and and . The marginal range parameters and variance parameters were permutations of equispaced numbers in , while the ’s where chosen as a random correlation matrix. For data generation locations, we take points on the real line equispaced between .
We compare the estimated marginal and edge-specific cross-covariance surfaces qualitatively using some heat-maps and quantitatively using the distance between true and estimated covariance sub-matrices in terms of Kullback-Leibler (KL) divergence. The KL divergence between two matrices and is defined as- . In case (a), which is the truly specified case, we present the results comparing FGGM-CovSel and the FGGM algorithm in zapata2019partial (implemented using the R package FGM (fgm)). The estimates of the cross-covariance surfaces for the proposed algorithms FGGM-CovSel and FGGM-Stitch are the same. This estimate performs better than the FGGM estimate (Figure (a)a). This behavior is expected since FGGM assumes the graph structure as unknown. Case (b) presents a mis-specified scenario where the data is generated from a non-partial separable multivariate GP such as Graphical Mateŕn. Figure (b)b shows that FGGM-CovSel performs poorer than the Set (a) but captures the banded patterns of Mateŕn to some extent. This difficulty is expected since we’re trying to fit a misspecified partial separable cross-covariance to a non-partially separable GP. Overall, across all edges (Table S1) and in both simulation scenarios, FGGM-CovSel reports lower KL divergence between true and estimated matrices than FGGM. FGGM-CovSel performs according to our expectations in both of the examples thus validating the theoretical basis of our approach. This is in addition to the benefit demonstrated in Figure 1 of using FGGM-Stitch to better capture the marginal variances.
|Set (a)||Set (b)|