DeepAI

# Uniform Inference in High-Dimensional Gaussian Graphical Models

Graphical models have become a very popular tool for representing dependencies within a large set of variables and are key for representing causal structures. We provide results for uniform inference on high-dimensional graphical models with the number of target parameters being possible much larger than sample size. This is in particular important when certain features or structures of a causal model should be recovered. Our results highlight how in high-dimensional settings graphical models can be estimated and recovered with modern machine learning methods in complex data sets. We also demonstrate in simulation study that our procedure has good small sample properties.

• 4 publications
• 1 publication
• 19 publications
02/14/2012

### Learning mixed graphical models from data with p larger than n

Structure learning of Gaussian graphical models is an extensively studie...
11/15/2010

### Characterization of differentially expressed genes using high-dimensional co-expression networks

We present a technique to characterize differentially expressed genes in...
10/22/2021

### Temporal Point Process Graphical Models

Many real-world objects can be modeled as a stream of events on the node...
01/30/2013

### Exact Inference of Hidden Structure from Sample Data in Noisy-OR Networks

In the literature on graphical models, there has been increased attentio...
04/05/2011

### On Identifying Significant Edges in Graphical Models of Molecular Networks

Objective: Modelling the associations from high-throughput experimental ...
07/29/2020

### Connecting actuarial judgment to probabilistic learning techniques with graph theory

Graphical models have been widely used in applications ranging from medi...
10/20/2022

### Graphical model inference with external network data

A frequent challenge when using graphical models in applications is that...

## 1 Introduction

We provide methodology and theory for uniform inference on high-dimensional graphical models with the number of target parameters being possible much larger than sample size. We demonstrate uniform asymptotic normality of the proposed estimator over -dimensional rectangles and construct simultaneous confidence bands on all of the target parameters. The proposed method can be applied to test simultaneously the presence of a large set of edges in the graphical model

 X=(X1,…,Xp)T∼N(μX,ΣX).

Assuming that the covariance matrix is nonsingular, the conditional independence structure of the distribution can be conveniently represented by a graph , where is the set of nodes and the set of edges in

. Every pair of variables not contained in the edge set is conditionally independent given all remaining variables. If the vector

is normally distributed, every edge corresponds to a non-zero entry in the inverse covariance matrix (Lauritzen (1996))

[8].

In the last decade, significant progress has been made on estimation of a large precision matrix in order to analyze the dependence structure of a high-dimensional normal distributed random variable. There are mainly two common approaches to estimate the entries of a precision matrix. The first approach is a penalized likelihood estimation approach with a lasso-type penalty on entries of the precision matrix, typically referred to as the graphical lasso. This approach has been studied in several papers, see e.g Lam and Fan (2009)

[7], Rothman et al. (2008) [12], Ravikumar et al. (2011) [10] and Yuan and Lin (2007) [16]. The second approach, first introduced by Meinshausen and Bühlmann (2006) [9], is neighborhood based. It estimates the conditional independence restrictions separately for each node in the graph and is hence equivalent to variable selection for Gaussian linear models. The idea of estimating the precision matrix column by column by running a regression for each variable against the rest of variables was further studied in Yuan (2010) [15], Cai, Liu and Zhou (2011) [4] and Sun and Zhang (2013) [13].

In this paper, we do not aim to estimate the whole precision matrix but we focus on quantifying the uncertainty of recovering its support by providing a significance test for a set of potential edges. In recent years, statistical inference for the precision matrix in high-dimensional settings has been studied, e.g in Janková and van de Geer (2016) [6] and Ren et al. (2015) [11]. Both approaches lead to an estimate that is elementwise asymptotically normal and enables testing for low-dimensional parameters of the precision matrix using standard procedures such as Bonferroni-Holm correction.
In contrast to these existing results, our method explicitly allows for testing a joint hypothesis without correction for multiple testing and conducting inference for a growing number of parameters using high dimensional central limit results. In order to provide theoretical results, fitting the problem of support discovery in Gaussian graphical models into a general Z-estimation setting with a high-dimensional nuisance function is key. Inference on a (multivariate) target parameter in general Z-estimation problems in high dimensions is covered in Belloni et al. (2014) [2], Belloni et al. (2015) [1] and Chernozhukov et al. (2017) [5].
Additionally, our results rely on approximate sparsity instead of row sparsity which restricts the number of non-zero entries of each row of the precision matrix to be at most that is in many applications a questionable assumption. In this context, we establish a theorem regarding uniform estimation rates of the lasso/ post lasso estimator in high-dimensional regression models under approximate sparsity conditions.

## 2 Setting

Let

 X=(X1,…,Xp)T∼N(μX,ΣX)

be a -dimensional random variable. For all with , assume that

 Xj=p∑l=1l≠jβ(j)lXl+ε(j)=β(j)X−j+ε(j)

and

 Xk=γ(j,k)X−{j,k}+ν(j,k),

where and . Define the column vector

 Γ(j)=(−β(j)1,…,−β(j)j−1,1,−β(j)j+1,…,−β(j)p)T.

One may show

 Φ0=(Φ10,…,Φp0)=(Γ(1)/Var(ε(1)),…,Γ(p)/Var(ε(p))),

where is the -th column of the precision matrix [6]. Hence

 β(j)k=0⇔β(j)k=0⇔Xj⊥Xk|X−{j,k} (2.1)

for all . Assume that we are interested in the following set of potential edges

 M:={m1,…,mdn}

where the number of edges may increase with sample size . In the following the dependence on is omitted to simplify the notation. In order to test whether all variables and are conditionally independent with for a , we have to estimate our target parameter

 θ0=(θm1,…,θmd)T:=(β(j1)k1,…,β(jd)kd)T.

The setting above fits in the general Z-estimation problem of the form

 E[ψmr(X,θmr,ηmr)]=0

with nuisance parameters

 ηmr=(β(j)−k,γ(j,k))

where and . The score functions are defined by

 ψmr(X,θ,η): =(Xj−θXk−η(1)X−mr)(Xk−η(2)X−mr)

for , and . Without loss of generality we assume for all tuples .

###### Comment 2.1.

The score function is linear, meaning

 ψmr(X,θ,η)=ψamr(X,η(2))θ+ψbmr(X,η)

with

 ψamr(X,η(2))=−Xk(Xk−η(2)X−mr)

and

 ψbmr(X,η)=(Xj−η(1)X−mr)(Xk−η(2)X−mr)

for and .

It is well known that in partially linear regression models

satisfies the moment condition

 E[ψmr(X,θmr,ηmr)]=0 (2.2)

and also the Neyman orthogonality condition

 ∂t{E[ψmr(X,θmr,ηmr+t~η)]}∣∣t=0

for all in an appropriate set where denotes the derivate with respect to . These properties are crucial for valid inference in high dimensional settings. We will show these properties explicitly in the proof of Theorem 1.

## 3 Estimation

Let , be i.i.d. random vectors.
At first we estimate the nuisance parameter

by running a lasso/ post-lasso regression of

on to compute and a lasso/ post-lasso regression of on to compute for each . The estimator of the target parameter

 θ0=(θm1,…,θmd)T

is defined as the solution of

 supr=1,…,d{∣∣En[ψmr(X,^θmr,^ηmr)]∣∣−infθ∈Θmr∣∣En[ψmr(X,θ,^ηmr)]∣∣}≤ϵn, (3.1)

where is the numerical tolerance and a sequence of positive constants converging to zero.

Assumptions A1-A4.
Let . The following assumptions hold uniformly in :

1. [label=A0,ref=A0]

2. For all with we have the following approximate sparse representations

• It holds

 Xj =β(j)X−j+ε(j) =θmrXk+β(mr)X−mr+ε(mr) =θmrXk+(β(1,mr)+β(2,mr))X−mr+ε(mr)

with and .

• It holds

 Xk =γ(j,k)X−{j,k}+ν(j,k) =γ(mr)X−mr+ν(mr) =(γ(1,mr)+γ(1,mr))X−mr+ν(mr)

with and .

3. There exists a positive numbers and such that the following growth conditions are fulfilled:

 n1~qslog4(an)n(log2(log(an))∨s)=o(1),log(d)=o(n1/9∧n2cq(1+cq)~q).
4. For all it holds

 ∥β(mr)∥2+∥γ(mr)∥2≤C

and

 supr=1,…,dsupθ∈Θmr|θ|≤C.

5. It holds

 inf∥ξ∥2=1E[(ξX)2]≥c and sup∥ξ∥2=1E[(ξX)2]≤C.

The condition 1 is a standard approximate sparsity condition. Condition 3 restricts the parameter spaces and ensures that the coefficients are well behaved. The last condition 4 restricts the correlation between the components of

and bounds the variances of each

from below and above. Assumptions 1-4 combined with the normal distribution of imply the conditions 1-3 from theorem 2 which enables us to estimate the nuisance parameter sufficiently fast.

###### Comment 3.1.

If we have exact sparsity for each with the sparsity of follows directly.
Observe that for and we have

 β(k)l=0⇔Xk⊥Xl|X−{k,l}⇔E[XkXl|X−{k,l}]=0

which implies

 E[XkXl|X−{j,k,l}]=E[E[XkXl|X−{k,l}]|X−{j,k,l}]=0

and thereby

 γ(j,k)l=0⇔Xk⊥Xl|X−{j,k,l}⇔E[XkXl|X−{j,k,l}]=0.

## 4 Main results

We will prove that the assumptions of Corollary from Belloni et al. (2015) [1]

hold and hence we are able to use their results to construct confidence intervals even for a growing number of hypothesis

. Define

 Jmr :=∂θE[ψmr(X),θ,ηmr]∣∣θ=θmr=−E[Xk(Xk−η(2)mrX−mr)] σ2mr :=E[J−2mrψ2mr(X,θmr,ηmr)]

and the corresponding estimators

 ^Jmr =−En[Xk(Xk−^η(2)mrX−mr)] ^σ2mr =En[^J−2mrψ2mr(X,^θmr,^ηmr)]

for . To construct confidence intervals we will employ the Gaussian multiplier bootstrap. Define

 ^ψmr(X):=−^σ−1mr^J−1mrψmr(X,^θmr,^ηmr)

and the process

where are independent standard normal random variables which are independent from . We define as the

-conditional quantile of

given the observations . The following theorem is the main result of our paper and establishes simultaneous confidence bands for the target parameter .

###### Theorem 1.

Under the assumptions 1-4

with probability

uniformly in the estimator in (3.1) obeys

 P(^θmr−cα^σmr√n≤θmr≤^θmr+cα^σmr√n,r=1,…,d)→1−α. (4.1)

Using theorem 1

we are able to construct standard confidence regions which are uniformly valid over a large set of variables and we can check null hypothesis of the form:

 H0:M∩E=∅.
###### Comment 4.1.

Theorem 1 is basically an application of the gaussian approximation and multiplier bootstrap for maxima of sums of high-dimensional random vectors [chernozhukov2013gaussian]

. The central limit theorem and bootstrap in high dimension introduced by Chernozhukov, Chetverikov, Kato et al. (2017)

[chernozhukov2017central] extend these results to more general sets, more precisely sparsely convex sets. Hence our main theorem can be easily generalized to various confidence regions that contain the true target parameter with probability . Theorem 1 provides critical regions of the form

 supr=1,…,d∣∣ ∣∣√n^θmr^σmr∣∣ ∣∣>c1−α. (4.2)

Alternatively, we can reject the null hypothesis if

 supr=1,…,d∣∣ ∣∣√n^θmr^σmr∣∣ ∣∣c1−α2. (4.3)

Both of these regions are based on the central limit theorem for hyperrectangles in high dimensions. The confidence region (4.3) is motivated by the fact that the standard normal distribution in high dimensions is concentrated in a thin spherical shell around the sphere of radius as described by Roman Vershynin (2017) [vershynin2017high] and therefore might have smaller volume. More generally, define

 ^θ∗mr(S,exp)=S∑s=1(√n^θmr−s^σmr−s)exp

for a fix , and

 r−s:={r−sifr−s>0d+(r−s)otherwise.

A test that reject the null hypothesis if

 supr=1,…,d∣∣^θ∗mr(S,exp)∣∣>c∗1−α (4.4)

has level by [chernozhukov2017central], since the constructed confidence regions correspond to S-sparsely convex sets. Here, is the -conditional quantile of given the observations with

 ^N∗mr=S∑s=1(^Nmr−s)exp

where

 r−s:={r−sifr−s>0d+(r−s)otherwise.

## 5 The function GGMtest

The function that will be added to the -package estimates the target coefficients

 (θm1,…,θmd)T=(β(j1)k1,…,β(jd)kd)T

corresponding the considered set of potential edges

 M:={m1,…,mdn}

by the proposed method described in section 3. It can be used to perform hypothesis tests with asymptotic level based on the different confidence regions described in comment 4.1. The nuisance function can be estimated by lasso, post-lasso or square root lasso. The verification of uniform convergence rates of the square root lasso estimator for functional parameters in high-dimensional settings is an interesting problem that we plan to address in future work.

### 5.1 cross-fitting

In general Z- estimation problems where a so called debiased or double machine learning (DML) method is used to construct confidence intervals, it is common to use cross-fitting in order to improve small sample properties. A detailed discussion of cross-fitted DML can be found in Chernozhukov et al. (2017) [5]. The following algorithm generalizes our proposed method to a -fold cross fitted version. We assume that is divisible by in order to simplify notation.

###### Algorithm 1.

1) Take a -fold random partition of observation indices such that the size of each fold is . Also, for each , define . 2) For each and , construct an estimator

 ^ηk,mr=^ηmr((Xi)i∈Ick)

by lasso/ post-lasso or square root lasso. 3) For each , construct an estimator as in 3.1:

 ≤ϵn

with . 4) Aggregate the estimators:

 ^θK=1KK∑k=1^θk.

5) For construct the uniform valid confidence interval

 [^θKmr−cα^σKmrn,^θKmr+cα^σKmrn]

with

 ^JKmr =−1KK∑k=1(Xk(Xk−^η(2)k,mrX−mr)), ^σKmr = ⎷(^JKmr)−21KK∑k=1(ψ2mr(X,^θKmr,^ηk,mr)).

is the bootstrap quantile of with

 ^Nmr=1√nn∑i=1ξi^ψKmr(X(i))

where are independent standard normal random variables which are independent from and

 ^ψKmr(X):=−(^σKmr^JKmr)−1ψmr(X,^θKmr,^ηKmr).

The confidence region above corresponds to (4.2). Confidence regions corresponding to (4.3) or (4.4) can be constructed in an analogous way.

## 6 Simulation Study

This section provides a simulation study on the proposed method. In each example the precision matrix of the Gaussian graphical model is generated as in the -package [17]. Hence, the corresponding adjacency matrix is generated by setting the nonzero off-diagonal elements to be one and each other element to be zero. To obtain a positive definite pre-version of the precision matrix we set

 Φpre:=v⋅A+(|Λmin(v⋅A)|+0.1+u)⋅Ip×p.

Here and are chosen to control the magnitude of partial correlations. The covariance matrix is generated by inverting and scaling the variances to one. The corresponding precision matrix is given by . For some given we generate independent samples of

 X=(X1,…,Xp)∼N(0,Σ)

and evaluate whether our test statistic would reject the null hypothesis for a specific set of edges

which satisfies the null hypothesis. Finally the acceptance rate is calculated over independent simulations for a given confidence level .

### 6.1 Simulation settings

In our simulation study we estimate the correlation structure of four different designs that are described in the following.

#### 6.1.1 Example 1: Random Graph

Each pair of off-diagonal elements of the covariance matrix of the first regressors is randomly set to non-zero with probability . The last regressor is added as an independent random variable. It results in about edges in the graph. The corresponding precision matrix is of the form

 Φ:=⎛⎜ ⎜ ⎜ ⎜ ⎜ ⎜⎝\raisebox−15.0pt{B}0⋮00⋯01⎞⎟ ⎟ ⎟ ⎟ ⎟ ⎟⎠

where is a sparse matrix. We test the hypothesis, whether the last regressor is independent from all other regressors, corresponding to

 M={(p,1),…,(p,p−1)}.

#### 6.1.2 Example 2: Cluster Graph

The regressors are evenly partitioned into disjoint groups. Each pair of off-diagonal elements is set non-zero with probability , if both and belong to the same group. It results in about edges in the graph. The precision Matrix is of the form

 Φ:=⎛⎜ ⎜ ⎜⎝B1\huge\mbox{{0}}B2B3\huge\mbox{{0}}B4⎞⎟ ⎟ ⎟⎠

where each block is a sparse matrix. We test the hypothesis that the first two hubs are conditionally independent. This corresponds to testing the tuples

 M={(1,p/4+1),…,(1,p/2),(2,p/4+1),…,(p/4,p/2)}.

#### 6.1.3 Example 3: Approximately Sparse Random Graph

In this example we generate a random graph structure as in example , but instead of setting the other elements of the adjacency matriy

to zero we generate independent random entries from a uniform distribution on

with . This results in a precision matrix of the form

 Φ:=⎛⎜ ⎜ ⎜ ⎜ ⎜ ⎜⎝\raisebox−15.0pt{B}0⋮00⋯01⎞⎟ ⎟ ⎟ ⎟ ⎟ ⎟⎠

where is not a sparse matrix anymore. We then again test the hypothesis, whether the last regressor is independent from all other regressors, corresponding to

 M={(p,1),…,(p,p−1)}.

#### 6.1.4 Example 4: Independent Graph

By setting

 Φ:=Ip×p

we generate samples of independent normal distributed random variables. We can test the hypothesis whether the regressors are independent by choosing

 M={(1,2),…,(1,p),(2,3),…,(p−1,p)}.

### 6.2 Simulation results

We provide simulated acceptance rates for all of the examples above calculated by the function GGMtest with bootstrap samples. Confidence Interval I corresponds to the standard case in (4.2), whereas Confidence Interval II is based on the approximation of the sphere in (4.3). In summary, the results reveal that the empirical acceptance rate is, on average, close to the nominal level of with a mean absolute deviation of over all simulations. The Confidence Interval II has got a mean absolute deviation of and performs significantly better than Confidence Interval I with a mean absolute deviation of . More complex S-sparsely convex sets seem to result in better acceptance rates, whereas higher exponents do not improve the rates. The lowest mean absolute deviation () is achieved in table 2 for , and without cross-fitting.

## Appendix A Proof of Theorem 1

###### Proof.

We want to use corollary from Belloni et al. (2015) [1]. Consequently, we will show that their assumptions 2.1-2.4 and the growth conditions of corollary hold by modifying the proof of corollary in [1]. Let be an arbitrary set in . We have

due to the assumptions 3 and 4. Define the convex set

 Tmr={η=(η(1),η(2)):η(1)∈Rp−2,η(2)∈Rp−2}

and endow with the norm

 ||η||e=||η(1)||2∨||η(2)||2.

Further let and define the nuisance realization set

 Tmr= {(β(mr),γ(mr))}∪{η∈Tr:||η(1)||0∨||η(2)||0≤Cs ,||η(1)−β(mr)||2∨||η(2)−γ(mr)||2≤Cτn,||η(2)−γ(mr)||1≤C√sτn}.

for a sufficiently large constant . First we verify Assumption 2.1 (i). The moment condition holds since

 E[ψmr(X,θmr,ηmr)] =E[ε(mr)ν(mr)] =E[E[ε(mr)ν(mr)|X−j]]=E[ν(mr)E[ε(mr)|X−j]=0]=0.

 Sn: =E[maxr|√nEn[ψmr(X,θmr,ηmr)]|] =E[supf∈FGn(f)]

with . By the same arguments as in the beginning of proof of theorem 2 we conclude that the envelope of fulfills

 ||maxr|ε(mr)ν(mr)|||P,q =E[maxr(|ε(mr)ν(mr)|)q]1/q ≤E[maxr(|ε(mr)|)2q]1/2qE[maxr(|ν(mr)|)2q]1/2q ≤C(q)log(d).

Since , using lemma O.2 (Maximal Inequality I) in [1], we have

 Sn≤Clog1/2(d)+C(q)log1/2(d)(n2qlog3(d)n)1/2≲log1/2(d)

by the assumption 2. Hence, assumption 3 implies that for all , contains an interval of radius centered at for all sufficiently large for any constant . Assumption 2.1 (i) follows.

For all , the map is twice continuously Gateaux-differentiable on , and so is the map . Further we have

 Dmr,0[η,ηmr]: =∂tE[ψmr(X,θmr,ηmr+t(η−ηmr))]∣∣t=0 =E[∂t{(Xj−θmrXk−(η(1)mr+t(η(1)−η(1)mr))X−mr) (Xk−(η(2)mr+t(η(2)−η(2)mr))X−mr)}]∣∣t=0 =E[ε(mr)(η(2)mr−η(2))X−mr]+E[(η(1)mr−η(1))X−mrν(mr)] =0.

Therefore, Assumptions 2.1 (ii) and 2.1 (iii) hold. Remark that

 |Jmr| =|∂θE[ψmr(X,θ,ηmr)]|θ=θmr| =|E[−X