DeepAI
Log In Sign Up

Uniform Inference in High-Dimensional Gaussian Graphical Models

Graphical models have become a very popular tool for representing dependencies within a large set of variables and are key for representing causal structures. We provide results for uniform inference on high-dimensional graphical models with the number of target parameters being possible much larger than sample size. This is in particular important when certain features or structures of a causal model should be recovered. Our results highlight how in high-dimensional settings graphical models can be estimated and recovered with modern machine learning methods in complex data sets. We also demonstrate in simulation study that our procedure has good small sample properties.

READ FULL TEXT VIEW PDF
02/14/2012

Learning mixed graphical models from data with p larger than n

Structure learning of Gaussian graphical models is an extensively studie...
11/15/2010

Characterization of differentially expressed genes using high-dimensional co-expression networks

We present a technique to characterize differentially expressed genes in...
10/22/2021

Temporal Point Process Graphical Models

Many real-world objects can be modeled as a stream of events on the node...
01/30/2013

Exact Inference of Hidden Structure from Sample Data in Noisy-OR Networks

In the literature on graphical models, there has been increased attentio...
04/05/2011

On Identifying Significant Edges in Graphical Models of Molecular Networks

Objective: Modelling the associations from high-throughput experimental ...
07/29/2020

Connecting actuarial judgment to probabilistic learning techniques with graph theory

Graphical models have been widely used in applications ranging from medi...
10/20/2022

Graphical model inference with external network data

A frequent challenge when using graphical models in applications is that...

1 Introduction

We provide methodology and theory for uniform inference on high-dimensional graphical models with the number of target parameters being possible much larger than sample size. We demonstrate uniform asymptotic normality of the proposed estimator over -dimensional rectangles and construct simultaneous confidence bands on all of the target parameters. The proposed method can be applied to test simultaneously the presence of a large set of edges in the graphical model

Assuming that the covariance matrix is nonsingular, the conditional independence structure of the distribution can be conveniently represented by a graph , where is the set of nodes and the set of edges in

. Every pair of variables not contained in the edge set is conditionally independent given all remaining variables. If the vector

is normally distributed, every edge corresponds to a non-zero entry in the inverse covariance matrix (Lauritzen (1996))

[8].

In the last decade, significant progress has been made on estimation of a large precision matrix in order to analyze the dependence structure of a high-dimensional normal distributed random variable. There are mainly two common approaches to estimate the entries of a precision matrix. The first approach is a penalized likelihood estimation approach with a lasso-type penalty on entries of the precision matrix, typically referred to as the graphical lasso. This approach has been studied in several papers, see e.g Lam and Fan (2009)

[7], Rothman et al. (2008) [12], Ravikumar et al. (2011) [10] and Yuan and Lin (2007) [16]. The second approach, first introduced by Meinshausen and Bühlmann (2006) [9], is neighborhood based. It estimates the conditional independence restrictions separately for each node in the graph and is hence equivalent to variable selection for Gaussian linear models. The idea of estimating the precision matrix column by column by running a regression for each variable against the rest of variables was further studied in Yuan (2010) [15], Cai, Liu and Zhou (2011) [4] and Sun and Zhang (2013) [13].

In this paper, we do not aim to estimate the whole precision matrix but we focus on quantifying the uncertainty of recovering its support by providing a significance test for a set of potential edges. In recent years, statistical inference for the precision matrix in high-dimensional settings has been studied, e.g in Janková and van de Geer (2016) [6] and Ren et al. (2015) [11]. Both approaches lead to an estimate that is elementwise asymptotically normal and enables testing for low-dimensional parameters of the precision matrix using standard procedures such as Bonferroni-Holm correction.
In contrast to these existing results, our method explicitly allows for testing a joint hypothesis without correction for multiple testing and conducting inference for a growing number of parameters using high dimensional central limit results. In order to provide theoretical results, fitting the problem of support discovery in Gaussian graphical models into a general Z-estimation setting with a high-dimensional nuisance function is key. Inference on a (multivariate) target parameter in general Z-estimation problems in high dimensions is covered in Belloni et al. (2014) [2], Belloni et al. (2015) [1] and Chernozhukov et al. (2017) [5].
Additionally, our results rely on approximate sparsity instead of row sparsity which restricts the number of non-zero entries of each row of the precision matrix to be at most that is in many applications a questionable assumption. In this context, we establish a theorem regarding uniform estimation rates of the lasso/ post lasso estimator in high-dimensional regression models under approximate sparsity conditions.

2 Setting

Let

be a -dimensional random variable. For all with , assume that

and

where and . Define the column vector

One may show

where is the -th column of the precision matrix [6]. Hence

(2.1)

for all . Assume that we are interested in the following set of potential edges

where the number of edges may increase with sample size . In the following the dependence on is omitted to simplify the notation. In order to test whether all variables and are conditionally independent with for a , we have to estimate our target parameter

The setting above fits in the general Z-estimation problem of the form

with nuisance parameters

where and . The score functions are defined by

for , and . Without loss of generality we assume for all tuples .

Comment 2.1.

The score function is linear, meaning

with

and

for and .

It is well known that in partially linear regression models

satisfies the moment condition

(2.2)

and also the Neyman orthogonality condition

for all in an appropriate set where denotes the derivate with respect to . These properties are crucial for valid inference in high dimensional settings. We will show these properties explicitly in the proof of Theorem 1.

3 Estimation

Let , be i.i.d. random vectors.
At first we estimate the nuisance parameter

by running a lasso/ post-lasso regression of

on to compute and a lasso/ post-lasso regression of on to compute for each . The estimator of the target parameter

is defined as the solution of

(3.1)

where is the numerical tolerance and a sequence of positive constants converging to zero.

Assumptions A1-A4.
Let . The following assumptions hold uniformly in :

  1. [label=A0,ref=A0]

  2. For all with we have the following approximate sparse representations

    • It holds

      with and .

    • It holds

      with and .

  3. There exists a positive numbers and such that the following growth conditions are fulfilled:

  4. For all it holds

    and

    Additionally contains a ball of radius centered at .

  5. It holds

The condition 1 is a standard approximate sparsity condition. Condition 3 restricts the parameter spaces and ensures that the coefficients are well behaved. The last condition 4 restricts the correlation between the components of

and bounds the variances of each

from below and above. Assumptions 1-4 combined with the normal distribution of imply the conditions 1-3 from theorem 2 which enables us to estimate the nuisance parameter sufficiently fast.

Comment 3.1.

If we have exact sparsity for each with the sparsity of follows directly.
Observe that for and we have

which implies

and thereby

4 Main results

We will prove that the assumptions of Corollary from Belloni et al. (2015) [1]

hold and hence we are able to use their results to construct confidence intervals even for a growing number of hypothesis

. Define

and the corresponding estimators

for . To construct confidence intervals we will employ the Gaussian multiplier bootstrap. Define

and the process

where are independent standard normal random variables which are independent from . We define as the

-conditional quantile of

given the observations . The following theorem is the main result of our paper and establishes simultaneous confidence bands for the target parameter .

Theorem 1.

Under the assumptions 1-4

with probability

uniformly in the estimator in (3.1) obeys

(4.1)

Using theorem 1

we are able to construct standard confidence regions which are uniformly valid over a large set of variables and we can check null hypothesis of the form:

Comment 4.1.

Theorem 1 is basically an application of the gaussian approximation and multiplier bootstrap for maxima of sums of high-dimensional random vectors [chernozhukov2013gaussian]

. The central limit theorem and bootstrap in high dimension introduced by Chernozhukov, Chetverikov, Kato et al. (2017)

[chernozhukov2017central] extend these results to more general sets, more precisely sparsely convex sets. Hence our main theorem can be easily generalized to various confidence regions that contain the true target parameter with probability . Theorem 1 provides critical regions of the form

(4.2)

Alternatively, we can reject the null hypothesis if

(4.3)

Both of these regions are based on the central limit theorem for hyperrectangles in high dimensions. The confidence region (4.3) is motivated by the fact that the standard normal distribution in high dimensions is concentrated in a thin spherical shell around the sphere of radius as described by Roman Vershynin (2017) [vershynin2017high] and therefore might have smaller volume. More generally, define

for a fix , and

A test that reject the null hypothesis if

(4.4)

has level by [chernozhukov2017central], since the constructed confidence regions correspond to S-sparsely convex sets. Here, is the -conditional quantile of given the observations with

where

5 The function

The function that will be added to the -package estimates the target coefficients

corresponding the considered set of potential edges

by the proposed method described in section 3. It can be used to perform hypothesis tests with asymptotic level based on the different confidence regions described in comment 4.1. The nuisance function can be estimated by lasso, post-lasso or square root lasso. The verification of uniform convergence rates of the square root lasso estimator for functional parameters in high-dimensional settings is an interesting problem that we plan to address in future work.

5.1 cross-fitting

In general Z- estimation problems where a so called debiased or double machine learning (DML) method is used to construct confidence intervals, it is common to use cross-fitting in order to improve small sample properties. A detailed discussion of cross-fitted DML can be found in Chernozhukov et al. (2017) [5]. The following algorithm generalizes our proposed method to a -fold cross fitted version. We assume that is divisible by in order to simplify notation.

Algorithm 1.

1) Take a -fold random partition of observation indices such that the size of each fold is . Also, for each , define . 2) For each and , construct an estimator

by lasso/ post-lasso or square root lasso. 3) For each , construct an estimator as in 3.1:

with . 4) Aggregate the estimators:

5) For construct the uniform valid confidence interval

with

is the bootstrap quantile of with

where are independent standard normal random variables which are independent from and

The confidence region above corresponds to (4.2). Confidence regions corresponding to (4.3) or (4.4) can be constructed in an analogous way.

6 Simulation Study

This section provides a simulation study on the proposed method. In each example the precision matrix of the Gaussian graphical model is generated as in the -package [17]. Hence, the corresponding adjacency matrix is generated by setting the nonzero off-diagonal elements to be one and each other element to be zero. To obtain a positive definite pre-version of the precision matrix we set

Here and are chosen to control the magnitude of partial correlations. The covariance matrix is generated by inverting and scaling the variances to one. The corresponding precision matrix is given by . For some given we generate independent samples of

and evaluate whether our test statistic would reject the null hypothesis for a specific set of edges

which satisfies the null hypothesis. Finally the acceptance rate is calculated over independent simulations for a given confidence level .

6.1 Simulation settings

In our simulation study we estimate the correlation structure of four different designs that are described in the following.

6.1.1 Example 1: Random Graph

Each pair of off-diagonal elements of the covariance matrix of the first regressors is randomly set to non-zero with probability . The last regressor is added as an independent random variable. It results in about edges in the graph. The corresponding precision matrix is of the form

where is a sparse matrix. We test the hypothesis, whether the last regressor is independent from all other regressors, corresponding to

6.1.2 Example 2: Cluster Graph

The regressors are evenly partitioned into disjoint groups. Each pair of off-diagonal elements is set non-zero with probability , if both and belong to the same group. It results in about edges in the graph. The precision Matrix is of the form

where each block is a sparse matrix. We test the hypothesis that the first two hubs are conditionally independent. This corresponds to testing the tuples

(a) Random Graph
(b) Cluster Graph
Figure 3: Examples

6.1.3 Example 3: Approximately Sparse Random Graph

In this example we generate a random graph structure as in example , but instead of setting the other elements of the adjacency matriy

to zero we generate independent random entries from a uniform distribution on

with . This results in a precision matrix of the form

where is not a sparse matrix anymore. We then again test the hypothesis, whether the last regressor is independent from all other regressors, corresponding to

6.1.4 Example 4: Independent Graph

By setting

we generate samples of independent normal distributed random variables. We can test the hypothesis whether the regressors are independent by choosing

6.2 Simulation results

We provide simulated acceptance rates for all of the examples above calculated by the function GGMtest with bootstrap samples. Confidence Interval I corresponds to the standard case in (4.2), whereas Confidence Interval II is based on the approximation of the sphere in (4.3). In summary, the results reveal that the empirical acceptance rate is, on average, close to the nominal level of with a mean absolute deviation of over all simulations. The Confidence Interval II has got a mean absolute deviation of and performs significantly better than Confidence Interval I with a mean absolute deviation of . More complex S-sparsely convex sets seem to result in better acceptance rates, whereas higher exponents do not improve the rates. The lowest mean absolute deviation () is achieved in table 2 for , and without cross-fitting.

Confidence Interval I Confidence Interval II
Model p d lasso post-lasso sqrt-lasso lasso post-lasso sqrt-lasso
random 20 19 0.929 0.936 0.931 0.923 0.931 0.930
50 49 0.909 0.914 0.909 0.920 0.918 0.928
100 99 0.915 0.918 0.915 0.924 0.926 0.926
cluster 20 25 0.909 0.940 0.913 0.911 0.932 0.915
40 100 0.914 0.918 0.918 0.930 0.938 0.940
60 225 0.895 0.893 0.899 0.917 0.922 0.925
approx 20 19 0.929 0.929 0.929 0.942 0.942 0.942
50 49 0.906 0.906 0.906 0.919 0.919 0.919
100 99 0.897 0.897 0.897 0.933 0.933 0.933
indepent 5 10 0.929 0.929 0.929 0.929 0.929 0.929
10 45 0.926 0.926 0.926 0.931 0.931 0.931
20 190 0.899 0.899 0.899 0.919 0.919 0.919
Table 1: Simulation results for S=1,exp=1 and 1-fold
Confidence Interval I Confidence Interval II
Model p d lasso post-lasso sqrt-lasso lasso post-lasso sqrt-lasso
random 20 19 0.956 0.935 0.951 0.956 0.935 0.950
50 49 0.911 0.927 0.922 0.904 0.939 0.925
100 99 0.924 0.912 0.921 0.922 0.920 0.922
cluster 20 25 0.965 0.954 0.966 0.938 0.942 0.941
40 100 0.935 0.941 0.947 0.932 0.939 0.944
60 225 0.929 0.930 0.935 0.937 0.941 0.946
approx 20 19 0.964 0.964 0.964 0.954 0.954 0.954
50 49 0.946 0.945 0.946 0.945 0.945 0.945
100 99 0.909 0.909 0.909 0.937 0.937 0.937
indepent 5 10 0.947 0.947 0.947 0.957 0.957 0.957
10 45 0.932 0.932 0.932 0.934 0.934 0.934
20 190 0.934 0.934 0.934 0.941 0.941 0.941
Table 2: Simulation results for S=5,exp=1 and 1-fold
Confidence Interval I Confidence Interval II
Model p d lasso post-lasso sqrt-lasso lasso post-lasso sqrt-lasso
random 20 19 0.919 0.916 0.918 0.914 0.919 0.926
50 49 0.901 0.916 0.907 0.908 0.923 0.928
100 99 0.920 0.912 0.916 0.920 0.920 0.926
cluster 20 25 0.918 0.936 0.922 0.906 0.919 0.907
40 100 0.908 0.932 0.923 0.916 0.933 0.936
60 225 0.895 0.896 0.897 0.909 0.928 0.925
approx 20 19 0.925 0.925 0.925 0.939 0.936 0.939
50 49 0.903 0.904 0.906 0.926 0.919 0.922
100 99 0.907 0.901 0.909 0.925 0.924 0.923
indepent 5 10 0.930 0.930 0.930 0.931 0.931 0.931
10 45 0.923 0.920 0.922 0.946 0.945 0.946
20 190 0.896 0.893 0.894 0.922 0.920 0.919
Table 3: Simulation results for S=5,exp=2 and 1-fold
Confidence Interval I Confidence Interval II
Model p d lasso post-lasso sqrt-lasso lasso post-lasso sqrt-lasso
random 20 19 0.927 0.917 0.923 0.945 0.939 0.937
50 49 0.912 0.905 0.918 0.935 0.928 0.930
100 99 0.902 0.901 0.903 0.918 0.930 0.925
cluster 20 25 0.921 0.921 0.926 0.927 0.932 0.927
40 100 0.904 0.899 0.907 0.926 0.923 0.918
60 225 0.888 0.884 0.885 0.932 0.922 0.928
approx 20 19 0.915 0.918 0.917 0.937 0.930 0.932
50 49 0.891 0.898 0.897 0.931 0.931 0.932
100 99 0.894 0.894 0.892 0.931 0.933 0.932
indepent 5 10 0.920 0.919 0.916 0.931 0.929 0.929
10 45 0.913 0.911 0.910 0.929 0.930 0.931
20 190 0.891 0.896 0.896 0.922 0.918 0.917
Table 4: Simulation results for S=1,exp=1 and 3-fold
Confidence Interval I Confidence Interval II
Model p d lasso post-lasso sqrt-lasso lasso post-lasso sqrt-lasso
random 20 19 0.946 0.919 0.940 0.945 0.931 0.943
50 49 0.922 0.931 0.938 0.917 0.938 0.935
100 99 0.929 0.918 0.920 0.934 0.943 0.934
cluster 20 25 0.951 0.950 0.962 0.927 0.936 0.939
40 100 0.924 0.920 0.936 0.925 0.929 0.943
60 225 0.919 0.917 0.927 0.939 0.937 0.940
approx 20 19 0.955 0.955 0.951 0.960 0.960 0.958
50 49 0.917 0.917 0.923 0.932 0.932 0.930
100 99 0.932 0.932 0.934 0.945 0.945 0.942
indepent 5 10 0.935 0.935 0.936 0.950 0.950 0.951
10 45 0.926 0.926 0.926 0.929 0.929 0.928
20 190 0.925 0.925 0.926 0.939 0.939 0.940
Table 5: Simulation results for S=5,exp=1 and 3-fold
Confidence Interval I Confidence Interval II
Model p d lasso post-lasso sqrt-lasso lasso post-lasso sqrt-lasso
random 20 19 0.909 0.906 0.911 0.937 0.923 0.937
50 49 0.894 0.897 0.901 0.898 0.927 0.932
100 99 0.895 0.884 0.895 0.928 0.924 0.930
cluster 20 25 0.905 0.892 0.902 0.920 0.899 0.898
40 100 0.904 0.898 0.924 0.909 0.927 0.935
60 225 0.887 0.858 0.875 0.920 0.900 0.916
approx 20 19 0.919 0.915 0.918 0.932 0.928 0.933
50 49 0.907 0.907 0.908 0.930 0.932 0.930
100 99 0.890 0.891 0.892 0.921 0.928 0.921
indepent 5 10 0.925 0.927 0.926 0.939 0.939 0.938
10 45 0.910 0.910 0.908 0.924 0.924 0.925
20 190 0.880 0.880 0.876 0.915 0.913 0.914
Table 6: Simulation results for S=5,exp=2 and 3-fold

Appendix A Proof of Theorem 1

Proof.

We want to use corollary from Belloni et al. (2015) [1]. Consequently, we will show that their assumptions 2.1-2.4 and the growth conditions of corollary hold by modifying the proof of corollary in [1]. Let be an arbitrary set in . We have

due to the assumptions 3 and 4. Define the convex set

and endow with the norm

Further let and define the nuisance realization set

for a sufficiently large constant . First we verify Assumption 2.1 (i). The moment condition holds since

In addition, we have

with . By the same arguments as in the beginning of proof of theorem 2 we conclude that the envelope of fulfills

Since , using lemma O.2 (Maximal Inequality I) in [1], we have

by the assumption 2. Hence, assumption 3 implies that for all , contains an interval of radius centered at for all sufficiently large for any constant . Assumption 2.1 (i) follows.

For all , the map is twice continuously Gateaux-differentiable on , and so is the map . Further we have

Therefore, Assumptions 2.1 (ii) and 2.1 (iii) hold. Remark that