1 Introduction
We provide methodology and theory for uniform inference on high-dimensional graphical models with the number of target parameters being possible much larger than sample size. We demonstrate uniform asymptotic normality of the proposed estimator over -dimensional rectangles and construct simultaneous confidence bands on all of the target parameters. The proposed method can be applied to test simultaneously the presence of a large set of edges in the graphical model
Assuming that the covariance matrix is nonsingular, the conditional independence structure of the distribution can be conveniently represented by a graph , where is the set of nodes and the set of edges in
. Every pair of variables not contained in the edge set is conditionally independent given all remaining variables. If the vector
is normally distributed, every edge corresponds to a non-zero entry in the inverse covariance matrix (Lauritzen (1996))
[8].In the last decade, significant progress has been made on estimation of a large precision matrix in order to analyze the dependence structure of a high-dimensional normal distributed random variable. There are mainly two common approaches to estimate the entries of a precision matrix. The first approach is a penalized likelihood estimation approach with a lasso-type penalty on entries of the precision matrix, typically referred to as the graphical lasso. This approach has been studied in several papers, see e.g Lam and Fan (2009)
[7], Rothman et al. (2008) [12], Ravikumar et al. (2011) [10] and Yuan and Lin (2007) [16]. The second approach, first introduced by Meinshausen and Bühlmann (2006) [9], is neighborhood based. It estimates the conditional independence restrictions separately for each node in the graph and is hence equivalent to variable selection for Gaussian linear models. The idea of estimating the precision matrix column by column by running a regression for each variable against the rest of variables was further studied in Yuan (2010) [15], Cai, Liu and Zhou (2011) [4] and Sun and Zhang (2013) [13].In this paper, we do not aim to estimate the whole precision matrix but we focus on quantifying the uncertainty of recovering its support by providing a significance test for a set of potential edges. In recent years, statistical inference for the precision matrix in high-dimensional settings has been studied, e.g in Janková and van de Geer (2016) [6] and Ren et al. (2015) [11]. Both approaches lead to an estimate that is elementwise asymptotically normal and enables testing for low-dimensional parameters of the precision matrix using standard procedures such as Bonferroni-Holm correction.
In contrast to these existing results, our method explicitly allows for testing a joint hypothesis without correction for multiple testing and conducting inference for a growing number of parameters using high dimensional central limit results. In order to provide theoretical results, fitting the problem of support discovery in Gaussian graphical models into a general Z-estimation setting with a high-dimensional nuisance function is key. Inference on a (multivariate) target parameter in general Z-estimation problems in high dimensions is covered in Belloni et al. (2014) [2], Belloni et al. (2015) [1] and Chernozhukov et al. (2017) [5].
Additionally, our results rely on approximate sparsity instead of row sparsity which restricts the number of non-zero entries of each row of the precision matrix to be at most that is in many applications a questionable assumption. In this context, we establish a theorem regarding uniform estimation rates of the lasso/ post lasso estimator in high-dimensional regression models under approximate sparsity conditions.
2 Setting
Let
be a -dimensional random variable. For all with , assume that
and
where and . Define the column vector
One may show
where is the -th column of the precision matrix [6]. Hence
(2.1) |
for all . Assume that we are interested in the following set of potential edges
where the number of edges may increase with sample size . In the following the dependence on is omitted to simplify the notation. In order to test whether all variables and are conditionally independent with for a , we have to estimate our target parameter
The setting above fits in the general Z-estimation problem of the form
with nuisance parameters
where and . The score functions are defined by
for , and . Without loss of generality we assume for all tuples .
Comment 2.1.
The score function is linear, meaning
with
and
for and .
It is well known that in partially linear regression models
satisfies the moment condition
(2.2) |
and also the Neyman orthogonality condition
for all in an appropriate set where denotes the derivate with respect to . These properties are crucial for valid inference in high dimensional settings. We will show these properties explicitly in the proof of Theorem 1.
3 Estimation
Let , be i.i.d. random vectors.
At first we estimate the nuisance parameter
by running a lasso/ post-lasso regression of
on to compute and a lasso/ post-lasso regression of on to compute for each . The estimator of the target parameteris defined as the solution of
(3.1) |
where is the numerical tolerance and a sequence of positive constants converging to zero.
Assumptions A1-A4.
Let .
The following assumptions hold uniformly in :
-
[label=A0,ref=A0]
-
For all with we have the following approximate sparse representations
-
It holds
with and .
-
It holds
with and .
-
-
There exists a positive numbers and such that the following growth conditions are fulfilled:
-
For all it holds
and
Additionally contains a ball of radius centered at .
-
It holds
The condition 1 is a standard approximate sparsity condition. Condition 3 restricts the parameter spaces and ensures that the coefficients are well behaved. The last condition 4 restricts the correlation between the components of
and bounds the variances of each
from below and above. Assumptions 1-4 combined with the normal distribution of imply the conditions 1-3 from theorem 2 which enables us to estimate the nuisance parameter sufficiently fast.Comment 3.1.
If we have exact sparsity for each with the sparsity of follows directly.
Observe that for and we have
which implies
and thereby
4 Main results
We will prove that the assumptions of Corollary from Belloni et al. (2015) [1]
hold and hence we are able to use their results to construct confidence intervals even for a growing number of hypothesis
. Defineand the corresponding estimators
for . To construct confidence intervals we will employ the Gaussian multiplier bootstrap. Define
and the process
where are independent standard normal random variables which are independent from . We define as the
-conditional quantile of
given the observations . The following theorem is the main result of our paper and establishes simultaneous confidence bands for the target parameter .Theorem 1.
Using theorem 1
we are able to construct standard confidence regions which are uniformly valid over a large set of variables and we can check null hypothesis of the form:
Comment 4.1.
Theorem 1 is basically an application of the gaussian approximation and multiplier bootstrap for maxima of sums of high-dimensional random vectors [chernozhukov2013gaussian]
. The central limit theorem and bootstrap in high dimension introduced by Chernozhukov, Chetverikov, Kato et al. (2017)
[chernozhukov2017central] extend these results to more general sets, more precisely sparsely convex sets. Hence our main theorem can be easily generalized to various confidence regions that contain the true target parameter with probability . Theorem 1 provides critical regions of the form(4.2) |
Alternatively, we can reject the null hypothesis if
(4.3) |
Both of these regions are based on the central limit theorem for hyperrectangles in high dimensions. The confidence region (4.3) is motivated by the fact that the standard normal distribution in high dimensions is concentrated in a thin spherical shell around the sphere of radius as described by Roman Vershynin (2017) [vershynin2017high] and therefore might have smaller volume. More generally, define
for a fix , and
A test that reject the null hypothesis if
(4.4) |
has level by [chernozhukov2017central], since the constructed confidence regions correspond to S-sparsely convex sets. Here, is the -conditional quantile of given the observations with
where
5 The function
The function that will be added to the -package estimates the target coefficients
corresponding the considered set of potential edges
by the proposed method described in section 3. It can be used to perform hypothesis tests with asymptotic level based on the different confidence regions described in comment 4.1. The nuisance function can be estimated by lasso, post-lasso or square root lasso. The verification of uniform convergence rates of the square root lasso estimator for functional parameters in high-dimensional settings is an interesting problem that we plan to address in future work.
5.1 cross-fitting
In general Z- estimation problems where a so called debiased or double machine learning (DML) method is used to construct confidence intervals, it is common to use cross-fitting in order to improve small sample properties. A detailed discussion of cross-fitted DML can be found in Chernozhukov et al. (2017) [5]. The following algorithm generalizes our proposed method to a -fold cross fitted version. We assume that is divisible by in order to simplify notation.
Algorithm 1.
1) Take a -fold random partition of observation indices such that the size of each fold is . Also, for each , define . 2) For each and , construct an estimator
by lasso/ post-lasso or square root lasso. 3) For each , construct an estimator as in 3.1:
with . 4) Aggregate the estimators:
5) For construct the uniform valid confidence interval
with
is the bootstrap quantile of with
where are independent standard normal random variables which are independent from and
6 Simulation Study
This section provides a simulation study on the proposed method. In each example the precision matrix of the Gaussian graphical model is generated as in the -package [17]. Hence, the corresponding adjacency matrix is generated by setting the nonzero off-diagonal elements to be one and each other element to be zero. To obtain a positive definite pre-version of the precision matrix we set
Here and are chosen to control the magnitude of partial correlations. The covariance matrix is generated by inverting and scaling the variances to one. The corresponding precision matrix is given by . For some given we generate independent samples of
and evaluate whether our test statistic would reject the null hypothesis for a specific set of edges
which satisfies the null hypothesis. Finally the acceptance rate is calculated over independent simulations for a given confidence level .6.1 Simulation settings
In our simulation study we estimate the correlation structure of four different designs that are described in the following.
6.1.1 Example 1: Random Graph
Each pair of off-diagonal elements of the covariance matrix of the first regressors is randomly set to non-zero with probability . The last regressor is added as an independent random variable. It results in about edges in the graph. The corresponding precision matrix is of the form
where is a sparse matrix. We test the hypothesis, whether the last regressor is independent from all other regressors, corresponding to
6.1.2 Example 2: Cluster Graph
The regressors are evenly partitioned into disjoint groups. Each pair of off-diagonal elements is set non-zero with probability , if both and belong to the same group. It results in about edges in the graph. The precision Matrix is of the form
where each block is a sparse matrix. We test the hypothesis that the first two hubs are conditionally independent. This corresponds to testing the tuples
![]() |
![]() |
6.1.3 Example 3: Approximately Sparse Random Graph
In this example we generate a random graph structure as in example , but instead of setting the other elements of the adjacency matriy
to zero we generate independent random entries from a uniform distribution on
with . This results in a precision matrix of the formwhere is not a sparse matrix anymore. We then again test the hypothesis, whether the last regressor is independent from all other regressors, corresponding to
6.1.4 Example 4: Independent Graph
By setting
we generate samples of independent normal distributed random variables. We can test the hypothesis whether the regressors are independent by choosing
6.2 Simulation results
We provide simulated acceptance rates for all of the examples above calculated by the function GGMtest with bootstrap samples. Confidence Interval I corresponds to the standard case in (4.2), whereas Confidence Interval II is based on the approximation of the sphere in (4.3). In summary, the results reveal that the empirical acceptance rate is, on average, close to the nominal level of with a mean absolute deviation of over all simulations. The Confidence Interval II has got a mean absolute deviation of and performs significantly better than Confidence Interval I with a mean absolute deviation of . More complex S-sparsely convex sets seem to result in better acceptance rates, whereas higher exponents do not improve the rates. The lowest mean absolute deviation () is achieved in table 2 for , and without cross-fitting.
Confidence Interval I | Confidence Interval II | |||||||
---|---|---|---|---|---|---|---|---|
Model | p | d | lasso | post-lasso | sqrt-lasso | lasso | post-lasso | sqrt-lasso |
random | 20 | 19 | 0.929 | 0.936 | 0.931 | 0.923 | 0.931 | 0.930 |
50 | 49 | 0.909 | 0.914 | 0.909 | 0.920 | 0.918 | 0.928 | |
100 | 99 | 0.915 | 0.918 | 0.915 | 0.924 | 0.926 | 0.926 | |
cluster | 20 | 25 | 0.909 | 0.940 | 0.913 | 0.911 | 0.932 | 0.915 |
40 | 100 | 0.914 | 0.918 | 0.918 | 0.930 | 0.938 | 0.940 | |
60 | 225 | 0.895 | 0.893 | 0.899 | 0.917 | 0.922 | 0.925 | |
approx | 20 | 19 | 0.929 | 0.929 | 0.929 | 0.942 | 0.942 | 0.942 |
50 | 49 | 0.906 | 0.906 | 0.906 | 0.919 | 0.919 | 0.919 | |
100 | 99 | 0.897 | 0.897 | 0.897 | 0.933 | 0.933 | 0.933 | |
indepent | 5 | 10 | 0.929 | 0.929 | 0.929 | 0.929 | 0.929 | 0.929 |
10 | 45 | 0.926 | 0.926 | 0.926 | 0.931 | 0.931 | 0.931 | |
20 | 190 | 0.899 | 0.899 | 0.899 | 0.919 | 0.919 | 0.919 |
Confidence Interval I | Confidence Interval II | |||||||
---|---|---|---|---|---|---|---|---|
Model | p | d | lasso | post-lasso | sqrt-lasso | lasso | post-lasso | sqrt-lasso |
random | 20 | 19 | 0.956 | 0.935 | 0.951 | 0.956 | 0.935 | 0.950 |
50 | 49 | 0.911 | 0.927 | 0.922 | 0.904 | 0.939 | 0.925 | |
100 | 99 | 0.924 | 0.912 | 0.921 | 0.922 | 0.920 | 0.922 | |
cluster | 20 | 25 | 0.965 | 0.954 | 0.966 | 0.938 | 0.942 | 0.941 |
40 | 100 | 0.935 | 0.941 | 0.947 | 0.932 | 0.939 | 0.944 | |
60 | 225 | 0.929 | 0.930 | 0.935 | 0.937 | 0.941 | 0.946 | |
approx | 20 | 19 | 0.964 | 0.964 | 0.964 | 0.954 | 0.954 | 0.954 |
50 | 49 | 0.946 | 0.945 | 0.946 | 0.945 | 0.945 | 0.945 | |
100 | 99 | 0.909 | 0.909 | 0.909 | 0.937 | 0.937 | 0.937 | |
indepent | 5 | 10 | 0.947 | 0.947 | 0.947 | 0.957 | 0.957 | 0.957 |
10 | 45 | 0.932 | 0.932 | 0.932 | 0.934 | 0.934 | 0.934 | |
20 | 190 | 0.934 | 0.934 | 0.934 | 0.941 | 0.941 | 0.941 |
Confidence Interval I | Confidence Interval II | |||||||
---|---|---|---|---|---|---|---|---|
Model | p | d | lasso | post-lasso | sqrt-lasso | lasso | post-lasso | sqrt-lasso |
random | 20 | 19 | 0.919 | 0.916 | 0.918 | 0.914 | 0.919 | 0.926 |
50 | 49 | 0.901 | 0.916 | 0.907 | 0.908 | 0.923 | 0.928 | |
100 | 99 | 0.920 | 0.912 | 0.916 | 0.920 | 0.920 | 0.926 | |
cluster | 20 | 25 | 0.918 | 0.936 | 0.922 | 0.906 | 0.919 | 0.907 |
40 | 100 | 0.908 | 0.932 | 0.923 | 0.916 | 0.933 | 0.936 | |
60 | 225 | 0.895 | 0.896 | 0.897 | 0.909 | 0.928 | 0.925 | |
approx | 20 | 19 | 0.925 | 0.925 | 0.925 | 0.939 | 0.936 | 0.939 |
50 | 49 | 0.903 | 0.904 | 0.906 | 0.926 | 0.919 | 0.922 | |
100 | 99 | 0.907 | 0.901 | 0.909 | 0.925 | 0.924 | 0.923 | |
indepent | 5 | 10 | 0.930 | 0.930 | 0.930 | 0.931 | 0.931 | 0.931 |
10 | 45 | 0.923 | 0.920 | 0.922 | 0.946 | 0.945 | 0.946 | |
20 | 190 | 0.896 | 0.893 | 0.894 | 0.922 | 0.920 | 0.919 |
Confidence Interval I | Confidence Interval II | |||||||
---|---|---|---|---|---|---|---|---|
Model | p | d | lasso | post-lasso | sqrt-lasso | lasso | post-lasso | sqrt-lasso |
random | 20 | 19 | 0.927 | 0.917 | 0.923 | 0.945 | 0.939 | 0.937 |
50 | 49 | 0.912 | 0.905 | 0.918 | 0.935 | 0.928 | 0.930 | |
100 | 99 | 0.902 | 0.901 | 0.903 | 0.918 | 0.930 | 0.925 | |
cluster | 20 | 25 | 0.921 | 0.921 | 0.926 | 0.927 | 0.932 | 0.927 |
40 | 100 | 0.904 | 0.899 | 0.907 | 0.926 | 0.923 | 0.918 | |
60 | 225 | 0.888 | 0.884 | 0.885 | 0.932 | 0.922 | 0.928 | |
approx | 20 | 19 | 0.915 | 0.918 | 0.917 | 0.937 | 0.930 | 0.932 |
50 | 49 | 0.891 | 0.898 | 0.897 | 0.931 | 0.931 | 0.932 | |
100 | 99 | 0.894 | 0.894 | 0.892 | 0.931 | 0.933 | 0.932 | |
indepent | 5 | 10 | 0.920 | 0.919 | 0.916 | 0.931 | 0.929 | 0.929 |
10 | 45 | 0.913 | 0.911 | 0.910 | 0.929 | 0.930 | 0.931 | |
20 | 190 | 0.891 | 0.896 | 0.896 | 0.922 | 0.918 | 0.917 |
Confidence Interval I | Confidence Interval II | |||||||
---|---|---|---|---|---|---|---|---|
Model | p | d | lasso | post-lasso | sqrt-lasso | lasso | post-lasso | sqrt-lasso |
random | 20 | 19 | 0.946 | 0.919 | 0.940 | 0.945 | 0.931 | 0.943 |
50 | 49 | 0.922 | 0.931 | 0.938 | 0.917 | 0.938 | 0.935 | |
100 | 99 | 0.929 | 0.918 | 0.920 | 0.934 | 0.943 | 0.934 | |
cluster | 20 | 25 | 0.951 | 0.950 | 0.962 | 0.927 | 0.936 | 0.939 |
40 | 100 | 0.924 | 0.920 | 0.936 | 0.925 | 0.929 | 0.943 | |
60 | 225 | 0.919 | 0.917 | 0.927 | 0.939 | 0.937 | 0.940 | |
approx | 20 | 19 | 0.955 | 0.955 | 0.951 | 0.960 | 0.960 | 0.958 |
50 | 49 | 0.917 | 0.917 | 0.923 | 0.932 | 0.932 | 0.930 | |
100 | 99 | 0.932 | 0.932 | 0.934 | 0.945 | 0.945 | 0.942 | |
indepent | 5 | 10 | 0.935 | 0.935 | 0.936 | 0.950 | 0.950 | 0.951 |
10 | 45 | 0.926 | 0.926 | 0.926 | 0.929 | 0.929 | 0.928 | |
20 | 190 | 0.925 | 0.925 | 0.926 | 0.939 | 0.939 | 0.940 |
Confidence Interval I | Confidence Interval II | |||||||
---|---|---|---|---|---|---|---|---|
Model | p | d | lasso | post-lasso | sqrt-lasso | lasso | post-lasso | sqrt-lasso |
random | 20 | 19 | 0.909 | 0.906 | 0.911 | 0.937 | 0.923 | 0.937 |
50 | 49 | 0.894 | 0.897 | 0.901 | 0.898 | 0.927 | 0.932 | |
100 | 99 | 0.895 | 0.884 | 0.895 | 0.928 | 0.924 | 0.930 | |
cluster | 20 | 25 | 0.905 | 0.892 | 0.902 | 0.920 | 0.899 | 0.898 |
40 | 100 | 0.904 | 0.898 | 0.924 | 0.909 | 0.927 | 0.935 | |
60 | 225 | 0.887 | 0.858 | 0.875 | 0.920 | 0.900 | 0.916 | |
approx | 20 | 19 | 0.919 | 0.915 | 0.918 | 0.932 | 0.928 | 0.933 |
50 | 49 | 0.907 | 0.907 | 0.908 | 0.930 | 0.932 | 0.930 | |
100 | 99 | 0.890 | 0.891 | 0.892 | 0.921 | 0.928 | 0.921 | |
indepent | 5 | 10 | 0.925 | 0.927 | 0.926 | 0.939 | 0.939 | 0.938 |
10 | 45 | 0.910 | 0.910 | 0.908 | 0.924 | 0.924 | 0.925 | |
20 | 190 | 0.880 | 0.880 | 0.876 | 0.915 | 0.913 | 0.914 |
Appendix A Proof of Theorem 1
Proof.
We want to use corollary from Belloni et al. (2015) [1]. Consequently, we will show that their assumptions 2.1-2.4 and the growth conditions of corollary hold by modifying the proof of corollary in [1]. Let be an arbitrary set in . We have
due to the assumptions 3 and 4. Define the convex set
and endow with the norm
Further let and define the nuisance realization set
for a sufficiently large constant . First we verify Assumption 2.1 (i). The moment condition holds since
In addition, we have
with . By the same arguments as in the beginning of proof of theorem 2 we conclude that the envelope of fulfills
Since , using lemma O.2 (Maximal Inequality I) in [1], we have
by the assumption 2. Hence, assumption 3 implies that for all , contains an interval of radius centered at for all sufficiently large for any constant . Assumption 2.1 (i) follows.
For all , the map is twice continuously Gateaux-differentiable on , and so is the map . Further we have
Therefore, Assumptions 2.1 (ii) and 2.1 (iii) hold. Remark that