Multi-task Learning for Gaussian Graphical Regressions with High Dimensional Covariates

by   Jingfei Zhang, et al.

Gaussian graphical regression is a powerful means that regresses the precision matrix of a Gaussian graphical model on covariates, permitting the numbers of the response variables and covariates to far exceed the sample size. Model fitting is typically carried out via separate node-wise lasso regressions, ignoring the network-induced structure among these regressions. Consequently, the error rate is high, especially when the number of nodes is large. We propose a multi-task learning estimator for fitting Gaussian graphical regression models; we design a cross-task group sparsity penalty and a within task element-wise sparsity penalty, which govern the sparsity of active covariates and their effects on the graph, respectively. For computation, we consider an efficient augmented Lagrangian algorithm, which solves subproblems with a semi-smooth Newton method. For theory, we show that the error rate of the multi-task learning based estimates has much improvement over that of the separate node-wise lasso estimates, because the cross-task penalty borrows information across tasks. To address the main challenge that the tasks are entangled in a complicated correlation structure, we establish a new tail probability bound for correlated heavy-tailed (sub-exponential) variables with an arbitrary correlation structure, a useful theoretical result in its own right. Finally, the utility of our method is demonstrated through simulations as well as an application to a gene co-expression network study with brain cancer patients.


page 1

page 2

page 3

page 4


Gaussian Graphical Regression Models with High Dimensional Responses and Covariates

Though Gaussian graphical models have been widely used in many scientifi...

Multivariate Gaussian Network Structure Learning

We consider a graphical model where a multivariate normal vector is asso...

Multi-task Learning for Compositional Data via Sparse Network Lasso

A network lasso enables us to construct a model for each sample, which i...

Noise Covariance Estimation in Multi-Task High-dimensional Linear Models

This paper studies the multi-task high-dimensional linear regression mod...

ADMM Algorithm for Graphical Lasso with an ℓ_∞ Element-wise Norm Constraint

We consider the problem of Graphical lasso with an additional ℓ_∞ elemen...

Distributed Multitask Learning

We consider the problem of distributed multi-task learning, where each m...

Structured Input-Output Lasso, with Application to eQTL Mapping, and a Thresholding Algorithm for Fast Estimation

We consider the problem of learning a high-dimensional multi-task regres...

1 Introduction

Gaussian graphical models are an effective tool for inferring the dependence among variables of interest, such as the co-expression patterns among genes (e.g., Cai et al., 2012; Chen et al., 2016) and functional connectivity between brain regions (e.g., Zhang et al., 2019), because precision matrices for multivariate Gaussian variables can be interpreted as partial correlations (Peng et al., 2009). What has been overlooked is that graphical structures may depend on external covariates, which can be high dimensional. For example, genetic variants, clinical and environmental factors, may affect both the expression levels of individual genes and the co-expression relationships among genes (Wang et al., 2012, 2013; Brynedal et al., 2017). Genetic variants that alter co-expression relationships are referred to as co-expression quantitative trait loci (QTLs), and identifying them is of keen scientific interest (Wang et al., 2012, 2013; van der Wijst et al., 2018). Identifying co-expression QTLs among many SNPs can be formulated as a problem that relates graphical models to high dimensional external covariates, and a fundamental interest is to ascertain how the covariates modulate the individual-level graphical structures, and recover both the population- and subject-level graphs.

Though much progress has been made for developing graphical models, less has been done for covariate-dependent graphical models. Several works (e.g., Yin and Li, 2011; Li et al., 2012; Cai et al., 2012; Chen et al., 2016) considered covariate-dependent Gaussian graphical models, wherein the mean of the nodes depends on covariates, while the network structure is the same across all of the subjects. Guo et al. (2011) and Danaher et al. (2014) estimated stratified graphical models by preserving the common structure among them. Liu et al. (2010) partitioned the covariate space into several subspaces and fitted separate Gaussian graphical models for each subspace. Kolar et al. (2010) nonparametrically estimated the dependence of a covariance matrix on one continuous covariate. Cheng et al. (2014) fitted a conditional Ising model for binary data. Ni et al. (2019) proposed a DAG model that allows the graph structure to vary with a small number of covariates, and assumed a hierarchical ordering of the nodes. None of the cited works, however, can handle high dimensional covariates. Notably, Zhang and Li (2022) proposed a Gaussian graphical regression framework that regresses the precision matrix of a graphical model on high dimensional covariates and estimates the parameters using penalized node-wise regressions. However, Zhang and Li (2022) utilized separate node-wise regressions, ignoring the network-induced common structure among these regressions. Consequently, the obtained error rate is high, especially with a large number of nodes.

This work addresses these issues by making several methodological and theoretical contributions.

  • We propose to adapt multi-task learning to fit Gaussian graphical regression models. Specifically, we design a cross-task group sparsity penalty which enables us to borrow information across different tasks. As the number of covariates, , can be much larger than , it is reasonable to assume the active covariates (i.e., those with nonzero effects on the graph) are sparse, which can also be accommodated by the specified cross-task group sparsity penalty; see Figure 1. We combine it with an element-wise sparsity to address the possible sparse effects of active covariates and further reduce the model complexity.

  • To optimize the cross-task objective function with the combined sparsity penalty, we adapt an efficient augmented Lagrangian algorithm, which solves subproblems with a semi-smooth Newton method.

  • To address the main challenge that the regression tasks are entangled in a general correlation structure and that the combined sparsity penalty is not decomposable, we establish a new and sharp tail probability bound for correlated heavy-tailed (sub-exponential) variables with an arbitrary correlation structure, which is meritorious even on its own.

  • Finally, we prove that, compared to Zhang and Li (2022), the error rate of the simultaneously estimated precision parameters improves by a factor of , the number of response variables. The improvement is remarkable for a large , as further corroborated in simulation studies. Code is available on GitHub, and detailed proofs are given in the Supplementary Material.

2 Multi-task Learning for Gaussian Graphical Regressions

Denote by

the vector of response variables (e.g., expression levels of

genes) and the vector of covariates (e.g., age, sex and genetic variants), and assume that

where is the conditional covariance, and is the conditional precision matrix linked to via


where , , . We assume to be free of , and this is discussed after (2). Writing , some straightforward algebra shows that (1) relates to the following regression, termed Gaussian graphical regression (Zhang and Li, 2022):


where , is independent of and , for all and . It is easy to see that, when for all , (2) reduces to the usual Gaussian graphical model with .

Figure 1: An illustration of multi-task learning in Gaussian graphical regression.

Model (2) provides a regression framework for estimating the precision parameters in (1), by including the interactions between and . Correspondingly, the partial correlation between and , conditional on all other variables, is modeled as a function of ; see an illustration in Figure 1. The diagonal elements of (i.e.,

’s) are connected to the residual variances in (

2), that is, . From this perspective, assuming to be free of may be viewed as assuming the residual variance of , after removing the effects of , and their interactions, not to depend on , which is plausible in the context of regression.

Let be the vector of coefficients in (2) and denote . To expose the key ideas, we assume a known in the ensuing development, and focus on the estimation of ; extensions with unknown are possible, but with more involved notation. With independent data , let and , where denotes the Kronecker product. To estimate , we consider


where with and are tuning parameters. The convex regularizing terms, and , encourage group- and element-wise sparsity, respectively, though the group sparse penalty is not applied to (i.e., intercept terms). The combined sparsity penalty in (3) is termed the sparse group lasso penalty (Simon et al., 2013). Distinguishing from the node-wise estimation (Zhang and Li, 2022), (3) considers graphical regressions simultaneously, with the group lasso penalty regulating (the effects from the th covariate) cross tasks; see Figure 1. Thus, (3) is termed multi-task learning for a Gaussian graphical regression model.

3 An Augmented Lagrangian Algorithm with a Semi-smooth Newton Method

By specifying a convex objective function, (3) facilitates use of existing algorithms for solving sparse group lasso problems, including proximal gradient descent (Chen et al., 2012), block coordinate descent (Simon et al., 2013; Wang et al., 2019) and alternating direction method of multipliers (ADMM) (Boyd et al., 2011). We propose to use an efficient augmented Lagrangian algorithm which uses a semi-smooth Newton method to solve subproblems. We opt for the semi-smooth Newton method because it exploits the second order sparsity structure in our setting (Zhang et al., 2020) and enjoys a linear rate of convergence.

More specifically, write , where , and . Correspondingly, (3) can be written as

and its dual problem can be written as:

where is the conjugate function of , i.e., . Define the proximal mapping of to be

Similar to ADMM, the augmented Lagrangian of the dual problem is given by

where , and the subproblem relates to iteratively minimizing

where is the updated estimate and is given at the th iteration; see the detailed steps below.

Let be given, and choose . Iterate the following steps for until convergence.
Step 1: compute and ;
Step 2: compute ;
Step 3: update so that .
Algorithm 1 An augmented Lagrangian method for solving (3)

Step 1 solves the subproblem with a semi-smooth Newton method, utilizing a generalized Jacobian of given in Zhang et al. (2020). In general, the algorithm outperforms the first-ordered ADMM algorithm in computational efficiency and estimation accuracy, because it accounts for the second order sparsity structure when solving (Li et al., 2017; Zhang et al., 2020).

Positive-definiteness. A natural sufficient condition to ensure positive definiteness of is diagonal dominance, leading to . With Assumption 1 stipulating , positive definiteness holds when , . Assuming (if not, rescale first), this sufficient condition can be simplified to for all (note that is sparse), suggesting that, to satisfy diagonal dominance, the “effect sizes" of (i.e., ) on partial correlations cannot be too large. If the true covariance/precision matrix is positive definite, Theorem 2 implies that the estimated precision matrix is asymptotically positive definite. For finite sample cases, a post-hoc rescaling procedure can be utilized for securing positive definiteness.

Tuning. The and in (3) can be jointly selected via -fold (e.g., ) cross validation. To facilitate parallelization of the algorithm, we consider a hierarchical selection procedure. Rewrite and , where reflects the weight of the lasso penalty relative to the group lasso penalty (e.g., and correspond to group lasso and lasso, respectively) and reflects the total amount of regularization. We assess a set of values for ; for each , a grid of values are considered in cross validation. The search can be parallelized for different ’s.

4 Theory: Concentration Inequality and Error Rate for Multi-task Learning

We derive the non-asymptotic error rate of the sparse group lasso estimator from (3). The main challenge is that the tasks in (3) are correlated, and this differs from the usual multi-task learning with group sparsity (Lounici et al., 2011). To tackle this challenge, we have made a key advance in Theorem 1 that gives a new tail bound for the sum of correlated heavy-tailed (sub-exponential) variables with an arbitrary correlation structure.

Our theoretical investigation faces other challenges. First, because the design matrix in (3) includes high-dimensional interactions between and , and the variance of is a function of

, characterizing the joint distribution of each row in

is difficult and requires a delicate treatment. Second, as the combined penalty term is not decomposable, the classic techniques for decomposable regularizers and null space properties (Negahban et al., 2012) are not applicable. By utilizing a novel concentration inequality for the sum of correlated variables in (3), we derive two interrelated bounds for the stochastic term, whose combination yields a sharp upper bound of the stochastic term. We, therefore, show that our proposed estimator possesses an improved error bounds compared to the lasso and the group lasso when the true coefficients are simultaneously sparse, and, more importantly, the error rate of the multi-task estimates improves by a factor of , compared to the separate node-wise estimates obtained by Zhang and Li (2022).

4.1 A new concentration inequality for the sum of correlated variables

The concentration results for correlated random variables are often derived under specific correlation structures, such as weak dependence

(Merlevède et al., 2011) and asymptotic independence (Ko and Tang, 2008). These structures are unlikely to be applicable to our setting because the error terms in (3), (from the tasks), depend on, for example, the expressions of genes, and are correlated via a complicated co-expression network. To bound , we employ a novel idea that partitions the index set of response variables into mutually exclusive subsets such that the variables within each subset are independent. This can be done by exploring the topology of a graph and solving a vertex coloring problem (Lewis, 2015); see the proof of Theorem 1 in the Supplementary Material. With that, we present a concentration inequality result under a general correlation structure.

Theorem 1.

Consider correlated mean zero sub-exponential random variables , and an induced network with a node set and an edge set . Denote the maximum node degree of by and let . For any and a constant , it holds that

where is the sub-exponential norm defined in the Supplementary Material. The results are inclusive. For example, when are mutually independent, we have , and the inequality reduces to the usual form for independent sub-exponential random variables; if the variables are correlated with , we obtain a tail probability in the same order as when the variables are independent. The theorem leads to a corollary that gives a sharp bound on the sum of correlated chi-squared random variables, which is critical for our proof.

Corollary 1.

Consider correlated variables , and an induced network defined as in Theorem 1. Denote the maximum node degree of by and let . For any , it holds that

4.2 Error rate analysis

Let be the element-wise support set of and be the group-wise support set of , and denote by and , i.e., and are the numbers of nonzero entries and nonzero groups, respectively. Without loss of generality, we assume . We state the needed regularity conditions.

Assumption 1.

Suppose are i.i.d. mean zero random vectors with a covariance matrix satisfying for some constant . Moreover, there exists a constant such that for all and .

Assumption 2.

Suppose for some constants .

Assumption 3.

The dimensions and sparsity satisfy and for . The maximum column norm of is bounded above by a positive constant .

Assumption 1 stipulates that the covariates are element-wise bounded, which is needed in characterizing the joint distribution of each row in . This condition is not restrictive as genetic variants are often coded to be or (Chen et al., 2016). Assumptions 1 and 2

impose bounded eigenvalues on

and as commonly done in the high-dimensional regression literature (Chen et al., 2016; Hao et al., 2018; Cai et al., 2019). Assumption 3 is a sparsity condition, allowing both and to grow at a polynomial order of . Moreover, the number of nonzero entries can grow with . This condition and are useful when establishing a restricted eigenvalue condition (Bickel et al., 2009) for and when bounding the stochastic term . The condition on the column norm of implies that, for example, each gene is only connected to a finite number of genes.

Theorem 2.

Let denote the number of nonzero entries in a candidate model such that . Suppose Assumptions 1-3 hold, and for some constant . Then the sparse group lasso estimator in (3) with


satisfies, with probability at least ,


where , , and are positive constants.

The separate node-wise regressions considered in Zhang and Li (2022) yield an error rate of , which is slower than that in (5) by a factor if ; as often far exceeds , the improvement with multi-task learning is considerable.

Theorem 2 also shows that our proposed estimator enjoys an improved error bound over estimators with only a lasso or a group lasso penalty on . Specifically, given that the dimension of is and , applying the regular lasso regularizer alone would yield an error bound of (Negahban et al., 2012), which is slower than that in (5) when and , corresponding to group sparsity. Moreover, when , estimating with the group lasso regularizer alone, which excludes , is not feasible, because the dimension of the latter (i.e., ) exceeds . If we utilize a group lasso regularizer that includes , the estimator would have an error bound of (Lounici et al., 2011), which is slower than that in (5) when and , corresponding to within-group sparsity. In Theorem 2, the condition upper bounds the size of candidate models, which helps to bound the stochastic errors from tasks.

5 Numerical Experiments

We compare the finite sample performance of our proposed method, defined in (3) (referred to as “MtRegGMM"), with those of three competing solutions, namely, a benchmarking i.i.d Gaussian graphical model estimated by the neighborhood selection method (Meinshausen and Bühlmann, 2006) (“IID"), a lasso estimator (Joint), and the separate regressions considered in Zhang and Li (2022) (“RegGMM").

We simulate samples from (1), with (e.g., genes) and external covariate (e.g., SNPs). The ’s are generated i.i.d. from Bernoulli. We randomly select 3 covariates to have nonzero effects, and in the graphs for these covariates and the population-level graph, five edges are randomly selected to be nonzero. We set for . The initial nonzero coefficients are set to 0.3. For each , we rescale by dividing each entry by . After rescaling, for each and , we use the average of and to fill the entries at and . This process results in symmetry with diagonal dominance and, thus, ensures positive definiteness of the precision matrices. For each simulation configuration, we generate 50 independent data sets; given , we determine and , and generate the th sample from , . For a fair comparison, tuning parameters in all of the methods are selected via 5-fold cross validation.

, Method TPR FPR Error of Error of
MtRegGMM 0.985 (0.013) 0.0001 (0.0000) 0.534 (0.032) 0.405 (0.046)
Joint 0.942 (0.015) 0.0003 (0.0000) 0.762 (0.033) 0.712 (0.067)
RegGMM 0.922 (0.020) 0.0013 (0.0001) 1.146 (0.045) 1.984 (0.153)
IID - - - 1.764 (0.034)
MtRegGMM 0.983 (0.021) 0.0001 (0.0000) 0.549 (0.037) 0.436 (0.060)
Joint 0.918 (0.030) 0.0002 (0.0000) 0.848 (0.040) 0.912 (0.088)
RegGMM 0.923 (0.016) 0.0009 (0.0001) 1.207 (0.036) 2.223 (0.199)
IID - - - 1.777 (0.026)
MtRegGMM 0.964 (0.013) 0.0000 (0.0000) 0.612 (0.034) 0.489 (0.055)
Joint 0.896 (0.020) 0.0000 (0.0000) 0.846 (0.038) 0.786 (0.069)
RegGMM 0.918 (0.020) 0.0003 (0.0000) 2.620 (0.060) 5.845 (0.329)
IID - - - 5.119 (0.890)
Table 1: Estimation accuracy of and in simulations with varying sample size network size and covariate dimension .
Figure 2: Estimation errors and computing time.

To evaluate the estimation accuracy, we report the estimation errors , where ’s, with a slight overuse of notation, denote the estimates of ’s obtained by various methods. For the selection accuracy, we report the true positive rate (TPR) and false positive rate (FPR). Also reported is the average estimation error of the precision matrix defined to be , where is the true subject-specific precision matrix and is estimated by a given method.

The proposed MtRegGMM outperforms the competing methods in estimation and selection accuracy for various and . The estimation errors of MtRegGMM increases with and , confirming the results of Theorem 2. In the left plot of Figure 2, we compare the estimation errors from RegGMM and MtRegGMM as increases. The error from RegGMM increases roughly linearly with while that from MtRegGMM remains relatively stable as increases, again confirming the results of Theorem 2.

Finally, we assess the computation cost including tuning. The middle and right panels of Figure 2 show the computation time of MtRegGMM for a given . The simulations were run on an iMac with a 3.6 GHz Intel Core i9 processor. Because the number of parameters is , the total computing cost is expected to be in the order of , as seen in Figure 2.

6 Co-expression QTL Analysis

We analyze the REMBRANDT trial (GSE108476) with a subcohort of glioblastoma multiforme (GBM) patients, who had undergone microarray and single nucleotide polymorphism (SNP) chip profiling, with both gene expression and SNP data available for analysis. The raw data were pre-processed and normalized using standard pipelines (Gusev et al., 2018).

The response variables are the expression levels of genes belonging to the human glioma pathway in the Kyoto Encyclopedia of Genes and Genomes (KEGG) database (Kanehisa and Goto, 2000). The covariates include local SNPs (i.e., SNPs that fall within 2kb upstream and 0.5kb downstream of the gene) residing near these 73 genes, resulting in a total of 118 SNPs. SNPs are coded with “0" indicating homozygous in the major allele and “1" otherwise. Age and gender are also included in analysis. Consequently, there are covariates, bringing a total of parameters (including intercepts). We construct the population-level gene co-expression network, and examine if and how age, gender and SNPs modulate the network.

Figure 3: The population-level gene co-expression network, where the size of each node is proportional to the mean expression level, and edges with positive (negative) partial correlations are shown in red dashed (black solid) lines.
Figure 4: Effects of each covariate (e.g., SNPs) on edges or partial correlations, with positive (negative) effects shown in red dashed (black solid) lines.

We have used the proposed method to construct the population level network as shown in Figure 3. Most of the connected genes in Figure 3 are known to be oncogenes. For example, PIK3CA is a protein coding gene and is one of the most highly mutated oncogenes identified in human cancers (Samuels and Velculescu, 2004). The PIK3CA gene is a part of the PI3K/AKT/MTOR signaling pathway, which is one of the core pathways in human GBM and other types of cancer (Network et al., 2008). In Figure 3, we can identify several core pathways in human GBM including the PI3K/ AKT/MTOR, Ras-Raf-MEK-ERK and calcium signaling pathways.

We next examine the covariate effects on the network. Identified by our method are nine co-expression QTLs, namely, rs6701524, rs10519201, rs1347069, rs9303511, rs503314, rs7286558, rs759950, rs306098, rs25919. The network effects of rs6701524 are shown in Figure 4 (the left panel). This SNP, residing in MTOR, is found to affect co-expressions of genes in the PI3K/ AKT/MTOR pathway. This is an interesting finding as PI3K/MTOR is a key pathway in GBM development and progression, and inhibition of PI3K/MTOR signaling was found effective in increasing survival with GBM tumor (Batsios et al., 2019). This co-expression QTL can potentially play an important role in activating the PI3K/MTOR pathway. Shown in Figure 4 (the right panel) are the network effects of rs1347069, a variant of MAP2K1. The figure indicates that this SNP mostly affects the co-expressions of genes in the Ras-Raf-MEK-ERK pathway.

Moreover, we have found some novel co-expression QTLs. For example, rs10519201 regulates the co-expressions of PDGFB and GADD45A; rs9303511 is associated with the co-expressions of GADD45A and CALML4; rs503314 influences the co-expressions of CCND1 and PLCG2, CALML3; rs7286558 may modify the co-expressions of EGFR and PRKCA; rs759950 modulates the co-expressions of MAP2K1 and HRAS; rs306098 influences the co-expressions of PIK3CD and CAMK2D; rs25919 may alter the co-expressions of CAMK2A and GADD45G. Co-expression QTL identification has recently sparked much interest, and these findings warrant in-depth investigations.

7 Conclusion

We have proposed a computationally efficient multi-task learning estimator for Gaussian graphical regression models, and proved that the error rate of our estimator improves upon that of the separate node-wise lasso estimator by a factor of (i.e., the dimension of a graph). The remarkable improvement is possible because multi-task learning borrows information across tasks. Moreover, unlike the usual multi-task problems, our tasks are entangled in a complicated correlation structure, defying existing theoretical results; we address it by establishing a new tail probability bound for correlated heavy-tailed variables with an arbitrary correlation structure. Simulations and a gene network application have suggested feasibility and utility of the proposal in high dimensional settings, where both the response variables and covariates exceed the sample size. Future work includes investigations of the optimal error rates and the conditions needed to ensure variable selection consistency as well as extensions to accommodate more general types of responses, such as binary data.


  • Batsios et al. (2019) Batsios, G., Viswanath, P., Subramani, E., Najac, C., Gillespie, A. M., Santos, R. D., Molloy, A. R., Pieper, R. O., and Ronen, S. M. (2019), “PI3K/mTOR inhibition of IDH1 mutant glioma leads to reduced 2HG production that is associated with increased survival,” Scientific Reports, 9, 1–15.
  • Bickel et al. (2009) Bickel, P. J., Ritov, Y., and Tsybakov, A. B. (2009), “Simultaneous analysis of Lasso and Dantzig selector,” The Annals of Statistics, 37, 1705–1732.
  • Boyd et al. (2011) Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J., et al. (2011), “Distributed optimization and statistical learning via the alternating direction method of multipliers,”

    Foundations and Trends® in Machine learning

    , 3, 1–122.
  • Brynedal et al. (2017) Brynedal, B., Choi, J., Raj, T., Bjornson, R., Stranger, B. E., Neale, B. M., Voight, B. F., and Cotsapas, C. (2017), “Large-scale trans-eQTLs affect hundreds of transcripts and mediate patterns of transcriptional co-regulation,” The American Journal of Human Genetics, 100, 581–591.
  • Cai et al. (2012) Cai, T. T., Li, H., Liu, W., and Xie, J. (2012), “Covariate-adjusted precision matrix estimation with an application in genetical genomics,” Biometrika, 100, 139–156.
  • Cai et al. (2019) Cai, T. T., Zhang, A., and Zhou, Y. (2019), “Sparse Group Lasso: Optimal Sample Complexity, Convergence Rate, and Statistical Inference,” in arXiv preprint arXiv:1909.09851.
  • Chen et al. (2016) Chen, M., Ren, Z., Zhao, H., and Zhou, H. (2016), “Asymptotically normal and efficient estimation of covariate-adjusted Gaussian graphical model,” Journal of the American Statistical Association, 111, 394–406.
  • Chen et al. (2012) Chen, X., Lin, Q., Kim, S., Carbonell, J. G., and Xing, E. P. (2012), “Smoothing proximal gradient method for general structured sparse regression,” The Annals of Applied Statistics, 6, 719–752.
  • Cheng et al. (2014) Cheng, J., Levina, E., Wang, P., and Zhu, J. (2014), “A sparse Ising model with covariates,” Biometrics, 70, 943–953.
  • Danaher et al. (2014) Danaher, P., Wang, P., and Witten, D. M. (2014), “The joint graphical lasso for inverse covariance estimation across multiple classes,” Journal of the Royal Statistical Society: Series B (Statistical Methodology), 76, 373–397.
  • Guo et al. (2011) Guo, J., Levina, E., Michailidis, G., and Zhu, J. (2011), “Joint estimation of multiple graphical models,” Biometrika, 98, 1–15.
  • Gusev et al. (2018) Gusev, Y., Bhuvaneshwar, K., Song, L., Zenklusen, J.-C., Fine, H., and Madhavan, S. (2018), “The REMBRANDT study, a large collection of genomic data from brain cancer patients,” Scientific Data, 5, 180158.
  • Hao et al. (2018) Hao, N., Feng, Y., and Zhang, H. H. (2018), “Model selection for high-dimensional quadratic regression via regularization,” Journal of the American Statistical Association, 113, 615–625.
  • Kanehisa and Goto (2000) Kanehisa, M. and Goto, S. (2000), “KEGG: kyoto encyclopedia of genes and genomes,” Nucleic Acids Research, 28, 27–30.
  • Ko and Tang (2008) Ko, B. and Tang, Q. (2008), “Sums of dependent nonnegative random variables with subexponential tails,” Journal of Applied Probability, 45, 85–94.
  • Kolar et al. (2010) Kolar, M., Parikh, A. P., and Xing, E. P. (2010), “On sparse nonparametric conditional covariance selection,” in Proceedings of the 27th International Conference on International Conference on Machine Learning, pp. 559–566.
  • Lewis (2015) Lewis, R. (2015), A Guide to Graph Colouring, vol. 7, Springer.
  • Li et al. (2012) Li, B., Chun, H., and Zhao, H. (2012), “Sparse estimation of conditional graphical models with application to gene networks,” Journal of the American Statistical Association, 107, 152–167.
  • Li et al. (2017) Li, Y., Wen, Z., Yang, C., and Yuan, Y. (2017), “A semi-smooth Newton method for solving semidefinite programs in electronic structure calculations,” arXiv preprint arXiv:1708.08048.
  • Liu et al. (2010) Liu, H., Chen, X., Wasserman, L., and Lafferty, J. D. (2010), “Graph-valued regression,” in Advances in Neural Information Processing Systems, pp. 1423–1431.
  • Lounici et al. (2011) Lounici, K., Pontil, M., Van De Geer, S., and Tsybakov, A. B. (2011), “Oracle inequalities and optimal inference under group sparsity,” The Annals of Statistics, 39, 2164–2204.
  • Meinshausen and Bühlmann (2006) Meinshausen, N. and Bühlmann, P. (2006), “High-dimensional graphs and variable selection with the lasso,” The Annals of Statistics, 34, 1436–1462.
  • Merlevède et al. (2011) Merlevède, F., Peligrad, M., and Rio, E. (2011), “A Bernstein type inequality and moderate deviations for weakly dependent sequences,” Probability Theory and Related Fields, 151, 435–474.
  • Negahban et al. (2012) Negahban, S. N., Ravikumar, P., Wainwright, M. J., and Yu, B. (2012), “A unified framework for high-dimensional analysis of -estimators with decomposable regularizers,” Statistical Science, 27, 538–557.
  • Network et al. (2008) Network, C. G. A. R. et al. (2008), “Comprehensive genomic characterization defines human glioblastoma genes and core pathways,” Nature, 455, 1061.
  • Ni et al. (2019) Ni, Y., Stingo, F. C., and Baladandayuthapani, V. (2019), “Bayesian graphical regression,” Journal of the American Statistical Association, 114, 184–197.
  • Peng et al. (2009) Peng, J., Wang, P., Zhou, N., and Zhu, J. (2009), “Partial correlation estimation by joint sparse regression models,” Journal of the American Statistical Association, 104, 735–746.
  • Samuels and Velculescu (2004) Samuels, Y. and Velculescu, V. E. (2004), “Oncogenic mutations of PIK3CA in human cancers,” Cell Cycle, 3, 1221–1224.
  • Simon et al. (2013) Simon, N., Friedman, J., Hastie, T., and Tibshirani, R. (2013), “A sparse-group lasso,” Journal of Computational and Graphical Statistics, 22, 231–245.
  • van der Wijst et al. (2018) van der Wijst, M. G., Brugge, H., de Vries, D. H., Deelen, P., Swertz, M. A., and Franke, L. (2018), “Single-cell RNA sequencing identifies celltype-specific cis-eQTLs and co-expression QTLs,” Nature Genetics, 50, 493–497.
  • Wang et al. (2019)

    Wang, J., Zhang, Z., and Ye, J. (2019), “Two-Layer Feature Reduction for Sparse-Group Lasso via Decomposition of Convex Sets.”

    J. Mach. Learn. Res., 20, 163–1.
  • Wang et al. (2013) Wang, L., Zheng, W., Zhao, H., and Deng, M. (2013), “Statistical analysis reveals co-expression patterns of many pairs of genes in yeast are jointly regulated by interacting loci,” PLoS Genet, 9, e1003414.
  • Wang et al. (2012) Wang, Y., Joseph, S. J., Liu, X., Kelley, M., and Rekaya, R. (2012), “SNPxGE2: a database for human SNP–coexpression associations,” Bioinformatics, 28, 403–410.
  • Yin and Li (2011) Yin, J. and Li, H. (2011), “A sparse conditional gaussian graphical model for analysis of genetical genomics data,” The Annals of Applied Statistics, 5, 2630.
  • Zhang and Li (2022) Zhang, J. and Li, Y. (2022), “High-Dimensional Gaussian Graphical Regression Models with Covariates,” Journal of the American Statistical Association, published online March 14, 2022.
  • Zhang et al. (2019) Zhang, J., Sun, W. W., and Li, L. (2019), “Mixed-Effect Time-Varying Network Model and Application in Brain Connectivity Analysis,” Journal of the American Statistical Association, 1–15.
  • Zhang et al. (2020) Zhang, Y., Zhang, N., Sun, D., and Toh, K.-C. (2020), “An efficient Hessian based algorithm for solving large-scale sparse group Lasso problems,” Mathematical Programming, 179, 223–263.