1 Introduction
Gaussian graphical models are an effective tool for inferring the dependence among variables of interest, such as the coexpression patterns among genes (e.g., Cai et al., 2012; Chen et al., 2016) and functional connectivity between brain regions (e.g., Zhang et al., 2019), because precision matrices for multivariate Gaussian variables can be interpreted as partial correlations (Peng et al., 2009). What has been overlooked is that graphical structures may depend on external covariates, which can be high dimensional. For example, genetic variants, clinical and environmental factors, may affect both the expression levels of individual genes and the coexpression relationships among genes (Wang et al., 2012, 2013; Brynedal et al., 2017). Genetic variants that alter coexpression relationships are referred to as coexpression quantitative trait loci (QTLs), and identifying them is of keen scientific interest (Wang et al., 2012, 2013; van der Wijst et al., 2018). Identifying coexpression QTLs among many SNPs can be formulated as a problem that relates graphical models to high dimensional external covariates, and a fundamental interest is to ascertain how the covariates modulate the individuallevel graphical structures, and recover both the population and subjectlevel graphs.
Though much progress has been made for developing graphical models, less has been done for covariatedependent graphical models. Several works (e.g., Yin and Li, 2011; Li et al., 2012; Cai et al., 2012; Chen et al., 2016) considered covariatedependent Gaussian graphical models, wherein the mean of the nodes depends on covariates, while the network structure is the same across all of the subjects. Guo et al. (2011) and Danaher et al. (2014) estimated stratified graphical models by preserving the common structure among them. Liu et al. (2010) partitioned the covariate space into several subspaces and fitted separate Gaussian graphical models for each subspace. Kolar et al. (2010) nonparametrically estimated the dependence of a covariance matrix on one continuous covariate. Cheng et al. (2014) fitted a conditional Ising model for binary data. Ni et al. (2019) proposed a DAG model that allows the graph structure to vary with a small number of covariates, and assumed a hierarchical ordering of the nodes. None of the cited works, however, can handle high dimensional covariates. Notably, Zhang and Li (2022) proposed a Gaussian graphical regression framework that regresses the precision matrix of a graphical model on high dimensional covariates and estimates the parameters using penalized nodewise regressions. However, Zhang and Li (2022) utilized separate nodewise regressions, ignoring the networkinduced common structure among these regressions. Consequently, the obtained error rate is high, especially with a large number of nodes.
This work addresses these issues by making several methodological and theoretical contributions.

We propose to adapt multitask learning to fit Gaussian graphical regression models. Specifically, we design a crosstask group sparsity penalty which enables us to borrow information across different tasks. As the number of covariates, , can be much larger than , it is reasonable to assume the active covariates (i.e., those with nonzero effects on the graph) are sparse, which can also be accommodated by the specified crosstask group sparsity penalty; see Figure 1. We combine it with an elementwise sparsity to address the possible sparse effects of active covariates and further reduce the model complexity.

To optimize the crosstask objective function with the combined sparsity penalty, we adapt an efficient augmented Lagrangian algorithm, which solves subproblems with a semismooth Newton method.

To address the main challenge that the regression tasks are entangled in a general correlation structure and that the combined sparsity penalty is not decomposable, we establish a new and sharp tail probability bound for correlated heavytailed (subexponential) variables with an arbitrary correlation structure, which is meritorious even on its own.

Finally, we prove that, compared to Zhang and Li (2022), the error rate of the simultaneously estimated precision parameters improves by a factor of , the number of response variables. The improvement is remarkable for a large , as further corroborated in simulation studies. Code is available on GitHub, and detailed proofs are given in the Supplementary Material.
2 Multitask Learning for Gaussian Graphical Regressions
Denote by
the vector of response variables (e.g., expression levels of
genes) and the vector of covariates (e.g., age, sex and genetic variants), and assume thatwhere is the conditional covariance, and is the conditional precision matrix linked to via
(1) 
where , , . We assume to be free of , and this is discussed after (2). Writing , some straightforward algebra shows that (1) relates to the following regression, termed Gaussian graphical regression (Zhang and Li, 2022):
(2) 
where , is independent of and , for all and . It is easy to see that, when for all , (2) reduces to the usual Gaussian graphical model with .
Model (2) provides a regression framework for estimating the precision parameters in (1), by including the interactions between and . Correspondingly, the partial correlation between and , conditional on all other variables, is modeled as a function of ; see an illustration in Figure 1. The diagonal elements of (i.e.,
’s) are connected to the residual variances in (
2), that is, . From this perspective, assuming to be free of may be viewed as assuming the residual variance of , after removing the effects of , and their interactions, not to depend on , which is plausible in the context of regression.Let be the vector of coefficients in (2) and denote . To expose the key ideas, we assume a known in the ensuing development, and focus on the estimation of ; extensions with unknown are possible, but with more involved notation. With independent data , let and , where denotes the Kronecker product. To estimate , we consider
(3) 
where with and are tuning parameters. The convex regularizing terms, and , encourage group and elementwise sparsity, respectively, though the group sparse penalty is not applied to (i.e., intercept terms). The combined sparsity penalty in (3) is termed the sparse group lasso penalty (Simon et al., 2013). Distinguishing from the nodewise estimation (Zhang and Li, 2022), (3) considers graphical regressions simultaneously, with the group lasso penalty regulating (the effects from the th covariate) cross tasks; see Figure 1. Thus, (3) is termed multitask learning for a Gaussian graphical regression model.
3 An Augmented Lagrangian Algorithm with a Semismooth Newton Method
By specifying a convex objective function, (3) facilitates use of existing algorithms for solving sparse group lasso problems, including proximal gradient descent (Chen et al., 2012), block coordinate descent (Simon et al., 2013; Wang et al., 2019) and alternating direction method of multipliers (ADMM) (Boyd et al., 2011). We propose to use an efficient augmented Lagrangian algorithm which uses a semismooth Newton method to solve subproblems. We opt for the semismooth Newton method because it exploits the second order sparsity structure in our setting (Zhang et al., 2020) and enjoys a linear rate of convergence.
More specifically, write , where , and . Correspondingly, (3) can be written as
and its dual problem can be written as:
where is the conjugate function of , i.e., . Define the proximal mapping of to be
Similar to ADMM, the augmented Lagrangian of the dual problem is given by
where , and the subproblem relates to iteratively minimizing
where is the updated estimate and is given at the th iteration; see the detailed steps below.
Step 1 solves the subproblem with a semismooth Newton method, utilizing a generalized Jacobian of given in Zhang et al. (2020). In general, the algorithm outperforms the firstordered ADMM algorithm in computational efficiency and estimation accuracy, because it accounts for the second order sparsity structure when solving (Li et al., 2017; Zhang et al., 2020).
Positivedefiniteness. A natural sufficient condition to ensure positive definiteness of is diagonal dominance, leading to . With Assumption 1 stipulating , positive definiteness holds when , . Assuming (if not, rescale first), this sufficient condition can be simplified to for all (note that is sparse), suggesting that, to satisfy diagonal dominance, the “effect sizes" of (i.e., ) on partial correlations cannot be too large. If the true covariance/precision matrix is positive definite, Theorem 2 implies that the estimated precision matrix is asymptotically positive definite. For finite sample cases, a posthoc rescaling procedure can be utilized for securing positive definiteness.
Tuning. The and in (3) can be jointly selected via fold (e.g., ) cross validation. To facilitate parallelization of the algorithm, we consider a hierarchical selection procedure. Rewrite and , where reflects the weight of the lasso penalty relative to the group lasso penalty (e.g., and correspond to group lasso and lasso, respectively) and reflects the total amount of regularization. We assess a set of values for ; for each , a grid of values are considered in cross validation. The search can be parallelized for different ’s.
4 Theory: Concentration Inequality and Error Rate for Multitask Learning
We derive the nonasymptotic error rate of the sparse group lasso estimator from (3). The main challenge is that the tasks in (3) are correlated, and this differs from the usual multitask learning with group sparsity (Lounici et al., 2011). To tackle this challenge, we have made a key advance in Theorem 1 that gives a new tail bound for the sum of correlated heavytailed (subexponential) variables with an arbitrary correlation structure.
Our theoretical investigation faces other challenges. First, because the design matrix in (3) includes highdimensional interactions between and , and the variance of is a function of
, characterizing the joint distribution of each row in
is difficult and requires a delicate treatment. Second, as the combined penalty term is not decomposable, the classic techniques for decomposable regularizers and null space properties (Negahban et al., 2012) are not applicable. By utilizing a novel concentration inequality for the sum of correlated variables in (3), we derive two interrelated bounds for the stochastic term, whose combination yields a sharp upper bound of the stochastic term. We, therefore, show that our proposed estimator possesses an improved error bounds compared to the lasso and the group lasso when the true coefficients are simultaneously sparse, and, more importantly, the error rate of the multitask estimates improves by a factor of , compared to the separate nodewise estimates obtained by Zhang and Li (2022).4.1 A new concentration inequality for the sum of correlated variables
The concentration results for correlated random variables are often derived under specific correlation structures, such as weak dependence
(Merlevède et al., 2011) and asymptotic independence (Ko and Tang, 2008). These structures are unlikely to be applicable to our setting because the error terms in (3), (from the tasks), depend on, for example, the expressions of genes, and are correlated via a complicated coexpression network. To bound , we employ a novel idea that partitions the index set of response variables into mutually exclusive subsets such that the variables within each subset are independent. This can be done by exploring the topology of a graph and solving a vertex coloring problem (Lewis, 2015); see the proof of Theorem 1 in the Supplementary Material. With that, we present a concentration inequality result under a general correlation structure.Theorem 1.
Consider correlated mean zero subexponential random variables , and an induced network with a node set and an edge set . Denote the maximum node degree of by and let . For any and a constant , it holds that
where is the subexponential norm defined in the Supplementary Material. The results are inclusive. For example, when are mutually independent, we have , and the inequality reduces to the usual form for independent subexponential random variables; if the variables are correlated with , we obtain a tail probability in the same order as when the variables are independent. The theorem leads to a corollary that gives a sharp bound on the sum of correlated chisquared random variables, which is critical for our proof.
Corollary 1.
Consider correlated variables , and an induced network defined as in Theorem 1. Denote the maximum node degree of by and let . For any , it holds that
4.2 Error rate analysis
Let be the elementwise support set of and be the groupwise support set of , and denote by and , i.e., and are the numbers of nonzero entries and nonzero groups, respectively. Without loss of generality, we assume . We state the needed regularity conditions.
Assumption 1.
Suppose are i.i.d. mean zero random vectors with a covariance matrix satisfying for some constant . Moreover, there exists a constant such that for all and .
Assumption 2.
Suppose for some constants .
Assumption 3.
The dimensions and sparsity satisfy and for . The maximum column norm of is bounded above by a positive constant .
Assumption 1 stipulates that the covariates are elementwise bounded, which is needed in characterizing the joint distribution of each row in . This condition is not restrictive as genetic variants are often coded to be or (Chen et al., 2016). Assumptions 1 and 2
impose bounded eigenvalues on
and as commonly done in the highdimensional regression literature (Chen et al., 2016; Hao et al., 2018; Cai et al., 2019). Assumption 3 is a sparsity condition, allowing both and to grow at a polynomial order of . Moreover, the number of nonzero entries can grow with . This condition and are useful when establishing a restricted eigenvalue condition (Bickel et al., 2009) for and when bounding the stochastic term . The condition on the column norm of implies that, for example, each gene is only connected to a finite number of genes.Theorem 2.
The separate nodewise regressions considered in Zhang and Li (2022) yield an error rate of , which is slower than that in (5) by a factor if ; as often far exceeds , the improvement with multitask learning is considerable.
Theorem 2 also shows that our proposed estimator enjoys an improved error bound over estimators with only a lasso or a group lasso penalty on . Specifically, given that the dimension of is and , applying the regular lasso regularizer alone would yield an error bound of (Negahban et al., 2012), which is slower than that in (5) when and , corresponding to group sparsity. Moreover, when , estimating with the group lasso regularizer alone, which excludes , is not feasible, because the dimension of the latter (i.e., ) exceeds . If we utilize a group lasso regularizer that includes , the estimator would have an error bound of (Lounici et al., 2011), which is slower than that in (5) when and , corresponding to withingroup sparsity. In Theorem 2, the condition upper bounds the size of candidate models, which helps to bound the stochastic errors from tasks.
5 Numerical Experiments
We compare the finite sample performance of our proposed method, defined in (3) (referred to as “MtRegGMM"), with those of three competing solutions, namely, a benchmarking i.i.d Gaussian graphical model estimated by the neighborhood selection method (Meinshausen and Bühlmann, 2006) (“IID"), a lasso estimator (Joint), and the separate regressions considered in Zhang and Li (2022) (“RegGMM").
We simulate samples from (1), with (e.g., genes) and external covariate (e.g., SNPs). The ’s are generated i.i.d. from Bernoulli. We randomly select 3 covariates to have nonzero effects, and in the graphs for these covariates and the populationlevel graph, five edges are randomly selected to be nonzero. We set for . The initial nonzero coefficients are set to 0.3. For each , we rescale by dividing each entry by . After rescaling, for each and , we use the average of and to fill the entries at and . This process results in symmetry with diagonal dominance and, thus, ensures positive definiteness of the precision matrices. For each simulation configuration, we generate 50 independent data sets; given , we determine and , and generate the th sample from , . For a fair comparison, tuning parameters in all of the methods are selected via 5fold cross validation.
,  Method  TPR  FPR  Error of  Error of  

100 

MtRegGMM  0.985 (0.013)  0.0001 (0.0000)  0.534 (0.032)  0.405 (0.046)  
Joint  0.942 (0.015)  0.0003 (0.0000)  0.762 (0.033)  0.712 (0.067)  
RegGMM  0.922 (0.020)  0.0013 (0.0001)  1.146 (0.045)  1.984 (0.153)  
IID        1.764 (0.034)  

MtRegGMM  0.983 (0.021)  0.0001 (0.0000)  0.549 (0.037)  0.436 (0.060)  
Joint  0.918 (0.030)  0.0002 (0.0000)  0.848 (0.040)  0.912 (0.088)  
RegGMM  0.923 (0.016)  0.0009 (0.0001)  1.207 (0.036)  2.223 (0.199)  
IID        1.777 (0.026)  

MtRegGMM  0.964 (0.013)  0.0000 (0.0000)  0.612 (0.034)  0.489 (0.055)  
Joint  0.896 (0.020)  0.0000 (0.0000)  0.846 (0.038)  0.786 (0.069)  
RegGMM  0.918 (0.020)  0.0003 (0.0000)  2.620 (0.060)  5.845 (0.329)  
IID        5.119 (0.890) 
To evaluate the estimation accuracy, we report the estimation errors , where ’s, with a slight overuse of notation, denote the estimates of ’s obtained by various methods. For the selection accuracy, we report the true positive rate (TPR) and false positive rate (FPR). Also reported is the average estimation error of the precision matrix defined to be , where is the true subjectspecific precision matrix and is estimated by a given method.
The proposed MtRegGMM outperforms the competing methods in estimation and selection accuracy for various and . The estimation errors of MtRegGMM increases with and , confirming the results of Theorem 2. In the left plot of Figure 2, we compare the estimation errors from RegGMM and MtRegGMM as increases. The error from RegGMM increases roughly linearly with while that from MtRegGMM remains relatively stable as increases, again confirming the results of Theorem 2.
Finally, we assess the computation cost including tuning. The middle and right panels of Figure 2 show the computation time of MtRegGMM for a given . The simulations were run on an iMac with a 3.6 GHz Intel Core i9 processor. Because the number of parameters is , the total computing cost is expected to be in the order of , as seen in Figure 2.
6 Coexpression QTL Analysis
We analyze the REMBRANDT trial (GSE108476) with a subcohort of glioblastoma multiforme (GBM) patients, who had undergone microarray and single nucleotide polymorphism (SNP) chip profiling, with both gene expression and SNP data available for analysis. The raw data were preprocessed and normalized using standard pipelines (Gusev et al., 2018).
The response variables are the expression levels of genes belonging to the human glioma pathway in the Kyoto Encyclopedia of Genes and Genomes (KEGG) database (Kanehisa and Goto, 2000). The covariates include local SNPs (i.e., SNPs that fall within 2kb upstream and 0.5kb downstream of the gene) residing near these 73 genes, resulting in a total of 118 SNPs. SNPs are coded with “0" indicating homozygous in the major allele and “1" otherwise. Age and gender are also included in analysis. Consequently, there are covariates, bringing a total of parameters (including intercepts). We construct the populationlevel gene coexpression network, and examine if and how age, gender and SNPs modulate the network.
We have used the proposed method to construct the population level network as shown in Figure 3. Most of the connected genes in Figure 3 are known to be oncogenes. For example, PIK3CA is a protein coding gene and is one of the most highly mutated oncogenes identified in human cancers (Samuels and Velculescu, 2004). The PIK3CA gene is a part of the PI3K/AKT/MTOR signaling pathway, which is one of the core pathways in human GBM and other types of cancer (Network et al., 2008). In Figure 3, we can identify several core pathways in human GBM including the PI3K/ AKT/MTOR, RasRafMEKERK and calcium signaling pathways.
We next examine the covariate effects on the network. Identified by our method are nine coexpression QTLs, namely, rs6701524, rs10519201, rs1347069, rs9303511, rs503314, rs7286558, rs759950, rs306098, rs25919. The network effects of rs6701524 are shown in Figure 4 (the left panel). This SNP, residing in MTOR, is found to affect coexpressions of genes in the PI3K/ AKT/MTOR pathway. This is an interesting finding as PI3K/MTOR is a key pathway in GBM development and progression, and inhibition of PI3K/MTOR signaling was found effective in increasing survival with GBM tumor (Batsios et al., 2019). This coexpression QTL can potentially play an important role in activating the PI3K/MTOR pathway. Shown in Figure 4 (the right panel) are the network effects of rs1347069, a variant of MAP2K1. The figure indicates that this SNP mostly affects the coexpressions of genes in the RasRafMEKERK pathway.
Moreover, we have found some novel coexpression QTLs. For example, rs10519201 regulates the coexpressions of PDGFB and GADD45A; rs9303511 is associated with the coexpressions of GADD45A and CALML4; rs503314 influences the coexpressions of CCND1 and PLCG2, CALML3; rs7286558 may modify the coexpressions of EGFR and PRKCA; rs759950 modulates the coexpressions of MAP2K1 and HRAS; rs306098 influences the coexpressions of PIK3CD and CAMK2D; rs25919 may alter the coexpressions of CAMK2A and GADD45G. Coexpression QTL identification has recently sparked much interest, and these findings warrant indepth investigations.
7 Conclusion
We have proposed a computationally efficient multitask learning estimator for Gaussian graphical regression models, and proved that the error rate of our estimator improves upon that of the separate nodewise lasso estimator by a factor of (i.e., the dimension of a graph). The remarkable improvement is possible because multitask learning borrows information across tasks. Moreover, unlike the usual multitask problems, our tasks are entangled in a complicated correlation structure, defying existing theoretical results; we address it by establishing a new tail probability bound for correlated heavytailed variables with an arbitrary correlation structure. Simulations and a gene network application have suggested feasibility and utility of the proposal in high dimensional settings, where both the response variables and covariates exceed the sample size. Future work includes investigations of the optimal error rates and the conditions needed to ensure variable selection consistency as well as extensions to accommodate more general types of responses, such as binary data.
References
 Batsios et al. (2019) Batsios, G., Viswanath, P., Subramani, E., Najac, C., Gillespie, A. M., Santos, R. D., Molloy, A. R., Pieper, R. O., and Ronen, S. M. (2019), “PI3K/mTOR inhibition of IDH1 mutant glioma leads to reduced 2HG production that is associated with increased survival,” Scientific Reports, 9, 1–15.
 Bickel et al. (2009) Bickel, P. J., Ritov, Y., and Tsybakov, A. B. (2009), “Simultaneous analysis of Lasso and Dantzig selector,” The Annals of Statistics, 37, 1705–1732.

Boyd et al. (2011)
Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J., et al. (2011),
“Distributed optimization and statistical learning via the
alternating direction method of multipliers,”
Foundations and Trends® in Machine learning
, 3, 1–122.  Brynedal et al. (2017) Brynedal, B., Choi, J., Raj, T., Bjornson, R., Stranger, B. E., Neale, B. M., Voight, B. F., and Cotsapas, C. (2017), “Largescale transeQTLs affect hundreds of transcripts and mediate patterns of transcriptional coregulation,” The American Journal of Human Genetics, 100, 581–591.
 Cai et al. (2012) Cai, T. T., Li, H., Liu, W., and Xie, J. (2012), “Covariateadjusted precision matrix estimation with an application in genetical genomics,” Biometrika, 100, 139–156.
 Cai et al. (2019) Cai, T. T., Zhang, A., and Zhou, Y. (2019), “Sparse Group Lasso: Optimal Sample Complexity, Convergence Rate, and Statistical Inference,” in arXiv preprint arXiv:1909.09851.
 Chen et al. (2016) Chen, M., Ren, Z., Zhao, H., and Zhou, H. (2016), “Asymptotically normal and efficient estimation of covariateadjusted Gaussian graphical model,” Journal of the American Statistical Association, 111, 394–406.
 Chen et al. (2012) Chen, X., Lin, Q., Kim, S., Carbonell, J. G., and Xing, E. P. (2012), “Smoothing proximal gradient method for general structured sparse regression,” The Annals of Applied Statistics, 6, 719–752.
 Cheng et al. (2014) Cheng, J., Levina, E., Wang, P., and Zhu, J. (2014), “A sparse Ising model with covariates,” Biometrics, 70, 943–953.
 Danaher et al. (2014) Danaher, P., Wang, P., and Witten, D. M. (2014), “The joint graphical lasso for inverse covariance estimation across multiple classes,” Journal of the Royal Statistical Society: Series B (Statistical Methodology), 76, 373–397.
 Guo et al. (2011) Guo, J., Levina, E., Michailidis, G., and Zhu, J. (2011), “Joint estimation of multiple graphical models,” Biometrika, 98, 1–15.
 Gusev et al. (2018) Gusev, Y., Bhuvaneshwar, K., Song, L., Zenklusen, J.C., Fine, H., and Madhavan, S. (2018), “The REMBRANDT study, a large collection of genomic data from brain cancer patients,” Scientific Data, 5, 180158.
 Hao et al. (2018) Hao, N., Feng, Y., and Zhang, H. H. (2018), “Model selection for highdimensional quadratic regression via regularization,” Journal of the American Statistical Association, 113, 615–625.
 Kanehisa and Goto (2000) Kanehisa, M. and Goto, S. (2000), “KEGG: kyoto encyclopedia of genes and genomes,” Nucleic Acids Research, 28, 27–30.
 Ko and Tang (2008) Ko, B. and Tang, Q. (2008), “Sums of dependent nonnegative random variables with subexponential tails,” Journal of Applied Probability, 45, 85–94.
 Kolar et al. (2010) Kolar, M., Parikh, A. P., and Xing, E. P. (2010), “On sparse nonparametric conditional covariance selection,” in Proceedings of the 27th International Conference on International Conference on Machine Learning, pp. 559–566.
 Lewis (2015) Lewis, R. (2015), A Guide to Graph Colouring, vol. 7, Springer.
 Li et al. (2012) Li, B., Chun, H., and Zhao, H. (2012), “Sparse estimation of conditional graphical models with application to gene networks,” Journal of the American Statistical Association, 107, 152–167.
 Li et al. (2017) Li, Y., Wen, Z., Yang, C., and Yuan, Y. (2017), “A semismooth Newton method for solving semidefinite programs in electronic structure calculations,” arXiv preprint arXiv:1708.08048.
 Liu et al. (2010) Liu, H., Chen, X., Wasserman, L., and Lafferty, J. D. (2010), “Graphvalued regression,” in Advances in Neural Information Processing Systems, pp. 1423–1431.
 Lounici et al. (2011) Lounici, K., Pontil, M., Van De Geer, S., and Tsybakov, A. B. (2011), “Oracle inequalities and optimal inference under group sparsity,” The Annals of Statistics, 39, 2164–2204.
 Meinshausen and Bühlmann (2006) Meinshausen, N. and Bühlmann, P. (2006), “Highdimensional graphs and variable selection with the lasso,” The Annals of Statistics, 34, 1436–1462.
 Merlevède et al. (2011) Merlevède, F., Peligrad, M., and Rio, E. (2011), “A Bernstein type inequality and moderate deviations for weakly dependent sequences,” Probability Theory and Related Fields, 151, 435–474.
 Negahban et al. (2012) Negahban, S. N., Ravikumar, P., Wainwright, M. J., and Yu, B. (2012), “A unified framework for highdimensional analysis of estimators with decomposable regularizers,” Statistical Science, 27, 538–557.
 Network et al. (2008) Network, C. G. A. R. et al. (2008), “Comprehensive genomic characterization defines human glioblastoma genes and core pathways,” Nature, 455, 1061.
 Ni et al. (2019) Ni, Y., Stingo, F. C., and Baladandayuthapani, V. (2019), “Bayesian graphical regression,” Journal of the American Statistical Association, 114, 184–197.
 Peng et al. (2009) Peng, J., Wang, P., Zhou, N., and Zhu, J. (2009), “Partial correlation estimation by joint sparse regression models,” Journal of the American Statistical Association, 104, 735–746.
 Samuels and Velculescu (2004) Samuels, Y. and Velculescu, V. E. (2004), “Oncogenic mutations of PIK3CA in human cancers,” Cell Cycle, 3, 1221–1224.
 Simon et al. (2013) Simon, N., Friedman, J., Hastie, T., and Tibshirani, R. (2013), “A sparsegroup lasso,” Journal of Computational and Graphical Statistics, 22, 231–245.
 van der Wijst et al. (2018) van der Wijst, M. G., Brugge, H., de Vries, D. H., Deelen, P., Swertz, M. A., and Franke, L. (2018), “Singlecell RNA sequencing identifies celltypespecific ciseQTLs and coexpression QTLs,” Nature Genetics, 50, 493–497.

Wang et al. (2019)
Wang, J., Zhang, Z., and Ye, J. (2019), “TwoLayer Feature Reduction for SparseGroup Lasso via Decomposition of Convex Sets.”
J. Mach. Learn. Res., 20, 163–1.  Wang et al. (2013) Wang, L., Zheng, W., Zhao, H., and Deng, M. (2013), “Statistical analysis reveals coexpression patterns of many pairs of genes in yeast are jointly regulated by interacting loci,” PLoS Genet, 9, e1003414.
 Wang et al. (2012) Wang, Y., Joseph, S. J., Liu, X., Kelley, M., and Rekaya, R. (2012), “SNPxGE2: a database for human SNP–coexpression associations,” Bioinformatics, 28, 403–410.
 Yin and Li (2011) Yin, J. and Li, H. (2011), “A sparse conditional gaussian graphical model for analysis of genetical genomics data,” The Annals of Applied Statistics, 5, 2630.
 Zhang and Li (2022) Zhang, J. and Li, Y. (2022), “HighDimensional Gaussian Graphical Regression Models with Covariates,” Journal of the American Statistical Association, published online March 14, 2022. https://doi.org/10.1080/01621459.2022.2034632.
 Zhang et al. (2019) Zhang, J., Sun, W. W., and Li, L. (2019), “MixedEffect TimeVarying Network Model and Application in Brain Connectivity Analysis,” Journal of the American Statistical Association, 1–15.
 Zhang et al. (2020) Zhang, Y., Zhang, N., Sun, D., and Toh, K.C. (2020), “An efficient Hessian based algorithm for solving largescale sparse group Lasso problems,” Mathematical Programming, 179, 223–263.