1 Introduction
As the amount of data grows in volume and variety, data integration, or the analysis of multiple sources of data simultaneously, is becoming increasingly necessary in numerous disciplines. For example, in genomics, scientists can gather data from many related, yet distinct sources including gene expression, miRNA expression, point mutations, and DNA methylation. Since all of these genomic sources interact within the same biological system, it can be advantageous to analyze them together via data integration. Ultimately, the abundance and diversity of information captured by integrated data offers an invaluable opportunity to gain a better and more holistic understanding of the phenomena at hand.
In this work, we aim to perform feature selection for a common family of integrated data sets called highdimensional multiview
data. Multiview data refers to data collected on the same set of samples, but with features from multiple sources of potentially mixed types (e.g. categorical, binary, count, proportion, continuous, and skewed continuous values). More formally, suppose we observe multiview data with
highdimensional views (or sources), , which are measured on the same samples but with features of mixed types. We seek to recover a sparse set of features from each associated with the response by considering:(1.1) 
Here, are the coefficients associated with view , is a tuning parameter which regulates the sparsity level, and is the generalized linear model (GLM) loglikelihood associated with . Note we not only consider continuous (Gaussian) responses, but also the broader class of GLMs including the Poisson (loglinear) and Bernoulli (logistic) families.
While there are many applications for multiview feature selection in genomics, imaging, national security, economics, and other fields, major difficulties, stemming from the heterogeneity of features and how to appropriately integrate such differences, have prevented the successful use of multiview feature selection in practice. To our knowledge, no one has proposed an effective practical solution to perform feature selection with multiview data. A plethora of works have studied feature selection in the highdimensional setting via the Lasso or GLM Lasso (Tibshirani, 1996; Yuan and Lin, 2007; Tibshirani et al., 2013; Zhao and Yu, 2006), and others have studied various data integration problems (Hall and Llinas, 1997; Shen, Olshen and Ladanyi, 2009; Acar, Kolda and Dunlavy, 2011). However, there is limited research at the intersection of the two fields.
The one area that touches on multiview feature selection is in the context of mixed graphical models, which estimate sparse graphs between features in multiview data
(Yang et al., 2014a, b; Lee and Hastie, 2013; Cheng et al., 2013; Haslbeck and Waldorp, 2015). Using the nodewise neighborhood estimation approach of Meinshausen and Bühlmann (2006), mixed graphical models estimate the neighborhood of each node (i.e. feature) separately via a penalized regression model (typically based on the Lasso or GLM Lasso) and combine neighborhoods using an “AND” or “OR” rule. However, though mixed graphical models perform well in idealized settings for which theoretical guarantees have been proven, we will demonstrate in Section 2 that there are severe limitations with these approaches in realistic settings with correlated, heterogeneous features, commonly found in multiview data.To facilitate more effective integrative analyses in practice, we investigate the understudied problem of highdimensional multiview feature selection, and we propose a practical solution. Our work is the first to identify and critically examine the fundamental challenges of multiview feature selection, and we leverage this deep understanding of the challenges to develop a new highdimensional multiview selection method, the Block Randomized Adaptive Iterative Lasso (BRAIL). BRAIL is a practical tool for multiview feature selection with its roots grounded in theory, and it builds upon adaptive penalties, the randomized Lasso, and stability selection (Meinshausen and Bühlmann, 2010) to overcome the issues incurred by existing methods. Our method can be used for both regression and mixed graphical selection, thus lending itself to a host of important applications.
We organize this paper as follows. In Section 2, we investigate the major challenges of multiview feature selection and highlight the literature gaps relating to these issues. We also show that the culmination of these challenges lead to poor feature recovery in existing Lassotype methods and mixed graphical models. In Section 3, we introduce our proposed method, BRAIL, which takes steps to address the challenges from Section 2. In Section 4, we showcase the strong empirical performance of BRAIL through simulations and contrast it to existing methods. In Section 5, we further demonstrate the effectiveness of BRAIL in a novel integrative genomics case study for ovarian cancer, and we provide concluding remarks in Section 6.
2 Challenges
Before introducing our proposed method, it is instructive to understand the challenges posed by feature selection in the multiview setting. These challenges have been overlooked in previous methods and thus contribute to many of their shortcomings. In this section, we focus on the challenges faced by linear models with Lassotype penalties due to their overwhelming popularity and desirable statistical properties (Tibshirani, 1996; Yuan and Lin, 2007; Meinshausen and Yu, 2009; Tibshirani et al., 2013; Zhao and Yu, 2006; Zhang and Huang, 2008). Given data and response , recall that the (GLM) Lasso solves
(2.1) 
where is a regularization parameter and is the GLM loglikelihood associated with the response. For clarity, we use the term “Lasso” to refer to the penalized model with continuous (Gaussian) responses, “GLM Lasso” to mean the penalized model with nonGaussian GLM responses (e.g. binary, Poisson), and “Lassotype” methods to mean either the Lasso or GLM Lasso with some form of penalty (e.g. a global penalty, separate penalties, adaptive penalties).
Our focus here is not on deriving new theoretical guarantees for the Lasso in multiview settings. Rather, we highlight deep practical concerns, which are rooted in theory and commonly arise in feature selection for data integration. By identifying these practical challenges, we open up numerous avenues for future theoretical research and set the stage for the construction of a new method, which overcomes the identified issues.
2.1 Motivating Example
To first illustrate the current challenges and motivate the need for a solution, we present in Figure 3 the estimated graphs from common Lassotype methods and our proposed method when applied to real ovarian cancer genomics data. Here, there are samples and features from three views: countvalued RNASeq data (), continuous miRNA data (), and proportionvalued methylation data () (refer to Section 5 for data collection and preprocessing details). As in several previous graphical models and mixed graphical models (Meinshausen and Bühlmann, 2006; Ravikumar et al., 2010; Jalali et al., 2011), we estimated the graphs using nodewise neighborhood selection. We then combined neighborhoods using the “AND” rule and applied stability selection (Meinshausen and Bühlmann, 2010; Liu, Roeder and Wasserman, 2010) with the threshold to select stable edges.
Figure 3 specifically compares three types of estimation schemas at each node: (a) GLM Lasso with one global penalty, (b) GLM Lasso with separate penalties for each view, and (c) our proposed BRAIL algorithm (introduced in Section 3). The first two methods have been proposed in several mixed graphical models (Yang et al., 2014a; Chen, Witten and Shojaie, 2014; Haslbeck and Waldorp, 2015) and satisfy strong theoretical guarantees in idealized settings. In the real data example however, the Lassotype methods are unstable (illustrated by the fewer edges), favor feature selection within one view, and select only a few edges between views. This overall instability indicates that the Lassotype methods are not robust to small perturbations of the data and raises serious concerns about the reproducibility and reliability of the results (Yu, 2013). Our proposed BRAIL algorithm, in contrast, avoids these issues and exhibits greater stability as well as balance, selecting a larger number of within and between block edges under the same thresholding value. We will later see through extensive simulations in Section 4 that the issues with existing Lassotype methods observed here are recurring problems in very general multiview scenarios.
To begin understanding why existing Lassotype methods struggle in practice, we identify and study four major challenges of feature selection for highdimensional multiview data: 1) scaling, 2) ultrahighdimensionality, 3) signal interference, and 4) domainspecific betamin. These issues stem from a combination of domain differences, signal differences, and the highdimensionality of each view, and together, these challenges can have a significant adverse effect on feature recovery for data integration. We next examine each of these challenges in detail.
2.2 Scaling
The first and most obvious challenge with integrative analyses revolves around scaling. That is, each view in a multiview data set is often measured on a different scale, and it is unclear how to most effectively integrate such differences. Many believe that normalizing all features to mean
and variance
remedies the scaling differences, but this is not always the case. Even after centering and scaling, data views remain distinct if they differ in ways beyond the first and second moments. This issue is especially problematic with binary and countvalued data blocks, two common types in multiview data, since they are defined by much higher moments. We thus highly discourage using the ordinary (GLM) Lasso with a single penalty (
2.1) on normalized multiview data.Now while one can use different regularization parameters for each view to help alleviate the scaling differences, this generates another set of issues that are complicated by the following challenges. We will revisit the scaling issue in light of these complications later in this section.
2.3 UltraHighDimensionality
In addition to the scaling issue, performing exact feature selection with the Lasso is already difficult in the ordinary highdimensional setting. For exact feature selection, the number of samples must be above a theoretical minimum known as the sample complexity. In the highly idealized scenario of an iid standard Gaussian design and a Gaussian response, Wainwright (2009) showed that the sample complexity scales at approximately , where is the number of features, and is the number of nonzero features. This idealized lower bound can be difficult to attain in many applications including genomics, where typical values of and demand patients  a large and highly expensive study. We informally refer to the regime where as “highdimensional” and as “ultrahighdimensional.” Roughly, the Lasso can never perform exact feature selection in the ultrahighdimensional regime.
For nonGaussian responses and more realistic designs such as correlated, heterogeneous views in multiview data, the sample complexity is significantly higher than the idealized Gaussian bound (Chen, Witten and Shojaie, 2014; Ravikumar et al., 2010). As an example, the Poisson GLM’s sample complexity scales at approximately (Yang et al., 2015), so if and , we require samples. This problem is further exacerbated in multiview settings since combining multiple highdimensional views for data integration almost always results in an ultrahighdimensional problem.
2.4 Signal Interference
The third challenge we identify stems from a problem with the Lasso known as shrinkage noise. Su, Bogdan and Candes (2015)
showed that with high probability, no matter how strong the effect sizes, false discoveries appear early on the Lasso path due to pseudonoise introduced by shrinkage in the high and ultrahighdimensional regimes. When the Lasso selects its first few features using large regularization parameters, the residuals still contain much of the signal associated with the selected features, and it is this extra noise which
Su, Bogdan and Candes (2015) calls shrinkage noise.In the multiview context, shrinkage noise becomes a very complex and serious issue due to the different signals across blocks. Since the Lasso naturally selects features from the block with the highest signal first, the resulting shrinkage noise will mask the weaker signals from other blocks and compromise our ability to select from these weaker blocks. We refer to this adverse consequence of shrinkage noise as signal interference.
The problem of shrinkage noise has not been widely studied beyond the iid Gaussian design in Su, Bogdan and Candes (2015), but we provide strong empirical evidence in Figure 5 that confirms the existence of shrinkage noise and signal interference in nonGaussian multiview settings. In the case of an iid Gaussian and an iid binary block, shown in Figure 5, we achieve perfect recovery in the Gaussian block when the signaltonoise ratio (SNR) in the binary block is , but as the SNR of the binary block increases, it interferes with our ability to recover the Gaussian features in the small sample scenario of . This signal interference is especially disastrous in the GLM Lasso with binary responses, where support recovery in the Gaussian block tends to . When we increase the sample size to however, there is no decline in the recovery of the Gaussian block in Figure 5(a). This agrees with the known result from Su, Bogdan and Candes (2015) that shrinkage noise occurs when the Lasso’s theoretical conditions are violated and in particular, when is not sufficiently large.
2.5 DomainSpecific Betamin Condition
Finally, analogous to how signal differences can exacerbate the Lasso’s shrinkage noise, domain differences in multiview problems can complicate the Lasso’s betamin condition, which establishes a lower bound for the minimum amount of signal (i.e. SNR) required for feature recovery (Zhao and Yu, 2006; Meinshausen and Bühlmann, 2006; Bühlmann et al., 2013).
In Figure 6, we report our ability to recover the true features for a simple simulation with iid features from four data types (Gaussian, binary, uniform, and Poisson). In each subplot, the red and blue solid lines indicate the minimum SNR needed to recover of all true features when and , respectively. We observe that the minimum SNR requirement varies based upon the domain of the features and that the Gaussian predictors can tolerate the lowest SNR. These empirical results reveal that if two blocks have the same amount of signal but are from different domains, we can only recover the features that pass the minimum signal threshold dictated by the domains. Put concretely, if we were to perform feature selection on our simulated multiview data with and SNR , we would be able to recover of the true features in the Gaussian block but only about of the true features in the binary block. This observed phenomenon agrees with previous work, which has shown that an increase in the sparsity in effectively reduces the SNR in the highdimensional setting (Wang, Wainwright and Ramchandran, 2010). Beyond this however, the betamin condition has been relatively unexplored for the GLM Lasso and domain differences and remains a ripe area for future theoretical work.
2.6 Additional Challenges
It is important to note that the four challenges above do not act independently from one another. In fact, the main source of difficulty with multiview feature selection is arguably the interactions between challenges. For instance, consider the problem of selecting features from highdimensional discrete blocks with weak signals. The ultrahighdimensionality issue can exacerbate the already existing problem of signal interference, which can then worsen scaling issues, increase minimum SNR requirements, and amplify the overall difficulty of the problem.
In conjunction with these complex interactions, the need to select an appropriate amount of regularization through model selection methods can also increase the difficulty of multiview feature selection. We will compare three common selection methods, namely, stability selection (Meinshausen and Bühlmann, 2010; Liu, Roeder and Wasserman, 2010), cross validation (Stone, 1974; Allen, 1974; Shao, 1993), and extended BIC (Chen and Chen, 2012), and discuss their additional challenges in Section 4.
We lastly note that the majority of our discussion has been focused on the Lasso. Feature selection is even more challenging for the GLM Lasso. Chen, Witten and Shojaie (2014) investigated this for mixed graphical models and concluded that the predictors associated with Gaussian responses are easier to recover than those with responses from other exponential families. Specifically, Gaussian responses require fewer samples, allow for a wider range of tuning parameters, and generally have a higher probability of success.
2.7 Challenges with Existing Methods
Having identified a host of challenges, we return to address why common Lassotype methods are not wellsuited for multiview feature selection.
To begin with its most simple form, the Lasso with a single global penalty (2.1) uses the same penalty for all views and does not alleviate the scaling issues or signal interference issues in multiview data. The consequences of these problems are evident, especially in the case of nonGaussian blocks with weak signals, in Figure 3(a), where fewer edges are selected within the proportionvalued methylation block.
By employing the Lasso with separate penalties for each data view, we can mitigate the issue of scaling. Nevertheless, model selection becomes more challenging with multiple penalties, and signal interference remains a driver of poor recovery. In fact, having a separate penalty for each view exacerbates signal interference and encourages selection from the block with the strongest signal and no selection from the blocks with weak signals. This signal interference is exemplified in Figure 3 by the extreme selection imbalance among views, with almost no selection in the miRNA block and heavy selection in the RNASeq block.
In the Adaptive Lasso (Zou, 2006), the amount of regularization associated with is typically for some constants . This adaptive penalty mitigates the scaling issue by adjusting for signal differences through , but it also encourages selection of features with higher signals and penalizes features with weaker signals. The Adaptive Lasso hence complicates signal interference by treating weaker signals as noise and results in little to no selection in the blocks with weak signals.
While the previous methods all struggle with signal interference, one simple way to reduce the signal interference between blocks is to perform separate Lassos for each data view. Since independentlyestimated blocks cannot possibly interfere with one another, this method addresses both scaling and signal interference issues. It also avoids the problem of ultrahighdimensionality. However, each view by itself usually does not contain sufficient information to explain much of the variability in the response, and we lose the advantages of data integration.
Beyond the Lassotype methods, there are selection methods with nonconvex penalties such as SCAD and MCP (Fan and Li, 2001; Zhang et al., 2010). These nonconvex penalties tend to scale better than the Lassotype penalties but are still not variable selection consistent in the ultrahighdimensional regime, especially for nonGaussian responses and highly correlated data. We investigate MCP/SCAD feature selection in Table 6 in the Appendix, but our primary focus in this paper is on the more commonly used Lassotype penalties.
3 Block Randomized Adaptive Iterative Lasso
Driven by the many challenges and the lack of effective tools, we propose a new method for multiview feature selection, the Block Randomized Adaptive Iterative Lasso (BRAIL). For the sake of notation, suppose we observe the response vector
and multiview data with views of potentially mixed types, samples, and total features. We will assume and typically for each view. Let denote the indices of the support, and let denote the columns of indexed by . We will introduce BRAIL in the context of regression and later discuss its extension to graph selection.Under the regression framework, the goal of BRAIL can be viewed as twofold: 1) to select features from each view that are associated with the response , and 2) to do so while avoiding the challenges discussed in Section 2. With this goal in mind, we briefly outline the BRAIL algorithm in Algorithm 1 and summarize the key steps taken to overcome the current challenges.
At a highlevel, BRAIL iterates across the data blocks and estimates each data block separately while holding all other blocks fixed. This iterative procedure is motivated by the advantages of performing separate Lassos  namely, that it mitigates the ultrahighdimensionality and signal interference issues. Then, within each of the individual block estimations, BRAIL first estimates the block’s support and subsequently, the coefficient values given the support. Here, BRAIL leverages ideas from adaptive weighting schemes, stability selection, and the randomized Lasso in an attempt to reduce the scaling discrepancies and domainspecific betamin issues.
We next provide the full BRAIL algorithm in Algorithm 2 and proceed to discuss each step of the BRAIL algorithm in greater detail.

Initialization:

Set .

Initialize , where for .

Reorder blocks in the data , if necessary.


Do:

Set .

For , estimate blockwise, holding () and () fixed:

Update , the estimated support for block :

Set adaptive regularization:
(3.1) where
(3.2) and is the estimated Fisher information matrix.

Perform stability selection:

Take bootstrap samples: .

Solve the randomized Lasso: For each ,
(3.3) where and .

Select features at stability level :
(3.4)



Update , the estimated nonzero coefficients for block :
(3.5) where .



Until: , where denotes the signed support of a vector.

Output: .
3.1 Initialization
In our proposed BRAIL algorithm, the coefficients are first initialized to a prespecified sparsity level (e.g. nonzero features per block) by fitting separate Lasso path regressions for each block. We then must specify the order of the blocks to iterate over. This ordering can affect the performance of BRAIL as accurate estimation of the first block improves subsequent estimations of other blocks. Since previous Lasso results guarantee a high probability of support recovery when is sufficiently large compared to , we strongly advise estimating the blocks with the smallest first, especially if . If dimensions of all the blocks are of similar sizes or much larger than , we recommend starting with Gaussian blocks, which tend to have better support recovery than nonGaussian blocks.
3.2 Estimating Support ()
After initialization, we repeatedly iterate across the data blocks and estimate the support of each block, holding the estimates of all other blocks fixed. This blockwise support estimation is performed using stability selection with the randomized Lasso (Meinshausen and Bühlmann, 2010). As given by step 1(b) in Algorithm 2, we solve the Lasso times using the bootstrap and randomized penalty terms, and we threshold the stability score (3.4) to select the most stable features. Since the randomized Lasso is known to be feature selection consistent even when the Lasso’s irrepresentable condition is violated (Meinshausen and Bühlmann, 2010), BRAIL leverages the randomized penalties and stability in order to adequately handle correlated features.
3.3 Adaptive Regularization ()
Like the usual randomized Lasso, our penalty term in (3.3) includes a random weight . However, in order to account for the scaling discrepancies, signal variability, and domain differences between blocks, we introduce a blockspecific adaptive penalty in (3.3) as well. For feature in block , we define the adaptive weight
(3.1) 
where
(3.2) 
Here, is the Fisher information matrix corresponding to the GLM of the response , and
denotes the maximum eigenvalue.
In this definition of , there are two moving parts. First, the multiplicative scheme in (3.1) encourages previously selected features to remain selected while still allowing all features to freely enter or exit the model. Secondly, accounts for the heterogeneity of multiview data and helps to mitigate the challenges of Section 2.
Though the exact form of was derived experimentally, can be interpreted as the product of three factors, each of which is rooted in solid theoretical foundations. For instance, part (c) of (3.2) is closely related to the theoretical bound on the regularization parameter needed for selection consistency of the Lasso (Zhao and Yu, 2006; Meinshausen and Bühlmann, 2006). The ratio of eigenvalues in part (a) (i.e. the domain correction term) is motivated by the theoretical conditions imposed on the Fisher information matrix for exponential family distributions (Yang et al., 2015), and part (b) of (3.2) (i.e. the signal correction term) can be viewed as the average signal in block since is derived from the theoretical sparsity level within each block (Bunea et al., 2007). While in theory this specific combination of weights should correct for domain differences and signal differences across views, we reinforce our choice of through strong empirical results in Section 4.
3.4 Coefficient Estimations
After estimating the blockwise support using the randomized Lasso with adaptive weights, we seek to estimate the coefficients of the support as accurately as possible since these values are used in future block estimations and iterations of BRAIL. We thus fit a penalized regression model with a small ridge penalty in (3.5). The small penalty introduces only a little bias to ensure that we can still estimate coefficients when the selected support is greater than .
3.5 Convergence
We finally declare convergence of BRAIL’s iterative blockbyblock estimation procedure when the estimated support remains unchanged. Our empirical analysis indicates that BRAIL has quick support convergence, and we provide one example of this fast convergence in Figure 17 in the Appendix. Using the ovarian cancer simulation (see Section 5) for three different responses (Gaussian, binary, and Poisson), we report that the average number of iterations until convergence is between 4 and 5 with the maximum number of iterations reaching 15 (over 100 runs). These ranges are similar for all designs, confirming BRAIL’s fast convergence.
3.6 BRAIL Summary
While we have introduced BRAIL under the regression framework, BRAIL can be naturally extended to estimate mixed graphical models via a penalized nodewise regression approach (Meinshausen and Bühlmann, 2006). As in the motivating example in Section 2, we can use BRAIL to estimate the neighborhood of each node separately via penalized regressions and then combine the neighborhoods using an“AND” or “OR” rule to obtain the graph.
In either the regression or graph selection setting, our BRAIL algorithm deliberately takes steps to exploit the practical advantages of existing Lassotype methods while avoiding the drawbacks described in Section 2. For instance, by performing iterative blockbyblock estimations, BRAIL inherits the advantages of performing separate Lassos and avoids the issues of ultrahighdimensionality and signal interference. We also mitigate the scaling and betamin problems by engineering adaptive penalties in BRAIL to correct for domain and signal differences between blocks. Still, selecting an appropriate amount of regularization can be very challenging in practice due to highly correlated data. BRAIL thus incorporates randomized stability selection, which is known to be feature selection consistent under stronger and more complex dependencies than can be handled by the Lasso. This boosts the support estimation of correlated features, and together, with the previous components, BRAIL effectively overcomes the many practical challenges of multiview feature selection and lends itself to a plethora of data integration applications.
4 Numerical Studies
We next reinforce the theoreticallyguided choices in our BRAIL construction and demonstrate its effectiveness through extensive simulations. In these simulations, we evaluate BRAIL against four common Lassobased parametric methods: (i) Lasso with a global penalty for all blocks, (ii) Lasso with separate penalties for each block, (iii) separate Lassos for each block, and (iv) Adaptive Lasso. For the Adaptive Lasso, we use ridge weights as they are better adapted to handle correlated features. Moreover, to avoid biases from penalty selection methods, we use oracle information to select features in the Lassobased models. That is, if is the number of true features in the simulation, we fit the full path of the Lasso and select the first features. In the case of the Lasso with separate penalties, we select the features with the largest number of true positives. We do not, however, use oracle information for BRAIL. Instead, BRAIL internally selects the number of features using stability selection with the threshold as outlined in Algorithm 2.
To systematically compare these methods, we simulate from various designs of with three blocks  namely, a Gaussian , Bernoulli , and Poisson block  and various types of GLM responses . Due to the popular use of the Gaussian, Bernoulli, and Poisson GLMs, we run simulations with responses from each of these families. For the Gaussian response, we fit the linear model , where . For the binary and Poisson responses, we use copula transformations (Nelsen, 1999) to simulate .
In addition to these response models for , we consider four simulation designs for to understand model behavior under different assumptions. The four simulations designs are: (i) iid features, (ii) independent features with nonconstant variance, (iii) correlated features with covariance structure from a Block Directed Markov Random Field, and (iv) a real data inspired simulation with features from The Cancer Genome Atlas (TCGA) ovarian cancer study. We elaborate on each of these designs below.
Note in all of the simulations, we set the number of true features in each covariate block to 10, and the magnitudes of the true features are drawn from with random sign assignment. Unless stated otherwise, we simulate samples and features. We also center and scale the design matrix before estimation.
iid Design. For each of the three covariate blocks, we simulate samples from iid features. Here, for the highdimensional design and for the lowdimensional design.
Heteroscedasticity Design.
In this design, we assume that the features are independent but have nonconstant variance. For the Gaussian block, the entries in each column are simulated from the normal distribution
, where . In the Bernoulli block, each column is simulated independently with entries drawn from with . Similarly, in the Poisson block, the meanof each column is drawn from the Gamma distribution
(using the shape/scale parameterization).Block Directed Graph Design. We next drop the independence assumption and use a Block Directed Markov Random Field (BDMRF) (Yang et al., 2014a) graph to simulate correlated features. In this case, is simulated via Gibbs sampling with the partial ordering of the underlying mixed graph given by , where is a pairwise Gaussian conditional random field (CRF), is a pairwise Ising CRF (Ravikumar et al., 2010), and is a pairwise Poisson Markov Random Field (MRF) (Yang et al., 2013, 2012). We set high correlations for the Gaussian and Poisson blocks and low correlations for the binary block and between block structure.
Ovarian Cancer Inspired Simulation Design. In an attempt to simulate data closest to real world scenarios, we take the continuousvalued miRNA data, proportionvalued methylation data, and the countvalued RNASeq data from The Cancer Genome Atlas (TCGA) ovarian cancer database (The Cancer Genome Atlas Research Network, 2011) to be our covariates. After merging and preprocessing the TCGA ovarian cancer data (refer to Section 5 for details), we arrive at samples and , , and features.
Under each of these simulation scenarios, we evaluate the performance of BRAIL and the oracle Lassotype methods by reporting the true positive rate (TPR) and false discovery proportion (FDP) for overall feature recovery and individual block recoveries. Due to the large number of features, we use FDP, defined as the number of false positives divided by total the number of recovered nonzero features, instead of the false discovery rate.
We summarize the results of our simulations with Gaussian responses in Table 1 and those with binary and Poisson responses in Table 2. Note that for the binary and Poisson responses, we show the block directed graph results here and provide the other simulation results in the Appendix. We also highlight in bold the TPR/FDP combination with the lowest TPR*(1FDP) value for overall recovery. In almost all scenarios, the results in Table 1 and Table 2 indicate that BRAIL (with no oracle information) is able to achieve a higher TPR and lower FDP than its competitive Lassotype methods with oracle information.
When oracle information is unavailable, model selection techniques can introduce additional errors and further complicate feature selection. Table 3 shows one such case and compares the block directed graph simulation performance of BRAIL against the Lassotype methods using 5fold crossvalidation, extended BIC, and stability selection to select the penalty parameters. We also include the oracle estimators for the same set of simulations to emphasize the large decrease in performance when the Lassotype methods do not have oracle information. These simulations indicate that crossvalidation tends to overselect the number of features in the model while extended BIC underselects, and stability selection performs the best but pales in comparison to oracle selection. BRAIL, in contrast, outperforms the Lassotype methods even when oracle selection is used for these competitive methods. Additional simulations, confirming the strong empirical performance of BRAIL, are provided in the Appendix.
5 Case Study: Integrative Genomics of Ovarian Cancer
One promising practical application for our research on multiview feature selection lies in integrative cancer genomics. Here, scientists seek to integrate data from multiple sources of highthroughput genomic data to more holistically model the genomic systems in cancer cells, leading to a better understanding of disease mechanisms and possible therapies.
In this case study, we seek to integrate three different types of genomic data to study how epigenetics and short RNAs influence the gene regulatory system in ovarian cancer. Specifically, we are interested in discovering miRNAs and CpG sites, which affect the gene expression of wellknown oncogenes in ovarian cancer and hence can serve as potential drug targets for blocking or decreasing the expression of these oncogenes. Driven by this goal of discovering potential drug targets, we use our proposed BRAIL method to estimate the integrative ovarian cancer gene regulatory network with the specific intention of identifying miRNAs and CpG sites that are directly linked to known oncogenes of ovarian cancer.
In this investigation, we integrate the following three data sets: (1) countvalued gene expression measured via RNASeq, (2) continuous (Gaussian) miRNA expression, and (3) proportionvalued DNA methylation data from The Cancer Genome Atlas (TCGA) ovarian cancer study, which is publicly available (The Cancer Genome Atlas Research Network, 2011). The TCGA data originally contained genes, CpG sites, and miRNAs but only common patients across all three data sets of interest. We hence reduced the number of features to manageable sizes by first filtering features according to their association with several important clinical outcomes  survival via a univariate cox model, chemoresistance via a univariate logistic model, and recurrence via a univariate logistic model. In addition, we transformed the RNASeq data using the KolmogorovSmirnov Test () to alleviate the problem of very large counts (up to 20,000). This preprocessing yielded genes, CpG sites, and miRNAs in the RNASeq, methylation, and miRNA data sets, respectively. Lastly, per the recommendation of scientists, we included 20 additional highly mutated genes that were experimentally identified as important in ovarian cancer, resulting in genes in the RNASeq data set.
To estimate the integrated ovarian cancer network, we fit a Block Directed Markov Random Field (BDMRF) model (Yang et al., 2014a) using BRAIL to estimate the neighborhood of each node in the graph. Note that since miRNAs and methylation are both gene regulatory mechanisms, miRNAs and methylation can affect expression levels (measured via RNASeq), but the converse is not possible. To agree with this known physical mechanism, we set the partial ordering of the mixed graph underlying BDMRF as , where is a pairwise Ising MRF for the proportionvalued methylation data, is a pairwise Gaussian MRF for the continuous miRNA data, and is a pairwise Poisson CRF for the countvalued RNASeq data. However, we recall that only negative conditional dependencies are permitted in the Poisson MRF and CRF models. Since this constraint is unrealistic for genomics data, we fit a SubLinear Poisson CRF, in lieu of the usual Poisson CRF, to allow for both positive and negative conditional dependencies Yang et al. (2013). Under this specified BDMRF model, we employ nodewise neighborhood selection (Meinshausen and Bühlmann, 2006; Yang et al., 2015) using BRAIL to learn the edge structure of the integrated network.
Our overall BRAILestimated network is presented in Figure 7, and in Figure 10, we more closely examine the relationships between the oncogenes, miRNAs, and CpG sites by zooming in on the subnetworks for the wellknown oncogene, BRCA1, and its direct neighbor, miRNA23b. Both BRCA1 and miRNA23b are wellknown biomarkers and have been implicated in several ovarian cancer studies (Antoniou et al., 2003; King et al., 2003; BRCA, 1994; Li et al., 2014; Yan et al., 2016; Geng et al., 2012). Moreover, miRNA23b is known to play a key role in p53 signaling (via TP53) (Boren et al., 2009), agreeing with the estimated edge between the TP53 oncogene and miRNA23b in Figure 10(b).
Aside from this link however, the estimated edges between genes, CpG sites, and miRNAs in Figure 7 are largely unexplored and unknown by researchers since BRAIL is one of the first practical approaches for multiview feature selection. Nevertheless, we can partially validate our BRAILestimated network by highlighting the many genes with verified connections in the ovarian cancer and cancer proliferation/suppression literatures. In Figure 10, we circle in red this collection of implicated genes, which includes LDOC1, SGCB, and miRNA210 (Buchholtz et al., 2014; Obermayr et al., 2010; Giannakakis et al., 2008).
As we have noted, there is substantial evidence in the scientific literature, suggesting that our proposed BRAIL algorithm successfully identified promising candidates as well as known biomarkers involved in ovarian cancer. By taking into account the relationships between genes, miRNAs, and CpG sites, our integrative analysis via BRAIL leads to valuable insights beyond a single biomarker type and to novel discoveries of direct connections between miRNAs, CpG sites, and known oncogenes, which may aid the development of targeted drug therapies for ovarian cancer. This is the first integrative analysis of its kind, and future experiments studying the connections between known ovarian cancer oncogenes and candidate miRNAs and CpG sites would be of great value to validate our findings.
6 Discussion
Though we have primarily focused on applications to integrative genomics in this work, BRAIL is not limited to this context. BRAIL can be applied to any field that yields highdimensional multiview data, and with the rapid advances in technologies, we expect BRAIL to have a growing and farreaching impact in fields such as imaging genetics, national security, climate studies, spatial statistics, Internet data, marketing, and economics. BRAIL is also a versatile tool that can be used to for any sparse regression or graph selection problem in this multiview context.
In addition to developing an effective data integration tool for multiview feature selection, our work addresses the many difficulties of performing multiview feature selection in practice. These practical challenges were severely understudied prior to this work, but we partially resolve this gap, identifying four root challenges which interact with one another to impede recovery. Throughout our investigation of these practical challenges, we provide strong empirical evidence of the existence as well as the adverse consequences of such challenges. However, the theoretical underpinnings of these issues are still unknown. Understanding exactly how challenges such as shrinkage noise and the betamin condition are influenced by varying domains and signals would be of great benefit to the field of data integration as a whole. We also highlight that while the Lasso has been wellstudied under Gaussianity and idealized assumptions, the increasing abundance of correlated nonGaussian data in multiview settings requires a greater push for theoretical studies on feature selection with heterogeneous data and the GLM Lasso.
Overall, we have demonstrated many challenges posed by multiview feature selection, and in our investigation of these challenges, we opened up new avenues for future theoretical work to rigorously understand how the heterogeneity of multiview data complicates feature selection. Driven by these challenges and the ineffectiveness of existing methods, we developed a practical solution to overcome the current challenges. Our method, BRAIL, is one of the first practical tools for multiview feature selection and is grounded in deep theoretical foundations. With its versatility and strong empirical performance, BRAIL facilitates impactful integrative analyses across a broad spectrum of fields.
Acknowledgments
Y.B. acknowledges support from the NIH/NCI T32 CA096520 training program in Biostatistics for Cancer Research, grant number 5T32CA09652011. T.T. acknowledges support from ARO, grant number W911NF1710005. GA acknowledges support from NSF DMS1554821, NSF NeuroNex1707400, and NSF DMS1264058. We also thank Zhandong Liu, YingWooi Wan, and Matthew L. Anderson at Baylor College of Medicine for thoughtful discussions related to this work.
References
 Acar, Kolda and Dunlavy (2011) [author] Acar, EvrimE., Kolda, Tamara GT. G. Dunlavy, Daniel MD. M. (2011). Allatonce optimization for coupled matrix and tensor factorizations. arXiv preprint arXiv:1105.3422.
 Allen (1974) [author] Allen, David MD. M. (1974). The relationship between variable selection and data agumentation and a method for prediction. Technometrics 16 125–127.
 Antoniou et al. (2003) [author] Antoniou, AnthonyA., Pharoah, PDPP., Narod, StevenS., Risch, Harvey AH. A., Eyfjord, Jorunn EJ. E., Hopper, JLJ., Loman, NiklasN., Olsson, HåkanH., Johannsson, OO., Borg, ÅkeÅ. et al. (2003). Average risks of breast and ovarian cancer associated with BRCA1 or BRCA2 mutations detected in case series unselected for family history: a combined analysis of 22 studies. The American Journal of Human Genetics 72 1117–1130.
 Boren et al. (2009) [author] Boren, ToddT., Xiong, YinY., Hakam, ArdeshirA., Wenham, RobertR., Apte, SachinS., Chan, GinaG., Kamath, Siddharth GS. G., Chen, DungTsaD.T., Dressman, HollyH. Lancaster, Johnathan MJ. M. (2009). MicroRNAs and their target messenger RNAs associated with ovarian cancer response to chemotherapy. Gynecologic oncology 113 249–255.
 BRCA (1994) [author] BRCA, Susceptibility GeneS. G. (1994). A strong candidate for the breast and ovarian cancer susceptibility gene BRCA1. Science 266 7.
 Buchholtz et al. (2014) [author] Buchholtz, MarieLuiseM.L., Brüning, AnsgarA., Mylonas, IoannisI. Jückstock, JuliaJ. (2014). Epigenetic silencing of the LDOC1 tumor suppressor gene in ovarian cancer cells. Archives of gynecology and obstetrics 290 149–154.
 Bühlmann et al. (2013) [author] Bühlmann, PeterP. et al. (2013). Statistical significance in highdimensional linear models. Bernoulli 19 1212–1242.
 Bunea et al. (2007) [author] Bunea, FlorentinaF., Tsybakov, AlexandreA., Wegkamp, MartenM. et al. (2007). Sparsity oracle inequalities for the Lasso. Electronic Journal of Statistics 1 169–194.
 Chen and Chen (2012) [author] Chen, JiahuaJ. Chen, ZehuaZ. (2012). Extended BIC for smallnlargeP sparse GLM. Statistica Sinica 555–574.
 Chen, Witten and Shojaie (2014) [author] Chen, ShizheS., Witten, Daniela MD. M. Shojaie, AliA. (2014). Selection and estimation for mixed graphical models. Biometrika 102 47–64.
 Cheng et al. (2013) [author] Cheng, JieJ., Li, TianxiT., Levina, ElizavetaE. Zhu, JiJ. (2013). Highdimensional mixed graphical models. arXiv preprint arXiv:1304.2810.
 Fan and Li (2001) [author] Fan, JianqingJ. Li, RunzeR. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American statistical Association 96 1348–1360.
 Freier (2016) [author] Freier, ChristophC. (2016). Role of regulatory T cells and associated chemokines in human gynecological tumors, PhD thesis, lmu.
 Gao et al. (2009) [author] Gao, JianQingJ.Q., Tsuda, YasuhiroY., Han, MinM., Xu, DongHangD.H., Kanagawa, NaokoN., Hatanaka, YutakaY., Tani, YoichiY., Mizuguchi, HiroyukiH., Tsutsumi, YasuoY., Mayumi, TadanoriT. et al. (2009). NK cells are migrated and indispensable in the antitumor activity induced by CCL27 gene therapy. Cancer immunology, immunotherapy 58 291.
 Geng et al. (2012) [author] Geng, JiongJ., Luo, HuiH., Pu, YiY., Zhou, ZhiminZ., Wu, XiaomingX., Xu, WenhuiW. Yang, ZhengxiangZ. (2012). Methylation mediated silencing of miR23b expression and its role in glioma stem cells. Neuroscience letters 528 185–189.
 Giannakakis et al. (2008) [author] Giannakakis, AntonisA., Sandaltzopoulos, RaphaelR., Greshock, JoelJ., Liang, ShunS., Huang, JiaJ., Hasegawa, KoseiK., Li, ChunshengC., O’BrienJenkins, AnnA., Katsaros, DionyssiosD., Weber, Barbara LB. L. et al. (2008). miR210 links hypoxia with cell cycle regulation and is deleted in human epithelial ovarian cancer. Cancer biology & therapy 7 255–264.
 Hall and Llinas (1997) [author] Hall, David LD. L. Llinas, JamesJ. (1997). An introduction to multisensor data fusion. Proceedings of the IEEE 85 6–23.
 Haslbeck and Waldorp (2015) [author] Haslbeck, JonasJ. Waldorp, Lourens JL. J. (2015). mgm: Structure Estimation for timevarying mixed graphical models in highdimensional data. arXiv preprint arXiv:1510.06871.

Jalali et al. (2011)
[author] Jalali, AliA., Ravikumar, PradeepP., Vasuki, VishvasV. Sanghavi, SujayS. (2011). On learning discrete graphical models using groupsparse regularization. In Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics 378–387.
 King et al. (2003) [author] King, MaryClaireM.C., Marks, Joan HJ. H., Mandell, Jessica BJ. B. et al. (2003). Breast and ovarian cancer risks due to inherited mutations in BRCA1 and BRCA2. Science 302 643–646.
 Lee and Hastie (2013) [author] Lee, JasonJ. Hastie, TrevorT. (2013). Structure learning of mixed graphical models. In Artificial Intelligence and Statistics 388–396.
 Li et al. (2014) [author] Li, WeipingW., Liu, ZhongyuZ., Chen, LiL., Zhou, LiL. Yao, YuanqingY. (2014). MicroRNA23b is an independent prognostic marker and suppresses ovarian cancer progression by targeting runtrelated transcription factor2. FEBS letters 588 1608–1615.
 Liu, Roeder and Wasserman (2010) [author] Liu, HanH., Roeder, KathrynK. Wasserman, LarryL. (2010). Stability approach to regularization selection (stars) for high dimensional graphical models. In Advances in neural information processing systems 1432–1440.
 Meinshausen and Bühlmann (2006) [author] Meinshausen, NicolaiN. Bühlmann, PeterP. (2006). Highdimensional graphs and variable selection with the lasso. The annals of statistics 1436–1462.
 Meinshausen and Bühlmann (2010) [author] Meinshausen, NicolaiN. Bühlmann, PeterP. (2010). Stability selection. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 72 417–473.

Meinshausen and
Yu (2009)
[author] Meinshausen, NicolaiN. Yu, BinB. (2009). Lassotype recovery of sparse representations for highdimensional data. The Annals of Statistics 246–270.
 Nelsen (1999) [author] Nelsen, Roger BR. B. (1999). Introduction. In An Introduction to Copulas 1–4. Springer.
 The Cancer Genome Atlas Research Network (2011) [author] The Cancer Genome Atlas Research Network (2011). Integrated genomic analyses of ovarian carcinoma. Nature 474 609.
 Obermayr et al. (2010) [author] Obermayr, EvaE., SanchezCabo, FatimaF., Tea, MuyKheng MM.K. M., Singer, Christian FC. F., Krainer, MichaelM., Fischer, Michael BM. B., Sehouli, JalidJ., Reinthaller, AlexanderA., Horvat, ReinhardR., Heinze, GeorgG. et al. (2010). Assessment of a six gene panel for the molecular detection of circulating tumor cells in the blood of female cancer patients. BMC cancer 10 666.

Ravikumar et al. (2010)
[author] Ravikumar, PradeepP., Wainwright, Martin JM. J., Lafferty, John DJ. D. et al. (2010). Highdimensional Ising model selection using ℓ1regularized logistic regression. The Annals of Statistics 38 1287–1319.
 Shao (1993) [author] Shao, JunJ. (1993). Linear model selection by crossvalidation. Journal of the American statistical Association 88 486–494.
 Shen, Olshen and Ladanyi (2009) [author] Shen, RonglaiR., Olshen, Adam BA. B. Ladanyi, MarcM. (2009). Integrative clustering of multiple genomic data types using a joint latent variable model with application to breast and lung cancer subtype analysis. Bioinformatics 25 2906–2912.
 Stone (1974) [author] Stone, MervynM. (1974). Crossvalidatory choice and assessment of statistical predictions. Journal of the Royal Statistical Society: Series B (Methodological) 36 111–133.
 Su, Bogdan and Candes (2015) [author] Su, WeijieW., Bogdan, MalgorzataM. Candes, EmmanuelE. (2015). False discoveries occur early on the lasso path. arXiv preprint arXiv:1511.01957.
 Tibshirani (1996) [author] Tibshirani, RobertR. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B (Methodological) 267–288.
 Tibshirani et al. (2013) [author] Tibshirani, Ryan JR. J. et al. (2013). The lasso problem and uniqueness. Electronic Journal of Statistics 7 1456–1490.
 Tong et al. (2017) [author] Tong, ManM., Wong, Tin LokT. L., Luk, Steve TinChiS. T.C., Che, NoéliaN., Wong, Kai YauK. Y., Fung, Tsun MingT. M., Guan, XinYuanX.Y., Lee, Nikki PN. P., Yuan, YunFeiY.F., Lee, Terence KT. K. et al. (2017). Downregulation of 4hydroxyphenylpyruvate dioxygenate (HPD) contributes to the pathogenesis of hepatocellular carcinoma (HCC) through ERK/BCL2 signalling activation.
 Toyama et al. (1999) [author] Toyama, TatsuyaT., Iwase, HirotakaH., Watson, PeterP., Muzik, HuongH., Saettler, ElizabethE., Magliocco, AnthonyA., DiFrancesco, LisaL., Forsyth, PeterP., Garkavtsev, IgorI., Kobayashi, ShunzoS. et al. (1999). Suppression of ING1 expression in sporadic breast cancer. Oncogene 18.
 Wainwright (2009) [author] Wainwright, Martin JM. J. (2009). Sharp thresholds for HighDimensional and noisy sparsity recovery using  Constrained Quadratic Programming (Lasso). IEEE transactions on information theory 55 2183–2202.
 Wang, Wainwright and Ramchandran (2010) [author] Wang, WeiW., Wainwright, Martin JM. J. Ramchandran, KannanK. (2010). Informationtheoretic limits on sparse signal recovery: Dense versus sparse measurement matrices. IEEE Transactions on Information Theory 56 2967–2979.
 Yan et al. (2016) [author] Yan, JingJ., Jiang, JingyiJ.y., Meng, XiaoNaX.N., Xiu, YinLingY.L. Zong, ZhiHongZ.H. (2016). MiR23b targets cyclin G1 and suppresses ovarian cancer tumorigenesis and progression. Journal of Experimental & Clinical Cancer Research 35 31.
 Yang et al. (2012) [author] Yang, EunhoE., Allen, GeneveraG., Liu, ZhandongZ. Ravikumar, Pradeep KP. K. (2012). Graphical models via generalized linear models. In Advances in Neural Information Processing Systems 1358–1366.
 Yang et al. (2013) [author] Yang, EunhoE., Ravikumar, Pradeep KP. K., Allen, Genevera IG. I. Liu, ZhandongZ. (2013). On Poisson graphical models. In Advances in Neural Information Processing Systems 1718–1726.
 Yang et al. (2014a) [author] Yang, EunhoE., Ravikumar, PradeepP., Allen, Genevera IG. I., Baker, YuliaY., Wan, YingWooiY.W. Liu, ZhandongZ. (2014a). A General Framework for Mixed Graphical Models. arXiv preprint arXiv:1411.0288.
 Yang et al. (2014b) [author] Yang, EunhoE., Baker, YuliaY., Ravikumar, PradeepP., Allen, GeneveraG. Liu, ZhandongZ. (2014b). Mixed graphical models via exponential families. In Artificial Intelligence and Statistics 1042–1050.

Yang et al. (2015)
[author] Yang, EunhoE., Ravikumar, PradeepP., Allen, Genevera IG. I. Liu, ZhandongZ. (2015). Graphical models via univariate exponential family distributions. Journal of Machine Learning Research 16 3813–3847.
 Yu (2013) [author] Yu, BinB. (2013). Stability. Bernoulli 19 1484–1500. 10.3150/13BEJSP14
 Yuan and Lin (2007) [author] Yuan, MingM. Lin, YiY. (2007). Model selection and estimation in the Gaussian graphical model. Biometrika 94 19–35.
 Zhang et al. (2010) [author] Zhang, CunHuiC.H. et al. (2010). Nearly unbiased variable selection under minimax concave penalty. The Annals of statistics 38 894–942.

Zhang and Huang (2008)
[author] Zhang, CunHuiC.H. Huang, JianJ. (2008). The sparsity and bias of the lasso selection in highdimensional linear regression. The Annals of Statistics 1567–1594.
 Zhao and Yu (2006) [author] Zhao, PengP. Yu, BinB. (2006). On model selection consistency of Lasso. Journal of Machine learning research 7 2541–2563.
 Zou (2006) [author] Zou, HuiH. (2006). The adaptive lasso and its oracle properties. Journal of the American statistical association 101 1418–1429.
Appendix A Additional Simulations
We provide the following figures and tables to supplement our simulations and to further support the strong empirical performance of BRAIL.
To augment the motivating example in Figure 3, we provide comparisons of BRAIL to two other mixed graphical selection methods from the mgm R package (Haslbeck and Waldorp, 2015) in Figure 13. mgm takes a nodewise neighborhood estimation approach, and for each node, mgm selects the Lasso regularization parameter using either the extended BIC or CV, fits a penalized GLM model, and applies additional thresholding to the estimated coefficients to remove noise. Here, we used the “AND” rule to combine estimated neighborhoods for all three graphs. (Note that we had to convert the proportionvalued methylation values into 01 binary values in order to comply with mgm package restrictions.)
From Figure 13, we see that mgm with the extended BIC selection criteria tends to underselect features while mgm with CV often overselects. This agrees with our simulations and discussion of the Lassotype model selection biases in Section 4. We also observe that like the Lassotype methods in Figure 3, mgm with CV and extended BIC can result in imbalanced selection between the blocks.
We next investigate the convergence of the BRAIL algorithm. Our empirical analysis indicates that BRAIL has quick support convergence for all simulation scenarios. We demonstrate this convergence for one type of simulation in Figure 17. Here, we simulate data using predictors from the TCGA ovarian cancer data (see Section 5) for three types of responses: (a) continuous, (b) binary, and (c) counts. We report the number of iterations over 100 runs of the BRAIL algorithm and denote the average number of iterations until convergence by the dashed vertical black line. We see that the average number of iterations is between 4 and 5 with the maximum number of iterations reaching 15. These ranges were similar for all simulation designs, confirming relatively fast convergence of BRAIL. Furthermore, we also show the true positive rates and the total number of selected features in Figure 17 to highlight BRAIL’s convergence to a relatively accurate solution.
Figure 23 duplicates the information in Table 1 but using boxplots for better visualization. We recall that these simulations compared BRAIL and various Lassotype methods (using oracle information) under four simulation designs with Gaussian responses (see Section 4 for further details). In almost all of these simulations, BRAIL is able to achieve a higher TPR while maintaining a low FDP.
We next verify that the above simulation results are not heavily dependent on our choice of and . In Figure 26, we ran the iid simulation design with Gaussian responses for different nonuniform values of and . These results show that BRAIL can successfully recover features from unequally sized blocks with different amounts of sparsity while other methods may struggle to account for biases introduced by the different ’s and ’s.
For Poisson and binary responses, Tables 4 and 5 provide the results from additional simulation designs to supplement Table 2. Here, the response and predictors were simulated according to the description in Section 4 with , , and for each block.
For easier visualizations, we duplicate the results of Tables 4 and 5 using boxplots in Figure 33. As shown by the plots, BRAIL is able to achieve higher TPR and maintain low FDP across various simulations for both binary and Poisson responses.
In Table 6, we compare BRAIL to the nonconvex penalties, MCP and SCAD, under the block directed graph simulation design with three different types of responses. We see that MCP performs well with Gaussian responses, and in fact, the MCP penalty can be used instead of a Lasso penalty in the BRAIL algorithm for Gaussian responses. However, the BRAIL algorithm outperforms MCP and SCAD for nonGaussian responses. We thus chose to use a Lasso penalty when introducing the BRAIL algorithm for consistency.
Figure 38 provides the same information as Table 3 but using boxplots for easier visualization. While the model selection techniques (i.e. CV, extended BIC, and stability selection) give lower values of than their oracle selection counterparts, BRAIL outperforms even the oracle selection methods and yields the highest .
Comments
There are no comments yet.