# Incorporating correlations between drugs and heterogeneity of multi-omics data in structured penalized regression for drug sensitivity prediction

Targeted cancer drugs have been developed to interfere with specific molecular targets, which are expected to affect the growth of cancer cells in a way that can be characterized by multi-omics data. The prediction of cancer drug sensitivity simultaneously for multiple drugs based on heterogeneous multi-omics data (e.g., mRNA expression, DNA copy number or DNA mutation) is an important but challenging task. We use joint penalized regression models for multiple cancer drugs rather than a separate model for each drug, thus being able to address the correlation structure between drugs. In addition, we employ integrative penalty factors (IPF) to allow penalizing data from different molecular data sources differently. By integrating IPF with tree-lasso, we create the IPF-tree-lasso method, which can capture the heterogeneity of multi-omics data and the correlation between drugs at the same time. Additionally, we generalize the IPF-lasso to the IPF-elastic-net, which combines ℓ_1- and ℓ_2-penalty terms and can lead to improved prediction performance. To make the computation of IPF-type methods more feasible, we present that the IPF-type methods are equivalent to the original lasso-type methods after augmenting the data matrix, and employ the Efficient Parameter Selection via Global Optimization (EPSGO) algorithm for optimizing multiple penalty factors efficiently. Simulation studies show that the new model, IPF-tree-lasso, can improve the prediction performance significantly. We demonstrate the performance of these methods on the Genomics of Drug Sensitivity in Cancer (GDSC) data.

## Authors

• 6 publications
• 6 publications
04/30/2019

### Composite local low-rank structure in learning drug sensitivity

The molecular characterization of tumor samples by multiple omics data s...
12/29/2015

### Sparse group factor analysis for biclustering of multiple data sources

Motivation: Modelling methods that find structure in data are necessary ...
11/16/2018

### Synergistic Drug Combination Prediction by Integrating Multi-omics Data in Deep Learning Models

Drug resistance is still a major challenge in cancer therapy. Drug combi...
09/08/2009

### Tree-guided group lasso for multi-response regression with structured sparsity, with an application to eQTL mapping

We consider the problem of estimating a sparse multi-response regression...
10/09/2020

### A Cross-Level Information Transmission Network for Predicting Phenotype from New Genotype: Application to Cancer Precision Medicine

An unsolved fundamental problem in biology and ecology is to predict obs...
11/16/2018

### PaccMann: Prediction of anticancer compound sensitivity with multi-modal attention-based neural networks

We present a novel approach for the prediction of anticancer compound se...
09/28/2021

### Improved prediction rule ensembling through model-based data generation

Prediction rule ensembles (PRE) provide interpretable prediction models ...
##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

Many cancers have specific molecular causes, e.g., mutations in genes involved in the hallmark processes of cancer [16]. Targeted cancer drugs directly affect those particular cancer genes. However, the efficacy of a drug to block cancer cells’ growth may be determined by additional genes. For example, trastuzumab is a human epidermal growth factor receptor-2 (HER2) antibody targeting HER2-overexpressed breast cancer cells [8]. But Liu et al. [28] reported that 2-adrenergic receptor (2-AR) is also a molecular marker to affect the efficacy of trastuzumab in breast cancer. The other example is that the drug Vemurafenib targets metastatic melanoma BRAF mutations at codon 600 [29]. But the matrix metalloproteinase-2 (MMP-2) upregulation has been found in vemurafenib-resistant melanoma cells [34]. Therefore, exploring the multi-omics data which can characterize the whole biological system is helpful to evaluate drug efficacy. It is also potential to find new cancer therapy solutions. That is, cancer cells with the similar characterization of multi-omics are possibly affected by other drugs. Thus it is valuable to model multiple cancer drugs sensitivity and multi-omics data jointly.

In recent years, several groups and consortia have developed big datasets which include large-scale ex vivo pharmacological profiling of cancer drugs and the genomic information of corresponding cell lines [3, 9, 13, 15, 18]. The genomic data can for example consist of genome-wide measurements of mRNA expression, DNA copy numbers, DNA single-point and other mutations or CpG methylation of cell lines. They reflect different heterogeneous molecular profiles of the cancer cell lines with respect to effects, intra correlations, measurement scales and background noise [17]. The drug sensitivity data for some groups of drugs are expected to be correlated, due to their common targets and similar pharmacodynamic behaviours.

To analyze such data, one straightforward method is to use (penalized) linear regression methods, for example lasso

[42]

, regressing each drug on all molecular features in a linear manner. Lasso could select a few relevant features with nonzero regression coefficient estimates from a large number of features. But it cannot address the heterogeneity of different molecular data sources.

Boulesteix et al. [5] introduced integrative -penalized regression with penalty factors (IPF-lasso) to shrink the effects of features from different data sources with varying -penalties, to reflect their different relative contributions. While lasso or IPF-lasso can be extended to multivariate regression to jointly model multiple drugs sensitivity, the correlation of drugs is not reflected in the peanlization of regression coefficients. Kim and Xing [22]

proposed tree-lasso to estimate structured sparsity of multiple response variables assuming a hierarchical cluster structure in the response variables. Each cluster is likely to be influenced by some common features, for which the effects are similar between correlated responses.

In this article, we propose the IPF-tree-lasso which borrows the strength of varying penalty parameters from IPF-lasso and the cluster structure in multivariate regression from tree-lasso. Thus, IPF-tree-lasso can capture the different relative contributions of multiple omics input data sources and the group structure of correlated drug response variables. Since some targeted cancer drugs might have a similar mechanisms, for example the same target gene or signaling pathway, then these drugs are likely to have correlated sensitivities. IPF-tree-lasso is likely to select common relevant molecular features of these correlated drugs, and accordingly to shrink their coefficients with similar penalty parameters.

Elastic net [50] is also compared here, because it considers the grouping effect of correlated features and the -penalty can improve the prediction performance over lasso. Additionally, we also formulate the integrative elastic net with penalty factors (IPF-elastic-net) model to provide a flexible extension of the elastic net with varying complexity parameters ’s as well as varying parameters ’s.

However, IPF-tree-lasso and IPF-elastic-net have more complicated penalty terms which might require new optimization algorithms. We use augmented data matrix, so that the original smoothing proximal gradient descent method [22] and cyclical coordinate descent algorithm for lasso [11] can be employed directly. As elastic net and IPF-type methods have multiple penalty parameters to be optimized, the standard grid search [21] is computationally not efficient. Frohlich and Zell [12] proposed an interval-search algorithm, the Efficient Parameter Selection via Global Optimization (EPSGO), which is more efficient. Sill et al. [37] implemented the EPSGO algorithm in elastic net models and developed the R package c060. We have adapted the c060 package for efficient penalty parameters optimization for IPF-tree-lasso and IPF-elastic-net.

The rest of the paper is organized as follows. In Section 2, the standard penalized regression methods and their extensions with structured penalties are introduced briefly. Section 3 describes the simulation scenario based on multivariate responses and different types of features, and then we present the simulation results and discussion. In Section 4, the Genomics of Drug Sensitivity in Cancer [13] data are used to compare the prediction performance of all methods. Lastly, we discuss the main findings and conclude the paper in Section 5.

## 2 Structured penalties for multivariate regression

### 2.1 Lasso and elastic net

The pharmacological data are collected for samples (e.g., cell lines or patients) and response variables (typically drug sensitivity). The response variables are denoted by (, ), where means the response of the th cell line treated with the th drug. The high-dimensional (i.e., multi-omics) data contain features in total, and all features are available for all samples, denoted by (,

). The linear model mapping from high-dimensional data to multivariate responses is

 Y=1nβ⊤0+XB+E, (2.1)

where is an

-column vector,

is the intercept vector corresponding to response variables, is a regression coefficients matrix, and is a noise matrix. can be estimated by minimizing the sum of the residual sum of squares and a penalty function as following

 minβ0,B{12mn∥Y−1nβ⊤0−XB∥2F+pen(B)},

where is the Frobenius norm. When is very large, especially , and pressuming only a few true relevant features, the -penalized regression, or lasso [42] is a standard method to identify those features by their nonzero coefficient estimates. The multivariate lasso uses the -norm penalty function in , where () and is the given penalty parameter that controls the strength of penalizing coefficients.

Another regularization method is elastic net [50]

, which takes a bias-variance trade-off between lasso and the continuous shrinkage method ridge regression. The ridge penalty (

-penalty) tends to include or exclude strongly correlated features together. The penalty function of elastic net in is , where gives the compromise between ridge () and lasso ().
2.2 IPF-lasso and IPF-elastic-net
Lasso and elastic net penalize all coefficients of features by globally controlling penalty parameters and . In order to distinguish the contributions of heterogeneous data sources, Boulesteix et al. [5] proposed IPF-lasso to analyze multi-omics data. IPF-lasso allows varying penalty parameters to weight the norms of different sources’ coefficients. For multivariate responses, the penalty function is (), where , stacks by rows and is the coefficients matrix corresponding to the th data source. The IPF-lasso reduces to the lasso problem after transforming the matrix as follows.

###### Proposition 1.

Given response and data sources with numbers of features respectively, and the corresponding coefficients matrix , let

 X⋆ B⋆ =[B1\Shortstack...λ2λ1B2\Shortstack...⋯\Shortstack...λSλ1BS]∈R(p1+⋯+pS)×m, (^β0,^B) =argminβ0,B{12mn∥Y−1nβ⊤0−XB∥2F+S∑s=1λs∥Bs∥ℓ1}, (^β0,^B⋆) =argminβ0,B⋆{12mn∥Y−1nβ⊤0−X⋆B⋆∥2F+λ1∥B⋆∥ℓ1}.

Then where .

When there are two data sources, i.e., , Figure 1 shows that the prediction performance as cross-validated MSE is convex in versus when fixing the two penalty parameters ratio .

A simple integrative -penalty method with penalty factors (sIPF-elastic-net) has where () and . Here, different data sources share the same parameter , which forces the same strength to shrink the coefficients of strongly correlated features towards each other.

###### Theorem 1.

Given samples and responses , standardized data sources (variance 1 and mean 0 for each column) with numbers of features , respectively, and the corresponding coefficients matrix , let , and , be the sIPF-elastic-net estimates. Suppose two coefficients and from the th and th data sources (), respectively, and ; then

 |^βjk(λ,α)−^βj′k(λ,α)|≤√λ2s+λ2s′−2λsλs′ρmnλsλs′(1−α)∥Y∥ℓ1,

where sample correlation with the vector of samples of the th feature belonging to the th data source and .

Theorem 1 (proof in Supplementary S1) states that the difference of two nonzero and the same sign coefficient estimates tends to 0 if their corresponding features are highly positively correlated and penalized equally. Therefore, sIPF-elastic-net shows the similar grouping effect as elastic net. The fully flexible version of the IPF-elastic-net has penalty function where , (). Proposition 2 gives its equivalent lasso problem if .

###### Proposition 2.

Given response and data sources with numbers of features respectively, and the corresponding coefficients matrix is , let

 (^β0,^B) =argminβ0,B{12mn∥Y−1nβ⊤0−XB∥2F+S∑s=1λs(αs∥Bs∥ℓ1+12(1−αs)∥Bs∥2ℓ2)}, (2.3) (^β0,^B⋆) =argminβ0,B⋆{12mn∥Y⋆−1n+pβ⊤0−X⋆B⋆∥2F+λ⋆1∥B⋆∥ℓ1}.

Then where

 X⋆ Y⋆ =[Y⋮0⋮⋯⋮0]∈R(n+p)×m, B⋆ λ⋆1 =λ1/{2m(n+∑sps)}, αs ∈(0,1],s∈1,⋯,S.

### 2.2 Tree-lasso and IPF-tree-lasso

Kim and Xing [22] proposed the tree-lasso method which uses a hierarchical tree structure over the response variables in a group-lasso based penalty function. As illustrated in Figure 2 we hypothesize that highly correlated response variables in each cluster are likely to be influenced by a common set of features. The hierarchical tree structure of multiple response variables can be represented as a tree with a set of vertices and groups . It can be given by prior knowledge of the pharmacokinetic properties of the drugs, or be learned from the data, for example by hierarchical clustering. Given the tree and groups , the penalty function of the tree-lasso is defined as

 pen(B) =λp∑j=1∑ν∈Vων∥βGνj∥ℓ2 =λp∑j=1∑ν∈Vleaf∥βGνj∥ℓ2+λp∑j=1∑ν∈Vintων∥βGνj∥ℓ2 =λp∑j=1∑ν∈Vleaf|βGνj|+λp∑j=1∑ν∈Vint{hν⋅∑c∈Children(ν)ωc∥βGcj∥ℓ2+(1−hν)∥βGνj∥ℓ2},

where is the th row of associated with response variables in group , is either the weight associated with the height of each internal node in tree or for the leaf node, and are the internal nodes and leaves of the tree, respectively. For example, consider a case with two drugs and a tree of three nodes that consists of two leaf nodes and one root node. It is illustrated as the following subtree of tree in Figure 2, , , , . Then the penalty function for this tree is

 pen(B)=λp∑j=1{|βj1|%leaf$Gν1$+|βj2|leaf Gν2+hν4(|βj1|leaf Gν1+|βj2|leaf Gν2)+(1−hν4)√(βj1)2+(βj2)2internal Gν4}.

We can now define IPF-tree-lasso, where different ’s are employed for different data sources. Its penalty function is defined as follows

 pen(B)=S∑s=1λs⎛⎜⎝ps∑j=1∑ν∈Vintων∥βGνj,s∥ℓ2+ps∑j=1∑ν∈Vleaf∥βGνj,s∥ℓ2⎞⎟⎠, (2.5)

where is the th row of coefficients corresponding to th data source.

###### Proposition 3.

Given response and data sources with numbers of features , respectively, and the corresponding coefficients matrix is , let

 X⋆ B⋆ =[B1\Shortstack...λ2λ1B2\Shortstack...⋯\Shortstack...λSλ1BS]∈R(p1+⋯+pS)×n, (^β0,^B) =argminβ0,B⎧⎪⎨⎪⎩12mn∥Y−1nβ⊤0−XB∥2F+S∑s=1λs⎛⎜⎝ps∑j=1∑ν∈V% intων∥βGνj,s∥ℓ2+ps∑j=1∑ν∈Vleaf∥βGνj,s∥ℓ2⎞⎟⎠⎫⎪⎬⎪⎭, (^β0,^B⋆) =argminβ0,B⋆⎧⎪⎨⎪⎩12mn∥Y−1nβ⊤0−X⋆B⋆∥2F+λ1⎛⎜⎝p∑j∑ν∈Vintων∥β⋆Gνj∥ℓ2+p∑j∑ν∈Vleaf∥β⋆Gνj∥ℓ2⎞⎟⎠⎫⎪⎬⎪⎭.

Then where .

Without loss of generality, the proof for the case with three response variables and the tree of five nodes shown in Figure 2 is given in Supplementary S2.

###### Proposition 4.

Generally, the penalized objective function can be formulated as

 minβ0,B⎧⎨⎩12mn∥Y−1nβ⊤0−XB∥2F+λp∑j=1∑g∈Gwg∥Sgj(B)∥ℓqj,g⎫⎬⎭, (2.6)

where is the a submatrix of , , is a tuning parameter, contains pre-defined groups and all group weights ’s are known or can be pre-estimated. The submatrix is associated with specified set of rows and columns . Let with numbers of features , respectively, and . The corresponding IPF problem

 minβ0,B⎧⎨⎩12mn||Y−1nβ⊤0−XB||2F+S∑s=1ps∑j=1∑g∈Gλswg∥Sgj(Bs)∥ℓqj,g⎫⎬⎭ (2.7)

can be transformed into the equivalent original problem

 minβ0,B⋆⎧⎨⎩12mn∥Y−1nβ⊤0−X⋆B⋆∥2F+λ1p∑j=1∑g∈Gwg∥Sgj(B⋆)∥ℓqj,g⎫⎬⎭,

where , and , if and . If and (or) , then the elements

and the non-penalized features are remained in the Frobenius-norm loss function. The rows in

and the columns in need not be contiguous, and can be overlaping.

The Proposition 4 allows different norms for different submatrices in the penalty term. IPF-lasso and IPF-tree-lasso are special cases of , that is,

• IPF-lasso: , , , ;

• IPF-tree-lasso: , , .

In (2.6), is likely to seek a common subset of a submatrix , in which selected features will be relevant to multiple response variables simultaneously (Turlach and others, 2005). The Sparse Group Elastic Net [33] including , and is also a special case of (2.7). Since the penalty term in (2.7) allows the overlaps of submatrices, it can contain more integrated penalty cases. For example, sparse-group lasso [38] with and penalties actually allows the overlaps of coefficient groups (including singletons). Jacob, Obosinski and Vert [20] proposed group lasso allowing overlaps in combination with graphical lasso. Li, Nan and Zhu [27] proposed the multivariate sparse group lasso, where an arbitrary group structure such as overlapping or nested or multilevel hierarchical structure is considered. All these methods can be extended into corresponding IPF-type methods and be solved by Proposition 4. However, (2.7) doesn’t include graphical lasso or fused lasso. If there is a non-identity transformation of the submatrix of inside of the norm in the penalty term, the augmented matrix will be complicated.

### 2.3 Implementation

For the implementation, we give an initial decreasing sequence, starting at which shrinks all coefficients to zero [11], for lasso, elastic net, IPF-lasso and sIPF-elastic-net. The multivariate responses are vectorized to . The interval-search algorithm Efficient Parameter Selection vis Global Optimization (EPSGO) is applied to find the optimal in elastic net type methods [12, 37], as well as penalty parameters ratios in IPF-type methods. The EPSGO algorithm updates the tuning parameters through learning a Gaussian process model of the loss function surface from the points which have already been visited. Note that parameter tuning for IPF-elastic-net with varying ’s remains challenging even when using the EPSGO algorithm due to the large number of parameters in a non-convex situation. For IPF-tree-lasso, is optimized by exploring a given sequence of values, while the EPSGO algorithm is used to determine the optimal penalty parameters ratios (). The IPF-tree-lasso is implemented based on the equivalent tree-lasso problem, which is more efficient than directly adapting the original tree-lasso algorithm to iterate coefficients of different data sources respectively (see Supplementary S3). The tree structure is pre-estimated from the response data by hierarchical agglomerative clustering [14] and only using the nodes with normalized heights larger than a pre-determined threshold to ignore groups with weak correlations between response variables. Optimal penalty parameters are found by minimizing the MSE as loss function on the learning data, and independent validation data are used to obtain prediction mean squared errors (MSE) to evaluate prediction performance.

## 3 Simulations

### 3.1 Simulation scenario

We simulate data to demonstrate the prediction performance and variable selection performance of lasso, elastic net, tree-lasso and their corresponding IPF-type methods. We assume multiple responses with , two data sources of high-dimensional input features (), and samples. The data are simulated according to the linear model

 Y=[X1,~X2][B1B2]+E. (3.1)

The high-dimensional feature matrix

is generated from a multivariate normal distribution with mean

and a nondiagonal covariance matrix as Boulesteix and others (2017) suggested , where

 Σ=⎡⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢⎣Ap1/b(σ)…0\tabularcell@hboxBp1/b,p2/b(σ)\tabularcell@hbox…0⋮⋱⋮\tabularcell@hbox⋮\tabularcell@hbox⋱⋮0…Ap1/b(σ)\tabularcell@hbox0\tabularcell@hbox…Bp1/b,p2/b(σ)[2pt/2pt]Bp2/b,p1/b(σ)…0\tabularcell@hboxAp2/b(σ)\tabularcell@hbox…0⋮⋱⋮\tabularcell@hbox⋮\tabularcell@hbox⋱⋮0…Bp2/b,p1/b(σ)\tabularcell@hbox0\tabularcell@hbox…Ap2/b(σ)⎤⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥⎦.

Further dichotomizing where is an indicator function and let . The second data source is dichotomized to simulate the common situation that one data source represents binary gene mutations. In the covariance matrix , blocks and capture the covariances of features among the first and second data source, respectively. In each data source, there are latent groups of size and , respectively, represented by blocks and in which any two features have covariance . We set , , and the variance of each feature is one. Considering that different data sources might have different numbers of features, we simulate two situations, and . In (3.1), the noise term (), and .

We assume that multiple responses can be grouped and the group relationships can be addressed by a hierarchical tree structure. In our intended applications, different molecular information may explain different group effects in the drug response variables. In the first simulation scenario, we design two different hierarchical tree effects from the two data sources, as illustrated in Figure 3. The groups in the first 12 response variables are determined by the first data source, and the second data source determines the groups in the second 12 responses. The two hierarchical structures are generated by the two matrices and illustrated in Figure 3. In a second simulation scenario the two data sources do not determine the drug groups separately, but instead in an overlapping manner. For this we design two very different hierarchical structures with and , as illustrated in Figure 4. In scenario 3, to test the sensitivity to model misspecification, we set a control case which has no tree-structured and as shown in Figure 5. The patterns of and in Figure 5 correspond to the designed ”hotspots” in the simulation by Lewin et al. [25].

To evaluate the prediction performance of different methods, we calculate MSE from the simulated validation data and which are simulated independently in an identical manner to and . Additionally, we compare the accuracy of variable selection performance, where we use the terms sensitivity to denote the percentage of nonzero coefficients accurately estimated as nonzeros and specificity to denote the percentage of zero coefficients accurately estimated as zeros. Algorithm 1 below summarizes the simulation study setup.

### 3.2 Simulation results and discussion

We do 50 simulations and show in Figure 6 the prediction performance of all methods in the different scenarios. When the two data sources have the same number of features (i.e., ) and similar trees for and as in scenario 1 (Figure 3), then lasso, IPF-lasso, elastic net and sIPF-elastic-net have similar prediction performance in terms of MSE (Figure 6(a)). But tree-lasso and IPF-tree-lasso outperform other methods. However, when the two data sources are more different as in design scenario 2 (Figure 4), IPF-lasso and sIPF-elastic-net are slightly superior to lasso and elastic net in the situation of (Figure 6 (c)). IPF-tree-lasso further improves the prediction. In the control case, Figure 6 (e) (scenario 3), tree-lasso and IPF-tree-lasso still have better prediction.

When the two data sources have different feature numbers (i.e., , ), IPF-type methods (IPF-lasso, sIPF-elastic-net and IPF-tree-lasso) have lower MSE than their corresponding non-IPF-type methods (lasso, elastic net and tree-lasso) respectively in all three scenarios (Figure 6 (b), (d) and (f)). When comparing scenario 2 with scenario 1 in the case , IPF-type methods improve the prediction more significantly in scenario 2.

Table 1 displays the accuracy of coefficients matrix estimation and variable selection for scenario 1. According to the averaged absolute errors of estimated coefficients, , all methods performe similarly if but the IPF-type methods perform slightly better for . In both cases, and , IPF-lasso, sIPF-elastic-net, and IPF-tree-lasso select fewer features and achieve larger specificity than non-IPF-type methods, but they lose sensitivity.

In the simulations, we mainly focus on simulating hierarchically structured drug sensitivity according to the specifically designed coefficient matrices as in Figure 3 (scenario 1) and Figure 4 (scenario 2). In both scenarios 1 and 2, the tree-lasso model improves performance. But even in the case of non-structured (scenario 3) tree-lasso still outperforms lasso and elastic net in Figure 6(e), because some of the correlation structures among response variables can be captured by a tree structure. For example, the nonzero blocks in in Figure 5 illustrate different response variables that are likely to be similar since these responses can be explained by the same features. Kim and Xing [22] also showed that tree-lasso can take into account such correlations even when the tree structure is not fully realized. Compared to the top three panels of Figure 6, the bottom three panels reflect a greater contribution to prediction performance of IPF-type methods. In all situations, IPF-tree-lasso achieves the best prediction, at least as good as tree-lasso when , and the IPF-lasso and sIPF-elastic perform quite well when . It is because that IPF-tree-lasso does not only consider the correlations among responses, but also distinguishes the relative contributions of two data sources with varying penalty parameters and .

## 4 Real data analysis

The Genomics of Drug Sensitivity in Cancer (GDSC) database [49] was developed from a large-scale pharmacogenomic study. The drug sensitivity, measured as half-maximal inhibitory concentration (IC), is estimated by fitting a Bayesian sigmoid model to dose-response curves obtained for a range of drug concentrations corresponding to the 72 hours’ effect of drug treatment on cell viability, see Garnett et al. [13] for the modelling details. We use the data from their archived files in ftp://ftp.sanger.ac.uk/pub4/cancerrxgene/releases/release-5.0/, where there are 97 cancer drugs tested for 498 cell lines with complete availability of IC after excluding one cell line with an extremely small estimate for IC . These cell lines are from 13 cancer tissue types. For each cell line, the following genomic data are available as baseline measurements: genome-wide measurement of mRNA expression, copy numbers and DNA single point and other mutations. We preselected 2602 gene expression features with the largest variances over cell lines, which in total explain of the variation. Referring to the Cancer Gene Census (https://cancer.sanger.ac.uk/census), copy number variations for 426 genes and mutations in 68 genes are causally implicated in cancer.

We randomly select cell lines of each cancer tissue type for training data and the other as validation data. A multivariate regression model is fitted to the training data:

 Y=1nβ⊤0+X0B0+XB+E, (4.1)

where is the

to reduce the skewness of IC

, is the intercept vector,

represents the cancer tissue types by dummy variables, and

consists of the -transformed gene expression variables (), the copy number variables () and mutation (0/1) variables (). Let correspond to the coefficient matrices of the three data sources. Since tissue types are known to have large effects on drug sensitivity and is low-dimensional, we do not penalize the coefficients of cancer tissue types when fitting the model . Supplementary S5 outlines the estimation of non-penalized coefficients in the tree-lasso model. The validation data are used to evaluate prediction performance by MSE. The implementation procedure of (4.1) by all methods is the same as shown in the simulation Algorithm 1. Additionally, is computed as

 R2val≡1−SSEvalSSTval=1−∥Yval−1n^β⊤0−Xval^B∥2F∥Yval−1n¯Yval∥2F,

where and are the 20% validation data, is a column vector with averaged logIC of each drug over the validation data samples. For comparison we also fit two low-dimensional linear regression models: (1) the ”NULL” model which only includes an intercept vector, and (2) the ”OLS” model which includes the intercept vector and dummies of the cancer tissue types.

To eliminate the uncertainty of splitting the data randomly into training validation sets, Table 2 reports the averaged results of 10 different random splits of the GDSC data. When only using the 13 tissue categories as predictors (”OLS” model), the prediction performance is very poor, MSE=3.199 and averaged over the 10 repetitions. However, the genomic information improves the prediction in the lasso and elastic net models as shown in Table 2. By taking into account the hierarchical group relationship of 97 drugs (tree-lasso and IPF-tree-lasso) and the heterogeneity of different data sources (IPF-type methods), performance can be improved further. IPF-tree-lasso performs best, with prediction MSE=3.025 and . Nevertheless, all methods explain only a limited percentage of variation in the drug sensitivity data when averaging across all drugs. When looking at the individual explained variation and MSE, the results differ widely between drugs. For example, the molecular information via the IPF-tree-lasso model can explain variation of Nutlin- drug sensitivity (Figure S2 of Supplementary S6). More prediction results with discussion for individual drugs and tissues can be found in Supplementary S6.

In Table 2,

indicates the features selected at least twice over the

repetitions. We note that lasso, elastic net and IPF-lasso perform very sparse variable selection with fewer than 860 out of (0.29%) nonzero features estimated more than once over 10 repetitions. Although sIPF-elastic-net has better average prediction performance, it results in an almost full dense coefficients matrix with averaged , but elastic net with averaged . Tree-lasso obtains a much larger number of nonzero coefficients for copy number features than IPF-tree-lasso, but only one associated mutation variable: TP53 for drug Nutlin-. In contrast, IPF-tree-lasso selects 537 associated mutated gene features, in particular, targeted mutant B-Raf for drugs PLX4720 and SB590885, mutant EGFR/ErbB for BIBW2992 (Afatinib), and the chromosomal rearragement Bcr-Abl for drugs Nilotinib and AP-24534. All these mutations are in target genes of the corresponding drugs and were thus expected to be important for the prediction of drug sensitivity.

## 5 Conclusion

In this study, we used multi-omics data to predict sensitivity of cancer cell lines to multiple drugs jointly. The penalized regression should take into account the heterogeneity in the multi-omics data as well as (hierarchical) relationships between drugs. We extended the tree-lasso to IPF-tree-lasso, which can achieve the two purposes simultaneously. Through weighting the data by the penalty parameters, Proposition 3 makes the implementation of IPF-tree-lasso feasible. In addition, based on IPF-lasso [5], we formulated the IPF-elastic-net, which is an option to tune the varying penalty parameters and () in (2.3), thus allowing differing degrees of sparsity in the different data sources. If one has prior knowledge on the sparsity of the different data sources, IPF-elastic-net can specify some data sources as lasso for instance, i.e., those ’s as 1. Proposition 4 provides the transformation from a large class of general IPF-type penalized problems into the equivalent original penalized problems. Furthermore, we demonstrated how the interval-search algorithm EPSGO [12] can be used to optimize multiple penalty parameters efficiently.

To capture the heterogeneity of different features, Bergersen, Gland and Lyng [4] proposed weighted lasso to use external information (i.e., copy numbers) to generate penalty weights. Van de Wiel et al. [44] developed ”GRridge” regression to use related co-data (annotation or external values) to derive group-specific penalties by empirical Bayes estimation; only one global penalty parameter needs to be optimzed by e.g. cross-validation. These methods use other data as auxiliary data to determine (group) weights in the penalty term. However, in IPF-type methods all data information contributes to the outcomes directly. Dondelinger and Mukherjee [10] proposed joint lasso to penalize subgroup-specific (i.e., cancer tissue) coefficients differently with an additional fusion penalty term, but they used the same penalty for high-dimensional features. Klau et al. [23] presented priority-Lasso to construct blocks of multi-omics data sources and regress on each data source sequentially. Wu et al. [48] selectively reviewed multi-level omics data integration methods, but focused on univariate and survival outcomes.

In the GDSC data analysis, the low averaged implies the limitation to use genomic information to predict drug sensitivity. Wang et al. [45] and Ali et al. [1] showed proteomes of human cancer cell lines are more representative of primary tumors than genomic profiles alone to predict drug sensitivity. Chambliss and Chan [6] recommended to integrate pharmacoproteomic and pharmacogenomics profiles to identify the right therapeutic regimen. Also, the heterogeneity between cell lines even within one cancer tissue type might be large [2, 30].

One disadvantage of IPF-type methods is that they cannot address known associations between features in the different data sources, e.g., between gene expression and mutation status of the same gene. However, the sIPF-elastic-net employs a common for all data sources, which is likely to select the strongly correlated features over all data sources together if is small (Theorem 1). As for IPF-tree-lasso, specifying similar weights for coefficients of associated internal nodes in () across different data sources may select correlated features over multiple data sources. Furthermore, using biological pathways of genes related to the cancer may improve the biological interpretation and prediction of drug sensitivity as well. Lee et al. [24] and Li and Li [26] proposed pathway-based approaches to identify biologically relevant pathways related to interesting phenotype. The tree-lasso or IPF-tree-lasso can also be extended to include the pathway-group structure over features in the penalty term besides the hierarchical structure over response variables.

## Acknowledgements

The first author is supported by the Faculty of Medicine, University of Oslo, Norway. This work is associated with the Norges Forskningsråd project BIG INSIGHT: 237718. The authors thank Professor Arnoldo Frigessi for discussions. Conflict of Interest: None declared.

## Supplementary Material

Software in the form of R package IPFStructPenalty, together with a sample input data set and complete documentation is available on https://
github.com/zhizuio/IPFStructPenalty
. Other supplementary materials including proofs, extra algorithms and results with discussion are available online.

## References

• Ali et al. [2018] Ali, M., Khan, S.A., Wennerberg, K. and Aittokallio, T.(2018). Global proteomics profiling improves drug sensitivity prediction: results from a multi-omics, pan-cancer modeling approach. Bioinformatics 34, 1353-1362.
• Arul et al. [2017] Arul, M., Roslani, A.C. and Cheah, S.H.(2017). Heterogeneity in cancer cells: variation in drug response in different primary and secondary colorectal cancer cell lines in vitro. In Vitro Cellular & Developmental Biology - Animal 53, 435-447.
• Barretina et al. [2012] Barretina, J., Caponigro, G., Stransky, N., Venkatesan, K., Margolin, A.A., Kim, S., Wilson, C.J., Lehar, J., Kryukov, G.V., Sonkin, D. et al.(2012). The Cancer Cell Line Encyclopedia enables predictive modelling of anticancer drug sensitivity. Nature 483, 603-607.
• Bergersen, Gland and Lyng [2011] Bergersen, L.C., Glad, I.K. and Lyng, H.(2011). Weighted lasso with data integration. Statistical Applications in Genetics and Molecular Biology 10, 1-29.
• Boulesteix et al. [2017] Boulesteix, A.L., De Bin, R., Jiang, X. and Fuchs, M.(2017). IPF-LASSO: integrative L1- penalized regression with penalty factors for prediction based on multi-omics data. Computational and Mathematical Methods in Medicine 7691937.
• Chambliss and Chan [2016] Chambliss, A.B. and Chan, D.W.(2016). Precision medicine: from pharmacogenomics to pharmacoproteomics. Clinical Proteomics 13-25.
• Chen et al. [2012] Chen, X., Lin, Q., Kim, S., Carbonell, J. and Xing, E.(2012). Smoothing proximal gradient method for general structured sparse regression. The Annals of Applied Statistics 6, 719-752.
• Chiu, Ueno and Lee [2004] Chiu, S.J., Ueno N.T. and Lee, R.J.(2004). Tumor-targeted gene delivery via anti-HER2 antibody (trastuzumab, Herceptin) conjugated polyethylenimine. Journal of Controlled Release 97, 357-369
• Daemen et al. [2013] Daemen, A., Griffith, O.L., Heiser, L.M., Wang, N.J., Enache, O.M., Sanborn, Z., Pepin, F., Durinck, S., Korkola, J.E., Griffith, M. et al. (2013). Modeling precision treatment of breast cancer. Genome Biology 14, R110.
• Dondelinger and Mukherjee [2018] Dondelinger, F. and Mukherjee, S. (2018). The joint lasso: high-dimensional regression for group structured data. Biostatistics kxy035.
• Friedman, Hastie and Tibshirani [2010] Friedman, J., Hastie, T. and Tibshirani, R.(2010). Regularization Paths for Generalized Linear methods via Coordinate Descent. Journal of Statistical Software 33, 1-22.
• Frohlich and Zell [2005] Frohlich, H. and Zell, A.

(2005). Efficient Parameter Selection for Support Vector Machines in Classification and Regression via Model-Based Global Optimization. Proceedings of the International Joint Conference of Neural Networks

pp 1431-1438.
• Garnett et al. [2012] Garnett, M., Edelman, E., Heidorn, S., Greenman, C.D., Dastur, A., Lau, K.W., Greninger, P., Thompson, I.R., Luo, X., Soares, J. et al.(2012). Systematic identification of genomic markers of drug sensitivity in cancer cells. Nature 483, 570-575.
• Golub et al. [1999] Golub, T., Slonim, D., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J.P., Coller, H., Loh, M.L., Downing, J.R., Caligiuri, M.A., Bloomfield, C.D. and Lander, E.S. (1999). Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286, 531-537.
• Greshock et al. [2010] Greshock, J., Bachman, K.E., Degenhardt, Y.Y., Jing, J., Wen, Y.H., Eastman, S., McNeil, E., Moy, C., Wegrzyn, R., Auger, K., Hardwicke, M.A. and Wooster, R. (1999). Molecular target class is predictive of in vitro response profile. Cancer Research 70, 3677-3686.
• Hanahan and Weinberg [2011] Hanahan, D. and Weinber, R.A. (2011). Hallmarks of cancer: the next generation. Cell 144, 646-674.
• Hasin, Seldin and Lusin [2017] Hasin, Y., Seldin, M. and Lusis, A. (2017). Multi-omics approaches to disease. Genome Biology 18, 83.
• Haverty et al. [2016] Haverty, P.M., Lin, E., Tan, J., Yu, Y., Lam, B., Lianoglou, S., Neve, R.M., Martin, S., Settleman, J., Yauch, R.L. and Bourgon, R. (2016). Reproducible pharmacogenomic profiling of cancer cell line panels. Nature 533, 333-337.
• Iversen et al. [2009] Iversen, C., Larson, G., Lai, C., Yeh, L.T., Dadson, C., Weingarten, P., Appleby, T., Vo, T., Maderna, A., Vernier, J.M., Hamatake, R., Miner, J.N. and Quart, B. (2009). RDEA119/BAY 869766: a potent, selective, allosteric inhibitor of MEK1/2 for the treatment of cancer. Cancer Research 69, 6839-6847.
• Jacob, Obosinski and Vert [2009] Jacob, L., Obozinski, G. and Vert, J.(2009). Group Lasso with Overlap and Graph Lasso.

Proceedings of the 26th Annual International Conference on Machine Learning

pp, 433-440.
• Jones, Schonlau and Welch [1998] Jones, D., Schonlau, M. and Welch, W.(1998). Efficient Global Optimization of Expensive Black-Box Functions. Journal of Global Optimization 12, 455-492.
• Kim and Xing [2012] Kim, S. and Xing, E.(2012). Tree-guide group lasso for multi-response regression with structured sparsity, with an application to eQTL mapping. The Annals of Applied Statistics 6, 1095-1117.
• Klau et al. [2018] Klau, S., Jurinovic, V., Hornung, R., Herold, T. and Boulesteix, A.L.(2018). Priority-Lasso: a simple hierarchical approach to the prediction of clinical outcome using multi-omics data. BMC Bioinformatics 19, 322.
• Lee et al. [2016] Lee, S., Choi, S., Kim, Y.J., Kim, B.J., T2d-Genes Consortium, Hwang, H. and Park, T.(2016). Pathway-based approach using hierarchical components of collapsed rare variants. Bioinformatics 32 586-594.
• Lewin et al. [2016] Lewin, A., Saadi, H., Peters, J., Moreno-Moral, A., Lee, J.C., Smith, K.G., Petretto, E., Bottolo, L. and Richardson, S.(2016). MT-HESS: an efficient Bayesian approach for simultaneous association detection in OMICS datasets, with application to eQTL mapping in multiple tissues. Bioinformatics 32, 523-532.
• Li and Li [2008] Li, C. and Li, H.(2008). Network-constrained regularization and variable selection for analysis of genomic data. Bioinformatics 24, 1175-1182.
• Li, Nan and Zhu [2015] Li, Y., Nan, B. and Zhu, J.(2015). Multivariate Sparse Group Lasso for the Multi- variate Multiple Linear Regression with an Arbitrary Group Structure. Biometrics 71, 354-363.
• Liu et al. [2016] Liu, D., Yang, Z., Wang, T., Yang, Z., Chen, H., Hu, Y., Hu, C., Guo, L., Deng, Q., Liu, Y., Yu, M., Shi, M., Du, N. and Guo, N.(2016). -AR signaling controls trastuzumab resistance-dependent pathway. Oncogene 35, 47-58.
• Luke and Hodi [2012] Luke, J. and Hodi, F.S.(2012). Vemurafenib and BRAF Inhibition: A New Class of Treatment for Metastatic Melanoma. Clinical Cancer Reaserch 18, 9-14.
• Meascham and Morrison [2013] Meacham, C.E. and Morrison, S.J.(2013). Tumor heterogeneity and cancer cell plasticity. Nature 501, 328-337.
• Middleton et al. [2015] Middleton, M.R., Friedlander, P., Hamid, O., Daud, A., Plummer, R., Falotico, N., Chyla, B., Jiang, F., McKeegan, E., Mostafa, N.M. et al.(2015). Randomized phase II study evaluating veliparib (ABT-888) with temozolomide in patients with metastatic melanoma. Annals of Oncology 26, 2173-2179.
• Newman et al. [2012] Newman, B., Liu, Y., Lee, H.F., Sun, D. and Wang, Y.(2012). HSP90 inhibitor 17-AAG selectively eradicates lymphoma stem cells. Cell Reports 72, 4551-4561.
• Samarov et al. [2017] Samarov, D., Allen, D., Hwang, J., Lee, Y.J. and Litorja, M.(2017). A coordinate-descent-based approach to solving the sparse group elastic net. Technometrics 59, 437-445.
• Sandri et al. [2016] Sandri, S., Faiao-Flores, F., Tiago, M., Pennacchi, P.C., Massaro, R.R., Alves-Fernandes, D.K., Berardinelli, G.N., Evangelista, A.F., de Lima Vazquez, V., Reis, R.M. and Maria-Engler, S.S.(2016). Vemurafenib resistance increases melanoma invasiveness and modulates the tumor microenvironment by MMP-2 upregulation. Pharmacological Research 111, 523-533.
• Sebolt-Leopold [2000] Sebolt-Leopold, J.S.(2000). Development of anticancer drugs targeting the MAP kinase pathway. Oncogene 19, 6594-6599.
• Shi et al. [2015] Shi, Y., Ma, I.T., Patel, R.H., Shang, X., Chen, Z., Zhao, Y., Cheng, J., Fan, Y., Rojas, Y., Barbieri, E. et al.(2015). NSC-87877 inhibits DUSP26 function in neuroblastoma resulting in p53-mediated apoptosis. Cell Death & Disease 6, e1841.
• Sill et al. [2014] Sill, M., Hielscher, T., Becker, N. and Zucknick, M.(2014). c060: Extended Inference with Lasso and elastic net Regularized Cox and Generalized Linear methods. Journal of Statistical Software 62, 1-22.
• Simon et al. [2012] Simon, N., Friedman, J., Hastie, T. and Tibshirani, R.(2012). A sparse-group Lasso. Journal of Computational and Graphical Statistics 22, 231-245.
• Simon, Friedman and Hastie [2013] Simon, N., Friedman, J. and Hastie, T.(2013). A blockwise descent algorithm for group-penalized multiresponse and multinomial regression. arXiv 1311.6529.
• Sontake et al. [2017] Sontake ,V., Wang, Y., Kasam, R.K., Sinner, D., Reddy, G.B., Naren, A.P., McCormack, F.X., White, E.S., Jegga, A.G. and Madala, S.K.(2017). Hsp90 regulation of fibroblast activation in pulmonary fibrosis. JCI Insight 2, e91454.
• Tanaka et al. [2006] Tanaka, R., Tomosugi, M., Sakai, T. and Sowa, Y.(2006). MEK Inhibitor Suppresses Expression of the miR-17-92 Cluster with G1-Phase Arrest in HT-29 Human Colon Cancer Cells and MIA PaCa-2 Pancreatic Cancer Cells. Anticancer Research 36, 4537-4543.
• Tibshirani [1996] Tibshirani, R.(1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society B 58, 267-288.
• Turlach, Venables and Wright [2005] Turlach, B., Venables, W. and Wright, S.(2005). Simultaneous Variable Selection. Technometrics 47, 349-363.
• Van de Wiel et al. [2016] Van de Wiel, M.A., Lien, T.G., Verlaat, W., Van Wieringen, W.N., Wilting, S.M.(2016). Better prediction by use of co-data: adaptive group-regularized ridge regression. Statistics in Medicine 35, 368-381.
• Wang et al. [2017] Wang, J., Mouradov, D., Wang, X., Jorissen. R.N., Chambers, M.C., Zimmerman, L.J., Vasaikar, S., Love, C.G., Li, S., Lowes, K. et al.(2017). Colorectal Cancer Cell Line Proteomes Are Representative of Primary Tumors and Predict Drug Sensitivity. Gastroenterology 153, 1082-1095.
• Weber et al. [2017] Weber, H., Valbuena, J.R., Barbhuiya, M.A., Stein, S., Kunkel, H., Garcia, P., Bizama, C., Riquelme, I., Espinoza, J.A. and Kurtz, S.D.(2017). Small molecule inhibitor screening identifified HSP90 inhibitor 17-AAG as potential therapeutic agent for gallbladder cancer. Oncotarget 8, 26169-16184.
• Weiss et al. [2018] Weiss, B., Plotkin, S., Widemann, B., Tonsgard, J., Blakeley, J., Allen, J., Schorry, E., Korf, B., Rosser, T., Goldman, S. et al.((2018). NFM-06. NF106: Phase 2 trial of the MEK inhibitor PD-0325901 in adolescents and adults with NF1-related plexiform neurofibromas: an NF clinical trials consortium study. Neuro-Oncology 20, i143.
• Wu et al. [2019] Wu, C., Zhou, F., Ren, J., Li, X., Jiang, Y. and Ma, S.(2017). A Selective review of multi-level omics data integration using variable selection. High-Throughput 8, 4.
• Yang et al. [2013] Yang, W., Soares, J., Greninger, P., Edelman, E.J., Lightfoot, H., Forbes, S., Bindal, N., Beare, D., Smith, J.A., Thompson, I.R. et al.(2013). Genomics of Drug Sensitivity in Cancer (GDSC): a resource for therapeutic biomarker discovery in cancer cells. Nucleic Acids Reserch 41, D955-961.
• Zou and Hastie [2005] Zou, H. and Hastie, T.(2005). Regularization and variable selection via the elastic net. Journal of Royal Statistics Society, Series B 67, 301-320.