personalized_medicine
None
view repo
Personalized medicine has become an important part of medicine, for instance predicting individual drug responses based on genomic information. However, many current statistical methods are not tailored to this task, because they overlook the individual heterogeneity of patients. In this paper, we look at personalized medicine from a linear regression standpoint. We introduce a alternative version of the ridge estimator and target individuals by establishing a tuning parameter calibration scheme that minimizes prediction errors of individual patients. In stark contrast, classical schemes such as crossvalidation minimize prediction errors only on average. We show that our pipeline is optimal in terms of oracle inequalities, fast, and highly effective both in simulations and on real data.
READ FULL TEXT VIEW PDFNone
In the last decade, improvements in genomic, transcriptomic, and proteomic technologies have enabled personalized medicine (also called precision medicine) to become an essential part of contemporary medicine. Personalized medicine takes into account individual variability in genes, proteins, environment, and lifestyle to decide on optimal disease treatment and prevention [14]. The use of a patient’s genetic and epigenetic information has already proven to be highly effective to tailor drug therapies or preventive care in a number of applications, such as breast [7], prostate [23], ovarian [17], and pancreatic cancers [24], cardiovascular disease [11], cystic fibrosis [36], and psychiatry [10]. The subfield of pharmacogenomics studies specifically how genes affect a person’s response to particular drugs to develop more efficient and safer medications [37].
Genomic, epigenomic, and transcriptomic data used in precision medicine, such as gene expression, copy number variants, or methylation levels are typically highdimensional with a number of variables that rivals or exceeds the number of observations. Using such data to estimate and predict treatment response or risk of complications, therefore, requires regularization typically by the norm (lasso), the norm (ridge), or other terms. Ridge regression [18] yields good predictive performance for dense or nonsparse effects, that is, for outcomes related to systemic conditions, as the method does not perform variable selection. Ridge regression has become a standard tool for prediction based on genomic data, and it has been shown that ridge regression can outmatch competing methods for survival based on gene expression [2, 8].
However, regularization always introduces one or more tuning parameters. These tuning parameters are usually calibrated based on the averaged prediction risks. Most commonly used, fold crossvalidation (CV) divides the data into folds (typically ), predicts each fold outofsample, averages over all folds for a range of tuning parameters, and selects the value with the lowest averaged error [32, 12]. But this averaging removes the inherent individual heterogeneity of the patients and can, therefore, result in suboptimal prediction performance. This may ultimately lead to unsuitable treatment, administration of improper medication with adverse side effects, or lack of preventive care [14].
Hence, rather than minimizing an averaged prediction error, our goal is to minimize each patient’s individual (“personalized”) prediction error. A naïve twostage personalized procedure for ridge regression was recently proposed by [16]. In this paper, we introduce a alternative ridge estimator, referred to as euclidean distance ridge (), and calibrate the tuning parameter using adaptive validation [21, 31] individually for each patient. We show that this approach offers compelling theory, fast computations, and accurate prediction on data.
The specific motivation for our method is unravelling the relationship between gene expression and weight gain in kidney transplant recipients [5]. Kidney transplant recipients are known to often gain substantial weight during the first year after transplantation, which can result in adverse health effects [26]. Individual predictions of this weight gain could help in providing each patient with the best possible care.
The remainder of this paper is organized as follows: We introduce the linear regression framework and the problem statement in Section 2. We then introduce the main methodology of our approach, and present theoretical guarantees in Section 3. In addition, we discuss the algorithm and analyze its the performance through simulation studies using synthetic and real data in Section 4. We further apply our pipeline to kidney transplant data in Section 5. Finally, we discuss the results in Section 6 and defer the proofs to the Appendix. Our code is available at https://github.com/LedererLab/personalized_medicine.
We consider data () that follows a linear regression model
(2.1) 
Let denote the number of parameters, e.g. genes or genetic probes, and the number of samples, e.g. patients, then
is the vector of outcomes,
, for example, a person’s response to a treatment. We let denote the design matrix, where each row , , contains the genome information of the corresponding person. Each element , , of the regression vector models the gene’s influence on the person’s response. We ensure the uniqueness of by assuming that it is a projection onto the linear space generated by the rows of [4, 30]. For the random error vector, we make no assumptions on the probability distribution.
Our goal is to estimate the regression vector from data , or in terms of our application, predicting a person’s treatment response based on that person’s genome information. Mathematically, this amounts to estimating in terms of the personalized prediction error
(2.2) 
where is the person’s genome information.
Since the data in precision medicine is typically highdimensional, that is, the number of parameters (genes) exceeds the number of samples (patients) , we consider regularized leastsquares estimators of the form
(2.3) 
Here, denotes a function that takes into account prior information, such as sparsity or small regression vectors, and the tuning parameter balances the leastsquares term and the prior term.
Given an estimator (2.3), the main challenge is to find a good tuning parameter in line with our statistical goal. This means that we want to mimic the tuning parameter
which is the optimal tuning parameter in a given set of candidate parameters .
The optimal tuning parameter depends on the family of estimators (2.3), the unknown noise , and the patient’s genome information . The dependence on is integral to personalized medicine: different patients can respond very differently to the same treatment. But standard tuning parameter calibration such as CV schemes do not take this personalization into account but instead attempt to minimize the averaged prediction error rather than the personalized prediction error . We, therefore, develop a new prediction pipeline, that is tailored to the personalized prediction error and equip our methods with fast algorithms and sharp guarantees.
In this section, we introduce a alternative version of the ridge estimator [18] along with a calibration scheme tailored to personalized medicine. Two distinct features of the pipeline are its finitesample bounds and its computational efficiency. Our estimator is called euclidean distance ridge () and is defined as
(3.1) 
The replaces the ridge estimator’s squared prior term by its squareroot . This modification allows us to derive finitesample oracle inequalities that can be leveraged for tuning parameter calibration. At the same time, the preserves two of the ridge estimator’s most attractive features: it can model the influences of many parameters, and it can be computed without the need for elaborate descent algorithms (see Section 4).
Our first step is to establish finitesample guarantees for the . The key idea is that if the tuning parameter is large enough, the personalized prediction error (2.2) is bounded by a multiple of the tuning parameter. For ease of presentation, we assume an orthonormal design, that is, and defer the discussion of correlated covariates to the Appendix C. However, simulations with more general designs are carried out in Section 4. We establish the following guarantee for :
If , where
then it holds for orthonormal design that
Such guarantees are usually called oracle inequalities [20]. The given oracle inequality is an ideal starting point for our pipeline, because it gives us a mathematical handle on the quality of tuning parameters: a good tuning parameter should be large enough to meet the stated condition and yet small enough to give a sharp bound. The original ridge estimator lacks such inequalities for personalized prediction.
Our proof techniques, which are based on the optimality conditions of the estimator, also yield a similar bound for the original ridge estimator: if , then . The following pipeline can then be applied the same way as for the . But the crucial advantage of the ’s bound is that its righthand side is bounded by , which ensures that the results do not scale with .
The factor can be interpreted as the absolute value of the correlation between the person’s genome information and the estimator . This factor, and therefore , are included in our calibration scheme below, and our pipeline, hence, optimizes the prediction for particular study subjects.
Lemma 3.1 bounds the personalized prediction error of as a function of the tuning parameter . Given , the best tuning parameter in terms of the bound minimizes over all tuning parameters, that satisfy the lower bound
This tuning parameter value, which we call the oracle tuning parameter, can be interpreted as the closest theoretical mimic of the optimal tuning parameter .
Given a new person’s genome information , the oracle tuning parameter for personalized prediction in a candidate set is given by
The oracle tuning parameter is the best approximation of the optimal tuning parameter in view of the mathematical theory expressed by Lemma 3.1. In practice, however, one does not know the target nor the noise (typically not even its distribution), such that neither nor are accessible.
In the following our goal is, consequently, to match the prediction accuracy of (and, therefore, of essentially) with a completely datadriven scheme. Our proposal is based on pairwise tests along the tuning parameter path:
We select a tuning parameter by
(3.2) 
where the set of admissible tuning parameters is
The idea of using pairwise tests for tuning parameter calibration in highdimensional statistics has been introduced by
[6] under the name adaptive validation. A difference here is that the factors are not constant but depend both on and . The dependence on in particular reflects our focus on personalized prediction.The following result guarantees that the datadriven choice indeed provides—up to a constant factor 3—the same performance as the oracle tuning parameter .
Under the conditions of Lemma 3.1, it holds that
This result guarantees that our calibration pipeline selects an essentially optimal tuning parameter from any grid . Our pipeline is the only method for tuning parameter selection in personalized medicine that is equipped with such finitesample guarantees. It does, moreover, not require any knowledge about the regression vector nor the noise .
Our calibration method is fully adaptive to the noise distribution; however, it is instructive to exemplify our main result by considering Gaussian noise (see Appendix A.3 for the detailed derivations):
Suppose orthonormal design and Gaussian random noise . For any , it holds with probability at least that
The bound provides the usual parametric rate in the number of samples ; the factor entails the dependence on the number of parameters .
One of the main features of our pipeline is its efficient implementation. This implementation exploits a fundamental property of our estimator: there is a onetoone correspondence between the and the ridge estimator via the tuning parameters.
The ridge estimator is the regularized leastsquares estimator [18]
(4.1) 
where is a tuning parameter. Its computational efficiency, which is due to its closedform expression, provides a basis for the computation of our estimator. The closedform of the ridge estimator can be derived from the KarushKuhnTucker (KKT) conditions as
(4.2) 
noting that the matrix is always invertible if .
However, the inversion of the matrix
still deserves some thought: first, the matrix might be illconditioned, and second, the matrix needs to be computed for a range of tuning parameters rather than only for a single one. A standard approach to these two challenges is a singular value decomposition (svd) of the design matrix
.Let a singular value decomposition of be given by , where and are orthonormal matrices, and is an diagonal matrix of the corresponding singular values . Then, the ridge estimator can be computed as
(4.3) 
where is diagonal with .
The singular value decomposition of the design matrix does not depend on the tuning parameter; therefore, the ridge estimators can be readily computed for multiple tuning parameters just by substituting the value of in . The resulting set of ridge () estimators for a set of tuning parameters is called the ridge () path for .
Now, the crucial result is that the ridge estimator and the are computational siblings.
The onetoone mapping defined by
(4.4) 
transforms tuning parameters of the ridge estimator to tuning parameters of the estimator such that .
This mapping transforms, in particular, the optimal tuning parameter of the ridge estimator to a corresponding optimal tuning parameter of the estimator. More generally, it allows us to compute the estimator via the ridge estimator—see below.
The core idea of our proposed algorithm is to exploit the above onetoone mapping between estimator and ridge estimator. This correspondence allows us to compute solution paths efficiently via the ridge’s explicit formulation and svd.
First, consider a set of ridge tuning parameters and its corresponding set of tuning parameters given by
with cardinality . This set contains, in particular, the tuning parameter , whose optimality is guaranteed under Theorem 3.1. To compute the tuning parameter , given data , we first order the elements of such that
(4.5) 
The
method can then be formulated in terms of the binary random variables
for , and an algorithm is as follows:
The full pipeline can be summarized by the following four steps:
The algorithm can be readily implemented and is fast: it essentially only requires the computation of one ridge solution path (a single svd). In strong contrast, fold CV requires the computation of ridge solution paths. Consequently, the ridge estimator with can be computed approximately times faster than with fold CV, which we will confirm in the simulations. Moreover, CV still requires a tuning parameter, namely, the number of folds , while is completely parameterfree.
We evaluate the prediction performance of the method using (1) fully simulated data with random design and (2) a real data set with a simulated outcome. The results are compared to fold CV, which is a standard reference method.
The first setting is solely based on simulated data. The dimensions of the design matrix are . First, the entries of the design matrix are sampled i.i.d. from , where the mean itself is sampled according to , and the columns of the design matrix are then normalized to have Euclidean norm equal to one. The entries of the regression vector are sample i.i.d. from and then projected onto the row space of to ensure identifiability [4, 30]. The entries of the noise vector are sampled i.i.d. from , where to ensure a signaltonoise ratio of 0.5. Then, data testing vectors are sampled i.i.d. from . We generate a set of 300 tuning parameters .
(n,p)  Method  Mean error (sd)  Scaled run time 

(50,100)  166.78 (242.46)  1.00  
5fold CV  340.18 (888.28)  1.57  
10fold CV  474.58 (1220.44)  3.64  
(150,250)  433.50 (669.50)  1.00  
5fold CV  724.90 (1712.65)  3.43  
10fold CV  872.50 (2560.01)  8.04  
(200,500)  805.94 (1316.43)  1.00  
5fold CV  1098.68 (2821.35)  3.65  
10fold CV  1144.12 (2733.78)  8.44 
The results are summarized in Table 1. The mean personalized prediction errors for the testing vectors are averaged over 100 simulations as described above. The run time is shown relative to . We observe that in all considered cases, improves on CV both in terms of accuracy as well as in speed. A more detailed analysis of the scaled run time for CV relative to is shown in Figure 1. We fix with increasing and vice versa. Observing that the gain in speed is less than the factor , because the computations of the ridge estimator are then fast enough to compete with the sorting of the bounds in .
In the second setting, we simulate the outcome but based on real data as covariates. The basis is the genomic data from the application in Section 5 where the sample size or number of patient is and the number of genes is . The regression vector and the noise are then generated as in the first simulation setting above. The results are summarized in Table 2. We observe again that improves on CV both in terms of accuracy as well as in speed. The results of this section demonstrate that is a contender on data, which confirms and complements our theoretical findings from before.
Method  Mean error (sd)  Scaled run time 

33.71 (34.73)  1.00  
5fold CV  164.06 (120.06)  4.02 
10fold CV  132.90 (97.31)  9.58 
Kidney transplant recipients are known to gain substantial weight during the first year after transplantation, with a reported averaged increase of 12 kg [26]. Such substantial weight gain over a relatively short time period gives an increased risk for several adverse health effects, for instance, cardiovascular diseases, which may be detrimental for the overall health outcome of the patient. The weight gain has been explained by the use of prescribed steroids, which increase the appetite, but steroidfree protocols alone have not reduced the risk of obesity, suggesting alternative causes. Even though weight gain is fundamentally caused by a too high calorie intake relative to the energy expenditure, the heterogeneity in the individual response is substantial. Genetic variation has, therefore, been considered as a contributing factor, and several genes have been linked to obesity and weight gain.
[5] investigated if genomic data can be used to predict weight gain in kidney transplant recipients by measuring gene expression in subcutaneous adipose tissue. This was done as the relevant tissue may be easily obtained from the patients during surgery. The patients’ weight were recorded at the time of transplantation and at a 6months followup visit, resulting in a relative weight difference. The adipose tissue samples were collected from 26 transplant patients at the time of surgery, and mRNA levels were measured to obtain the gene expression profiles for gene probes using Affymetrix Human Gene 1.0 ST arrays. We restrict these covariates to the probe targets (corresponding to 1553 unique genes) identified by [5] as significantly correlated with the weight change, when adjusted for race and gender. The expression variability was also not associated with gender or race [5]. As excessive weight gain can have severe consequences for the patients, the goal is to predict the future weight increase based on the available gene expression profiles. If a large weight increase is predicted, additional measures such as diet restrictions or physiotherapy could be set into effect.


We compare the performance of our method in predicting weight gain for the kidney transplant patients to the prediction of standard ridge regression calibrated by CV. In detail, we make predictions for each patient both insample and outofsample, leaving out the observation and using the remaining data to fit the penalized regression model and select the optimal tuning parameter. Since we do not know the true parameter , we can only examine the performance of our method and CV by comparing their estimation errors, which is defined by
(5.1) 
As described in the previous section, the columns of the design matrix are normalized to have Euclidean norm one. Unlike in the Section 4.3, we take the whole gene probes into consideration for our real data analysis.
The averaged results are summarized in Table 2(a) and Table 2(b). We observe that clearly outperforms 5fold and 10fold CV for both insample and outofsample prediction of the kidney transplant data. For outofsample prediction, we observe an improvement of about in the estimation error and an improvement of
in the standard deviation compared to 5fold CV. These improvements, especially in standard deviation, reinforce the advantages of a personalized approach to tuning parameter calibration.
We have introduced a pipeline that calibrates ridge regression for personalized prediction. Its distinctive features are the finite sample guarantees (see Theorem 3.1) and the statistical and computational efficiency (see Tables 1 and 2). These features are echoed when predicting the weight gain of kidney transplant patients (see Table 3). Hence, our pipeline can improve personalized prediction and, thereby, further the cause of personalized medicine.
Assume and orthonormal design . According to the KKT conditions of the estimator, we have
Hence,
(A.1) 
Let and multiply from the left to obtain
where we use the assumption of orthonormal design. By taking absolute value on both sides and applying the triangle inequality, we derive the following bound for the personalized prediction error (2.2):
since by assumption. Finally, we obtain the bound
(A.2) 
with
∎
Let and suppose that the linear regression model (2.1) is under orthonormal design.
Bound on : First, we show that . Let
then by definition of , there must exist two tuning parameters with
such that
However, by Lemma (3.1), we have
and
Applying the triangle inequality to the above displays and combining the results yields
which leads to a contradiction to our assumption. Therefore, we obtain the following bound with respect to :
Bound on the personalized prediction error: Since , we have
Applying the triangle inequality, we ultimately find the bound
∎
For any standard normal variable , we have the following concentration bound
= for all . Now by Markov’s inequality,
For , we have
. Since the standard normal distribution is symmetric about 0, we obtain the desired result. ∎
Using this concentration bound, we derive the results of Example 3.1.
We consider the KKTconditions of (3.1) and replace the estimator with the ridge estimator to obtain
By taking the norm of both sides and with , we obtain
Thus, we can transform the ridge tuning parameter to the tuning parameter with respect to the same estimator.
Moreover, there is a onetoone relationship between and ridge. The ridge estimator in (4.2) implies that
and hence
Since
we have
and we finally conclude that when ∎
To avoid digression, we have restricted the theories in the main body of the paper to orthonormal design matrices. However, there are straightforward extensions along established lines in highdimensional theory. In general, the influence of correlation on regularized estimation has been studied extensively—see, for example, [9] and [15] for the lasso case. The most straightforward extension of our theories goes via the
restricted eigenvalue introduced in
[6]. This condition allows for design matrices, that satisfy for certain . We omit the details; importantly, our simulations demonstrate that our method provides accurate prediction far beyond orthonormal design.Statistics for highdimensional data: Methods, theory and applications
. Springer Series in Statistics. 2011.Contentbased tag propagation and tensor factorization for personalized item recommendation based on social tagging.
ACM Trans. Interact. Intell. Syst., 3(4):1–26, 2014.