Personalized medicine has become an important part of medicine, for instance predicting individual drug responses based on genomic information. However, many current statistical methods are not tailored to this task, because they overlook the individual heterogeneity of patients. In this paper, we look at personalized medicine from a linear regression standpoint. We introduce a alternative version of the ridge estimator and target individuals by establishing a tuning parameter calibration scheme that minimizes prediction errors of individual patients. In stark contrast, classical schemes such as cross-validation minimize prediction errors only on average. We show that our pipeline is optimal in terms of oracle inequalities, fast, and highly effective both in simulations and on real data.READ FULL TEXT VIEW PDF
In the last decade, improvements in genomic, transcriptomic, and proteomic technologies have enabled personalized medicine (also called precision medicine) to become an essential part of contemporary medicine. Personalized medicine takes into account individual variability in genes, proteins, environment, and lifestyle to decide on optimal disease treatment and prevention . The use of a patient’s genetic and epigenetic information has already proven to be highly effective to tailor drug therapies or preventive care in a number of applications, such as breast , prostate , ovarian , and pancreatic cancers , cardiovascular disease , cystic fibrosis , and psychiatry . The subfield of pharmacogenomics studies specifically how genes affect a person’s response to particular drugs to develop more efficient and safer medications .
Genomic, epigenomic, and transcriptomic data used in precision medicine, such as gene expression, copy number variants, or methylation levels are typically high-dimensional with a number of variables that rivals or exceeds the number of observations. Using such data to estimate and predict treatment response or risk of complications, therefore, requires regularization typically by the norm (lasso), the norm (ridge), or other terms. Ridge regression  yields good predictive performance for dense or non-sparse effects, that is, for outcomes related to systemic conditions, as the method does not perform variable selection. Ridge regression has become a standard tool for prediction based on genomic data, and it has been shown that ridge regression can outmatch competing methods for survival based on gene expression [2, 8].
However, regularization always introduces one or more tuning parameters. These tuning parameters are usually calibrated based on the averaged prediction risks. Most commonly used, -fold cross-validation (CV) divides the data into folds (typically ), predicts each fold out-of-sample, averages over all folds for a range of tuning parameters, and selects the value with the lowest averaged error [32, 12]. But this averaging removes the inherent individual heterogeneity of the patients and can, therefore, result in sub-optimal prediction performance. This may ultimately lead to unsuitable treatment, administration of improper medication with adverse side effects, or lack of preventive care .
Hence, rather than minimizing an averaged prediction error, our goal is to minimize each patient’s individual (“personalized”) prediction error. A naïve two-stage personalized procedure for ridge regression was recently proposed by . In this paper, we introduce a alternative ridge estimator, referred to as euclidean distance ridge (), and calibrate the tuning parameter using adaptive validation [21, 31] individually for each patient. We show that this approach offers compelling theory, fast computations, and accurate prediction on data.
The specific motivation for our method is unravelling the relationship between gene expression and weight gain in kidney transplant recipients . Kidney transplant recipients are known to often gain substantial weight during the first year after transplantation, which can result in adverse health effects . Individual predictions of this weight gain could help in providing each patient with the best possible care.
The remainder of this paper is organized as follows: We introduce the linear regression framework and the problem statement in Section 2. We then introduce the main methodology of our approach, and present theoretical guarantees in Section 3. In addition, we discuss the algorithm and analyze its the performance through simulation studies using synthetic and real data in Section 4. We further apply our pipeline to kidney transplant data in Section 5. Finally, we discuss the results in Section 6 and defer the proofs to the Appendix. Our code is available at https://github.com/LedererLab/personalized_medicine.
We consider data () that follows a linear regression model
Let denote the number of parameters, e.g. genes or genetic probes, and the number of samples, e.g. patients, then
is the vector of outcomes,, for example, a person’s response to a treatment. We let denote the design matrix, where each row , , contains the genome information of the corresponding person. Each element , , of the regression vector models the gene’s influence on the person’s response. We ensure the uniqueness of by assuming that it is a projection onto the linear space generated by the rows of [4, 30]. For the random error vector
, we make no assumptions on the probability distribution.
Our goal is to estimate the regression vector from data , or in terms of our application, predicting a person’s treatment response based on that person’s genome information. Mathematically, this amounts to estimating in terms of the personalized prediction error
where is the person’s genome information.
Since the data in precision medicine is typically high-dimensional, that is, the number of parameters (genes) exceeds the number of samples (patients) , we consider regularized least-squares estimators of the form
Here, denotes a function that takes into account prior information, such as sparsity or small regression vectors, and the tuning parameter balances the least-squares term and the prior term.
Given an estimator (2.3), the main challenge is to find a good tuning parameter in line with our statistical goal. This means that we want to mimic the tuning parameter
which is the optimal tuning parameter in a given set of candidate parameters .
The optimal tuning parameter depends on the family of estimators (2.3), the unknown noise , and the patient’s genome information . The dependence on is integral to personalized medicine: different patients can respond very differently to the same treatment. But standard tuning parameter calibration such as CV schemes do not take this personalization into account but instead attempt to minimize the averaged prediction error rather than the personalized prediction error . We, therefore, develop a new prediction pipeline, that is tailored to the personalized prediction error and equip our methods with fast algorithms and sharp guarantees.
In this section, we introduce a alternative version of the ridge estimator  along with a calibration scheme tailored to personalized medicine. Two distinct features of the pipeline are its finite-sample bounds and its computational efficiency. Our estimator is called euclidean distance ridge () and is defined as
The replaces the ridge estimator’s squared prior term by its square-root . This modification allows us to derive finite-sample oracle inequalities that can be leveraged for tuning parameter calibration. At the same time, the preserves two of the ridge estimator’s most attractive features: it can model the influences of many parameters, and it can be computed without the need for elaborate descent algorithms (see Section 4).
Our first step is to establish finite-sample guarantees for the . The key idea is that if the tuning parameter is large enough, the personalized prediction error (2.2) is bounded by a multiple of the tuning parameter. For ease of presentation, we assume an orthonormal design, that is, and defer the discussion of correlated covariates to the Appendix C. However, simulations with more general designs are carried out in Section 4. We establish the following guarantee for :
If , where
then it holds for orthonormal design that
Such guarantees are usually called oracle inequalities . The given oracle inequality is an ideal starting point for our pipeline, because it gives us a mathematical handle on the quality of tuning parameters: a good tuning parameter should be large enough to meet the stated condition and yet small enough to give a sharp bound. The original ridge estimator lacks such inequalities for personalized prediction.
Our proof techniques, which are based on the optimality conditions of the estimator, also yield a similar bound for the original ridge estimator: if , then . The following pipeline can then be applied the same way as for the . But the crucial advantage of the ’s bound is that its right-hand side is bounded by , which ensures that the results do not scale with .
The factor can be interpreted as the absolute value of the correlation between the person’s genome information and the estimator . This factor, and therefore , are included in our calibration scheme below, and our pipeline, hence, optimizes the prediction for particular study subjects.
Lemma 3.1 bounds the personalized prediction error of as a function of the tuning parameter . Given , the best tuning parameter in terms of the bound minimizes over all tuning parameters, that satisfy the lower bound
This tuning parameter value, which we call the oracle tuning parameter, can be interpreted as the closest theoretical mimic of the optimal tuning parameter .
Given a new person’s genome information , the oracle tuning parameter for personalized prediction in a candidate set is given by
The oracle tuning parameter is the best approximation of the optimal tuning parameter in view of the mathematical theory expressed by Lemma 3.1. In practice, however, one does not know the target nor the noise (typically not even its distribution), such that neither nor are accessible.
In the following our goal is, consequently, to match the prediction accuracy of (and, therefore, of essentially) with a completely data-driven scheme. Our proposal is based on pairwise tests along the tuning parameter path:
We select a tuning parameter by
where the set of admissible tuning parameters is
The idea of using pairwise tests for tuning parameter calibration in high-dimensional statistics has been introduced by under the name adaptive validation. A difference here is that the factors are not constant but depend both on and . The dependence on in particular reflects our focus on personalized prediction.
The following result guarantees that the data-driven choice indeed provides—up to a constant factor 3—the same performance as the oracle tuning parameter .
Under the conditions of Lemma 3.1, it holds that
This result guarantees that our calibration pipeline selects an essentially optimal tuning parameter from any grid . Our pipeline is the only method for tuning parameter selection in personalized medicine that is equipped with such finite-sample guarantees. It does, moreover, not require any knowledge about the regression vector nor the noise .
Our calibration method is fully adaptive to the noise distribution; however, it is instructive to exemplify our main result by considering Gaussian noise (see Appendix A.3 for the detailed derivations):
Suppose orthonormal design and Gaussian random noise . For any , it holds with probability at least that
The bound provides the usual parametric rate in the number of samples ; the factor entails the dependence on the number of parameters .
One of the main features of our pipeline is its efficient implementation. This implementation exploits a fundamental property of our estimator: there is a one-to-one correspondence between the and the ridge estimator via the tuning parameters.
The ridge estimator is the -regularized least-squares estimator 
where is a tuning parameter. Its computational efficiency, which is due to its closed-form expression, provides a basis for the computation of our estimator. The closed-form of the ridge estimator can be derived from the Karush-Kuhn-Tucker (KKT) conditions as
noting that the matrix is always invertible if .
However, the inversion of the matrix
still deserves some thought: first, the matrix might be ill-conditioned, and second, the matrix needs to be computed for a range of tuning parameters rather than only for a single one. A standard approach to these two challenges is a singular value decomposition (svd) of the design matrix.
Let a singular value decomposition of be given by , where and are orthonormal matrices, and is an diagonal matrix of the corresponding singular values . Then, the ridge estimator can be computed as
where is diagonal with .
The singular value decomposition of the design matrix does not depend on the tuning parameter; therefore, the ridge estimators can be readily computed for multiple tuning parameters just by substituting the value of in . The resulting set of ridge () estimators for a set of tuning parameters is called the ridge () path for .
Now, the crucial result is that the ridge estimator and the are computational siblings.
The one-to-one mapping defined by
transforms tuning parameters of the ridge estimator to tuning parameters of the estimator such that .
This mapping transforms, in particular, the optimal tuning parameter of the ridge estimator to a corresponding optimal tuning parameter of the estimator. More generally, it allows us to compute the estimator via the ridge estimator—see below.
The core idea of our proposed algorithm is to exploit the above one-to-one mapping between estimator and ridge estimator. This correspondence allows us to compute solution paths efficiently via the ridge’s explicit formulation and svd.
First, consider a set of ridge tuning parameters and its corresponding set of tuning parameters given by
with cardinality . This set contains, in particular, the tuning parameter , whose optimality is guaranteed under Theorem 3.1. To compute the tuning parameter , given data , we first order the elements of such that
method can then be formulated in terms of the binary random variables
for , and an algorithm is as follows:
The full pipeline can be summarized by the following four steps:
The algorithm can be readily implemented and is fast: it essentially only requires the computation of one ridge solution path (a single svd). In strong contrast, -fold CV requires the computation of ridge solution paths. Consequently, the ridge estimator with can be computed approximately times faster than with -fold CV, which we will confirm in the simulations. Moreover, CV still requires a tuning parameter, namely, the number of folds , while is completely parameter-free.
We evaluate the prediction performance of the method using (1) fully simulated data with random design and (2) a real data set with a simulated outcome. The results are compared to -fold CV, which is a standard reference method.
The first setting is solely based on simulated data. The dimensions of the design matrix are . First, the entries of the design matrix are sampled i.i.d. from , where the mean itself is sampled according to , and the columns of the design matrix are then normalized to have Euclidean norm equal to one. The entries of the regression vector are sample i.i.d. from and then projected onto the row space of to ensure identifiability [4, 30]. The entries of the noise vector are sampled i.i.d. from , where to ensure a signal-to-noise ratio of 0.5. Then, data testing vectors are sampled i.i.d. from . We generate a set of 300 tuning parameters .
|(n,p)||Method||Mean error (sd)||Scaled run time|
|5-fold CV||340.18 (888.28)||1.57|
|10-fold CV||474.58 (1220.44)||3.64|
|5-fold CV||724.90 (1712.65)||3.43|
|10-fold CV||872.50 (2560.01)||8.04|
|5-fold CV||1098.68 (2821.35)||3.65|
|10-fold CV||1144.12 (2733.78)||8.44|
The results are summarized in Table 1. The mean personalized prediction errors for the testing vectors are averaged over 100 simulations as described above. The run time is shown relative to . We observe that in all considered cases, improves on CV both in terms of accuracy as well as in speed. A more detailed analysis of the scaled run time for CV relative to is shown in Figure 1. We fix with increasing and vice versa. Observing that the gain in speed is less than the factor , because the computations of the ridge estimator are then fast enough to compete with the sorting of the bounds in .
In the second setting, we simulate the outcome but based on real data as covariates. The basis is the genomic data from the application in Section 5 where the sample size or number of patient is and the number of genes is . The regression vector and the noise are then generated as in the first simulation setting above. The results are summarized in Table 2. We observe again that improves on CV both in terms of accuracy as well as in speed. The results of this section demonstrate that is a contender on data, which confirms and complements our theoretical findings from before.
|Method||Mean error (sd)||Scaled run time|
|5-fold CV||164.06 (120.06)||4.02|
|10-fold CV||132.90 (97.31)||9.58|
Kidney transplant recipients are known to gain substantial weight during the first year after transplantation, with a reported averaged increase of 12 kg . Such substantial weight gain over a relatively short time period gives an increased risk for several adverse health effects, for instance, cardiovascular diseases, which may be detrimental for the overall health outcome of the patient. The weight gain has been explained by the use of prescribed steroids, which increase the appetite, but steroid-free protocols alone have not reduced the risk of obesity, suggesting alternative causes. Even though weight gain is fundamentally caused by a too high calorie intake relative to the energy expenditure, the heterogeneity in the individual response is substantial. Genetic variation has, therefore, been considered as a contributing factor, and several genes have been linked to obesity and weight gain.
 investigated if genomic data can be used to predict weight gain in kidney transplant recipients by measuring gene expression in subcutaneous adipose tissue. This was done as the relevant tissue may be easily obtained from the patients during surgery. The patients’ weight were recorded at the time of transplantation and at a 6-months follow-up visit, resulting in a relative weight difference. The adipose tissue samples were collected from 26 transplant patients at the time of surgery, and mRNA levels were measured to obtain the gene expression profiles for gene probes using Affymetrix Human Gene 1.0 ST arrays. We restrict these covariates to the probe targets (corresponding to 1553 unique genes) identified by  as significantly correlated with the weight change, when adjusted for race and gender. The expression variability was also not associated with gender or race . As excessive weight gain can have severe consequences for the patients, the goal is to predict the future weight increase based on the available gene expression profiles. If a large weight increase is predicted, additional measures such as diet restrictions or physiotherapy could be set into effect.
We compare the performance of our method in predicting weight gain for the kidney transplant patients to the prediction of standard ridge regression calibrated by CV. In detail, we make predictions for each patient both in-sample and out-of-sample, leaving out the observation and using the remaining data to fit the penalized regression model and select the optimal tuning parameter. Since we do not know the true parameter , we can only examine the performance of our method and CV by comparing their estimation errors, which is defined by
As described in the previous section, the columns of the design matrix are normalized to have Euclidean norm one. Unlike in the Section 4.3, we take the whole gene probes into consideration for our real data analysis.
The averaged results are summarized in Table 2(a) and Table 2(b). We observe that clearly outperforms 5-fold and 10-fold CV for both in-sample and out-of-sample prediction of the kidney transplant data. For out-of-sample prediction, we observe an improvement of about in the estimation error and an improvement of
in the standard deviation compared to 5-fold CV. These improvements, especially in standard deviation, reinforce the advantages of a personalized approach to tuning parameter calibration.
We have introduced a pipeline that calibrates ridge regression for personalized prediction. Its distinctive features are the finite sample guarantees (see Theorem 3.1) and the statistical and computational efficiency (see Tables 1 and 2). These features are echoed when predicting the weight gain of kidney transplant patients (see Table 3). Hence, our pipeline can improve personalized prediction and, thereby, further the cause of personalized medicine.
Assume and orthonormal design . According to the KKT conditions of the estimator, we have
Let and multiply from the left to obtain
where we use the assumption of orthonormal design. By taking absolute value on both sides and applying the triangle inequality, we derive the following bound for the personalized prediction error (2.2):
since by assumption. Finally, we obtain the bound
Let and suppose that the linear regression model (2.1) is under orthonormal design.
Bound on : First, we show that . Let
then by definition of , there must exist two tuning parameters with
However, by Lemma (3.1), we have
Applying the triangle inequality to the above displays and combining the results yields
which leads to a contradiction to our assumption. Therefore, we obtain the following bound with respect to :
Bound on the personalized prediction error: Since , we have
Applying the triangle inequality, we ultimately find the bound
For any standard normal variable , we have the following concentration bound
= for all . Now by Markov’s inequality,
For , we have
. Since the standard normal distribution is symmetric about 0, we obtain the desired result. ∎
Using this concentration bound, we derive the results of Example 3.1.
We consider the KKT-conditions of (3.1) and replace the estimator with the ridge estimator to obtain
By taking the -norm of both sides and with , we obtain
Thus, we can transform the ridge tuning parameter to the tuning parameter with respect to the same estimator.
Moreover, there is a one-to-one relationship between and ridge. The ridge estimator in (4.2) implies that
and we finally conclude that when ∎
To avoid digression, we have restricted the theories in the main body of the paper to orthonormal design matrices. However, there are straightforward extensions along established lines in high-dimensional theory. In general, the influence of correlation on regularized estimation has been studied extensively—see, for example,  and  for the lasso case. The most straightforward extension of our theories goes via the
-restricted eigenvalue introduced in. This condition allows for design matrices, that satisfy for certain . We omit the details; importantly, our simulations demonstrate that our method provides accurate prediction far beyond orthonormal design.
Statistics for high-dimensional data: Methods, theory and applications. Springer Series in Statistics. 2011.
Content-based tag propagation and tensor factorization for personalized item recommendation based on social tagging.ACM Trans. Interact. Intell. Syst., 3(4):1–26, 2014.