Tuning parameter calibration for prediction in personalized medicine

Personalized medicine has become an important part of medicine, for instance predicting individual drug responses based on genomic information. However, many current statistical methods are not tailored to this task, because they overlook the individual heterogeneity of patients. In this paper, we look at personalized medicine from a linear regression standpoint. We introduce a alternative version of the ridge estimator and target individuals by establishing a tuning parameter calibration scheme that minimizes prediction errors of individual patients. In stark contrast, classical schemes such as cross-validation minimize prediction errors only on average. We show that our pipeline is optimal in terms of oracle inequalities, fast, and highly effective both in simulations and on real data.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

07/29/2016

The Phylogenetic LASSO and the Microbiome

Scientific investigations that incorporate next generation sequencing in...
10/11/2021

CAPITAL: Optimal Subgroup Identification via Constrained Policy Tree Search

Personalized medicine, a paradigm of medicine tailored to a patient's ch...
10/01/2016

Tuning Parameter Calibration in High-dimensional Logistic Regression With Theoretical Guarantees

Feature selection is a standard approach to understanding and modeling h...
09/10/2009

Data-driven calibration of linear estimators with minimal penalties

This paper tackles the problem of selecting among several linear estimat...
08/25/2019

The Ridge Path Estimator for Linear Instrumental Variables

This paper presents the asymptotic behavior of a linear instrumental var...
01/30/2013

A note on selection stability: combining stability and prediction

Recently, many regularized procedures have been proposed for variable se...

Code Repositories

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

In the last decade, improvements in genomic, transcriptomic, and proteomic technologies have enabled personalized medicine (also called precision medicine) to become an essential part of contemporary medicine. Personalized medicine takes into account individual variability in genes, proteins, environment, and lifestyle to decide on optimal disease treatment and prevention [14]. The use of a patient’s genetic and epigenetic information has already proven to be highly effective to tailor drug therapies or preventive care in a number of applications, such as breast [7], prostate [23], ovarian [17], and pancreatic cancers [24], cardiovascular disease [11], cystic fibrosis [36], and psychiatry [10]. The subfield of pharmacogenomics studies specifically how genes affect a person’s response to particular drugs to develop more efficient and safer medications [37].

Genomic, epigenomic, and transcriptomic data used in precision medicine, such as gene expression, copy number variants, or methylation levels are typically high-dimensional with a number of variables that rivals or exceeds the number of observations. Using such data to estimate and predict treatment response or risk of complications, therefore, requires regularization typically by the norm (lasso), the norm (ridge), or other terms. Ridge regression [18] yields good predictive performance for dense or non-sparse effects, that is, for outcomes related to systemic conditions, as the method does not perform variable selection. Ridge regression has become a standard tool for prediction based on genomic data, and it has been shown that ridge regression can outmatch competing methods for survival based on gene expression [2, 8].

However, regularization always introduces one or more tuning parameters. These tuning parameters are usually calibrated based on the averaged prediction risks. Most commonly used, -fold cross-validation (CV) divides the data into  folds (typically ), predicts each fold out-of-sample, averages over all folds for a range of tuning parameters, and selects the value with the lowest averaged error [32, 12]. But this averaging removes the inherent individual heterogeneity of the patients and can, therefore, result in sub-optimal prediction performance. This may ultimately lead to unsuitable treatment, administration of improper medication with adverse side effects, or lack of preventive care [14].

Hence, rather than minimizing an averaged prediction error, our goal is to minimize each patient’s individual (“personalized”) prediction error. A naïve two-stage personalized procedure for ridge regression was recently proposed by [16]. In this paper, we introduce a alternative ridge estimator, referred to as euclidean distance ridge (), and calibrate the tuning parameter using adaptive validation [21, 31] individually for each patient. We show that this approach offers compelling theory, fast computations, and accurate prediction on data.

The specific motivation for our method is unravelling the relationship between gene expression and weight gain in kidney transplant recipients [5]. Kidney transplant recipients are known to often gain substantial weight during the first year after transplantation, which can result in adverse health effects [26]. Individual predictions of this weight gain could help in providing each patient with the best possible care.

The remainder of this paper is organized as follows: We introduce the linear regression framework and the problem statement in Section 2. We then introduce the main methodology of our approach, and present theoretical guarantees in Section 3. In addition, we discuss the algorithm and analyze its the performance through simulation studies using synthetic and real data in Section 4. We further apply our pipeline to kidney transplant data in Section 5. Finally, we discuss the results in Section 6 and defer the proofs to the Appendix. Our code is available at https://github.com/LedererLab/personalized_medicine.

2 Problem Setup

We consider data () that follows a linear regression model

(2.1)

Let denote the number of parameters, e.g. genes or genetic probes, and the number of samples, e.g. patients, then

is the vector of outcomes,

, for example, a person’s response to a treatment. We let  denote the design matrix, where each row , , contains the genome information of the corresponding person. Each element , , of the regression vector  models the gene’s influence on the person’s response. We ensure the uniqueness of  by assuming that it is a projection onto the linear space generated by the  rows of  [4, 30]. For the random error vector  

, we make no assumptions on the probability distribution.

Our goal is to estimate the regression vector  from data , or in terms of our application, predicting a person’s treatment response based on that person’s genome information. Mathematically, this amounts to estimating in terms of the personalized prediction error

(2.2)

where is the person’s genome information.

Since the data in precision medicine is typically high-dimensional, that is, the number of parameters (genes)  exceeds the number of samples (patients) , we consider regularized least-squares estimators of the form

(2.3)

Here, denotes a function that takes into account prior information, such as sparsity or small regression vectors, and the tuning parameter  balances the least-squares term and the prior term.

Given an estimator (2.3), the main challenge is to find a good tuning parameter in line with our statistical goal. This means that we want to mimic the tuning parameter

which is the optimal tuning parameter in a given set of candidate parameters .

The optimal tuning parameter  depends on the family of estimators (2.3), the unknown noise , and the patient’s genome information . The dependence on  is integral to personalized medicine: different patients can respond very differently to the same treatment. But standard tuning parameter calibration such as CV schemes do not take this personalization into account but instead attempt to minimize the averaged prediction error rather than the personalized prediction error . We, therefore, develop a new prediction pipeline, that is tailored to the personalized prediction error and equip our methods with fast algorithms and sharp guarantees.

3 Methodology

In this section, we introduce a alternative version of the ridge estimator [18] along with a calibration scheme tailored to personalized medicine. Two distinct features of the pipeline are its finite-sample bounds and its computational efficiency. Our estimator is called euclidean distance ridge () and is defined as

(3.1)

The  replaces the ridge estimator’s squared prior term by its square-root . This modification allows us to derive finite-sample oracle inequalities that can be leveraged for tuning parameter calibration. At the same time, the preserves two of the ridge estimator’s most attractive features: it can model the influences of many parameters, and it can be computed without the need for elaborate descent algorithms (see Section 4).

Our first step is to establish finite-sample guarantees for the . The key idea is that if the tuning parameter is large enough, the personalized prediction error (2.2) is bounded by a multiple of the tuning parameter. For ease of presentation, we assume an orthonormal design, that is, and defer the discussion of correlated covariates to the Appendix C. However, simulations with more general designs are carried out in Section 4. We establish the following guarantee for :

Lemma 3.1 (Oracle inequality for ).

If , where

then it holds for orthonormal design that

Such guarantees are usually called oracle inequalities [20]. The given oracle inequality is an ideal starting point for our pipeline, because it gives us a mathematical handle on the quality of tuning parameters: a good tuning parameter should be large enough to meet the stated condition and yet small enough to give a sharp bound. The original ridge estimator lacks such inequalities for personalized prediction.

Our proof techniques, which are based on the optimality conditions of the estimator, also yield a similar bound for the original ridge estimator: if , then . The following pipeline can then be applied the same way as for the . But the crucial advantage of the ’s bound is that its right-hand side is bounded by , which ensures that the results do not scale with .

The factor can be interpreted as the absolute value of the correlation between the person’s genome information  and the estimator . This factor, and therefore , are included in our calibration scheme below, and our pipeline, hence, optimizes the prediction for particular study subjects.

Lemma 3.1 bounds the personalized prediction error of as a function of the tuning parameter . Given , the best tuning parameter in terms of the bound minimizes over all tuning parameters, that satisfy the lower bound

This tuning parameter value, which we call the oracle tuning parameter, can be interpreted as the closest theoretical mimic of the optimal tuning parameter .

Definition 3.1 (Oracle tuning parameter for personalized prediction).

Given a new person’s genome information , the oracle tuning parameter for personalized prediction in a candidate set  is given by

The oracle tuning parameter  is the best approximation of the optimal tuning parameter  in view of the mathematical theory expressed by Lemma 3.1. In practice, however, one does not know the target  nor the noise  (typically not even its distribution), such that neither  nor  are accessible.

In the following our goal is, consequently, to match the prediction accuracy of  (and, therefore, of  essentially) with a completely data-driven scheme. Our proposal is based on pairwise tests along the tuning parameter path:

Definition 3.2 ( : Personalized adaptive validation for ).

We select a tuning parameter  by

(3.2)

where the set of admissible tuning parameters is

The idea of using pairwise tests for tuning parameter calibration in high-dimensional statistics has been introduced by 

[6] under the name adaptive validation. A difference here is that the factors are not constant but depend both on  and . The dependence on  in particular reflects our focus on personalized prediction.

The following result guarantees that the data-driven choice indeed provides—up to a constant factor 3—the same performance as the oracle tuning parameter .

Theorem 3.1 (Optimality for personalized adaptive validation for ).

Under the conditions of Lemma 3.1, it holds that

This result guarantees that our calibration pipeline selects an essentially optimal tuning parameter from any grid . Our pipeline is the only method for tuning parameter selection in personalized medicine that is equipped with such finite-sample guarantees. It does, moreover, not require any knowledge about the regression vector  nor the noise .

Our calibration method is fully adaptive to the noise distribution; however, it is instructive to exemplify our main result by considering Gaussian noise (see Appendix A.3 for the detailed derivations):

Example 3.1 (Gaussian noise).

Suppose orthonormal design and Gaussian random noise . For any , it holds with probability at least that

The bound provides the usual parametric rate  in the number of samples ; the factor  entails the dependence on the number of parameters .

4 Algorithm and Numerical Analysis

One of the main features of our pipeline is its efficient implementation. This implementation exploits a fundamental property of our estimator: there is a one-to-one correspondence between the  and the ridge estimator via the tuning parameters.

4.1 Connections to the ridge estimator

The ridge estimator is the -regularized least-squares estimator [18]

(4.1)

where is a tuning parameter. Its computational efficiency, which is due to its closed-form expression, provides a basis for the computation of our  estimator. The closed-form of the ridge estimator can be derived from the Karush-Kuhn-Tucker (KKT) conditions as

(4.2)

noting that the matrix is always invertible if .

However, the inversion of the matrix

still deserves some thought: first, the matrix might be ill-conditioned, and second, the matrix needs to be computed for a range of tuning parameters rather than only for a single one. A standard approach to these two challenges is a singular value decomposition (svd) of the design matrix 

.

Lemma 4.1 (Computation of the ridge estimator through singular value decomposition).

Let a singular value decomposition of  be given by , where and are orthonormal matrices, and is an diagonal matrix of the corresponding singular values . Then, the ridge estimator can be computed as

(4.3)

where is diagonal with .

The singular value decomposition of the design matrix does not depend on the tuning parameter; therefore, the ridge estimators  can be readily computed for multiple tuning parameters just by substituting the value of  in . The resulting set of ridge () estimators for a set of tuning parameters  is called the ridge () path for .

Now, the crucial result is that the ridge estimator and the  are computational siblings.

Theorem 4.1 (One-to-one mapping between tuning parameters).

The one-to-one mapping defined by

(4.4)

transforms tuning parameters  of the ridge estimator to tuning parameters  of the  estimator such that .

This mapping transforms, in particular, the optimal tuning parameter of the ridge estimator to a corresponding optimal tuning parameter of the  estimator. More generally, it allows us to compute the estimator via the ridge estimator—see below.

4.2 Algorithm

The core idea of our proposed algorithm is to exploit the above one-to-one mapping between  estimator and ridge estimator. This correspondence allows us to compute solution paths efficiently via the ridge’s explicit formulation and svd.

First, consider a set of ridge tuning parameters  and its corresponding set of  tuning parameters given by

with cardinality . This set contains, in particular, the tuning parameter , whose optimality is guaranteed under Theorem 3.1. To compute the tuning parameter , given data , we first order the elements of  such that

(4.5)

The

method can then be formulated in terms of the binary random variables

for , and an algorithm is as follows:

Input: Result: Set initial index: while  and  do
       Update index:
end while
Set output:
Algorithm 1 Algorithm for of Definition 3.2.

The full pipeline can be summarized by the following four steps:

  1. [leftmargin=1.8cm]

  2. Generate a set  of tuning parameters for ridge regression.

  3. Compute the ridge solution path with respect to  by using (4.3).

  4. Transform the ridge tuning parameters to their  counterparts  using (4.4) and sort the tuning parameters according to (4.5).

  5. Use the  method (Algorithm 1) to compute the tuning parameter  and map it back to its ridge counterpart .

The algorithm can be readily implemented and is fast: it essentially only requires the computation of one ridge solution path (a single svd). In strong contrast, -fold CV requires the computation of ridge solution paths. Consequently, the ridge estimator with  can be computed approximately  times faster than with -fold CV, which we will confirm in the simulations. Moreover, CV still requires a tuning parameter, namely, the number of folds , while  is completely parameter-free.

4.3 Simulation Study

We evaluate the prediction performance of the  method using (1) fully simulated data with random design and (2) a real data set with a simulated outcome. The results are compared to -fold CV, which is a standard reference method.

The first setting is solely based on simulated data. The dimensions of the design matrix are . First, the entries of the design matrix  are sampled i.i.d. from , where the mean itself is sampled according to , and the columns of the design matrix are then normalized to have Euclidean norm equal to one. The entries of the regression vector  are sample i.i.d. from and then projected onto the row space of  to ensure identifiability [4, 30]. The entries of the noise vector  are sampled i.i.d. from , where to ensure a signal-to-noise ratio of 0.5. Then,  data testing vectors  are sampled i.i.d. from . We generate a set of 300 tuning parameters .

(n,p) Method Mean error (sd) Scaled run time
(50,100) 166.78 0(242.46) 1.00
5-fold CV 340.18 0(888.28) 1.57
10-fold CV 474.58 (1220.44) 3.64
(150,250) 433.50 0(669.50) 1.00
5-fold CV 724.90 (1712.65) 3.43
10-fold CV 872.50 (2560.01) 8.04
(200,500) 805.94 (1316.43) 1.00
5-fold CV 1098.68 (2821.35) 3.65
10-fold CV 1144.12 (2733.78) 8.44
Table 1: For the first simulation setting, which entirely consists of artificial data,  outperforms -fold and -fold CV in accuracy and speed.

The results are summarized in Table 1. The mean personalized prediction errors for the testing vectors are averaged over 100 simulations as described above. The run time is shown relative to . We observe that in all considered cases,  improves on CV both in terms of accuracy as well as in speed. A more detailed analysis of the scaled run time for CV relative to is shown in Figure 1. We fix with increasing and vice versa. Observing that the gain in speed is less than the factor , because the computations of the ridge estimator are then fast enough to compete with the sorting of the bounds in .

Figure 1: Run time of 10-fold CV scaled by the run time of  for fixed number of observations and fixed number of parameters with and increasing, respectively. We observe that is faster than -fold CV.

In the second setting, we simulate the outcome but based on real data as covariates. The basis is the genomic data from the application in Section 5 where the sample size or number of patient is  and the number of genes is . The regression vector and the noise are then generated as in the first simulation setting above. The results are summarized in Table 2. We observe again that  improves on CV both in terms of accuracy as well as in speed. The results of this section demonstrate that  is a contender on data, which confirms and complements our theoretical findings from before.

Method Mean error (sd) Scaled run time
033.71 0(34.73) 1.00
5-fold CV 164.06 (120.06) 4.02
10-fold CV 132.90 0(97.31) 9.58
Table 2: In the second simulation setting, which consists of a mixture of real and artificial data,  outperforms -fold and -fold CV again both in accuracy and speed.

5 Example: Application to kidney transplant patient data

Kidney transplant recipients are known to gain substantial weight during the first year after transplantation, with a reported averaged increase of 12 kg [26]. Such substantial weight gain over a relatively short time period gives an increased risk for several adverse health effects, for instance, cardiovascular diseases, which may be detrimental for the overall health outcome of the patient. The weight gain has been explained by the use of prescribed steroids, which increase the appetite, but steroid-free protocols alone have not reduced the risk of obesity, suggesting alternative causes. Even though weight gain is fundamentally caused by a too high calorie intake relative to the energy expenditure, the heterogeneity in the individual response is substantial. Genetic variation has, therefore, been considered as a contributing factor, and several genes have been linked to obesity and weight gain.

[5] investigated if genomic data can be used to predict weight gain in kidney transplant recipients by measuring gene expression in subcutaneous adipose tissue. This was done as the relevant tissue may be easily obtained from the patients during surgery. The patients’ weight were recorded at the time of transplantation and at a 6-months follow-up visit, resulting in a relative weight difference. The adipose tissue samples were collected from 26 transplant patients at the time of surgery, and mRNA levels were measured to obtain the gene expression profiles for gene probes using Affymetrix Human Gene 1.0 ST arrays. We restrict these covariates to the probe targets (corresponding to 1553 unique genes) identified by [5] as significantly correlated with the weight change, when adjusted for race and gender. The expression variability was also not associated with gender or race [5]. As excessive weight gain can have severe consequences for the patients, the goal is to predict the future weight increase based on the available gene expression profiles. If a large weight increase is predicted, additional measures such as diet restrictions or physiotherapy could be set into effect.

Method Mean error (sd) Scaled run time
0.0049 (0.0121) 1.00
5-fold CV 0.0646 (0.0540) 1.19
10-fold CV 0.0666 (0.0584) 3.04
(a) In-sample prediction
Method Mean error (sd) Scaled run time
0.0622 (0.0309) 1.00
5-fold CV 0.0672 (0.0415) 1.10
10-fold CV 0.0678 (0.0475) 2.82
(b) Leave-one-out prediction
Table 3: In the kidney transplant data, regardless of in-sample or leave-one-out prediction,  outperforms -fold and -fold CV again both in accuracy and speed.

We compare the performance of our method in predicting weight gain for the kidney transplant patients to the prediction of standard ridge regression calibrated by CV. In detail, we make predictions for each patient both in-sample and out-of-sample, leaving out the observation and using the remaining data to fit the penalized regression model and select the optimal tuning parameter. Since we do not know the true parameter , we can only examine the performance of our method and CV by comparing their estimation errors, which is defined by

(5.1)

As described in the previous section, the columns of the design matrix are normalized to have Euclidean norm one. Unlike in the Section 4.3, we take the whole gene probes into consideration for our real data analysis.

The averaged results are summarized in Table 2(a) and Table 2(b). We observe that clearly outperforms 5-fold and 10-fold CV for both in-sample and out-of-sample prediction of the kidney transplant data. For out-of-sample prediction, we observe an improvement of about in the estimation error and an improvement of

in the standard deviation compared to 5-fold CV. These improvements, especially in standard deviation, reinforce the advantages of a personalized approach to tuning parameter calibration.

6 Conclusion

We have introduced a pipeline that calibrates ridge regression for personalized prediction. Its distinctive features are the finite sample guarantees (see Theorem 3.1) and the statistical and computational efficiency (see Tables 1 and 2). These features are echoed when predicting the weight gain of kidney transplant patients (see Table 3). Hence, our pipeline can improve personalized prediction and, thereby, further the cause of personalized medicine.

Despite our focus on personalized medicine, we also envision applications in other areas that involve individual heterogeneity. Two example are product recommendation [13, 28] and personalized marketing [33].

Appendix A Proofs

a.1 Proof of Lemma 3.1

Proof.

Assume and orthonormal design . According to the KKT conditions of the estimator, we have

Hence,

(A.1)

Let and multiply from the left to obtain

where we use the assumption of orthonormal design. By taking absolute value on both sides and applying the triangle inequality, we derive the following bound for the personalized prediction error (2.2):

since by assumption. Finally, we obtain the bound

(A.2)

with

a.2 Proof of Theorem 3.1

Proof.

Let and suppose that the linear regression model (2.1) is under orthonormal design.

Bound on : First, we show that . Let

then by definition of , there must exist two tuning parameters  with

such that

However, by Lemma (3.1), we have

and

Applying the triangle inequality to the above displays and combining the results yields

which leads to a contradiction to our assumption. Therefore, we obtain the following bound with respect to :

Bound on the personalized prediction error: Since , we have

Applying the triangle inequality, we ultimately find the bound

a.3 Proof of Example 3.1

Lemma A.1 (Deviation inequality).

For any standard normal variable , we have the following concentration bound

Proof.

= for all . Now by Markov’s inequality,

For , we have

. Since the standard normal distribution is symmetric about 0, we obtain the desired result. ∎

Using this concentration bound, we derive the results of Example 3.1.

Proof.

Given a , Gaussian noise

with variance

, and suppose that the linear regression model (2.1) is under orthonormal design. We first show that for

using the concentration bound, Lemma A.1:

Hence, holds with at least probability . By Theorem 3.1, we have with at least probability :

a.4 Proof of Lemma 4.1

Proof.

Let be a singular value decomposition of  as given in Lemma 4.1. Then by algebraic manipulation of Equation (4.2) the ridge estimator can be written as

where the matrix is defined as

a.5 Proof of Theorem 4.1

Proof.

We consider the KKT-conditions of (3.1) and replace the  estimator with the ridge estimator to obtain

By taking the -norm of both sides and with , we obtain

Thus, we can transform the ridge tuning parameter  to the tuning parameter  with respect to the same estimator.

Moreover, there is a one-to-one relationship between and ridge. The ridge estimator in (4.2) implies that

and hence

Since

we have

and we finally conclude that when

Appendix B Beyond Orthogonality

To avoid digression, we have restricted the theories in the main body of the paper to orthonormal design matrices. However, there are straightforward extensions along established lines in high-dimensional theory. In general, the influence of correlation on regularized estimation has been studied extensively—see, for example, [9] and [15] for the lasso case. The most straightforward extension of our theories goes via the

-restricted eigenvalue introduced in 

[6]. This condition allows for design matrices, that satisfy for certain . We omit the details; importantly, our simulations demonstrate that our method provides accurate prediction far beyond orthonormal design.

Appendix C Beyond Orthogonality

References

  • [1] The Norwegian Cancer Genomics Consortium. https://kreftgenomikk.no.
  • [2] Hege M Bøvelstad, Ståle Nygård, Hege L Størvold, Magne Aldrin, Ørnulf Borgan, Arnoldo Frigessi, and Ole Christian Lingjærde. Predicting survival from microarray data-a comparative study. Bioinformatics, 23(16):2080–2087, 2007.
  • [3] P. Bühlmann and S. van de Geer.

    Statistics for high-dimensional data: Methods, theory and applications

    .
    Springer Series in Statistics. 2011.
  • [4] Peter Bühlmann. Statistical significance in high-dimensional linear models. Bernoulli, 19(4):1212–1242, 2013.
  • [5] Ann Cashion, Ansley Stanfill, Fridtjof Thomas, Lijing Xu, Thomas Sutter, James Eason, Mang Ensell, and Ramin Homayouni. Expression levels of obesity-related genes are associated with weight change in kidney transplant recipients. PloS one, 8(3):e59962, 2013.
  • [6] Michael Chichignoud, Johannes Lederer, and Martin J. Wainwright. A Practical Scheme and Fast Algorithm to Tune the Lasso With Optimality Guarantees. J. Mach. Learn. Res., 17(231):1–20, 2016.
  • [7] Sang-Hoon Cho, Jongsu Jeon, and Seung Il Kim. Personalized medicine in breast cancer: a systematic review. J. Breast. Canc., 15(3):265–272, 2012.
  • [8] Erika Cule and Maria De Iorio. Ridge regression in prediction problems: automatic choice of the ridge parameter. Genet. Epidemiol., 37(7):704–714, 2013.
  • [9] Arnak S Dalalyan, Mohamed Hebiri, and Johannes Lederer. On the prediction performance of the Lasso. Bernoulli, 23(1):552–581, feb 2017.
  • [10] U Demkow and T Wolańczyk. Genetic tests in major psychiatric disorders–integrating molecular medicine with clinical psychiatry–why is it so difficult? Transl. Psychiat., 7(6):1–9, 2017.
  • [11] Georg B Ehret, Patricia B Munroe, Kenneth M Rice, Murielle Bochud, Andrew D Johnson, Daniel I Chasman, Albert V Smith, Martin D Tobin, Germaine C Verwoert, Shih-Jen Hwang, et al. Genetic variants in novel pathways influence blood pressure and cardiovascular disease risk. Nature, 478(7367):103–109, 2011.
  • [12] Gene H Golub, Michael Heath, and Grace Wahba. Generalized cross-validation as a method for choosing a good ridge parameter. Technometrics, 21(2):215–223, 1979.
  • [13] Ido Guy, Naama Zwerdling, Inbal Ronen, David Carmel, and Erel Uziel. Social media recommendation based on people and tags. In Proceedings of the 33rd international ACM SIGIR conference on Research and Development in Information Retrieval, pages 194–201. ACM, 2010.
  • [14] Margaret A Hamburg and Francis S Collins. The path to personalized medicine. New Engl. J. Med., 363(4):301–304, 2010.
  • [15] Mohamed Hebiri and Johannes Lederer. How correlations influence lasso prediction. IEEE T. Inform. Theory, 59(3):1846–1854, 2013.
  • [16] Kristoffer H Hellton and Nils Lid Hjort. Fridge: Focused fine-tuning of ridge regression for personalized predictions. Stat. Med., 37(8):1290–1303, 2018.
  • [17] Julia Hippisley-Cox and Carol Coupland. Development and validation of risk prediction algorithms to estimate future risk of common cancers in men and women: prospective cohort study. BMJ Open, 5(3):1–25, 2015.
  • [18] Arthur E Hoerl and Robert W Kennard. Ridge regression: Biased estimation for nonorthogonal problems. Technometrics, 12(1):55–67, 1970.
  • [19] Vessela N Kristensen, Ole Christian Lingjærde, Hege G Russnes, Hans Kristian M Vollan, Arnoldo Frigessi, and Anne-Lise Børresen-Dale. Principles and methods of integrative genomic analyses in cancer. Nature Reviews Cancer, 14(5):299–313, 2014.
  • [20] Johannes Lederer, Lu Yu, Irina Gaynanova, et al. Oracle inequalities for high-dimensional prediction. Bernoulli, 25(2):1225–1255, 2019.
  • [21] Oleg V Lepskii. On problems of adaptive estimation in white gaussian noise. Topics in Nonparametric Estimation, 12:87–106, 1992.
  • [22] Junwei Lu, Matey Neykov, and Han Liu. Adaptive inferential method for monotone graph invariants. arXiv:1707.09114, 2017.
  • [23] Robert K Nam, Ants Toi, Laurence H Klotz, John Trachtenberg, Michael AS Jewett, Sree Appu, D Andrew Loblaw, Linda Sugar, Steven A Narod, and Michael W Kattan. Assessing individual risk for prostate cancer. J. Clin. Oncol., 25(24):3582–3588, 2007.
  • [24] Shuji Ogino, Jérôme Galon, Charles S Fuchs, and Glenn Dranoff. Cancer immunology—analysis of host and tumor factors for personalized medicine. Nat. Rev. Clin. Oncol., 8(12):711, 2011.
  • [25] Uzoezi Ozomaro, Claes Wahlestedt, and Charles B Nemeroff. Personalized medicine in psychiatry: problems and promises. BMC Med., 11(1):132, 2013.
  • [26] Mandakini G Patel. The effect of dietary intervention on weight gains after renal transplantation. J. Renal Nutr., 8(3):137–141, 1998.
  • [27] R Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria, 2018.
  • [28] Dimitrios Rafailidis, Apostolos Axenopoulos, Jonas Etzold, Stavroula Manolopoulou, and Petros Daras.

    Content-based tag propagation and tensor factorization for personalized item recommendation based on social tagging.

    ACM Trans. Interact. Intell. Syst., 3(4):1–26, 2014.
  • [29] Jeffrey S Ross, Elzbieta A Slodkowska, W Fraser Symmans, Lajos Pusztai, Peter M Ravdin, and Gabriel N Hortobagyi. The her-2 receptor and breast cancer: ten years of targeted anti–her-2 therapy and personalized medicine. The Oncologist, 14(4):320–368, 2009.
  • [30] Jun Shao and Xinwei Deng. Estimation in high-dimensional linear models with deterministic design matrices. Ann. Stat., 40(2):812–831, 2012.
  • [31] V. G. Spokoiny, E. Mammen, and O. V. Lepski. Optimal spatial adaptation to inhomogeneous smoothness: an approach based on kernel estimates with variable bandwidth selectors. Ann. Stat., 25(3):929–947, 1997.
  • [32] Mervyn Stone. Cross-validatory choice and assessment of statistical predictions. J. R. Statist. Soc. B, 36(2):111–133, 1974.
  • [33] Heng Tang, Stephen Shaoyi Liao, and Sherry Xiaoyun Sun. A prediction framework based on contextual data to support mobile personalized marketing. Decis. Support Syst., 56:234–246, 2013.
  • [34] Robert Tibshirani. Regression shrinkage and selection via the lasso. J. R. Statist. Soc. B, 58(1):267–288, 1996.
  • [35] Sara van de Geer, Peter Bühlmann, and Shuheng Zhou. The adaptive and the thresholded Lasso for potentially misspecified models (and a lower bound for the Lasso). Electron. J. Stat., 5:688–749, 2011.
  • [36] Shafagh Al Nadaf Waters, Sharon Leh Ing Wong, Nikhil Tanaji Awatade, Christopher Kenta Hewson, Laura Katherine Fawcett, Anthony Kicic, and Adam Jaffe. Human primary epithelial cell models: promising tools in the era of cystic fibrosis personalized medicine. Front. Pharmacol., 9:1–11, 2018.
  • [37] Andreas Ziegler, Armin Koch, Katja Krockenberger, and Anika Großhennig. Personalized medicine using dna biomarkers: a review. Hum. Genet., 131(10):1627–1638, 2012.