A Single Index Model for Longitudinal Outcomes to Optimize Individual Treatment Decision Rules

by   Lanqiu Yao, et al.
NYU Langone Medical Center

A pressing challenge in medical research is to identify optimal treatments for individual patients. This is particularly challenging in mental health settings where mean responses are often similar across multiple treatments. For example, the mean longitudinal trajectories for patients treated with an active drug and placebo may be very similar but different treatments may exhibit distinctly different individual trajectory shapes. Most precision medicine approaches using longitudinal data often ignore information from the longitudinal data structure. This paper investigates a powerful precision medicine approach by examining the impact of baseline covariates on longitudinal outcome trajectories to guide treatment decisions instead of traditional scalar outcome measures derived from longitudinal data, such as a change score. We introduce a method of estimating "biosignatures" defined as linear combinations of baseline characteristics (i.e., a single index) that optimally separate longitudinal trajectories among different treatment groups. The criterion used is to maximize the Kullback-Leibler Divergence between different treatment outcome distributions. The approach is illustrated via simulation studies and a depression clinical trial. The approach is also contrasted with more traditional methods and compares performance in the presence of missing data.



page 1

page 2

page 3

page 4


Estimating Individualized Treatment Regimes from Crossover Designs

The field of precision medicine aims to tailor treatment based on patien...

Estimation of Individualized Decision Rules Based on an Optimized Covariate-Dependent Equivalent of Random Outcomes

Recent exploration of optimal individualized decision rules (IDRs) for p...

High dimensional precision medicine from patient-derived xenografts

The complexity of human cancer often results in significant heterogeneit...

Near-optimal Individualized Treatment Recommendations

Individualized treatment recommendation (ITR) is an important analytic f...

Dimensionality Reduction of Longitudinal 'Omics Data using Modern Tensor Factorization

Precision medicine is a clinical approach for disease prevention, detect...

An example of how false conclusions could be made with personalized health tracking and suggestions for avoiding similar situations

Personalizing interventions and treatments is a necessity for optimal me...

Representation Learning for Integrating Multi-domain Outcomes to Optimize Individualized Treatments

For mental disorders, patients' underlying mental states are non-observe...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

A major goal of precision medicine is to determine optimal treatment decisions rules for individual patients, instead of using the usual “one-size-fits-all” approach assuming one therapy works better than the others among all patients (Kosorok and Laber, 2019). The traditional strategy of treating all patients with the same treatment is often suboptimal since patients are often highly heterogeneous in many aspects, such as their clinical, genetic, environmental, and social characteristics and hence, there may not be a treatment option that is universally optimal for all patients. For example, there is substantial evidence of heterogeneity in treatment response among cancer patients (Dagogo-Jack and Shaw, 2018). Patients with advanced colorectal or breast cancer might not always benefit from molecular target treatment and comprehensive molecular aberrations studies are required  (Dienstmann et al., 2012; De Mattos-Arruda et al., 2013). Heterogeneous treatment response is also common in depression studies. For example, only about of patients with depression achieve remission after a single acute phase of treatment (Trivedi et al., 2006). This heterogeneity presents a significant challenge in medical research, making it difficult to find therapies that are effective in a majority of patients. Precision medicine is therefore a promising approach to solving this heterogeneity problem whereby treatment decisions are tailored to individual patients based on a patient’s characteristics (Petkova et al., 2017a).

Numerous methods have been developed to estimate treatment decision rules (TDRs) using baseline patient characteristics. For example,  Watkins (1989)

proposed Q-learning, where “Q” stands for “quality”. It is a reinforcement learning method that estimates the Q-function, which is defined as the conditional expectation of the sum of the current and future outcomes given that treatment decisions are made at all future stages based on the specific TDR. After 

Murphy et al. (2007) applied Q-learning in the clinical research area, it has been widely studied to optimize TDRs, and the Q-functions can be modeled parametrically, semiparametrically and nonparametrically (Zhao et al., 2009).  Murphy (2003) introduced a similar method known as Advantage learning (A-learning). It models a regret function that measures the loss caused by not following the optimal treatment regime at each stage and can be more robust to model misspecification than Q-learning (Schulte et al., 2014). Outcome weighted learning (OWL) is another widely used approach proposed by Zhao et al. (2012)

, where a weighted classification function calculated with baseline characteristics is applied to optimize the estimation of treatment rules and the computational problem is solved using support vector machines (SVM). Researchers have also utilized deep learning approaches on precision medicine, such as neural networks and evolutionary algorithms 

(Uddin et al., 2019)

. However, these machine learning and deep learning methods are very computationally burdensome and the results from these methods are often difficult to interpret.

Regression models have been investigated to develop TDRs (e.g., Petkova et al., 2020; Park et al., 2020; Tian et al., 2014; Henderson et al., 2010). Regression approaches have the advantage that the resulting TDRs are easy to use and easily interpretable. Additionally, TDRs based on regression modeling are often very competitive compared to more complex machine learning approaches. Petkova et al. (2017a) proposed a model that uses a linear combination or an single index of patients’ baseline characteristics, termed a generated effect modifier (GEM) or “biosignature" to optimize treatment decisions. The linear combination defining the biosignature in the GEM is determined by maximizing the interaction effect (or modifying effect) between treatment arms. Park et al. (2020) introduced a robust single-index model with multiple links (SIMML) functions, a generalization of the GEM, in which various treatment-specific nonparametric link functions were used to relate a single linear combination of baseline characteristics to outcomes.

Most medical studies have durations that last weeks and measures are typically collected longitudinally over the course of the study on participants. A shortcoming in the TDR literature is that the information available from the longitudinal assessments is typically ignored and only the baseline and last measurement is used to define an outcome, e.g., a change score, or controlling for baseline in an analysis of covariance (ANCOVA) model. TDR’s that ignore this temporal longitudinal aspect of the data may be missing important information that could be used to optimize the TDR. For example, one treatment may have a much quicker onset of action than another but the duration of the specific action of the treatment may wane quicker. In such a situation, the two treatments may look similar based on a simple change score outcome, but the distribution of trajectories for the two treatments may be quite distinct. An additional challenge in medical studies is that they are almost always plagued by missing data. Similar to the longitudinal structures that are often ignored, few methods precision medicine approaches account for missing data. Methods that are robust to missing data are potentially more powerful for developing well-performing TDRs.

Our objective is to create an effective technique for estimating TDRs that takes use of the data’s longitudinal structure and is robust when there is missing data. The rest of the paper is organized as follows: in Section 2 we introduce our motivating example, the EMBARC trial; Section 3 describes the algorithm we developed to identify an optimal linear combination of baseline covariates for treatment decisions; Section 4 provides a geometric illustration of the proposed TDR approach; to evaluate our method, a simulation study is reported in Section 5; Section 6 illustrates an application of our approach with the EMBARC example described in Section 2; a brief discussion is provided in Section 7.

2 Motivating example: the EMBARC trial

Major Depressive Disorder (MDD) is a chronic disease that persistently impacts people’s moods and behaviors. MDD is highly prevalent in the USA and current studies suggest that approximately 10–20 of the U.S. population will develop a depressive disorder in their lifetime (Richards, 2011). The antidepressants, such as Serotonin and noradrenaline reuptake inhibitors (SNRIs), were believed to work in 70 MDD patients (Dinoff et al., 2020). However, researchers found the effects of antidepressants were overestimated, i.e., the magnitude of symptom reduction with antidepressants was about only 40 (Khan et al., 2012). The resistance to medication is usually explained by the extreme heterogeneity. The clinical trial, Establishing Moderators and Biosignatures of Antidepressant Response in Clinical Care (EMBARC)  Trivedi et al. (2016) was designed to discover biomarkers for depression treatment response using a large collection of baseline modalities, such as clinical measures, behavior tests, brain imaging modalities. The primary goal was to discover biomarkers from this rich set of baseline data to deal with the heterogeneity and develop personalized treatment decision rules that use this rich source of information.

The EMBARC study was a multi-site randomized clinical trial with 296 participants recruited from four study sites (Petkova et al., 2017b). Participants were randomly assigned to receive either sertraline or placebo. The participants were followed up for 8 weeks and the primary outcome, Hamilton Depression Rating Scale (HDRS), was evaluated for each participant at each follow-up time. The HDRS is a measure for the severity of depression with lower scores indicating less severe depression. Quadratic mixed-effects models with linear and quadratic terms of time (weeks) were fit to the longitudinal data. The estimated outcome trajectories versus time are illustrated in Figure 1. The blue solid and red dashed curves represent the estimated trajectories for the active treatment and placebo group, respectively. The blue and red dots show the observed HDRS values for individual patients. The mean outcome trajectories are represented by the thick black curves (the mean trajectory for the active treatment group is the lower one). Note that the average trajectories for drug and placebo are very similar to one another.

Figure 1: The HDRS scores for the active treatment group (blue diamond points) and the placebo group (red circle points) at corresponding weeks. The data was analyzed using mixed effect models with quadratic effects of observation time. The blue solid and red dashed curves represent the treatment and placebo groups’ estimated outcome trajectories, respectively. The average trajectories for the two groups are shown by the two bold black curves (the active treatment mean curve is the lower curve).
Figure 1: Outcome Trajectories Plots for Participants in the EMBARC Study

The baseline measures that were collected in the EMBARC study include demographics (e.g., age and gender), clinical measurements (e.g., anxiety level, ranger attacks, hypersomnia, and fatigue), and cerebral cortical thickness data from MRI, task-based functional MRI (fMRI), and EEG from scalp electrodes. It is quite common in depression studies that no single baseline measure will explain a substantial portion of variability in the outcome measures, but instead, each baseline measure may contribute some minimal amount of information. Therefore, a driving goal of precision medicine is to discover powerful biosignatures for defining TDRs that combine information across multiple baseline features.

3 A single-index model to optimize Kullback-Leibler Divergence

The primary idea for this paper is to develop a linear biosignature (i.e., a linear combination of baseline features) that optimally separates treatment group trajectories in terms of Kullback-Leibler Divergence in order to develop strong performing TDRs. This work focuses on randomized clinical trials. Let denote the functional (e.g., longitudinal trajectory) outcome for subject in the th treatment group where is the number of participants for the th group; and

is the number of intervention arms. In a longitudinal study,

is only observed at a finite number of design time points (possibly with error). Denote as the observed outcome measured for the given subject at time , where is the index of observations and let

denote the vector of outcomes for the

th subject in the th group. In a balanced longitudinal study with no missing values, the measurement times will usually follow a protocol with a common set of follow-up times, i.e., with a total number of outcomes collected per subject. Denote the vector of observed times for subject as , where . That is, a subject may have missing time points or drop out of the study, the total observed time is less or equal to the total designed follow-up time points, . Let denote the set of baseline covariates. The TDRs will be developed for a linear biosignature defined as a linear combination of baseline covariates: . To ensure identifiability of the model, we impose the constraint . To incorporate a subject’s baseline characteristics into a longitudinal model, we consider the following mixed-effects model for :


with the biosignatrue and is its fixed-effect coefficient vector with dimension ; is also a coefficient of fixed effects, which represents the main effects in the th group; is the subject-specific random effects, assumed to be multivariate normal; is the vector of random errors, which is assumed independent of the random effects; is the design matrix with dimension , which is constructed with a set of suitable basis functions of the observation times , i.e. , where is a sequence of basis functions for time . Thus the biosignature is embedded in a longitudinal model and has an interaction with the observation time.

3.1 Purity function

TDRs are generally based on combinations of baseline features that have treatment modifying impacts on the outcome and in standard regression settings, this often corresponds to finding treatment effect modifiers that maximize interaction effects. Correspondingly, in order to optimize a TDR based on the functional nature of the outcome, an optimal choice of in (1) is determined in order to maximally distinguish the treatment groups in terms of the overall trajectories shapes. Let denote the coefficients for the trajectory of subject in the th group from (1), that is,


Conditional on a given biosignature , from the model assumptions, the coefficient distribution (2) is multivariate normal:


To measure the differences among the trajectory shape distributions as determined by the coefficients from different groups, the Kullback-Leibler Divergence (KLD) is applied, which measures the degree of divergence of one probability distribution

from another probability distribution



where and

denote the probability density functions of

and , respectively. Larger KLD between distributions corresponds to higher purity, i.e., the distributions overlap less. Note that is nonnegative. Here, we focus the scenario with two different treatment groups, i.e. . The purity function of the biosignature is then defined as:


which represents the KLD between the distribution of group 1 () and the distribution of group 2 (), given the biosignature . The function is a subject-level purity measure function since it depends on the subject-specific baseline covariates . In order to obtain a population-level of purity, we integrate over the entire population:


where is the probability density function (PDF) for the distribution of biosignature (which will be identical across treatment groups in a randomized trial). We note the dependence of the biosignature PDF on by . A larger purity, , represents a larger difference in outcomes over time between the two treatment populations. A larger difference between interventions indicates that it is easier to make a treatment decision, since one treatment is significantly different and superior than the others conditional on a set of baseline measures. Therefore, identifying a linear combination of baseline covariates to achieve the highest purity will lead to an optimal separation of the outcome distributions, which in turn will allow the estimation of well-performing TDRs. Given the mixed-effect model that incorporates the biosignature in (1), from the supporting information, the purity function can be expressed as:




That is, and are scalars defined in terms of . Note that by varying , the mixed-effect parameter values in these equations will vary as well, e.g., we could write, for example, instead of but we suppress this dependency here for notational convenience.

3.2 Optimization and estimation of

In this section, we describe the approach for estimating the model parameters, including the biosignature coefficients , in order to maximize the KLD. Since are baseline covariates, the estimation of the model parameters describing the distribution of , can be estimated independent of . We will use a working approximation model of in our derivations to follow. As noted above, the estimators and from (1) will depend on . In general, closed form expressions for the parameter estimates do not exist and we will use numerical methods to simultaneously maximize the likelihood for the mixed-effect model parameters in (1) (with a set of basis functions ) and estimating to maximize the KLD to maximize (7). Therefore, the that maximizes the separations (i.e., KLD) between distributions (e.g., active treatment verse placebo) is assessed from:


where parameters from the mixed-effect model in are determined by maximizing the likelihood function. Given a set of baseline covariates with dimension and a selection of basis functions , we apply Nelder-Mead algorithm Nelder and Mead (1965) to determine parameter estimates that numerically optimize the purity function. That is, is estimated using the Nelder-Mead algorithm on (9) and then is set to have unit norm (e.g. = 1).

3.3 Treatment decision rule

A treatment decision rule is defined as a function that maps a subject’s baseline characteristics to one of the treatment options. For illustration, here we consider the case of treatment options, coded as and . In order to make a TDR for a longitudinal data, we will extract a scalar summary from the outcome trajectories (curves of outcome vs observation time), since the curves do not have nature ordering. The most commonly used summary measure is the “change score", which is calculated as the differences between the last observation as the first measure. However, it only uses two outcomes for each subject and hence does not utilize all the information available in the data. It also ignores the information on the shape of the trajectory. In order to extract a meaningful scalar measurement that summarizes a function ,  Tarpey et al. (2021) considered the average tangent slope (ATS), which is the average of the tangent slope at each time point on the trajectory interval. Since the slope of the tangent line is the derivative of the function at time , the ATS extracts the average rate of change (e.g., improvement) across the study time interval. The ATS is given by


where and are the start and end time in the study, respectively. That is, the ATS is the average of the instantaneous rates of improvement across the time domain. The ATS is calculated from the estimated fixed-effects and hence gains efficiency by incorporating information from all the data.

The TDR of our single-index model based on maximizing KLD is then made by using the ATS criterion. Assume a larger value is preferred, i.e., a larger rate of improvement across time corresponds to a better outcome. After the estimation of , the TDR is then defined as:


where and are the first and last designed outcome measure time, respectively; is a sequence of basis function evaluated at time .  Tarpey et al. (2021) also demonstrated the properties of the ATS estimator via simulation studies under missing data scenarios. If a subject is missing their last scheduled assessment, the change score method of estimation can produced seriously biased estimates of treatment effects when the outcome trajectories are nonlinear. However, the TDR based on the KLD uses parameter estimates for the full data in (11) and hence suffers far less from the loss of efficiency or bias that hinders the change score method.

4 Geometric illustrations with quadratic trajectories

In this section, we illustrate geometrically how the KLD varies as a function of the biosignature using quadratic trajectories as an example. Many randomized clinical trials (RCTs) are of fairly short duration (e.g., 6-12 weeks). For many treatment studies (e.g., for pain and depression), it is common to see some subjects improve immediately (e.g., due to a placebo effect) followed by a deterioration while others may exhibit delayed response to therapy but later improve; on the other hand, some subjects exhibit a linear trend in improvement. These major characteristics of the therapy effects indicate that quadratic curves will often provide a good fit to the observed data and this motivates our use of quadratic curves for illustration. The coefficient vector in (2), , is then a vector consisting of an intercept, a coefficient for the linear term (“slope”) and a coefficient for the quadratic term (“concavity”). In a randomized trial, the intercept term represents the average outocme at baseline and should be the same across treatment groups. Also, the intercept provides no direct information on the shape of the trajectory. Thus, for our illustration, we focus on the shape information only, i.e., the slope and concavity values to represent the curves which can be graphically depicted in a 2-dimensional coordinate system (-axis for slope and

-axis for concavity). The bivariate normal distribution for these coefficients will be illustrated using elliptical contours of equal density. The center of the contour ellipses will represent the means of the distribution given by the second and third elements of the fixed-effect coefficient vectors

’s for groups

. The length of the major and minor axis of the ellipses are proportional to the eigenvalues of

, the covariance matrices of the random effects in .

The geometric illustrations are provided in Figure 2. Let denote the biosignature. The different panels in Figure 2 depict several scenarios showing the separation between the distributions for the treatment groups based on two characteristics:

  1. , the magnitude of the biosignature which increases across the columns in Figure 2 and

  2. angle between the biosignature coefficient vectors and which increases across the rows in Figure 2.

Figure 2: Contours of equal density for the joint slope and concavity distributions for two treatment groups that vary based on the magnitude of the biosignature (left-to-right) and the angle between the biosignature covariate directions and from model (1) (from top-to-bottom)
Figure 2: Illustration of Contour Plots for Trajectories Separations

As the biosignature increases in magnitude, the two distributions move apart (along the green dotted lines) as can be seen going from left to right in the rows of Figure 2 (unless the angle between the biosignature covariate directions and is zero as depicted in the panels in the top row of the figure). Additionally, the larger the angle between the biosignature covariate directions and , the quicker the two distributions separate as the biosignature

increases in magnitude. Also note that the orientations of the equal contour ellipses (as determined by the eigenvectors of

and ) impacts the degree of overlap between the two treatment distributions. The situation when the biosignature is zero, , for our illustration is shown in the very left column in Figure 2 – in this case, there is a large degree of overlap between two treatment distributions indicated by the large overlap in the equal contour ellipses. The goal of the TDR optimization is to determine that “moves” the two treatment distributions as far apart as possible based on these characteristics, that is, the goal is to maximize the longitudinal interaction effect.

5 Simulation studies

The performance of the TDRs based on determining a biosignature that maximizes the mean KLD is studied using simulation illustrations in this section. Because one of the primary motivations for developing the approach is the EMBARC depression study, we use quadratic outcome trajectories for illustration and the simulation parameter settings are set to mimic the parameter estimates from the EMBARC study. In all simulations, we consider the case with treatment groups. The outcomes are generated from the model (1) using the quadratic functional approximation, i.e., setting . Therefore, for a subject with outcomes measured at time , the design matrix has 3 columns: a column of ones, , and .

5.1 Longitudinal trajectory parameter settings

The assessment time points are set to , i.e., the outcomes are collected at baseline and subjects are followed up for 7 weeks. The fixed-effect coefficient parameters and are chosen to be similar to each other (as was the case for the estimated EMBARC study parameters),

The vectors of random effects are set as and with

The parameter provides direction information, i.e., how the coefficients of the trajectories vary with changes of the baseline biosignature . For this simulation illustration, we set

where ’s are chosen to be degrees, corresponding to the angles between vector and as . The error terms are simulated as and are all independent of each other. The average trajectories (with only fixed effects and without the effect of biosignatures) are illustrated in Figure 3:

Figure 3: Trajectories from the treatment group 1 and 2 were generated based on their average trajectories, which are the blue solid and red dashed curves, respectively.
Figure 3: Average Trajectories Plots for the Simulation

From Figure 3, the mean curves of treatment group 1 and 2, without regard to the impact of baseline covariates, have similar shapes and they have exactly the same average tangent slope. Thus, based on the average trajectories, one would conclude there is not difference between the two groups. However, incorporating the patients’ baseline information via the biosignature, distinctions between the two groups arise. That is, tailoring treatment to individual patient using baseline characteristics, an optimal treatment decision is possible even though on average, subjects respond similarly.

5.2 Covariate parameter settings

To generate the linear biosignature, we consider settings with the number of covariates to be . We set , standardized to have norm one. The baseline covariates were sampled from , where is a matrix and has 1’s on the diagonal and for the element on the th column and th row. The vector of mean values, , is set as , i.e., half are negative and the other half are positive, which makes the biosignature have mean .

5.3 Missing data settings

Different scenarios of missingness are considered in the simulations: (i) no missing data, i.e., all measures from all subjects are collected; (ii) missing completely at random (MCAR), that is, for each subject, each of his/her outcomes at week has a probability of to be missing; (iii) each patient has a chance of dropping out the trial at week (Dropout). Note that the outcomes at baseline are recorded for all patients.

5.4 TDRs and simulation settings

The proposed TDR based on a longitudinal single-index model to maximize the Kullback-Leibler Divergence (denoted as LS-KLD) is compared with a three other approaches for constructing TDRs The TDR methods considered are:

  1. Longitudinal single-index model based on Kullback-Leibler Divergence (denoted as LS-KLD),

  2. SIMML (Park et al., 2020) estimated from maximizing the profile likelihood,

  3. Linear GEM model (Petkova et al., 2017a) estimated under the criterion of maximizing the difference in the treatment-specific slopes, denoted as linGEM,

  4. The outcome weighted learning (OWL) method (Zhao et al., 2012)

    based on a Gaussian radial basis function kernel.

Note that the LS-KLD method estimates the TDR (11), based on the ATS estimator, while the SIMML and linGEM assess the TDR with the differences in the observed outcomes from baseline to last observation (change scores). Additionally, the TDRs estimated by plugging the actual in (11) (instead of the estimated ) are also presented for comparison. For each scenario, a training data set with sample size ( per group) was used to estimate the model parameters and the TDRs. Testing data sets with were generated to obtain an independent assessment of the performances of each of the TDRs. For the th simulated subject in the testing data, the true treatment assignment was calculated using


Note that the random effects were considered when determining the true group label since the random effects are available in the simulations, i.e. the true trajectories for each subject were known. The estimated group assignment was evaluated with in (11). The performances of models were assessed with Proportion of Correct Decision (PCD), as


The simulations were conducted in R 3.5.1 (R Core Team, 2021). The mixed-effect models were fitted with the package lme4 and the optimization to estimate is performed using the optim function in R. Altogether, this simulation experiment is examining different scenarios: three missing data scenarios, four different dimensions for the baseline features, and four angles between and . 200 repetitions were conducted for each scenario.

5.5 Simulation results

In order to compare the different TDRs, the proportion of correct decisions (PCD) of the test sample that is correctly classified to the optimal treatment is computed following (

13). That is, each subject in the test sample is simulated with known labels of group assignments. A correct decision is defined as the match between the estimated and true group assignment.

Figure 4: Plots of the proportion of correct decisions (PCD) of the treatment decision rules obtained from 200 training data sets fro each of the four methods. Each panel corresponds to one of the combinations of

and no missing data, MCAR, or Dropout. The dots represent the mean PCD estimated across the 200 repetitions and the bars represent the 1.96 standard deviations.

Figure 4: Proportion of Correct Decisions (PCD)

Figure 4 summarizes the results for all simulation scenarios. From left to right, the columns present the scenarios of no missing data, MCAR, and dropout. The panels across the rows show the PCD estimations under different number of covariates, . The mean PCDs with 1.96 standard deviations are plotted. From left to the right, the bars present the PCD estimated using the actual , the estimated longitudinal single-index model (LS-KLD), SIMML, linGEM and outcome weighted learning with Gaussian kernal (OWL-gaussian). For each panel, the -axis gives the PCD while the -axis represents the angle between and . When , all methods have PCD around across all scenarios as expected, i.e. the methods assign subjects to group 1 or group 2 essentially by chance (which is expected with ). For each panel, as increases (left to right), the separation of the groups increases and all methods perform better instead of OWL-gaussian. The performances in terms of different can be explained through the illustration in Figure 2. When and the ellipses shift in the same direction in which case the biosignature is not able to separate the groups; the opposite extreme is when corresponding to maximizing the separation between the treatment group distributions where the two distributions move in opposite directions as the biosignature varies. Across all the scenarios, the LS-KLD model has similar performance as using the true to estimate the PCD. When there is no missing data, the TDRs estimated by LS-KLD, SIMML, and linGEM perform similarly when the number of covariates is small (e.g. ). The OWL-gaussian is outperformed by the other three methods. When gets larger, the LS-KLD shows larger PCD values compared to SIMML and linGEM, especially when or . On the other hand, when there is missing data, the LS-KLD method performs much better than the other approaches, in terms of higher PCDs. The outcome weighted learning approach has bad performance across all scenarios. This may be caused by the weakness of this algorithm. The outcome weighted learning, which optimizes treatment decisions based on weighted support vector machine (SVM), is not robust against a perturbation of the clinical outcome. For example, it could break down by a simple shift of a clinical outcome (Zhou et al., 2017).

In addition to , the speed of separation of the two distributions is also determined by the norms of and . By applying the proposed optimization procedure for LS-KLD, estimates are obtained that maximally separate the treatment groups for an optimally identified biosignature . The resulting TDR then provides high confidence in assigning subjects to treatment groups among subjects who have a relatively high magnitude of .

6 Application to the EMBARC study

To illustrate the performance of the LS-KLD method to identify a linear combination of baseline covariates to define a TDR in the EMBARC study (see Section 2), 10 baseline covariates from the the Demographic, Clinical Measurement and Behavioral Phenotyping modality are used. The outcomes are the HDRS measured at week 0 (baseline), and weeks 1, 2, 3, 4, 6, and 8. Of 160 the patients who had data on these covariates, 87 were randomized to placebo and 73 to the antidepressant group – Table 1

summarizes these varaiables by treatment group. Summary statistics for the active treatment and placebo arms (e.g. mean and standard deviations for continuous covariates; counts and percentages for categorical covariates) are shown in the second and third columns, respectively. Linear regression models with the change scores (improvement in HDRS from week 0 to last week) as the outcome variable and the group indicator, the baseline covariate and their interaction as the predictor variables were fitted for each baseline covariate. The significance of the interaction terms were investigated using a likelihood ratio test. In Table

1, the -values of LRTs are summarized in column 4. Treatment had significant interaction effects for two baseline factors, “Age at Evaluation" and “Flanker Accuracy".

Mean (SD) / N () IPWE Value
Baseline Covariates Treatment Placebo value Linear Nonpar.
Age at Evaluation 37.44 (15.01) 38.30 (13.03) 0.006 5.71 5.13
Anger attack 3.08 (2.16) 2.97 (2.07) 0.839 6.66 7.23
Axis II 3.92 (1.40) 3.91 (1.50) 0.410 6.56 7.20
Chronicity 0.219 7.08 7.08
Yes 39 (53.4) 47 (54.0)
No 34 (46.6) 40 (46.0)
Hypersomnia 0.667 6.33 6.33
Yes 17 (23.3) 16 (18.4)
No 56 (76.7) 71 (81.6)
Sex 0.921 6.36 6.36
Female 26 (35.6) 32 (36.8)
Male 47 (64.4) 55 (63.2)
Number of correct responses 0.05 (0.82) 0.23 (0.71) 0.309 6.46 6.55
Median Reaction time 0.51 (1.78) 0.01 (1.10) 0.649 6.60 6.27
Flanker Reaction Time 61.36 (28.61) 59.52 (24.40) 0.075 5.65 5.77
Flanker Accuracy 0.22 (0.15) 0.22 (0.15) 0.005 5.44 6.00
  • Table 1: The second and third columns summarize the mean (SD) for continuous variables and the counts (

    ) for categorical variables in each group. “Number of correct responses" represents the number of correct responses in the “A not B” Working memory task. The

    -values for the interaction effects between outcomes and each covariate are provided in the fourth column. LRTs were used to calculate the values. The IPWE values obtained by fitting linear regressions or B-spline regressions are shown in the last two columns.

Table 1: Demographic Table.

To highlight the interaction effects of age at evaluation and flanker accuracy with treatment from Table 1, Figure 5

shows cubic B-spline regressions (with five degrees of freedom) fitted separately for the placebo (red dashed lines) and antidepressant treatment (blue solid lines) groups, with the 95

confidence bounds (represented as the red and blue areas in the plots). Despite evidence of the interactions between the curves in Figure 5

, the corresponding confidence intervals overlap substantially indicating that these two covariates individually have limited treatment-modifying effects and TDRs based on these variables individually may perform poorly.

Figure 5: The scatter plots present the relationship between the outcome improvement and the baseline covariates, “Age at Evaluation" (left panel) and “Flanker Accuracy" (right panel). The placebo and active treatment groups are shown by the red round and blue triangle points, respectively. The data points are overlaid with treatment-specific spline approximation spline regression curves with 5 degrees of freedom. The red dashed curves represent the placebo group, whereas the blue solid curves represent the active treatment group.
Figure 5: Treatment-specific Spline Approximated Regression Curves

In order to compare the performance of different TDRs, the value of the TDR will be used. The value of a TDR is defined to be the average outcome when treatments are assigned based on the TDR. We shall evaluate the effectiveness of a TDR using an empirical estimate of the value defined as the inverse probability weighted estimator (IPWE)  (Murphy, 2005):


where is the observed treatment assignment and is the treatment assignment based on the estimated TDR (1 = active treatment, 2 = placebo); is the change score, i.e., the difference between the first and last HDRS outcomes. Change score is being used here to compare TDRs since this is a measure that is extractable from all methods under consideration. Larger values of are indicative of an effective treatment (more improvement from baseline).

The treatment modifying effect for each baseline characteristic was evaluated with the IPWE (14). Simple linear regressions and -spline regressions of the HDRS improvement on each covariate were obtained separately in the antidepressant and the placebo groups. A 10-fold cross-validation (CV) was conducted to assess the TDR performances and then this 10-fold CV was repeated for 100 random splits of the data and then averaged over all random spits. The CV from applying linear regressions (Linear) and B-spline regressions (Nonpar.) are summarized in the fifth and sixth column in Table 1. The IPWEs estimated with the -spline regressions have slightly larger values than the linear regression. Additionally, an all-drug (i.e., a “one-size-fits-all”) policy was also applied based on the decision rule to give everyone the active treatment resulting in an IPWE . Thus, the policy to treat everyone with the active drug performs better than either regression approach (linear or -spline) when considering a single predictor variable. Although the two baseline covariates, “Age at Evaluation" and “Flanker Accuracy", have significant interaction effects with the treatment, they perform poorly in terms of informing an optimal treatment assignment.

Since no single predictor appears to be strong effect modifier by itself, results for the single-index models with a combination of all the baseline covariates were examined based on the change score. The four TDRs used in the simulation study in Section 5 were considered: LS-KLD, SIMML, linGEM and OWL-gaussian. These TDRs were compared using 10-fold CV with 100 random splits as was done for models with a single predictor. Results are also presented based on assigning all subjects to the active treatment group (AllDrug) or assigning all subjects to placebo (AllPbo).

Figure 6: Box plots of the estimated IPWE values across the 1000 splits (higher values are preferred). The mean and standard deviations for each method are: LS-KLD: 8.41 (1.69); SIMML: 7.94 (1.48): LinGEM: 7.87 (1.58); OWL-gaussian: 7.84 (1.67); AllDrug: 7.47 (1.7); AllPbo 6.00 (1.52) .
Figure 6: Evaluation of the IPWE

The IPWEs for the TDRs are summarized in Figure 6. The purple horizontal dashed line shows the median IPWE value across the 1000 repetitions (10-fold times 100 random splits) when all patients are assigned to the treatment group, while the red dashed line represents the median IPWE value of the LS-KLD. All single-index models have higher median IPWE values than the “one-size-fits-all’ approaches (AllDrug, AllPbo), implying that a treatment decision rule based on an individual’s biosignature can achieve better performance than the policy of administering the active treatment to everyone. The IPWE estimations of SIMML, LinGEM and OWL-gaussian are fairly similar, but their performance is worse than the LS-KLD approach.

The single index was calculated for each subject with the coefficients estimated from the three single-index models. The relationship between the outcome improvement and the single-index values are shown in Figure 7. Red round and blue triangle points represent the outcomes from the placebo and active treatment groups respectively. Cubic -spline regressions were fitted separately for the placebo (red dashed curves) and active treatment (blue solid curves) groups, with the 95 confidence bands represented as red and blue areas in the plot.

Figure 7: The scatter plots present the relationship between the outcome improvement and the estimated biosignatures by each single index model. The placebo and treatment groups are shown by the red and blue points, respectively. The data points are overlaid with treatment-specific spline approximation regression curves with 5 degrees of freedom. The red curves represent the placebo group, whereas the blue curves represent the treatment group.
Figure 7: Approximated Spline Regression with Biosignatures

6.1 Additional TDR Comparisons

The above results show that the LS-KLD approach shows the best in terms performed very well in terms of IPWE with the 10 baseline covariates that were selected for the illustration. However, the performances of the TDRs depend on the choice of the predictors included in the model. The results for different TDRs can vary substantially depending on which covariates are used in the TDR. In order to obtain a more robust picture of how different TDRs perform generally, we examined TDR performance for 500 models with various combinations of baseline covariates where the 500 models include random combinations of predictors with varying numbers of predictors. The demographic, clinical, and behavior phenotyping data sets contained 21 covariates at the baseline. For the th combination, the number of covariates

was chosen at random from a uniform distribution on the set

. Then covariates were randomly selected from the demographic, clinical, and behavior phenotyping covariates set. Since the “Age at Evaluation" and “Flanker Accuracy" have significant interactions with the outcome, we ensured each covariate combination for chosen models contain these two covariates, as well as the “Flanker Reaction Time", which is an important measure in clinical practice  (Dillon et al., 2015).

100 repetitions of 10-fold CV were conducted and the mean IPWE values were calculated across the 1000 training sets. Density plots of the IPWEs estimated by TDR approaches (LS-KLD, SIMML, linGEM, OWL-gaussian) across the 500 baseline covariate combinations are shown in Figure 8. By simply allocating all individuals to the active treatment group (AllDrug), the IPWE value (shown as the vertical dashed line). The density plots of LS-KLD, SIMML, linGEM, and OWL-gaussian are represented by the pink, yellow, green and blue curves, respectively. OWL-gaussian performs worst with mean IPWE , and a relatively large standard deviation 0.21. The SIMML and linGEM perform similarly in terms of IPWE estimation. For SIMML, the mean (SD) IPWE value across the 500 combinations is 7.38 (0.33), while for the linGEM model, it is 7.32 (0.35). SIMML has slightly better performance than the linGEM, in terms of a greater mean value of IPWE and a smaller standard deviation. However, those three mean values are smaller than the IPWE calculated by simply assigning all subjects to the active treatment group: SIMML and linGEM only perform better than the policy of AllDrug 35 and 28 times , respectively. On the other hand, the mean IPWE is 7.64 and 82 combinations have better performance than Alldrug; additionally, the IPWE values estimated by LS-KLD have much smaller standard deviation (0.18) compared with the other two methods.

Note that in this analysis, there are several examples of combinations of variables where the TDRs based on a single-index perform worse than the AllDrug policy. This is expected because some of the models may include variables that have negligible moderator effects and no combination of these variables may yield a strong TDR. The LS-KLD method, that takes into account the longitudinal structure of the data, performs considerably better compared to SIMML, linGEM, and OWL-gaussian and also performs better than the AllDrug policy in the majority of scenarios. Besides, we included “Age at Evaluation", “Flanker Reaction Time", and “Flanker Accuracy" in all combinations. Similar results are obtained without this restriction on containing these 3 covariates.

Figure 8: The density plots of the mean CV IPWE values across 500 combination of baseline covariates. The vertical dashed line represents the IPWE value by assigning all subjects to the drug group. The mean (SD) of each method is: LS-KLD: 7.64 (0.18); SIMML: 7.38 (0.33); LinGEM: 7.32 (0.35) ; OWL-gaussian: 7.30 (0.21).
Figure 8: Density Plots of the IPWEs

7 Discussion

In randomized clinical trials, patient heterogeneity is often a major concern, rendering a universal treatment strategy ineffective. Another significant issue in medical research is the problem of missing data. Most randomized trials are longitudinal in nature but most precision medicine approaches to develop TDRs ignore the longitudinal structure of data. The TDR approach developed and illustrated in this paper is based on the longitudinal structure of the data which, in addition to making more robust use of the data, also allows for a more powerful approach to deal with missing data compared to other popular TDRs.

Another feature of the LS-KLD treatment decision rule developed in this paper is that the optimal single-index biosignature (i.e., linear combination of baseline features) is determined based on the average tangent slope which is an overall measure of improvement (or deterioration) across the longitudinal trajectory. The estimator of the average tangent slope is more efficient and robust to missing data (e.g. MCAR) than other scalar summary measures (e.g. score change between last and first observation). Hence, the LS-KLD method has better performance, e.g., a higher proportion of correct decisions, when there is missing data.

There are several avenues for future research on the use of a Kullback-Leibler divergence to define treatment decision rules. For example, future work will focus on developing efficient and faster algorithms for implementing the LS-KLD TDR with variable selection. Extensions will also investigate the use of more flexible nonparametric link functions, i.e., consider a smooth function of the biosignature in (1), as is done in the SIMML model (Park et al., 2020) and incorporating functional predictors into the TDR (e.g., Ciarleglio et al., 2015; Park et al., 2021). Further investigations into the impact of missing data under different missing data mechanisms (e.g., missing not a random) is currently being studied.


  • Kosorok and Laber [2019] Michael R Kosorok and Eric B Laber. Precision medicine. Annual review of statistics and its application, 6:263–286, 2019.
  • Dagogo-Jack and Shaw [2018] Ibiayi Dagogo-Jack and Alice T Shaw. Tumour heterogeneity and resistance to cancer therapies. Nature reviews Clinical oncology, 15(2):81–94, 2018.
  • Dienstmann et al. [2012] Rodrigo Dienstmann, Danila Serpico, Jordi Rodon, Cristina Saura, Teresa Macarulla, Elena Elez, Maria Alsina, Jaume Capdevila, Jose Perez-Garcia, Gessamí Sánchez-Ollé, et al. Molecular profiling of patients with colorectal cancer and matched targeted therapy in phase i clinical trials. Molecular cancer therapeutics, 11(9):2062–2071, 2012.
  • De Mattos-Arruda et al. [2013] L De Mattos-Arruda, M Oliveira, A Navarro, M Vilaro, P Nuciforo, A Vivancos, J Seoane, J Rodon, J Cortes, and C Saura. Molecular profiling of advanced breast cancer patients and benefit obtained from matched targeted therapy in early phase clinical trials. In EUROPEAN JOURNAL OF CANCER, volume 49, pages S418–S418. ELSEVIER SCI LTD THE BOULEVARD, LANGFORD LANE, KIDLINGTON, OXFORD OX5 1GB …, 2013.
  • Trivedi et al. [2006] Madhukar H Trivedi, A John Rush, Stephen R Wisniewski, Andrew A Nierenberg, Diane Warden, Louise Ritz, Grayson Norquist, Robert H Howland, Barry Lebowitz, Patrick J McGrath, et al. Evaluation of outcomes with citalopram for depression using measurement-based care in STAR-D: implications for clinical practice. American journal of Psychiatry, 163(1):28–40, 2006.
  • Petkova et al. [2017a] Eva Petkova, Thaddeus Tarpey, Zhe Su, and R Todd Ogden. Generated effect modifiers (gem’s) in randomized clinical trials. Biostatistics, 18(1):105–118, 2017a.
  • Watkins [1989] Christopher John Cornish Hellaby Watkins. Learning from Delayed Rewards. PhD thesis, King’s College, Cambridge, UK, May 1989. URL :http://www.cs.rhul.ac.uk/$$̃chrisw/new_thesis.pdf.
  • Murphy et al. [2007] Susan A Murphy, David W Oslin, A John Rush, and Ji Zhu. Methodological challenges in constructing effective treatment sequences for chronic psychiatric disorders. Neuropsychopharmacology, 32(2):257–262, 2007.
  • Zhao et al. [2009] Yufan Zhao, Michael R Kosorok, and Donglin Zeng. Reinforcement learning design for cancer clinical trials. Statistics in medicine, 28(26):3294–3315, 2009.
  • Murphy [2003] Susan A Murphy. Optimal dynamic treatment regimes. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 65(2):331–355, 2003.
  • Schulte et al. [2014] Phillip J Schulte, Anastasios A Tsiatis, Eric B Laber, and Marie Davidian. Q-and A-learning methods for estimating optimal dynamic treatment regimes. Statistical science: a review journal of the Institute of Mathematical Statistics, 29(4):640, 2014.
  • Zhao et al. [2012] Yingqi Zhao, Donglin Zeng, A John Rush, and Michael R Kosorok. Estimating individualized treatment rules using outcome weighted learning. Journal of the American Statistical Association, 107(499):1106–1118, 2012.
  • Uddin et al. [2019] Mohammed Uddin, Yujiang Wang, and Marc Woodbury-Smith. Artificial intelligence for precision medicine in neurodevelopmental disorders. NPJ digital medicine, 2(1):1–10, 2019.
  • Petkova et al. [2020] Eva Petkova, Hyung Park, Adam Ciarleglio, R Todd Ogden, and Thaddeus Tarpey. Optimising treatment decision rules through generated effect modifiers a precision medicine tutorial. BJPsych open, 6(1), 2020.
  • Park et al. [2020] Hyung Park, Eva Petkova, Thaddeus Tarpey, and R Todd Ogden. A single-index model with multiple-links. Journal of Statistical Planning and Inference, 205:115–128, 2020.
  • Tian et al. [2014] Lu Tian, Ash A Alizadeh, Andrew J Gentles, and Robert Tibshirani. A simple method for estimating interactions between a treatment and a large number of covariates. Journal of the American Statistical Association, 109(508):1517–1532, 2014.
  • Henderson et al. [2010] Robin Henderson, Phil Ansell, and Deyadeen Alshibani. Regret-regression for optimal dynamic treatment regimes. Biometrics, 66(4):1192–1201, 2010.
  • Richards [2011] Derek Richards. Prevalence and clinical course of depression: a review. Clinical psychology review, 31(7):1117–1125, 2011.
  • Dinoff et al. [2020] Adam Dinoff, Sean T Lynch, Nitin Sekhri, and Lidia Klepacz. A meta-analysis of the potential antidepressant effects of buprenorphine versus placebo as an adjunctive pharmacotherapy for treatment-resistant depression. Journal of Affective Disorders, 2020.
  • Khan et al. [2012] Arif Khan, James Faucett, Pesach Lichtenberg, Irving Kirsch, and Walter A Brown. A systematic review of comparative efficacy of treatments and controls for depression. PloS one, 7(7), 2012.
  • Trivedi et al. [2016] Madhukar H Trivedi, Patrick J McGrath, Maurizio Fava, Ramin V Parsey, Benji T Kurian, Mary L Phillips, Maria A Oquendo, Gerard Bruder, Diego Pizzagalli, Marisa Toups, et al. Establishing moderators and biosignatures of antidepressant response in clinical care (embarc): Rationale and design. Journal of psychiatric research, 78:11–23, 2016.
  • Petkova et al. [2017b] Eva Petkova, R Todd Ogden, Thaddeus Tarpey, Adam Ciarleglio, Bei Jiang, Zhe Su, Thomas Carmody, Philip Adams, Helena C Kraemer, Bruce D Grannemann, et al. Statistical analysis plan for stage 1 embarc (establishing moderators and biosignatures of antidepressant response for clinical care) study. Contemporary clinical trials communications, 6:22–30, 2017b.
  • Nelder and Mead [1965] John A Nelder and Roger Mead. A simplex method for function minimization. The computer journal, 7(4):308–313, 1965.
  • Tarpey et al. [2021] Thaddeus Tarpey, Eva Petkova, Adam Ciarleglio, and Robert Todd Ogden. Extracting scalar measures from functional data with applications to placebo response. Statistics and Its Interface, 14(3):255–265, 2021.
  • R Core Team [2021] R Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria, 2021. URL https://www.R-project.org/.
  • Zhou et al. [2017] Xin Zhou, Nicole Mayer-Hamblett, Umer Khan, and Michael R Kosorok. Residual weighted learning for estimating individualized treatment rules. Journal of the American Statistical Association, 112(517):169–187, 2017.
  • Murphy [2005] Susan A Murphy. A generalization error for q-learning. Journal of Machine Learning Research, 6(Jul):1073–1097, 2005.
  • Dillon et al. [2015] Daniel G Dillon, Thomas Wiecki, Pia Pechtel, Christian Webb, Franziska Goer, Laura Murray, Madhukar Trivedi, Maurizio Fava, Patrick J McGrath, Myrna Weissman, et al. A computational analysis of flanker interference in depression. Psychological medicine, 45(11):2333–2344, 2015.
  • Ciarleglio et al. [2015] A. Ciarleglio, E. Petkova, R. T. Ogden, and T. Tarpey. Treatment decisions based on scalar and functional baseline covariates. Biometrics, 71:884–894, JUNE 2015.
  • Park et al. [2021] Hyung Park, Eva Petkova, Thaddeus Tarpey, and R. Todd Ogden. Functional additive models for optimizing individualized treatment rules. Biometrics, 2021. in press.