Estimation and Optimization of Composite Outcomes

11/28/2017 ∙ by Daniel J. Luckett, et al. ∙ 0

There is tremendous interest in precision medicine as a means to improve patient outcomes by tailoring treatment to individual characteristics. An individualized treatment rule formalizes precision medicine as a map from patient information to a recommended treatment. A regime is defined to be optimal if it maximizes the mean of a scalar outcome in a population of interest, e.g., symptom reduction. However, clinical and intervention scientists often must balance multiple and possibly competing outcomes, e.g., symptom reduction and the risk of an adverse event. One approach to precision medicine in this setting is to elicit a composite outcome which balances all competing outcomes; unfortunately, eliciting a composite outcome directly from patients is difficult without a high-quality instrument and an expert-derived composite outcome may not account for heterogeneity in patient preferences. We consider estimation of composite outcomes using observational data under the assumption that clinicians are approximately (i.e., imperfectly) making decisions to maximize individual patient utility. Estimated composite outcomes are subsequently used to construct an estimator of an individualized treatment rule that maximizes the mean of patient-specific composite outcomes. Furthermore, the estimated composite outcomes and estimated optimal individualized treatment rule can provide new insights into patient preference heterogeneity, clinician behavior, and the value of precision medicine in a given domain. We prove that the proposed estimators are consistent under mild conditions and demonstrate their finite sample performance through a suite of simulation experiments and an illustrative application to data from a study of bipolar depression.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Precision medicine is an approach to healthcare that involves tailoring treatment based on individual patient characteristics (Hamburg and Collins, 2010; Collins and Varmus, 2015). Accounting for heterogeneity by tailoring treatment has the potential to improve patient outcomes in many therapeutic areas. An individualized treatment rule formalizes precision medicine as a map from the space of patient covariates into the space of allowable treatments (Murphy, 2003; Robins, 2004). Almost all methods for estimating individualized treatment rules have been designed to optimize a scalar outcome (exceptions will be discussed shortly). However, in practice, clinical decision making often requires balancing trade-offs between multiple outcomes. For example, clinicians treating patients with bipolar disorder must manage both depression and mania. Antidepressants may help correct depressive episodes but may also induce manic episodes (Sachs et al., 2007; Ghaemi, 2008; Goldberg, 2008). We propose a novel method for using observational data to estimate a composite outcome and the associated optimal individualized treatment rule.

The estimation of optimal individualized treatment rules has been studied extensively leading to a wide range of estimators. These estimators include: regression-based methods like -learning (Murphy, 2005; Qian and Murphy, 2011; Schulte et al., 2014; Laber et al., 2014), -learning (Murphy, 2003; Robins, 2004; Blatt et al., 2004; Moodie et al., 2007; Wallace and Moodie, 2015), and regret regression (Henderson et al., 2010); direct-search methods (Zhang et al., 2012; Zhao et al., 2012; Zhang et al., 2013; Zhou et al., 2017)

based on inverse probability weighting

(Robins, 1999; Murphy et al., 2001; van der Laan and Petersen, 2007; Robins et al., 2008); and hybrid methods (Taylor et al., 2015; Zhang et al., 2017). The preceding methods require specification of a single scalar outcome that will be used to define an optimal regime; were individual patient utilities known, then they could be used as the outcome in any of these methods. However, in general such utilities are not known though they can be elicited provided a high-quality instrument is available (Butler et al., 2017); in the absence of such an instrument, preference elicitation is difficult to apply.

We propose a new paradigm for estimating optimal individualized treatment rules from observational data without eliciting patient preferences. The key premise is that clinicians are attempting to act optimally with respect to each patient’s utility and thus the observed treatment decisions contain information about individual patient utilities. We construct estimators of individual patient utilities which do not require that clinicians are acting optimally, only that they approximately follow an optimal policy. This approach allows us to describe the goals of the decision maker and how these goals vary across patients, determine what makes a patient more or less likely to be treated optimally under standard care, and estimate the decision rule which optimizes patient-specific composite outcomes. We develop this approach in the context of a single-stage, binary decision in the presence of two outcomes.

Other methods for estimating individualized treatment rules in the presence of multiple outcomes include using an expert-derived composite outcome for all patients (Thall et al., 2002; Thall et al., 2007). However, this does not account for differences in the utility function across patients and in some cases it may not be possible to elicit a high-quality composite outcome from an expert. Alternatively, multiple outcomes can be incorporated using set-valued treatment regimes (Laber et al., 2014; Lizotte and Laber, 2016; Wu, 2016), constrained optimization (Linn et al., 2015; Laber et al., 2016), or inverse preference elicitation (Lizotte et al., 2012). Schnell et al. (2017) extend methods for estimating the benefiting subgroup to the case of multiple outcomes using the concept of admissibility (see also Schnell et al., 2016). However, none of these approaches provide a method for estimating an individual patient’s utility.

This work is closely related to inverse reinforcement learning (Kalman, 1964; Ng et al., 2000; Abbeel and Ng, 2004; Ratliff et al., 2006), which involves studying decisions made by an expert and constructing the utility function that is optimized by the expert’s decisions. Inverse reinforcement learning has been successfully applied in navigation (Ziebart et al., 2008) and human locomotion (Mombaur et al., 2009). Inverse reinforcement learning methods assume that decisions are made in a single environment. However, in the context of precision medicine, both the utility function and the probability of optimal treatment may vary across patients. Our approach is a version of inverse reinforcement learning with multiple environments.

In Section 2, we introduce a pseudo-likelihood method to estimate patient utility functions from observational data. In Section 3, we state a number of theoretical results pertaining to the proposed method, including consistency and inference for the maximum pseudo-likelihood estimators. Section 4 present a series of simulation experiments used to evaluate the finite sample performance of the proposed methods. Section 5 presents an illustrative application using data from the STEP-BD bipolar disorder study. Conclusions and a discussion of future research are given in Section 6. Proofs are given in the appendix.

2 Pseudo-likelihood estimation of utility functions

Assume the available data are , , which comprise independent and identically distributed copies of , where are patient covariates, is a binary treatment, and and are two real-valued outcomes for which higher values are more desirable. An individualized treatment rule is a function such that under a patient presenting with covariates will be assigned to treatment . Let denote the potential outcome under treatment and for any regime define . An optimal regime for the outcome , say , satisfies for any other regime . The optimal regime for the outcome , say , is defined analogously. In order to identify these optimal regimes, and subsequently to identify the optimal regime across the class of utility functions introduced below, we make the following assumptions.

Assumption 1.

Consistency, and .

Assumption 2.

Positivity, for some constant and all pairs .

Assumption 3.

Ignorability, and .

These assumptions are standard in causal inference (Robins, 2004; Hernan and Robins, 2010). Assumption 3 is not empirically verifiable in observational studies (Rosenbaum and Rubin, 1983; Rosenbaum, 1984).

Define , then under the preceding assumptions it can be shown that (Zhang et al., 2012). Similarly, it follows that where . In general, need not equal ; therefore, if both and are clinically relevant, neither nor may be acceptable. We assume that there exists an unknown and possibly covariate-dependent utility , where measures the “goodness” of the outcome pair . The optimal regime with respect to , say , satisfies for any other regime . The goal is to use the observed data to estimate the utility and subsequently . Define . For the class of utility functions we consider below, is (possibly covariate-dependent) convex combination of and and is therefore identifiable under the stated causal assumptions and furthermore .

We assume that clinicians act with the goal of optimizing each patient’s utility and that their success in identifying the optimal treatment depends on individual patient characteristics. Therefore, that the clinicians are approximately, i.e., imperfectly, assigning treatment according to . If the clinician were always able to correctly identify the optimal treatment and assign for each patient, there would be no need to estimate the optimal treatment policy (Wallace et al., 2016). Instead, we assume that the decisions of the clinician are imperfect and that where is an unknown parameter. We implicitly assume throughout that may contain higher order terms, interactions, or basis functions constructed from the covariates.

2.1 Fixed utility

We begin by assuming that the utility function is constant across patients and takes the form for some . Lemma 1 of Butler et al. (2017) states that, for a broad class of utility functions, the optimal individualized treatment rule is equivalent to the optimal rule for a utility function of this form. Define and define . Let and denote estimators of and obtained from regression models fit to the observed data (Qian and Murphy, 2011). For a fixed value of , let and subsequently let be the plug-in estimator of . Given and , can be computed for each .

The joint distribution of

is

Assuming that and do not depend on or , the likelihood for is

(1)

which depends on the unknown function . Plugging in for into (1) yields the pseudo-likelihood

(2)

If we let and denote the maximum pseudo-likelihood estimators obtained by maximizing (2), then an estimator of the utility function is and is an estimator of the probability that a patient presenting with covariates would be treated optimally under standard care. An estimator of the optimal policy at is .

Because the pseudo-likelihood given in (2) is non-smooth in , standard gradient-based optimization algorithms cannot be used. However, for a given , it is straightforward to compute the profile estimator . We can compute the profile pseudo-likelihood estimator over a grid of values for and select the point on the grid yielding the largest pseudo-likelihood. The algorithm to construct is as follows.

  1. Set a grid

  2. For each : compute

  3. Select

  4. Set .

Step (2

) can be accomplished using logistic regression. The theoretical properties of this estimator are discussed in Section 

3.

2.2 Patient-specific utility

Outcome preferences can vary widely across patients in some application domains including schizophrenia (Kinter, 2009; Strauss et al., 2010) and pain management (Gan et al., 2004). To accommodate this setting, we assume that the utility function takes the form where is a smooth function. For illustration, we let where is an unknown parameter. Define and define . Let and denote estimators of and obtained from regression models fit to the observed data. For a fixed value of , let and subsequently let be the plug-in estimator of . Assume that decisions are made according to the model . We compute the estimators of by maximizing the pseudo-likelihood

(3)

An estimator for the utility function is and an estimator for the optimal decision function is .

As before, the pseudo-likelihood given in (3) is non-smooth in and standard gradient-based optimization methods cannot be used. It is again straightforward to compute the profile pseudo-likelihood estimator for any . However, because it is computationally infeasible to compute for all on a grid if is of moderate dimension, we generate a random walk through the parameter space using the Metropolis algorithm as implemented in the metrop function in the R package mcmc (Geyer and Johnson, 2017) and compute the profile pseudo-likelihood for each on the random walk. Let . We can compute by estimating using logistic regression as described in Section 2.1. The algorithm to construct a random walk through the parameter space is as follows.

  1. Set a chain length, , fix , and initialize to a starting value in

  2. For :

    1. Generate

    2. Set

    3. Compute

    4. Generate ; if , set ; otherwise, set .

After generating a chain , we select the that leads to the largest value of

as the maximum pseudo-likelihood estimator. Standard practice is to choose the variance of the proposal distribution,

, so that the acceptance proportion is between 0.25 and 0.5 (Geyer and Johnson, 2017).

3 Theoretical results

Here we state a number of theoretical results pertaining to the proposed pseudo-likelihood estimation method for utility functions. We state results for a patient-specific utility function; the setting where the utility function is fixed is a special case. All proofs are deferred to the appendix.

We assume that and that the true utility function is , where has bounded continuous derivative on compact sets and almost surely implies , i.e., the model introduced in Section 2.2 is well-defined and correctly specified with true parameters and . We further assume that the estimators and

are pointwise consistent for all ordered pairs

. Along with assumptions 1-3, we implicitly assume that the densities and exist. The following result states the consistency of the maximum pseudo-likelihood estimators for the utility function and the probability of optimal treatment. The proof involves verifying the conditions of Theorem 2.12 of Kosorok (2008).

Theorem 3.1 (Consistency with patient-specific utility).

Let the maximum pseudo-likelihood estimators be as in Section 2.2, . Assume that is a compact set with and that . Then, and as .

Let be the mean composite outcome in a population where decisions are made according to . The following result establishes the consistency of the value of the estimated optimal policy. The proof uses general theory developed by Qian and Murphy (2011).

Theorem 3.2 (Value consistency with patient-specific utility).

Let be the maximum pseudo-likelihood estimator for and let be the associated estimated optimal policy. Then, under the given assumptions, as .

Next, we derive the convergence rate and asymptotic distribution of . Assume that is a bounded subset of and let be the sup norm over , i.e., for , . Let . Assume that and that . Define , , and . Similarly, define , , and . Let and . Note that and . Let , , , and . We use the following regularity conditions.

Assumption 4.

There exist independent and identically distributed influence vectors

, and , and vector basis functions and such that both

and

Let , , , and . Furthermore, assume that , , , and are bounded by some . Let be positive definite and finite, where .

Assumption 5.

The following conditions hold.

  1. The random variable

    has a continuous density function in a neighborhood of 0 with ;

  2. The conditional distribution of given that converges to a non-degenerate distribution as ;

  3. There exist such that

    where is the -dimensional unit sphere.

Assumption 6.

Define, for , , and ,

(4)

Assume that has a unique, finite minimum over for all .

Let be the maximum pseudo-likelihood estimators given in Section 2.2. The following theorem states the asymptotic distribution of .

Theorem 3.3 (Asymptotic distribution).

Under the given regularity conditions

(5)

where , and .

Let denote convergence in probability over , as defined in Section 2.2.3 and Chapter 10 of Kosorok (2008). Theorem 3.4 below establishes the validity of a parametric bootstrap procedure for approximating the sampling distribution of .

Theorem 3.4 (Parametric bootstrap).

Assume and , where . Assume the regularity conditions given above hold. Let , where is an identity matrix and . Let . Let and define

where is the standard normal density. Define and . Then,

(6)

where is as defined in Theorem 3.3.

If we fix a large number of bootstrap replications, , then will provide an approximation to the sampling distribution of the maximum pseudo-likelihood estimators. In Sections 4 and 5, we demonstrate the use of the bootstrap to test for heterogeneity of patient preferences.

4 Simulation experiments

4.1 Fixed utility simulations

To examine the finite sample performance of the proposed methods, we begin with the following simple generative model. Let

be a vector of independent normal random variables with mean 0 and standard deviation 0.5. Let treatment be assigned according to

, i.e., the probability that the clinician correctly identifies the optimal treatment is constant across patients. Let and be independent normal random variables with mean 0 and standard deviation 0.5 and let and . We estimated and using linear models, implemented the proposed method for a variety of , , and values, and examined , , and , across 500 Monte Carlo replications.

Table 1 contains mean estimates of and across replications along with their associated root mean squared errors (RMSE), and estimated error rate defined as the proportion of time the estimated optimal policy does not recommend the true optimal treatment.

Error rate
100 0.25 0.6 0.321 (0.2413) 0.613 (0.0654) 0.114
0.8 0.252 (0.0574) 0.803 (0.0386) 0.028
0.75 0.6 0.656 (0.2614) 0.610 (0.0718) 0.121
0.8 0.752 (0.0519) 0.805 (0.0410) 0.026
200 0.25 0.6 0.272 (0.1618) 0.613 (0.0404) 0.069
0.8 0.250 (0.0273) 0.802 (0.0267) 0.014
0.75 0.6 0.719 (0.1778) 0.610 (0.0441) 0.076
0.8 0.751 (0.0325) 0.800 (0.0280) 0.016
300 0.25 0.6 0.264 (0.1356) 0.610 (0.0300) 0.056
0.8 0.250 (0.0202) 0.802 (0.0214) 0.011
0.75 0.6 0.736 (0.1328) 0.608 (0.0308) 0.057
0.8 0.749 (0.0198) 0.803 (0.0231) 0.011
500 0.25 0.6 0.249 (0.0823) 0.607 (0.0221) 0.034
0.8 0.250 (0.0151) 0.800 (0.0176) 0.008
0.75 0.6 0.750 (0.0855) 0.606 (0.0208) 0.035
0.8 0.751 (0.0127) 0.800 (0.0187) 0.007
Table 1: Estimation results for simulation where utility and probability of optimal treatment are fixed.

The pseudo-likelihood method performs well at estimating both and , with estimation improving with larger sample sizes as expected. Table 2 contains estimated values of the true optimal policy, a policy where the utility function is known and the -functions are estimated, a policy where both the utility and the -functions are estimated (the proposed method), a policy where the -functions are estimated and the utility is misspecified, and the standard of care. The policy with misspecified utility assumes , which corresponds to Q-learning for optimizing , while ignoring . The value of the standard of care is the mean composite outcome under the generative model. For each policy, the value is estimated by generating a testing sample of size 500 with treatment assigned according to the policy and averaging utilities (calculated using the true ) in the testing set.

Optimal Known Estimated Misspecified Standard of care
100 0.25 0.6 1.896 1.896 1.723 1.770 0.386
0.8 1.900 1.897 1.889 1.767 1.144
0.75 0.6 1.898 1.902 1.697 0.390 0.380
0.8 1.894 1.896 1.888 0.395 1.145
200 0.25 0.6 1.903 1.897 1.823 1.759 0.381
0.8 1.895 1.896 1.896 1.756 1.133
0.75 0.6 1.902 1.901 1.809 0.390 0.377
0.8 1.894 1.897 1.894 0.381 1.144
300 0.25 0.6 1.899 1.897 1.849 1.764 0.381
0.8 1.898 1.900 1.896 1.767 1.139
0.75 0.6 1.901 1.894 1.848 0.401 0.386
0.8 1.901 1.893 1.903 0.400 1.139
500 0.25 0.6 1.898 1.903 1.877 1.761 0.376
0.8 1.901 1.896 1.898 1.763 1.144
0.75 0.6 1.897 1.900 1.877 0.389 0.375
0.8 1.897 1.899 1.897 0.383 1.141
Table 2: Value results for simulation where utility and probability of optimal treatment are fixed.

The column labeled “estimated ” refers to the proposed method. We see that the proposed method produces values which increase with and generally come close to those of the policy where is known. In all settings, the proposed method offers significant improvement over the standard of care. The proposed method also offers significant improvement over the policy where the utility is misspecified.

To further examine the performance of the proposed method, we allow the probability of optimal treatment to depend on patient covariates. Let . This corresponds to the case where . Let , , and be generated as described above. In this generative model, the probability that a patient is treated optimally in standard care is larger for patients with positive values of and smaller for patients with negative values of . We applied the proposed method to 500 replications of this generative model for various and . Table 3 contains mean estimates of along with associated RMSE of , RMSE of , and the error rate.

RMSE of Error rate
100 0.25 0.423 (0.3313) 1.410 0.171
0.75 0.586 (0.3339) 1.441 0.173
200 0.25 0.366 (0.2751) 0.887 0.126
0.75 0.639 (0.2745) 0.885 0.136
300 0.25 0.324 (0.2243) 0.674 0.091
0.75 0.698 (0.2131) 0.689 0.099
500 0.25 0.283 (0.1484) 0.470 0.054
0.75 0.736 (0.1483) 0.495 0.068
Table 3: Estimation results for simulation where utility is fixed and probability of optimal treatment is patient-dependent.

Estimation of the observational policy (as defined by ) improves with larger sample sizes, with the largest improvement coming from an increase from 100 patients to 200. The probability that the estimated policy assigns the optimal treatment also increases with the sample size. The true value of does not affect estimation of or .

Table 4 contains estimated values of the true optimal policy, a policy where the utility function is known and the -functions are estimated, a policy where both the utility and the -functions are estimated (the proposed method), a policy where the -functions are estimated and the utility is misspecified, and the standard of care. Values are estimated from independent testing sets of size 500 as described above. The policy with misspecified utility assumes . The value under the standard of care is the mean composite outcome under the generative model.

Optimal Known Estimated Misspecified Standard of care
100 0.25 1.901 1.892 1.576 1.763 -0.110
0.75 1.897 1.898 1.572 0.377 0.144
200 0.25 1.899 1.898 1.676 1.765 -0.107
0.75 1.901 1.894 1.674 0.382 0.140
300 0.25 1.898 1.904 1.755 1.765 -0.101
0.75 1.901 1.902 1.766 0.387 0.150
500 0.25 1.896 1.897 1.827 1.769 -0.108
0.75 1.901 1.894 1.835 0.386 0.153
Table 4: Value results for simulation where utility is fixed and probability of optimal treatment is patient-dependent.

The proposed method (found in the column labeled “estimated ”) produces values that are close to those of the policy where the utility function is known in large samples and a significant improvement over standard of care in small to moderate samples. Again, the proposed method offers significant improvement over the policy where the utility is misspecified. We note that value under the standard of care differs across . When is close to 1, the composite outcome places more weight on , for which the magnitude of the association with is larger. Because patients with larger values of are more likely to be treated optimally in this generative model, the standard of care produces larger composite outcomes when is closer to 1.

4.2 Patient-specific utility simulations

Next, we examine the case where the utility function is allowed to vary across patients. Let , , and be generated as above. Again, assume that , i.e., . Consider the composite outcome , where , i.e., . We implemented the proposed method for various and examined estimation of and

across 500 replications. Each replication is based on a simulated Markov chain of length 10,000 as described in Section 

2.2. Results are summarized in Table 5.

RMSE of RMSE of Error rate
100 1.959 3.481 0.188
200 1.171 3.418 0.177
300 0.874 3.389 0.167
500 0.613 3.314 0.150
Table 5: Estimation results for simulation where utility and probability of optimal treatment are patient-dependent.

Larger sample sizes produce marginal decreases in the RMSE of . Despite the large RMSE’s of , the estimated policy assigns the true optimal treatment more than 80% of the time for all sample sizes and the error rate decreases as the sample size increases. Table 6 contains estimated values of the true optimal policy, the policy estimated using the proposed method, and standard of care.

Optimal Estimated Standard of care
100 1.563 1.311 -0.015
200 1.559 1.334 -0.011
300 1.562 1.358 -0.014
500 1.564 1.385 -0.016
Table 6: Value results for simulation where utility and probability of optimal treatment are patient-dependent.

The proposed method produces policies that achieve close to optimal value in large samples and significant improvement over the standard of care in small to moderate samples.

Finally, we examine the performance of the parametric bootstrap as described in Section 3. Let be a bivariate vector of normal random variables with mean 0, standard deviation 0.5, and correlation zero. Let and be generated as above and let where the first element of is an intercept. We consider two values of , , and . Let be the vector

with the first element removed. We are interested in testing the null hypothesis

, which corresponds to a test for heterogeneity of patient preferences. Table 7 below contains estimated power across 500 Monte Carlo replications under the null hypothesis, , and the alternative hypothesis, , at level for various sample sizes. Each replication is based on 1000 bootstrap samples.

Power under null Power under alternative
100 0.010 0.858
200 0.018 0.904
300 0.044 0.922
500 0.066 0.948
Table 7: Bootstrap hypothesis test for preference heterogeneity.

The proposed bootstrap procedure demonstrated type I error rates near nominal levels under the null and high power in large samples under the alternative.

5 Case study: The STEP-BD standard care pathway

The Systematic Treatment Enhancement Program for Bipolar Disorder (STEP-BD) was a landmark study of the effects of antidepressants in patients with bipolar disorder (Sachs et al., 2007). In addition to a randomized trial assessing outcomes for patients given an antidepressant or placebo, the STEP-BD study also included a large-scale observational study, the standard care pathway. We apply the proposed method to data from the STEP-BD standard care pathway to estimate decision rules for the use of antidepressants in patients with bipolar disorder.

Although bipolar disorder is characterized by alternating episodes of depression and mania, recurrent depression is the leading cause of impairment among patients with bipolar disorder (Judd et al., 2002). However, the use of antidepressants has not become standard care in bipolar disorder due to the risk of antidepressants inducing manic episodes in certain patients (Ghaemi, 2008; Goldberg, 2008). Thus, the clinical decision in the treatment of bipolar disorder is whether to prescribe antidepressants to a specific patient in order to balance trade-offs between symptoms of depression, symptoms of mania, and other side effects of treatment.

We use the SUM-D score for depression symptoms and the SUM-M score for mania symptoms as outcomes. We consider a patient treated if they took any one of ten antidepressants that appear in the STEP-BD standard care pathway (Deseryl, Serzone, Citalopram, Escitalopram Oxalate, Prozac, Fluvoxamine, Paroxetine, Zoloft, Venlafaxine, or Bupropion). Covariates were screened using linear models and included if there was an interaction with treatment for either outcome. Covariates used for this analysis were age and substance abuse history (yes/no). Figure 1 contains box plots of SUM-D scores on the log scale by substance abuse and treatment. Figure 2 contains box plots of SUM-M scores on the log scale by substance abuse and treatment. For both outcomes, lower scores are more desirable.

Figure 1: Box plots of log SUM-D by substance abuse and treatment.
Figure 2: Box plots of log SUM-M by substance abuse and treatment.

Figure 1 indicates that those without a history of substance abuse benefit from treatment with antidepressants. However, among those with a history of substance abuse, patients treated with antidepressants appear to have worse symptoms of depression. Figure 2 indicates that treatment has no effect on symptoms of mania among those without a history of substance abuse. However, among those with a history of substance abuse, it appears that treatment may be inducing manic episodes. Thus, a sensible treatment policy would be one that tends to prescribe antidepressants only to patients without a history of substance abuse.

We analyzed these data using the proposed method for optimizing composite outcomes. Results are summarized in Table 8 below. We estimated policies where both utility and probability of optimal treatment are fixed (fixed-fixed), where utility is fixed but probability of optimal treatment is assumed to vary between patients (fixed-variable) and where both utility and probability of optimal treatment are assumed to vary between patients (variable-variable). For both the fixed-variable policy and the variable-variable policy, we report in place of and for the variable-variable policy, we report in place of . Thus, for parameters that are assumed to vary across patients, Table 8 contains the mean estimate in the sample. Mean outcomes and value functions are averaged over five fold cross validation. For both SUM-D and SUM-M, lower scores are preferred. Value is reported as the percent improvement over standard of care, calculated using the estimated utility function. Large percent improvements in value are preferred.

Policy SUM-D SUM-M Value (% improvement)
fixed-fixed 2.320 0.837 4.2% 0.000 0.404
fixed-variable 2.320 0.857 5.6% 0.115 0.403
variable-variable 2.319 0.858 7.4% 0.115 0.404
standard of care 2.480 0.868 0.0%
Table 8: Results of analysis of STEP-BD data for SUM-D and SUM-M.

All estimated policies produce more desirable SUM-D scores and SUM-M scores compared to standard of care and improved value according to the estimated utility. Allowing the probability of optimal treatment to vary between patients leads to further improvements in value, as does allowing the utility function to vary between patients. All policies produce similar estimates for the probability of optimal treatment averaged across patients.

The resulting decision rules can be written as the sign of a linear combination of the covariates. As an example, the fixed-fixed policy assigns treatment with antidepressants when is equal to 1. The negative coefficient for substance abuse means that a history of substance abuse indicates that a patient should not be prescribed antidepressants. Prior research has shown that patients with a history of substance abuse are more likely to abuse antidepressants (Evans and Sullivan, 2014). This may contribute to the poor outcomes experienced by STEP-BD patients with a history of substance abuse who were treated with antidepressants. A test for preference heterogeneity based on 10,000 bootstrap replications generated according to Theorem 3.4 yielded a p-value of .

As a secondary analysis, we use the SUM-D score and a side effect score as outcomes. Eight side effects were recorded in the STEP-BD standard care pathway (tremors, dry mouth, sedation, constipation, diarrhea, headache, poor memory, sexual dysfunction, and increased appetite). Patients rated the severity of each side effect from 0 to 4 with larger values indicating more severe side effects. We took the mean score across side effects as the second outcome. Covariates were screened as in the above analyses. Patients in the standard care pathway were asked to report the percent of days over the past week that they experienced mood elevation, irritability, and anxiety; these were included as covariates along with age and history of substance abuse. Results are summarized in Table 9, reported analogously to those in Table 8.