1 Introduction
Identifying treatment with improved efficacy is one of the central tasks in clinical research. Yet, effectiveness may not be the only consideration in practice. For example, treatments are sometimes found to have similar health benefits but very different treatmentrelated risks and costs. The randomized Prostate Cancer Intervention Versus Observation Trial (PIVOT) followed men with localized prostate cancer for nearly 20 years and found that prostatectomy was not associated with significantly lower allcause or prostatecancer mortality than expectant managementWilt et al. (2017). However, prostatectomy costs three times as high as expectant management and induces a higher risk of complications. When two treatment have similar clinical benefits, healthcare policy makers often focus on comparing their costs as the practical and ultimate treatment allocation plan is the one that optimizes the utility of limited healthcare resources. In health economic research, treatments are evaluated using costeffectiveness (CE) analysis in terms of incremental CE, which is usually measured as the contrast between health gains and additional treatmentinduced costs.
Conventional CE analysis focuses on informing the best treatment plan at a population level using average incremental CE; however, this often leads to the “onesizefitsall” treatment strategy that may not be optimal for patients whose responses to treatment vary from the population average. For instance, the National Lung Screening TrialBlack et al. (2015) showed that the CE of screening with lowdose computed tomography varies by age, sex, smoking status, and risk of lung cancer. Analogous to the fact that individuals tend to have heterogeneous treatment benefits, a treatment may also induce different amount of cost for subjects with varying health status, e.g., additional medical procedures for managing complicationsPhelps (1997). All these sources of heterogeneity contribute to the variation in incremental CE among patients, which should be carefully considered to provide unique insights on personalized CEbased decisionmaking. Thus, our work estimates the optimal costeffective individualized treatment rule (CEITR) that tailors treatment recommendations to subjects’ characteristics and maximizes mean costeffectiveness when applied for the entire population.
When the objective is to maximize a single health outcome, the optimal ITR can be estimated using standard modelbased approaches by inverting the estimates of treatmentcovariate interaction terms. However, such methods are vulnerable to model misspecification and they are developed to minimize prediction errors that do not necessarily lead to optimal regimesMurphy (2005a)
. These concerns become more pressing when the goal is to maximize health gains while minimizing medical expenses. Even when both health and cost outcome models are correctly specified, they are likely to suggest different optimal rules that one may not know how to synchronize. In contrast, classification based machine learning methods provide direct estimation of the optimal ITR while imposing fewer constraints on models. These advantages are extremely beneficial to CE studies where multiple outcomes are involved. So far, several different formulations have been proposed in this direction to recast the ITR estimation problem into a classification problem, and a variety of classification tools have been implemented to find the optimal ITR as the classifier that minimizes a suitably weighted misclassification rate
Zhang et al. (2012a); Zhao et al. (2012); Cui, Zhu, and Kosorok (2017). Yet, the referenced methods are developed based on a single health outcome, so we aim to extend them to incorporate CE outcomes for health economic evaluations.The determination of the best CEITR involves understanding both the treatment efficacy and differential costs at a patient level. As clinical research often studies efficacy using a timetoevent endpoint that is subject to censoring, we follow the biomedical literatureLueza et al. (2016)
and focus on the restricted survival time measure that is more sensible for chronic disease studies with limited followup times. The treatment effect on cost is the difference in cumulative costs over a followup period. The estimation of cost has two major challenges: 1) positively skewed distribution
Malehi, Pourmotahari and Angali (2015): Some prior work assumed the sample means of cost are normally distributed, which requires a sufficiently large sample size that may not be satisfied in some studies
Mantopoulos et al. (2016); 2) induced informative censoring. Lin et al.Lin et al. (1997) has pointed out that there is a positive correlation between cumulative costs at death and censoring, so standard survival models may be inappropriate for estimating total costs. Inaccurate estimation of cost may result in substantial bias in both modelbased and classificationbased methods and mislead healthcare policy making; thus, more flexible approaches should be applied to resolve these difficulties.Claims data is one important resource of patientlevel cost data, but when it is unavailable, researchers often obtain cost from microsimulations that are also feasible for randomized trials. A majority of CE analyses are conducted using microsimulation models (e.g., national population simulation models) to help quantify short and longterm ramifications of healthcare decisionsAbraham (2013). Some microsimulationbased CE analyses are free of censoring issues under certain model assumptions, while others are performed alongside clinical trialsRamsey et al. (2015) where nonrandom losstofollowup may occur. There are also CE studies that employ a combination of trial and observational dataHarkanen et al. (2013), in which case the treatmentoutcome confounding and censoring issues may coexist. As healthcare policy makers increasingly demand realworld evidence of the economic value of an intervention, observational data will be widely used in the CE research in the near future. Therefore, robust statistical methods are desirable to accommodate a widespread nature of data.
Motivated by the netmonetarybenefit (NMB) concept in CE analyses, our earlier work recommended to estimate the best CEITR by modeling patients’ survival and cumulative cost simultaneouslyXu et al. (2020)
. However, in practice, this method may suffer unstable estimation or loss in efficiency when the true CEITR has a complicated form or the censoring rate is extremely high, respectively. In this paper, we follow the composite outcome framework in our previous work and propose two improved approaches that account for confounding and induced informative censoring issues, with two layers of novelty. First, we propose a partitioned inverseprobabilityweighting (IPW) estimator and a partitioned augmented IPW (AIPW) estimator for the NMBbased weights that are used in the classification step for identifying the optimal CEITR. The partitioned estimators more effectively incorporate available cost information from observed data therefore improve estimation efficiency; besides, they are robust to the skewed distribution of cost and model misspecification. Second, we combine our weight estimators with an advanced statistical learning approach that is more suitable for modeling complex CEITRs and more effective in mitigating the common overfitting issue in machine learning and handling the correlated tailoring variables in the rule.
The remainder of the paper is organized as follows. We introduce proposed methods in Section 2 and evaluate their performance as compared to several existing approaches via simulation studies in Section 3. In Section 4, we illustrate the topperforming approach from our simulations by evaluating the projected 15year personalized CE of the intensive blood pressure therapy of the NIHfunded Systolic Blood Pressure Intervention Trial (SPRINT) study. Finally, in Section 5, we discuss the practical challenges in CE analysis and the potential extensions of our methods.
2 Method
2.1 Notations and Assumptions
We consider an observational study with a twolevel treatment , where , so its distribution is driven by some of the baseline covariates . We denote the survival time by and censoring time by , then the observed survival time . To incorporate the practical issue of limited followup in most studies, we define the health outcome as the restricted survival time, i.e., , where is the predetermined study time horizon of interest. We denote as the accumulated cost up to an arbitrary time , then, our cost outcome is the cumulative cost up to , . Finally, we define the composite NMB outcome , which scales health outcomes and the related cost in monetary value using the willingnesstopay (WTP) parameter that symbolizes the most one would like to spend for gaining another lifeyear. Thus, our observed data is .
To simplify the discussion on heterogeneity in treatment effects and facilitate the illustration of our proposed method, we adopt the counterfactual framework in causal inference. Let denote the counterfactual survival time under treatment , then the potential restricted survival time and the treatment effect . Analogously, is the counterfactual cumulative cost up to time and the treatmentrelated incremental cost . Note that the treatment effects on cost may vary across individuals under two circumstances: One is when subjects have varied , and the other is when subjects have similar but varied comorbidity status that may accumulate cost in various speeds. For simplicity, we suppress the time dependency part of as in the following text when no confusion exists. We denote the counterfactual NMB as , then the treatment effect on NMB (i.e., incremental NMB) . To identify the counterfactual quantities related to treatment effects in NMB, we extend the classic identification assumptions in causal inference to this setting as the followingRobins and Greenland (1992):

Consistency: and ;

Positivity: for some positive value of and for ;

Ignorability: and , for .
Furthermore, we assume conditional noninformative censoring for survival time; i.e., .
2.2 Direct Estimation of the Optimal CEITR
We denote an arbitrary ITR as a function of subjects characteristics . Let be the potential NMB when treatment is assigned as the rule in . We define the optimal CEITR as the treatment rule that maximizes the average NMB; i.e., . Zhang et al.Zhang et al. (2012a) proposed a classification framework for estimating ITR and showed that is also the optimal classifier that minimizes the weighted misclassification error. We extend this idea to our CE setting as
(1)  
Eq(1) reformulates the estimation of the optimal CEITR problem as a weighted classification problem, in which classifies subjects into two classes that defined by . The class includes patients for whom the treatment is costeffective; so, the treatment should be assigned from a CE perspective. Conversely, since the incremental CE for patients in the class is nonpositive, the baseline treatment or control should be assigned as their costeffective option. We refer to as the class label and as the classification weight, which are both functions of ; thus, our classification algorithm considers cost and effectiveness jointly. We see no misclassification error is yielded when is identical to the underlying truth of CEITR . However, when , a penalty is induced by patients who are failed to receive their costeffective interventions. Note that the classification weight is defined at an individual level; so, misclassifying subjects with large incremental NMBs will result in large penalties. The formulation in Eq(1) is not unique, and alternative objective functions have been proposed by defining and with other transformations of an outcomeZhao et al. (2012).
Since CE analysis adopts a composite outcome, the relationship between baseline covariates and NMB is more complicated than typical comparative effectiveness studies that only consider a single health outcome. To capture the heterogeneity in incremental NMB, modelbased methods usually specify complicated forms to include all possible treatmentcovariate interaction terms. However, overly sophisticated models may mistake some of the noise for real signals and lead to overfitting, and excessively parsimonious models may be vulnerable to model misspecification. In contrast, the classification framework estimates ITRs in a datadriven fashion, so it has significant advantages in CE analyses where the underlying data structure is complex and nonlinear.
Next, we propose a twostep procedure that directly identifies the optimal CEITR. We first describe the use of statistical learning methods in the process of identifying the optimal CEITR regarding solving the weighted classification problem in Eq(1
). Similar as in ITR literature, a broad spectrum of machine learning methods can be applied in our proposal for CEITR estimation. We first review the most common and popular choice, decision tree, and summarize their limitations as compared to forestbased methods. Then, we discuss the major concerns of employing classic random forest algorithm and introduce the advanced conditional forest as our classifier. After that, we discuss the estimation of classification weights in Eq(1) where we propose two partitioned estimators to efficiently estimate the NMBbased weights under a survival setting.
2.2.1 Decision Tree versus Random Forest Classification Methods
Tree based algorithm is a common choice in decision making studiesLaber and Zhao (2015); Shen, Wang, and Taylor (2017); Tao et al. (2018); Xu et al. (2020). A decision treeBreiman et al. (1984) is built using greedy algorithms in which the best splitpoint at each node is chosen by searching through all available features to minimize misclassification error. Also, its hierarchical structure naturally takes into account the interaction terms between predictorsElith, Leathwick and Hastie (2008). Since a tree algorithm puts more emphasis on subjects with larger treatment effects on NMB, individuals with small incremental NMB are more likely to be misclassified when the tree size is small, yet, fully grown trees may suffer from overfitting. Pruning is a technique in machine learning that searches algorithms to balance the complexity and the predictive accuracy of a tree. In our implementation using the R package rpart, we prune off any split that does not reduce the overall lackoffit by a certain amount, which is quantified using the complexity parameter (cp). We choose the cp that produces the smallest 10fold crossvalidation error. Decision trees are favored for properties such as interpretability, transparency, and straightforward implementation. The major drawbacks are overfitting and instabilityLi and Belford (2002)
, which may result in estimators with lowbias and highvariances (“weaklearners”).
A random forestBreiman (2001)
, as an ensemble approach, is a collection of independent and identical singletree models, in which each tree casts a vote for the predicted class, and the class with the majority of the votes is the prediction of the forest. Compared to decision trees, random forests enhance prediction performance by reducing variances in two ways. First, bagging is a machine learning ensemble metaalgorithm designed to improve the stability and accuracy by averaging many weak learners, such as decision trees. Second, random feature selection, on top of bagging, reduces variance by weakening the correlations between trees. Unlike decision trees, random forests pick the best splitting variables among randomly sampled
variables instead of looking through all covariates (). The two techniques allow random forests to attenuate the overfitting issue and provide more generalizable models. Thus, forestbased algorithms have been widely used in decision making research such as identifying subjects with high disease risks in medicine (Kim and Loh, 2001) and making advertising plans (Kim and Loh, 2001) or predicting stock market prices in marketing (Kim and Loh, 2001).In CEITR estimation, it is crucial to identify the key tailoring variables from a potentially large pool of clinical characteristics and patient demographics. As the interpretation of tree models highly depends on selected variables, we avoid the biased feature selection issue by applying the conditional forest approach Hothorn, Hothorn, and Zeileis (2006)
. To deal with various types of covariates, the conditional forest conducts variable selection and variable splitting in two separate steps. In the first step, a global null hypothesis test is executed to evaluate the independence between candidate variables and outcome
and identify the variable with the strongest association to . For hypothesis testing on a particular variable, conditional forest employs a permutation test framework and constructs a linear test statistic using the mean and covariance when conditional on all the other covariates and all possible permutations of
. In the second step, the optimal split is chosen for to maximize a twosample linear statistic measuring the discrepancy between two daughter nodes. Thus, conditional forest conducts a conditional permutation schemeStrobl and Zeileis (2008) and provides unbiased variable selection even when predictors are in multiple types and correlated.Permutation based importance of is typically computed as the decrease in prediction accuracy before and after shuffling the variable . To reflect the true importance of each variable, conditional forest computes the conditional permutation based importance in which the values of are shuffled within groups of correlated covariates Strobl et al. (2008). This is equivalent to conditional on , which resolves the confounding issue in standard random forest. In practice, one may determine the correlated covariate set based on empirical association tests or experts’ domain knowledge. As may contain variables of different types, conditional forest defines a grid using all the binary cutoff points for splitting in the current tree and permutes within this grid. Note that the conditional grid varies across trees as each tree is built using different subsamples and randomly selected covariates. Besides, the grid complexity depends on prespecified control parameters such as the depth of the tree and the number of trees. The final conditional importance is the aggregated mean over all trees.
In this paper, we implement the conditional forest classification using the R package party. We preprune each tree by setting the maximum depth to five and pick the number of candidate features at each node through 10fold crossvalidation. We further illustrate the performance of the conditional forest under our proposal in the SPRINT example where various types of predictors are used. Even though the conditional forest has shown superior performance than a decision tree, it has other limitations including higher model complexity, lower interpretability, and longer computation time. So, we build each forest with 50 trees in the simulation study and with 500 trees in the real data analysis where more covariates are used.
Although our proposal is implemented via a conditional random forest approach, many more sophisticated statistical learning methods, such as convolutional neural network and gradient boosting, can also be applied to perform weighted classification and serve as the underlying algorithm for our proposed method. A neural network models a binary outcome as a function of the linear combination of input variables, in which the crossentropy is minimized via backpropagation. Gradient boosting machine and Bayesian additive regression tree model outcomes by iteratively fitting the residuals at each step. Although these more complicated machine learning algorithms presumably perform well even in extreme cases, such as large and complex data, our proposal using conditional forest presents a compromise between model interpretability and complexity.
2.3 Efficient Estimation of Classification Weights Using Partitioned Estimators
2.3.1 Review of the Existing Estimators
In Eq(1), both the class label and the classification weight are defined in terms of , which is a counterfactual quantity that needs to be estimated using observed data. A natural idea for estimating weights is applying regression models for survival time and cost outcomes separately, in which the treatment effect heterogeneity is modeled using treatment by covariate interaction terms. This weight proposal is referred to as the Regbased estimatorXu et al. (2020), where the treatment effect on restricted survival time is estimated as and the survival function
is estimated using a parametric survival model; The incremental cost is estimated using a generalized linear model with a gamma distribution as
. Finally, . Note that is the estimated individuallevel treatment effect on NMB; so, one may simply determine the optimal regime based on the sign of the weight; i.e., Xu et al. (2020). Regnaive is a modelbased approach since the estimation of the optimal ITR only involves inverting regression estimates without a classification step.Xu et al.Xu et al. (2020) proposed a nonpartitioned AIPW estimator for the NMB weights, and we refer to it as AIPWNP. The AIPWNP estimator employs inverse probability treatment weighting (IPCW) and censoring weighting (IPTW) to account for confounding and right censoring, respectively, as follows:
where and are the estimated propensity scores (PSs) for the two different outcomes; is the event indicator and is the censoring weight. and are regression estimates of survival time and cost, respectively. Due to right censoring, the AIPWNP estimator uses the uncensored subjects to represent censored ones, which is appropriate when only total cost data is available. However, many databases such as the Electronic Health Record and the Medical Expenditure Panel Survey also record cost history data (e.g., monthly cost), which can be used to improve the estimation of classification weights. In the following sections, we propose two partitioned estimators for the NMBbased weights that incorporate cost history information from censored subjects.
2.3.2 The Partitioned IPW Estimator Using Cost History Data
One limitation of the AIPWNP estimator is information loss, meaning that it is incapable of using the incomplete outcome data of censored subjects. Thus, we follow the ideas of Bang and TsiatisBang and Tsiatis (2000) and propose a partitioned IPW (IPWP) estimator for the NMBbased classification weights using cost history data.
We partition the entire study period into subintervals , where . Let be the cost of subject accrued from the subinterval, , and , where . With these subintervalspecific data, we construct the IPWP estimator for incremental cost as the sum of intervallevel weights:
where and remain the same across different subintervals as the treatment is fixed over time, and and are the event indicator and censoring weight for the interval, respectively. Recall a nonpartitioned IPW estimator can only use the total cost of uncensored subjects and discard the incomplete cost data of censored subjects even though they are counted by uncensored participants with whom they share similar baseline characteristics. In contrast, our IPWP approach improves the weight estimation in two aspects: 1) Maximizing the utilization of cost data. By partitioning the entire followup, IPWP employs the cost data of censored subjects from the subintervals before getting censored, so it makes use of incomplete data and improves estimation efficiency. As IPWP gains more information from a longer precensoring period, it shows a greater efficiency improvement when subjects are censored at a later time of the followup. 2) Use of subintervalspecific censoring weights. The IPWP estimator allows using uncensored or “notyet” censored subjects to represent the censored ones in each interval, which provides a richer pool for finding better representers. For instance, suppose subjects A and B share the most similar traits and are censored in intervals five and eight, respectively, and subject C is uncensored. A nonpartitioned estimator can only use subject C to represent subject A for all subintervals, even if subject B is a better match. While the IPWP estimator can use subject B to represent subject A from subintervals five to seven and to identify the next closet match, uncensored or notyet censored subject, for the next interval. This flexible reweighting scheme allows censored individuals to be accounted by the most similar set of subjects available over time, which improves the matching quality and results in lower bias as compared to situations where only one uncensored subject can be used for representation. Note that we describe this bias reduction under a specific setting where the matching or representing process may be imperfect.
With sufficiently small subintervals, every censored subject may contribute one or more precensoring periods, which guarantees a meaningful classification weight for cost for every subject in the study sample. To be consistent, we construct an IPWR estimator for restricted survival time in which the weights of censored subjects are estimated using regression as follows:
Thus, we estimate the NMBbased classification weight as . With a slight abuse of notation, we consider the IPWP weight of subject as an approximation of since it asymptotically estimates the treatment effect on NMBTian et al. (2014).
2.3.3 The Partitioned AIPW Estimator Using Cost History Data
A natural extension of the IPWP estimator is a partitioned AIPW estimator. We follow the ideas of Li et al.Li et al. (2018) but propose the AIPWP estimator in a disaggregated form for estimating the individuallevel NMBbased weights as follows:
where is the regression estimate of the cost that accrued in the subinterval. We say the AIPWP estimator makes “double information gain” as it incorporates extra information from outcome models that captures the relationship between covariates and outcomes when compared to IPWP, and it utilizes most of the cost data from censored subjects as an additional information source when compared to AIPWNP.
In principle, both ways of acquiring additional information can help improve estimation performance; though, several factors may influence the gain from these modifications. Censoring distribution and censoring rate may impact the amount of data that could be obtained from censored subjects. As mentioned earlier, the later the subjects are censored during a followup (e.g., censored at year 10 versus year 1), and the higher the censoring rate is (e.g., 50% versus 10%), the more cost data, i.e., more efficiency, may be gained by using a partitioned weight estimator. Since studies with an early censoring (e.g., oneyear precensoring time) are likely to have high censoring rates (50%), and studies with a late censoring (e.g., 10year precensoring time) tend to have low censoring rates (10%), the partitioned estimators gain either a small amount of data from many subjects or a large amount of data from a few patients. Furthermore, high censoring rates may affect the accuracy of survival time estimator used in the augmentation terms, e.g., due to extrapolation in parametric survival modelsKearns et al. (2020). An alternative is to apply nonparametric survival methods such as random survival forest or Bayesian additive regression trees. In addition, conventional regression has limited ability in modeling highdimensional covariates, so one may apply regularized regression such as lasso or elasticnet, or nonparametric machine learning methods such as random survival forest or causal survival forest.
To summarize, we proposed a twostep procedure: In the first step, we estimate the NMBbased individual weights using the IPWP or AIPWP estimators. In the second step, we plug these estimated weights into Eq(1) and solve it via the statistical learning algorithms of our choice to estimate the optimal CEITR.
3 Simulation Study
3.1 Simulation Schemes
In this section, we conduct simulation studies to evaluate the finite sample performance of our proposed methods. We consider five identical independently distributed covariates , where are and , and are . We generate the treatment assignment
using a logistic regression model as
. We simulate the NMB outcome in two parts, the survival time and cumulative cost, and we consider two commonly used values in CE analyses, $50,000 or $100,000 per year of life, for the WTPsCameron, Ubels and Norstrom (2018).We generate the potential survival times using an exponential proportional hazard model with a hazard rate of , where and are the coefficients for main effect terms and interaction terms, respectively. Every subject has at least 5 years of followup. Note that , , and are confounders as they predict both treatment and survival time, and and are also effect modifiers. We let , and for small and large heterogeneous treatment effect (HTE), respectively. We consider a wide scope of censoring rate, ranging from complete observations to heavy censoring, by generating the censoring time using an exponential proportional hazard model with different baseline hazards. We choose the restriction time as years.
We follow a standard health economic simulation settingLi et al. (2018) and construct the cumulative cost with three components: an initial cost (), an ongoing cost for every 6month (), and a deathrelated cost (). All costs are generated from a Gamma distribution with varied scale parameters: and . We set and or for present or absent HTE on cost, respectively, where and . The potential cumulative cost is simulated as where . In observed data, we only count the deathrelated cost for subjects whose deaths are observed before the end of the study; so, where .
3.2 Simulation Scenarios
We design 32 different simulation scenarios by varying four parameters: 1) Presence of treatment effect modification on both cost and survival times versus on survival times only, denoted as EMTM and EMT; 2) the magnitude of heterogeneity in treatment effects on the hazard ratio scale for restricted survival time, referred to as small HTE and large HTE, defined above; 3) two different WTP thresholds, $50K and $100K per year of life; 4) four different censoring rates, 0%, 20%, 50%, 70%. Under each scenario, we simulate 500 data sets, each with 1000 samples. Our entire simulation is conducted using parallel processing.
We apply seven direct methods named DTAIPWNP, DTIPWP, DTAIPWP, CRFAIPWNP, CRFIPWP, CRFAIPWP, and Regnaive to estimate the optimal CEITR, in which CRFIPWP and CRFAIPWP are our proposed methods. For each name, the first segment indicates the classifier choice, a weighted decision tree (DT) or conditional random forest (CRF), and the second and third segments specify the type of classification weight estimators introduced in Section 2.3. To assess the impact of model misspecification on method performance, we correctly specify the treatment and censoring models but misspecify the survival time and cost outcome models by omitting the treatmentcovariate interaction term .
We evaluate the performance of all seven methods for each of the 16 scenarios using two metrics. First, we compare the classification accuracy of an estimated optimal ITR , where the accuracy is defined as the proportion of subjects that are classified to the correct treatment group based on the true optimal ITR . Second, we compute the mean NMB under each estimated optimal ITR as and compare it to the mean NMB under the true optimal regime .
3.3 Simulation Results
Table 1
shows the average classification accuracy under 16 different scenarios when WTP is $50K. The standard error (SE) represents the variability of the estimates across 500 simulated data sets. Since both outcome models for survival times and cost are misspecified, the Regnaive method yielded the least accurate results among all methods. Given a fixed classifier, methods that use partitioned AIPW based NMB weights resulted in higher accuracy than the ones using nonpartitioned weights across all four levels of censoring. The partitioned IPW based weights outperformed and underperformed the nonpartitioned estimators when the censoring is low (
) and high (), respectively. On the other hand, with a fixed weight estimator, methods that apply a conditional forest classifier resulted in higher accuracy than the ones using a decision tree. Thus, the CRFAIPWP method, using both partitioned weights and conditional random forest classifiers, yielded the most accurate results among all methods and across all scenarios.Regnaive and methods that use nonpartitioned weights (DTAIPWNP and CRFAIPWNP) showed lower CCRs at low censoring rates (0% and 20%) than at high censoring rates (50% and 70%). One possible factor is the misspecified outcome models used in the weight estimation step. In contrast, the IPW based approaches (DTIPWP and CRFIPWP), without using outcome models, showed a decreasing trend in CCR as the censoring rate goes up. With a high censoring rate, the precensoring time of censored subjects was reduced, so the IPWP estimator had fewer subintervals to gain cost information. The AIPWP based methods (DTAIPWP and CRFAIPWP) showed consistently accurate and efficient performance across different levels of censoring since they have two different ways to gain information and remain robust. Methods that use partitioned estimators showed improved performance when the HTE on survival times increased to a larger value, which amplifies the heterogeneity in incremental NMB and yields stronger signals for classifiers to detect. The results under scenarios EMTM and EMT have similar patterns. Results for WTP is $100K are provided in Tables 4 and 5 in the Supplemental Materials.
EM  CR  HTE  Regnaive  DTAIPWNP  DTIPWP  DTAIPWP  CRFAIPWNP  CRFIPWP  CRFAIPWP 

TM  0%  S  51.9 (3.5)  75.6 (3.6)  85.3 (1.4)  86.6 (1.2)  78.8 (3.3)  86.3 (1.2)  87.2 (1.2) 
L  45.8 (3.2)  71.3 (4.3)  86.9 (1.4)  88.1 (1.1)  73.4 (5.0)  87.7 (1.2)  88.4 (1.1)  
20%  S  53.8 (3.3)  76.6 (3.5)  85.1 (1.4)  86.4 (1.3)  80.3 (3.3)  86.3 (1.2)  87.0 (1.2)  
L  47.3 (3.2)  73.4 (3.9)  86.9 (1.4)  87.9 (1.2)  76.3 (4.8)  87.7 (1.2)  88.3 (1.1)  
50%  S  72.4 (1.9)  76.2 (3.7)  60.7 (5.7)  86.3 (1.4)  80.6 (3.3)  55.9 (7.7)  86.8 (1.1)  
L  72.0 (2.1)  77.0 (4.0)  70.2 (5.2)  87.8 (1.3)  81.0 (3.3)  73.9 (5.8)  88.0 (1.2)  
70%  S  73.0 (1.4)  71.4 (3.9)  65.5 (3.4)  82.9 (1.8)  74.5 (4.4)  68.6 (3.1)  83.2 (1.6)  
L  74.2 (1.4)  77.7 (3.3)  63.4 (3.7)  85.6 (1.6)  81.5 (2.9)  66.4 (3.4)  85.5 (1.5)  
T  0%  S  55.0 (3.8)  77.2 (4.3)  89.1 (1.3)  90.0 (1.1)  81.9 (3.9)  90.2 (1.0)  90.6 (1.0) 
L  46.5 (3.5)  71.1 (4.9)  90.5 (1.2)  91.4 (1.0)  75.6 (5.6)  91.6 (1.0)  91.8 (1.0)  
20%  S  57.6 (3.6)  78.8 (4.1)  89.0 (1.3)  90.0 (1.2)  83.1 (3.7)  90.2 (1.1)  90.5 (1.1)  
L  48.8 (3.5)  73.9 (4.6)  90.5 (1.2)  91.4 (1.0)  78.6 (4.7)  91.6 (1.0)  91.8 (1.0)  
50%  S  76.6 (1.9)  80.2 (4.1)  64.9 (5.8)  89.8 (1.2)  85.2 (3.1)  66.9 (7.3)  90.2 (1.0)  
L  76.1 (2.1)  80.8 (4.5)  76.1 (5.0)  91.2 (1.1)  85.5 (3.3)  82.5 (4.2)  91.6 (1.0)  
70%  S  76.3 (1.3)  74.9 (4.4)  69.4 (3.1)  86.7 (1.6)  78.6 (4.4)  73.2 (2.8)  86.9 (1.5)  
L  77.5 (1.3)  81.4 (3.4)  67.9 (3.4)  89.0 (1.6)  85.3 (3.1)  71.1 (3.3)  89.0 (1.4) 
Table 2 shows that more accurate ITRs resulted in less biased mean NMBs as expected. Figures 1 and 2 show that both partitioned estimators and the conditional random forest classifier help to improve the decision boundary estimation (true decision boundary is labeled as “g.opt”). In Figure 1, even though AIPWP based methods did not capture the unclear decision boundary on the upper right corner, they outperformed all the other methods on estimating the clear decision on the lower left corner. In addition, the estimated decision boundaries deviate more from the truth as censoring rate increases. Among all seven methods, the CRFAIPWP yielded the most similar decision boundaries to the truth. The decision boundary graphs for other scenarios are provided in the Supplemental Materials. The implementation via R software of our proposed method is computationally efficient. It took on average approximately 2.7 sec for a typical analysis to estimate the optimal CEITR from a cohort with 1000 samples on an Intel^{®} Xeon^{®} E52667 computer.
EM  CR  HTE  Regnaive  DTAIPWNP  DTIPWP  DTAIPWP  CRFAIPWNP  CRFIPWP  CRFAIPWP  

TM  0%  S  17.1 (0.4)  9.7 (0.7)  14.4 (0.7)  16.0 (0.4)  16.2 (0.4)  14.8 (0.7)  16.1 (0.4)  16.3 (0.4) 
L  18.1 (0.4)  9.1 (0.7)  14.2 (0.9)  17.0 (0.4)  17.2 (0.4)  14.4 (1.0)  17.1 (0.4)  17.3 (0.4)  
20%  S  17.1 (0.4)  9.9 (0.6)  14.5 (0.7)  16.0 (0.4)  16.2 (0.4)  15.1 (0.7)  16.2 (0.4)  16.3 (0.4)  
L  18.1 (0.4)  9.3 (0.7)  14.6 (0.8)  17.0 (0.5)  17.2 (0.4)  15.0 (1.0)  17.1 (0.5)  17.2 (0.5)  
50%  S  17.1 (0.4)  13.3 (0.5)  14.4 (0.8)  11.5 (1.1)  16.2 (0.5)  15.1 (0.7)  10.5 (1.4)  16.3 (0.4)  
L  18.0 (0.4)  13.8 (0.6)  15.2 (0.9)  13.8 (1.1)  17.2 (0.4)  15.9 (0.7)  14.4 (1.2)  17.2 (0.4)  
70%  S  17.1 (0.4)  13.2 (0.4)  13.4 (0.8)  11.7 (0.9)  15.5 (0.5)  14.1 (0.9)  12.3 (0.8)  15.5 (0.5)  
L  18.0 (0.4)  14.0 (0.4)  15.2 (0.8)  12.0 (0.9)  16.6 (0.5)  16.0 (0.7)  12.7 (0.8)  16.6 (0.5)  
T  0%  S  19.4 (0.4)  11.4 (0.8)  16.1 (0.9)  18.6 (0.4)  18.8 (0.4)  17.0 (0.9)  18.7 (0.4)  18.8 (0.4) 
L  20.3 (0.3)  10.1 (0.8)  15.4 (1.1)  19.6 (0.4)  19.7 (0.4)  16.4 (1.3)  19.7 (0.4)  19.8 (0.4)  
20%  S  19.4 (0.3)  11.9 (0.8)  16.4 (0.9)  18.5 (0.4)  18.7 (0.4)  17.3 (0.8)  18.7 (0.4)  18.8 (0.4)  
L  20.3 (0.4)  10.6 (0.8)  16.1 (1.1)  19.6 (0.4)  19.7 (0.4)  17.0 (1.1)  19.7 (0.3)  19.8 (0.4)  
50%  S  19.4 (0.4)  16.0 (0.5)  16.7 (0.9)  13.6 (1.2)  18.7 (0.4)  17.7 (0.7)  14.0 (1.5)  18.8 (0.4)  
L  20.3 (0.4)  16.5 (0.6)  17.5 (1.0)  16.5 (1.1)  19.7 (0.4)  18.4 (0.8)  17.8 (0.9)  19.8 (0.4)  
70%  S  19.4 (0.4)  15.7 (0.4)  15.8 (1.0)  14.2 (0.8)  18.1 (0.5)  16.7 (0.9)  15.1 (0.7)  18.1 (0.4)  
L  20.3 (0.4)  16.5 (0.4)  17.7 (0.8)  14.6 (0.9)  19.2 (0.5)  18.6 (0.7)  15.4 (0.8)  19.2 (0.4) 
4 Application
High blood pressure increases the risk of cardiovascular disease (CVD) events, which are leading causes of death worldwideCDC (2018). About half of the U.S. adults have hypertension, which accounts for nearly $131 billion healthcare expenditure per yearKirkland et al. (2018). The Systolic Blood Pressure Intervention Trial (SPRINT) found the intensive treatment that targets a systolic blood pressure (SBP) of 120 mm Hg reduced CVD events and allcause mortality compared to the standard treatment with a SBP goal of 140 mm Hg in high CVD risk patientsThe SPRINT Research Group (2015). However, these findings may be insufficient to inform healthcare policy making due to the lack of economic evidence of the intervention. Bress et al.Bress et al. (2017) projected lifetime costs and effects of the treatment in SPRINT using a microsimulation model and conducted a CE analysis that showed the intensive treatment is costeffective at a population level. Our analysis extends previous findings by studying the CE of the intensive treatment at a patient level. We adopted the same simulation model in Bress et al.Bress et al. (2017) and applied our proposed method to estimate the most costeffective treatment rule that is tailored to subjects’ characteristics. The model generated 10,000 patients by randomly sampling the SPRINT participants and projected 15year overall survival and total cost under the assumption that the adherence and treatment effects reduce after 5 years. Our NMB outcome is a composite of restricted survival time and the total cost of hypertension treatment, CVD event hospitalizations, chronic CVD treatment, serious adverse events, and background healthcare. A 3% discount rate was used for cost. The analysis was repeated for two WTP thresholds recommended by the American Heart Association, $50K and $100K per lifeyearAnderson et al. (2014).
We identified the optimal CEITR using the topperforming CRFAIPWP method indicated by our simulation study. Since the only source of censoring in our data is administrative censoring, all subjects had complete outcomes data up to 15 years so we used a censoring weight of one in the analysis. We used the same parametric models in the simulation study to estimate the counterfactual restricted survival time and costs. For each outcome model, we specified both the main effect terms and the effect modification terms with the same 17 baseline covariates that predetermined based on background knowledge. These estimated outcomes were then used in the AIPWP estimator. We conducted 10fold crossvalidation to obtain the outofsample estimates of the optimal rules and the mean NMBs. For each of the 10 iterations, we used 9/10 of the data to estimate the classification weights and train a conditional forest and used the rest 1/10 data as the test data for making predictions. To capture the uncertainty of our estimates, we computed the 95% confidence intervals with 1000 bootstrap samples
Efron and Tibshirani (1986), in which the lower and upper bounds are the 2.5^{th} and the 97.5^{th} percentiles of the bootstrap sampling distribution.Table 3 shows that the average treatment effects are positive across both WTP thresholds, which are the evidence for a conventional CE analysis to recommend the intensive treatment for the entire study cohortBress et al. (2017). However, with the consideration of individual heterogeneity, our method reported that the intensive treatment is costeffective for 69% of the patients if they are willing to afford up to $50K for per lifeyear gain, and the proportion increased to 74% if the affordability goes up to $100K per lifeyear. Besides, our evaluation of the mean NMBs under each treatment rule shows that the personalized regime resulted in larger mean NMBs than the uniform rules for both WTP thresholds. This informs healthcare decision makers that individualized treatment rules should be used to maximize the utility of clinical resources for managing hypertension. Our estimates of the average treatment effects reflect the intervention adherence rate in the data and the assumption that treatment effect fades away after the first 5 years of followup.
All patients assigned to  All patients assigned to  AIPWbased CE ITR  

WTP  intensive SBP control  standard SBP control  (95% CI)  
Proportion of Treated  50K  1  0  69% (49%, 84%) 
100K  1  0  74% (54%, 87%)  
Mean NMB Outcome  50K  437212  429848  438166 (4339245, 442638) 
100K  1059798  1037198  1060619 (1053006, 1068140) 
To further assess the capability of our method in handling censoring under a real world setting, we followed the suggestions from one of the reviewers and added arbitrary censoring to the SPRINT data. We generated the censoring time using a discrete uniform distribution, i.e.,
, which results in a 22% event rate. Under this set up, we applied the following four methods: CRFAIPWP, CRFAIPWNP, DTAIPWP, and DTAIPWNP. The results in Table 6 are aligned with the results in Table 3 and show that CEITRs yielded higher mean NMBs than uniform regimes (Treat all or Treat none). In addition, we see that using partitioned NMB weights helps to achieve a more efficient treatment strategy, that is, treating fewer patients but leading to higher mean NMBs. Furthermore, conditional random forests result in better treatment rules than decision trees, especially when partitioned weights are not available, e.g., there is no cost history data. Among all four estimators, CRFAIPWP showed the best performance for both WTP thresholds. Figure 9 displays the variable importance measure of each predictor. Both CRFAIPWP (top right) and DTAIPWP (bottom right) indicate that whether or not taking statin at baseline and whether or not being a current smoker at baseline are the top two important variables for estimating CEITR for both WTP thresholds.5 Discussion
There is a growing interest in evaluating healthcare resource use in the past decade. Many clinical trials that often designed to study the clinical benefit of interventions now also value the collection of economic data. Some researchers also gather cost data by linking patients’ clinical information to claims data such as the allpayer claims database; others may derive costs based on national guidelines such as the National Average Drug Acquisition Cost (NADAC). To understand such data with increasing availability and complexity and gain insights on the CE of certain intervention, we proposed a weighted classification approach to identify the optimal CEITR from a datadriven perspective. We adopted an advanced random forest implementation in which the forests are constructed with unbiased conditional inference trees. Furthermore, we proposed two partitioned estimators for the NMBbased classification weights in which the incomplete cost data of censored subjects are utilized to improve the estimation performance. To advocate personalized treatment plan, our methods estimate the optimal regime as a function of individual level characteristics for situations where there may be heterogeneity in treatment effects on both effectiveness and cost. We proposed methods under an observational study setting so that realworld challenges such as right censoring and confounding are considered. The direct application of our proposed methods to randomized controlled trials is also warranted by simply replacing the inverse probability treatment weights with randomization probabilitiesCui, Zhu, and Kosorok (2017); Chen et al. (2018).
One limitation of our proposed methods is that they may be sensitive to how a followup period is partitioned into subintervals. Using oversized subintervals may lose some cost data from censored subjects due to rounding of time in the last subinterval, while tiny intervals may include too few events and induce extreme censoring weights, especially in studies with low event rates. Also, a large number of subintervals lead to a higher computation burden. When applying our methods, one should be aware of two practical challenges in a CE analysis. First, CE analyses require a set of input parameters such as unit cost or discount rate, whose varied specifications may inform different decision rules; so, sensitivity analysis should be implemented to assess the uncertainty in parametersRamsey et al. (2015). Other input parameters such as event rates or treatment benefits are usually estimated from clinical trials or even observational studies, which may induce selection and confounding bias. So, it is also essential to review and assess the quality of original analysis. Second, due to the limited followup in study samples, withintrial parameter estimates are often projected to a lifetime horizon with certain assumptions. For example, one may assume the treatment effect estimated from a 3year study remains the same for a hundredyear projection, which may be implausible and yield unreasonable results. Highquality CE analysis should be conducted and repeated with multiple justifiable input parameters that reflect the most realistic scenarios. In the meantime, the uncertainty associated with the projection should be evaluated.
We followed the common practice in CE analysis alongside trials and adopted a clinical endpoint as the effectiveness outcome; while other CE analyses that are interested in both the quantity and quality of patients’ life may use qualityadjustedlifeyear (QALY) as an alternative. QALY is measured as the area under the curve of a qualityoflife function that defined by time and health state utilities. One challenge pointed out by Gelber et al.Gelber et al. (1989) in estimating QALYs is the induced informative censoring. Since our methods already account for the same issue for cost, they should be easily extended to situations where NMBs are defined using QALYs. Moreover, in clinical practice, it has become increasingly attractive to extend ITR studies to identify the optimal dynamic treatment regime (DTR) that allows the treatment to change according to a patient’s evolving disease status. Several methods have been developed to address various challenges in this extension when only a single health outcome is of interestHernan et al. (2006); Shen, Wang, and Taylor (2017). Recently, classificationbased direct estimation methods have also demonstrated their contributions to the estimation of optimal DTRLiu et al. (2018). All these pioneer work provides a solid foundation for our future work on extending the current proposal to estimate the most costeffective DTR. At each stage, the optimal treatment decision is identified based on patientspecific knowledge of historical treatment benefits and costs; so, the most costeffective DTR is determined by adjusting the current intervention plan to updated information over time. The development of DTR is extraordinarily beneficial to patients with chronic diseasesLavori and Dawson (2008) as longterm medication intake may increase the risk of drug toxicity and resistance. Also, the extension of DTR with a CE perception helps to control the potentially high medical expenses induced by longterm care and optimize resource allocation in chronic diseases.
This work was directly supported by R01 HL139837 from the National Heart, Lung, and Blood Institute (NHLBI), Bethesda, MD., and the Utah Center for Clinical and Translational Science. Dr. Bellows is supported by K01 HL140170 from the National Heart, Lung, and Blood Institute.
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
References
 Wilt et al. (2017) Wilt TJ, Jones KM, Barry MJ, et al. Followup of Prostatectomy versus Observation for Early Prostate Cancer. N Engl J Med 2017; 377: 132142.
 Black et al. (2015) Black WC, Keeler EB, Soneji SS, et al. Costeffectiveness of CT Screening in the National Lung Screening Trial. N Engl J Med 2015; 372: 388.
 Phelps (1997) Phelps CE. Good Technologies Gone Bad: How and Why the Costeffectiveness of a Medical Intervention Changes for Different Populations. Med Decis Making 1997; 17: 107117.
 Murphy (2005a) Murphy SA. An Experimental Design for the Development of Adaptive Treatment Strategies. Stat Med 2005; 24, 14551481.
 Zhang et al. (2012a) Zhang B, Tsiatis AA, Davidian M, et al. Estimating Optimal Treatment Regimes from a Classification Perspective. Stat 2012a; 1, 103114.
 Zhao et al. (2012) Zhao Y, Zeng D, Rush AJ, et al. Estimating Individualized Treatment Rules Using Outcome Weighted Learning. J Am Stat Assoc 2012; 107, 11061118.
 Cui, Zhu, and Kosorok (2017) Cui Y, Zhu R, and Kosorok M. Tree Based Weighted Learning for Estimating Individualized Treatment Rules with Censored Data. Electron J Stat 2017; 11: 107117.
 Lueza et al. (2016) Lueza B, Rotolo F, Bonastre J, et al. Bias and Precision of Methods for Estimating the Difference in Restricted Mean Survival Time from an Individual Patient Data Metaanalysis. Med Res Methodol 2016; 16–37.
 Malehi, Pourmotahari and Angali (2015) Malehi AS, Pourmotahari F, Angali KA. Statistical Models for the Analysis of Skewed Healthcare Cost Data: A Simulation Study. Health Econ Rev. 2015; 5:11.
 Mantopoulos et al. (2016) Mantopoulos T, Mitchell PM, Welton NJ, et al. Choice of Statistical Model for Costeffectiveness Analysis and Covariate Adjustment: Empirical Application of Prominent Models and Assessment of their Results. Eur J Health Econ 2016; 17: 927938.
 Lin et al. (1997) Lin DY , Feuer EJ, Etzioni R, et al. Estimating Medical Costs from Incomplete Followup Data. Biometrics 1997; 53, 419434.
 Abraham (2013) Abraham JM. Using Microsimulation Models to Inform U.S. Health Policy Making. Health Serv Res 2013; 48, 686695.
 Ramsey et al. (2015) Ramsey SD, Willke RJ, Glick H, et al. Costeffectiveness Analysis Alongside Clinical Trials IIAn ISPOR Good Research Practices Task Force Report. Value Health 2015; 18: 161172.
 Harkanen et al. (2013) Harkanen T, Maljanen T, Lindfors O, et al. Confounding and Missing Data in Costeffectiveness Analysis: Comparing Different Methods. Health Econ Rev 2013; 3: 8.
 Xu et al. (2020) Xu Y, Greene TH, Bress AP, et al. Estimating the Optimal Individualized Treatment Rule from a Costeffectiveness Perspective. Biometrics 2020. doi: 10.1111/biom.13406.
 Robins and Greenland (1992) Robins JM and Greenland S. Identifiability and Exchangeability for Direct and Indirect Effects. J Epidemiol. 1992; 3: 143155.
 Laber and Zhao (2015) Laber EB and Zhao Y. Treebased Methods for Individualized Treatment Regimes. Biometrika 2015; 102(3): 501514.
 Shen, Wang, and Taylor (2017) Shen J, Wang L, and Taylor JM. Estimation of the Optimal Regime in Treatment of Prostate Cancer Recurrence from Observational Data using Flexible Weighting Models. Biometrics 2017; 73(2): 635645.

Tao et al. (2018)
Tao Y, Wang L, and Almirall D. Treebased Reinforcement Learning for Estimating Optimal Dynamic Treatment Regimes.
Ann Appl Stat 2018; 12(3): 19141938.  Breiman et al. (1984) Breiman L, Friedman J, Stone CJ, et al. Classification and Regression Trees. 1st ed. Chapman and Hall/CRC, 1984.
 Elith, Leathwick and Hastie (2008) Elith J, Leathwick JR, and Hastie T. A Working Guide to Boosted Regression Trees. J Anim Ecol 2018; 77: 802813.
 Li and Belford (2002) Li RH and Belford GG. Instability of Decision Tree Classification Algorithms. In: Proceedings of the 8th ACM SIGKDD international conference on knowledge discovery and data mining, Edmonton, 2002. pp.570575.
 Breiman (2001) Breiman L. Machine Learning. 1st ed. Springer, 2001.
 Kim and Loh (2001) Wongvibulsin S, Wu KC, and Zeger SL. Clinical risk prediction with random forests for survival, longitudinal, and multivariate (RFSLAM) data analysis. BMC Med Res Methodol 2020; 20: 1.
 Kim and Loh (2001) Radcliffe NJ. Using control groups to target on predicted lift: Building and assessing uplift models. Direct mark. int. 2007; 1: 14–21.
 Kim and Loh (2001) Basak S, Kar S, Saha S, et al. Predicting the direction of stock market prices using treebased classifiers. North Am. J. Econ. Finance 2019; 47: 552567.
 Hothorn, Hothorn, and Zeileis (2006) Hothorn T, Hornik K, and Zeileis A. Unbiased Recursive Partitioning: A Conditional Inference Framework. J Computational and Graphical Stats 2006; 15: 651674.
 Strobl and Zeileis (2008) Strobl C and Zeileis A. Danger: High power!  Exploring the Statistical Properties of a Test for Random Forest Variable Importance. In: Proceedings of the 18th International Conference on Computational Statistics, Porto, Portugal, 2008.
 Strobl et al. (2008) Strobl C, Boulesteix AL, Zeileis A, et al. Conditional Variable Importance for Random Forests. BMC Bioinformatics 2008; 9: 307.
 Bang and Tsiatis (2000) Bang H and Tsiatis AA. Estimating Medical Costs with Censored Data. Biometrika 2000; 87, 329343.
 Tian et al. (2014) Tian L, Alizadeh AA, Gentles AJ, et al. A Simple Method for Estimating Interactions between a Treatment and a Large Number of Covariates. J Am Stat Assoc 2014; 109, 15171532.
 Li et al. (2018) Li J, Vachani A, Epstein A, et al. A Doubly Robust Approach for Costeffectiveness Estimation from Observational Data. Stat Methods Med Res 2018; 27:, 31263138.
 Kearns et al. (2020) Kearns B, Stevens J, Ren S, et al. How Uncertain is the Survival Extrapolation? A Study of the Impact of Different Parametric Survival Models on Extrapolated Uncertainty About Hazard Functions, Lifetime Mean Survival and Cost Effectiveness. Pharmacoeconomics 2020; 38: 193204.
 Cameron, Ubels and Norstrom (2018) Cameron D, Ubels J, and Norstrom F. On What Basis are Medical Costeffectiveness Thresholds Set? Clashing Opinions and an Absence of Data: A Systematic Review. Glob Health Action 2018; 11:, 1447828.
 CDC (2018) Centers for Disease Control and Prevention, National Center for Health Statistics. Underlying Cause of Death, 19992017. CDC WONDER Online Database. Atlanta, GA: Centers for Disease Control and Prevention; 2018. Accessed January 7, 2019.
 Kirkland et al. (2018) Kirkland EB, Heincelman M, Bishu KG, et al. Trends in Healthcare Expenditures among US Adults with Hypertension: National Estimates, 20032014. J Am Heart Assoc 2018; 7: e008731.
 The SPRINT Research Group (2015) The SPRINT Research Group. A Randomized Trial of Intensive versus Standard Bloodpressure Control. N Engl J Med 2015; 373, 21032116.
 Bress et al. (2017) Bress AP, Bellows BK, King JB, et al. CostEffectiveness of Intensive versus Standard BloodPressure Control. N Engl J Med 2017; 377, 745755.
 Anderson et al. (2014) Anderson JL, Heidenreich PA, Barnett PG, et al. ACC/AHA Statement on Cost/value Methodology in Clinical Practice Guidelines and Performance Measures: A Report of the American College of Cardiology/American Heart Association Task Force on Performance Measures and Task Force on Practice Guidelines. J Am Coll Cardiol 2014; 129, 2329–2345.
 Efron and Tibshirani (1986) Efron B and Tibshirani R. Bootstrap Methods for Standard Errors, Confidence Intervals, and Other Measures of Statistical Accuracy. Stat Sci 1986; 1, 54–75.
 Chen et al. (2018) Chen J, Fu H, He X, et al. Estimating Individualized Treatment Rules for Ordinal Treatments. Biometrics 2018; 74, 924–933.
 Gelber et al. (1989) Gelber RD, Gelman RS, and Goldhirsch AA. Qualityoflife Oriented Endpoint for Comparing Therapies. Biometrics 1989; 45: 781795.
 Hernan et al. (2006) Hernan MA, Lanoy E, Costagliola D, et al. Comparison of Dynamic Treatment Regimes via Inverse Probability Weighting. Basic Clin Pharmacol Toxicol 2006; 98: 237242.
 Liu et al. (2018) Liu Y, Wang Y, Kosorok MR, et al. Augmented OutcomeWeighted Learning for Estimating Optimal Dynamic Treatment Regimens. Stat Med 2018; 37(26): 37763788.
 Lavori and Dawson (2008) Lavori PW and Dawson R. Adaptive Treatment Strategies in Chronic Disease. Annu Rev Med 2008; 59: 443453.