Estimating heterogeneous treatment effects with right-censored data via causal survival forests

01/27/2020 ∙ by Yifan Cui, et al. ∙ University of North Carolina at Chapel Hill University of Illinois at Urbana-Champaign University of Pennsylvania Stanford University 1

There is fast-growing literature on estimating heterogeneous treatment effects via random forests in observational studies. However, there are few approaches available for right-censored survival data. In clinical trials, right-censored survival data are frequently encountered. Quantifying the causal relationship between a treatment and the survival outcome is of great interest. Random forests provide a robust, nonparametric approach to statistical estimation. In addition, recent developments allow forest-based methods to quantify the uncertainty of the estimated heterogeneous treatment effects. We propose causal survival forests that directly target on estimating the treatment effect from an observational study. We establish consistency and asymptotic normality of the proposed estimators and provide an estimator of the asymptotic variance that enables valid confidence intervals of the estimated treatment effect. The performance of our approach is demonstrated via extensive simulations and data from an HIV study.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Recently, random forests have been considered to estimate heterogeneous treatment effects in observational studies. Several examples in statistical and biomedical settings include Athey et al. (2019), Friedberg et al. (2018), Lu et al. (2018), Künzel et al. (2019) and Oprescu et al. (2019). The advantages of forest and tree-based methods, as described in Breiman (2001) and Athey et al. (2019), are two-fold. First, random forests provide a robust nonparametric approach for estimating heterogeneous treatment effects. Second, random forests can quantify the uncertainty of the estimated heterogeneous treatment effects through recent developments (Mentch and Hooker, 2014; Wager and Athey, 2018).

In clinical trials and other biomedical research studies, right-censored survival data are frequently encountered. The literature on random forests is well developed for survival data, e.g., Leblanc and Crowley (1993); Hothorn et al. (2006b) studied survival tree model in context of conditional inference trees. Hothorn et al. (2006a)

proposed to use the inverse probability of censoring weighting to compensate censoring in forest models;

Ishwaran et al. (2008) proposed so-called random survival forests, which extend random forests to handle survival data via using log-rank test at each split on an individual survival tree (Ciampi et al., 1986; Segal, 1988); Zhu and Kosorok (2012)

studied the impact of recursive imputation of survival forests on model fitting;

Steingrimsson et al. (2016)

proposed doubly robust survival trees by constructing doubly robust loss functions that use more information to improve efficiency;

Steingrimsson et al. (2019) constructed censoring unbiased regression trees and forests by considering a class of censoring unbiased loss functions. However, none of these methods were directly targeted at heterogeneous treatment effects in observational studies.

Another topic related to estimating heterogeneous treatment effects is optimal treatment regimes. A significant amount of work has been devoted to estimating optimal treatment rules in randomized trials with complete data (Murphy, 2003; Qian and Murphy, 2011; Zhang et al., 2012; Zhao et al., 2012; Laber and Zhao, 2015). Adapting the outcome weighted learning framework (Zhao et al., 2012), Zhao et al. (2015) proposed two new approaches, inverse censoring weighted outcome weighted learning, and doubly robust outcome weighted learning, both of which require semi-parametric estimation of the conditional censoring probability given the patient characteristics and treatment choice. Zhu et al. (2017) adopted the accelerated failure time model to estimate an interpretable single-tree treatment decision rule. Cui et al. (2017a) proposed a random forest approach for right-censored outcome weighted learning, which avoids both the inverse probability of censoring weighting and restrictive modeling assumptions. However, for observational studies with censored survival outcomes, these methods may suffer from confounding and selection bias. In addition, none of the existing approaches estimates the heterogeneous treatment effect and the associated confidence interval. Hence, it is challenging to provide valid interference for the suggested treatment.

To address these limitations in the existing literature, the proposed random forest approach, namely causal survival forest, aims at estimating the heterogeneous treatment effects from right-censored observational survival data. Compared with existing approaches, the proposed causal survival forest enjoys two advantages. Firstly, our random forest and its associated splitting rules target a direct estimation of the treatment effect while adjusting for the biasedness caused by treatment confounding variables and censoring. Secondly, we can provide a valid inference of the heterogeneous treatment effect, which is much needed in the practical implementation of precision medicine. Our approach is motivated by classical results on semiparametric efficiency theory and survival analysis (Tsiatis, 2007). We construct local estimating equations for the conditional average treatment effect (Athey et al., 2019), which leads to an unbiased splitting rule that addresses the difference between the two potential treatments.

The proposed approach is quite general in the sense that it includes many missing data and causal inference problems. The proposed method can deal with coarsening and missing data as long as the functional of interest admits an unbiased estimating equation. Considering a broader class is to demonstrate the flexibility of our method for various functionals that are generally of interest. Several functionals, such as the marginal mean of an outcome subject to missingness as well as the closely related marginal mean of a counterfactual outcome are within our framework. Thus, the conditional average treatment effect under unconfoudedness can be viewed as a running example to develop the proposed methodology.

2 Causal survival forests

Our goal is to construct a survival forest model (Ishwaran et al., 2008) that can overcome the potential bias caused by observational data and right censoring. Suppose is a dimensional covariate, is the treatment label, is the survival time, and is the censoring time. As we focus on the observational study setting, throughout the paper, we make the following three assumptions on the failure time for identifiability of the conditional average treatment effect.

Assumption 1.

(Consistency) almost surely.

Assumption 2.

(Unconfoundedness) .

Assumption 3.

(Positivity) almost surely if , where .

The consistency assumption states that we observe a realization of only if the treatment is equal to a subject’s actual treatment assignment . This assumption links the potential outcomes to the observed data because for any failure event, only one of the two potential outcomes and

is observed. The unconfoundedness assumption basically states that conditioning on covariate vector

, treatment is independent of potential outcomes. This assumption is satisfied if all prognostic factors used to determine the treatment label are recorded in . Finally, this positivity assumption essentially states that any subject with an observed value of has a positive probability of receiving both values of the treatment.

2.1 Causal forests without censoring

Before discussing treatment effect estimation with censored outcomes, we briefly review the causal forest approach to treatment heterogeneity without censoring. Suppose we want to estimate a treatment effect , where is some deterministic transformation of the survival time . Typical choices of outcome function include expected thresholded survival time and the survivor function .

If we knew that the treatment effects were constant, i.e., for all , then the following estimator due to Robinson (1988) attains rates of convergence, provided the three assumptions detailed above hold and that we estimate nuisance components sufficiently fast (Robins et al., 2017; Chernozhukov et al., 2018):


where , , and and are estimates of these quantities derived via cross-fitting (Schick, 1986). We use the superscript to remind ourselves that this estimator requires access to the complete (uncensored) data.

Here, however, our goal is not to estimate a constant treatment effect , but rather to fit covariate-dependent treatment heterogeneity . For this purpose, we use forests. As background, recall that given a target point , tree-based methods seek to find training examples which are close to and uses the local kernel weights to obtain a weighted averaging estimator. An essential ingredient of tree-based methods is recursive partitioning on the covariate space , which induces the local weighting. When the splitting variables are adaptively chosen, the width of a leaf can be narrower along the directions where the causal effect is changing faster. After the tree fitting is completed, the closest points to are those that fall into the same terminal node as . The observations that fall into the same node as the target point can be treated asymptotically as coming from a homogeneous group.

Athey et al. (2019)

generalizes using random forest-based weights for generic kernel estimation. The most closely related precedent from the perspective of adaptive nearest neighbor estimation are quantile regression forests

(Meinshausen, 2006) and bagging survival trees (Hothorn et al., 2004), which can be viewed as special cases of generalized random forests. The idea of adaptive nearest neighbors also underlies theoretical analyses of random forests such as Lin and Jeon (2006); Biau et al. (2008); Arlot and Genuer (2014). The random forest-based weights are derived from the fraction of trees in which an observation appears in the same terminal node as the target point. Specifically, given a test point , the weights are the frequency with which the -th training example falls in the same leaf as , i.e.,

where is the terminal node that contains in the -th tree, and denotes the cardinality.

The crux of the causal forest algorithm presented in Athey et al. (2019) is to pair the kernel-based view of forests with Robinson’s estimating equation (1). Causal forests seek to grow a forest such that the resulting weighting function can be used to express heterogeneity in , meaning that is roughly constant over observations given positive weight for predicting at . Then, we estimate by solving a localized version of (1):


For further details, including the choice of a splitting rule targeted for treatment heterogeneity, see Athey et al. (2019); further examples are given in Athey and Wager (2019). The idea of using Robinson’s transformation to fit treatment heterogeneity is further explored by Nie and Wager (2017).

2.2 Adjusting for censoring

The central goal of this paper is to develop a causal forest algorithm that can be used despite censoring, i.e., despite the fact that we sometimes do not observe , and instead only observe . Throughout, we assume that is the survival time up to a fixed maximum follow-up time and the censoring is noninformative conditionally on (Fleming and Harrington, 2011).

Assumption 4.

(Conditionally independent censoring) .

Our approach builds on the following recipe to making estimating equations robust to censoring, described in Tsiatis (2007, Chapter 10.4). If the true value of our parameter of interest is identified by a complete data estimating equation, , then is also identified via the following estimator that generalizes the celebrated augmented inverse-propensity weighting estimator of Robins et al. (1994): with


where is the conditional survival function for the censoring process, and is the associated conditional hazard function. In our case, the specific form of our complete data estimating equation enables us to simplify this expression resulting in


where and are the estimated conditional survival function and hazard function, respectively. The associated estimator is characterized by . This estimator attains rates for under the setting of Chernozhukov et al. (2018), i.e., with cross-fitting and 4-th root rates for the nuisance components provided the Assumptions 1-3 and conditionally independent censoring assumption.

2.3 Proposed causal survival forests

We now return to our main proposal, i.e., estimating heterogeneous treatment effects using causal survival forests. Following our discussion above, we proceed as follows. First, we estimate the nuisance components required to to form the score (4). Then, however, instead of estimating a constant parameter, we pair this estimating equation with the forest weighting scheme (2), resulting in estimates characterized by


In order to use this estimator, we of course need to specify how to grow the forest, so that the resulting forest weights adequately express heterogeneity in the underlying signal . Here, for the splitting rule, we use the -criterion proposed in Athey et al. (2019). In particular, we generate pseudo-outcomes by the following relabeling strategy at each internal node.

where is a shorthand of . Next, the splitting criterion proceeds exactly the same as a regression tree (Breiman et al., 1984) problem by treating the pseudo-outcomes ’s as a continuous outcome variable. Specifically, we split the parent node into two child nodes and such as to maximize the following quantity:

2.4 Confidence intervals for the estimated treatment effects

Establishing a confidence interval allows proper statistical inference on the suggested treatment strategy. To build asymptotically valid confidence intervals for centered on , it suffices to derive an estimator for . As shown in Lemma 3.1 and Theorem 3.2 in Section 3, it is enough to study the variance of , where

and is the influence function of the -th observation with respect to the true parameter value , i.e.,


It is easy to see that

We estimate the above variance by

where and are consistent estimators of and , respectively. Many strategies are available for estimating . We estimate it by fitting honest and regular regression forests. To obtain a valid estimation , notice that the term is equivalent to the output of a regression forest with weights and effective outcomes . There are many methods which have been developed to estimate the variance of a regression forest, including work by Sexton and Laake (2009); Wager et al. (2014); Mentch and Hooker (2016); Wager and Athey (2018); Athey et al. (2019). We follow Athey et al. (2019) and use bootstrap of little bags in our implementation.

3 Theoretical results

In this section, we study the asymptotic normality of the estimated treatment effect . Throughout this section, we assume that the covariates are distributed according to a density that is bounded away from zero and infinity. The following assumption guarantees the smoothness of .

Assumption 5.

(Lipschitz continuity) The treatment effect function is -Lipschitz continuous in terms of . In addition, the propensity score , hazard function and conditional survival function are Lipschitz continuous in terms of .

In addition, our trees are symmetric, i.e., their output is invariant to permuting the indices of training samples. Our algorithm also guarantees honesty (Wager and Athey, 2018), and the following two conditions.

Random split tree: At each internal node, the probability of splitting at the -th dimension is greater than , where for .

Subsampling: Each child node contains at least a fraction of the data points in its parent node for some , and trees are grown on subsamples of size scaling as , where , .

Furthermore, we need the following Assumptions 6-9 to couple and , where is an oracle estimator, with and being the underlying truth. Assumptions 6-8 are commonly assumed in the causal inference and survival analysis literature.

Assumption 6.

The failure time and censoring time are bounded. The density functions of the failure time and censoring time are both bounded above for all .

Assumption 7.

Any individual has a positive probability of receiving both treatments. Furthermore, for any , and some .

Assumption 8.

There exists a fixed positive constant , such that for all .

Assumption 9.

Consistency of the non-parametric plug-in estimators: we have the following convergences in probability,

for each . Furthermore, we assume that for each , and for both , , .

Assumption 9 is quite general. Biau (2012); Wager and Walther (2015) show that for the random forest models, can be faster than as long as the intrinsic signal dimension is less than . As shown in Cui et al. (2017b), is achievable for survival forest models. Nonparametric kernel smoothing methods such as Sun et al. (2019) provide estimation with , where depending on the dimension .

Consequently, from Lemma in the Supplementary Material, we have for each , , for . The following lemma provides an intermediate result for our main theorem,

Lemma 3.1.

We assume Assumptions 6-9 hold and converges to zero, where is the minimum terminal node size and is the number of trees. Then for any , we have that , where is the number of trees fitted in the forest model.

The proof of Lemma 3.1 is collected in the Supplementary Material. The technical results in Wager and Athey (2018); Athey et al. (2019) paired with Lemma 3.1 lead to the following asymptotic Gaussianity result.

Theorem 3.2.

Assume Assumptions 5-9 hold. If converges faster than , where is a function that is bounded away from 0 and increases at most polynomially with the log-inverse sampling ratio . Then there exists a sequence such that for any ,

where .

The proof of Theorem 3.2 is deferred to the Appendix. This asymptotic Gaussianity result yields valid asymptotic confidence intervals for the true treatment effect .

4 Simulation studies

We perform simulation studies to compare the proposed method with existing alternatives, including the Cox proportional hazards model using covariates , random survival forests using covariates , , respectively, and random survival forests fitted on treatment arm separately to learn the optimal treatment decision. Note that the last method essentially mimics the virtual twin method (Foster et al., 2011) applied to observational survival data. There are many existing implementations of random survival forests, including R packages randomForestSRC (Ishwaran and Kogalur, 2019), party (Hothorn et al., 2006b), ranger (Wright and Ziegler, 2017), RLT (Zhu, 2018), etc. However, to streamline our presentation and highlight the causal approach in observational studies, we only compare with Ishwaran and Kogalur (2019) to demonstrate the strength of the proposed causal survival forests among the survival forests designed for randomized experiments.

For each of the simulation settings, the optimal treatment assignment was learned based on the estimated treatment effects from a training dataset with sample size . A testing dataset with size 1000 was used to calculate the value function under the estimated rule. Each simulation was repeated 500 times. Tuning parameters need to be chosen for forest-based methods. The minimal number of observations in each terminal node was chosen as 15 (Ishwaran and Kogalur, 2019), , and , where denotes the least integer greater than or equal to . The number of variables available for splitting at each tree node was chosen from , and . The total number of trees was set to 500.

4.1 Simulation settings

We considered the following four scenarios. For each scenario, we generated covariates independently from a uniform distribution on

. In the first scenario, was generated from an accelerated failure time model, and was generated from a Cox model,

where the baseline hazard function , and

followed a standard normal distribution. The follow-up time

, and propensity score , where

is the density function of Beta distribution with shape parameters

and .

In the second scenario, was generated from a proportional hazard model with a non-linear structure, and was generated from an accelerated failure time model

where the baseline hazard function , and followed a standard normal distribution. The maximum follow-up time , and propensity score .

In the third scenario,

was generated from poisson distribution with mean

, and was generated from a Poisson distribution with mean . The maximum follow-up time , and propensity score .

In the fourth scenario, was generated from poisson distribution with mean , and was generated from poisson distribution with mean . The maximum follow-up time , and propensity score . Note that for subjects with , treatment does not affect survival time, and thus both are defined as optimal treatment.

In addition, we evaluated the coverage of the proposed 95% confidence intervals at points , , , , for the above four scenarios, respectively. The true treatment effect was estimated by the Monte Carlo method with sample size 100000. We used the default tuning parameters: the minimal number of observations in each terminal node was set to , and the number of variables available for splitting at each tree node was chosen from . In order to obtain valid confidence intervals, we followed the suggestion of Athey et al. (2019) and fit a large enough number of trees . Each simulation was repeated 500 times. The numerical results are summarized in Table 1.

4.2 Simulation results

Figures 0(a)-0(d) show the boxplots of correct classification rates in test samples with virtual twins as the reference level. In all figures, “VT” denotes the virtual twin method with random survival forests; “SRC1” denotes random survival forests using covariates ; “SRC2” denotes random survival forests using covariates ; “Cox” denotes Cox proportional hazard model using covariates ; “CSF” denotes the proposed causal survival forests.

In Scenarios 1, 3 and 4, the proposed causal survival forest achieves the best performance among all competing methods. In Scenario 2, because the true model for the failure time is the Cox model, it is not surprising that the Cox model performs the best here. Our estimated treatment rule performs better than other random survival forest approaches. In addition, because the proposed method requires the estimation of nuisances and plug-in quantities, the standard deviations of the proposed causal survival forests are slightly larger than other forest-based methods.

Overall, the proposed causal survival forest is superior to ordinary random survival forests which do not target the causal parameter directly. The reason is that the proposed forests directly model the causal effect of the treatment on the survival time. Thus, more splits are placed on the covariates interacting with rather than on the covariates, which only appears in the main effect. Furthermore, as shown in Table 1, the proposed confidence intervals have relatively good coverage at different testing points in various settings.

(a) Scenario 1
(b) Scenario 2
(c) Scenario 3
(d) Scenario 4
Figure 1: Correct classification rate of different methods
1 89.2 79.0 83.8 96.0 97.0
2 95.2 96.2 96.6 96.6 98.0
3 82.8 96.8 95.6 92.4 86.6
4 86.2 81.0 97.8 93.0 84.8
Table 1: Coverage in percent of the proposed 95% confidence intervals in the four scenarios

5 HIV data analysis

We demonstrate the proposed method by an application to the data from AIDS Clinical Trials Group Protocol 175 (ACTG175) (Hammer et al., 1996). The original dataset consists of 2139 HIV-infected subjects. The enrolled subjects were randomized to four treatment groups: zidovudine (ZDV) monotherapy, ZDV+didanosine (ddI), ZDV+zalcitabine, and ddI monotherapy. We focus on the subset of patients receiving the treatment ZDV+ddI or ddI monotherapy as considered in Lu et al. (2013). Treatment indicator denotes the treatment ddI with 561 subjects, and denotes the treatment ZDV+ddI with 524 subjects. Though ACTG175 is a randomized study, there seem to be some selection effects in the subsets used here. For example, for covariate race equals to 1, there are 138 receiving ZDV+ddI and 173 receiving ddI. A binomial test with null probability 0.5 gives p-value 0.05. For this reason, we analyze the study as an observational rather than randomized study.

Here we are interested in the causal effect between ZDV+ddI and ddI on survival time of HIV-infected patients. 12 selected baseline covariates were studied in Tsiatis et al. (2008); Zhang et al. (2008); Lu et al. (2013); Fan et al. (2017). There are 5 continuous covariates: age (year), weight (kg), Karnofsky score (scale of 0-100), CD4 count (cells/) at baseline, CD8 count (cells/

) at baseline. There are 7 binary variables: gender (male = 1, female = 0), homosexual activity (yes = 1, no = 0), race (non-white = 1, white = 0), symptomatic status (symptomatic = 1, asymptomatic = 0), history of intravenous drug use (yes = 1, no = 0), and hemophilia (yes = 1, no = 0). As the outcome considered here is the survival time, we also include CD4 count (cells/

) at weeks, CD8 count (cells/) at weeks as covariates, as well as above 12 covariates.

We applied the proposed causal survival forest to this dataset. We used the default tuning parameters: the minimal number of observations in each terminal node was set to , and the number of variables available for splitting at each tree node was chosen from . We fit a large enough number of trees . The point estimation and 95% confidence intervals from causal survival forest are presented in Figure 2. The confidence interval is wide may due to small sample size. According to the estimated optimal treatment rule obtained from the heterogeneous effect estimation, for the subgroup of patients with age less than 34 (median), 349 patients should be assigned to treatment ddI, while 205 patients should be assigned to treatment ZDV+ddI; For the subgroup of patients with age larger than 34, 272 should be assigned to treatment ZDV+ddI, while 257 patients should be assigned to treatment ddI. In addition, we varied the patients’ age, and other covariates were set to their median values. The estimated treatment effects are all negative when age is less than or equal to 48, while the estimated effects are positive when age is larger than 48. The results suggest that ZDV+ddI is more favorable for older HIV-infected patients. A similar finding was also observed in Lu et al. (2013) and Fan et al. (2017). We note, however, that the confidence intervals for the pointwise effect as reported in Figure 2 are wide, and so we ought not over-interpret the shape of the fitted heterogeneity .

Figure 2: The point estimation (solid line) and 95% confidence intervals (dashed line) from the proposed causal survival forest


We thank Julie Tibshirani for helpful conversations and suggestions.


Appendix A Proof of Theorem 3.2


Given the set of forest weights used to define the generalized random forest estimation with unknown true nuisance parameters, we have the following linear approximation

where denotes the influence function of the -th observation with respect to the true parameter value , and is a pseudo-forest output with weights and outcomes .

Note that Assumptions 2-6 in Athey et al. (2019) hold immediately from the definition of the estimating equation . In particular, is Lipschitz continuous in terms of for their Assumption 4; The solution of always exists for their Assumption 5. By the results shown in Wager and Athey (2018), there exists a sequence for which

where and is a function that is bounded away from 0 and increases at most polynomially with the log-inverse sampling ratio .

Furthermore, by Lemma 4 in Athey et al. (2019),

Following Lemma 3.1, as long as goes faster than , we have


  • Arlot and Genuer (2014) Arlot, S. and Genuer, R. (2014), “Analysis of purely random forests bias,” arXiv preprint arXiv:1407.3939.
  • Athey et al. (2019) Athey, S., Tibshirani, J., and Wager, S. (2019), “Generalized Random Forests,” The Annals of Statistics, 47(2).
  • Athey and Wager (2019) Athey, S. and Wager, S. (2019), “Estimating Treatment Effects with Causal Forests: An Application,” Observational Studies, 5, 36–51.
  • Biau (2012) Biau, G. (2012), “Analysis of a random forests model,”

    Journal of Machine Learning Research

    , 13, 1063–1095.
  • Biau et al. (2008)

    Biau, G., Devroye, L., and Lugosi, G. (2008), “Consistency of random forests and other averaging classifiers,”

    Journal of Machine Learning Research, 9, 2015–2033.
  • Breiman (2001) Breiman, L. (2001), “Random forests,” Machine learning, 45, 5–32.
  • Breiman et al. (1984) Breiman, L., Friedman, J., Stone, C. J., and Olshen, R. A. (1984), Classification and regression trees, CRC press.
  • Chernozhukov et al. (2018) Chernozhukov, V., Chetverikov, D., Demirer, M., Duflo, E., Hansen, C., Newey, W., and Robins, J. (2018), “Double/debiased machine learning for treatment and structural parameters,” The Econometrics Journal, 21, 1–68.
  • Ciampi et al. (1986) Ciampi, A., Thiffault, J., Nakache, J.-P., and Asselain, B. (1986), “Stratification by stepwise regression, correspondence analysis and recursive partition: a comparison of three methods of analysis for survival data with covariates,” Computational Statistics & Data Analysis, 4, 185 – 204.
  • Cui et al. (2017a) Cui, Y., Zhu, R., and Kosorok, M. (2017a), “Tree based weighted learning for estimating individualized treatment rules with censored data,” Electronic Journal of Statistics, 11, 3927–3953.
  • Cui et al. (2017b) Cui, Y., Zhu, R., Zhou, M., and Kosorok, M. R. (2017b), “Consistency of survival tree and forest models: splitting bias and correction,” arXiv:1707.09631.
  • Fan et al. (2017) Fan, C., Lu, W., Song, R., and Zhou, Y. (2017), “Concordance-assisted learning for estimating optimal individualized treatment regimes,” Journal of the Royal Statistical Society: Series B (Statistical Methodology), 79, 1565–1582.
  • Fleming and Harrington (2011) Fleming, T. R. and Harrington, D. P. (2011), Counting processes and survival analysis, vol. 169, John Wiley & Sons.
  • Foster et al. (2011) Foster, J. C., Taylor, J. M., and Ruberg, S. J. (2011), “Subgroup identification from randomized clinical trial data,” Statistics in medicine, 30, 2867–2880.
  • Friedberg et al. (2018) Friedberg, R., Tibshirani, J., Athey, S., and Wager, S. (2018), “Local linear forests,” arXiv preprint arXiv:1807.11408.
  • Hammer et al. (1996) Hammer, S. M., Katzenstein, D. A., Hughes, M. D., Gundacker, H., Schooley, R. T., Haubrich, R. H., Henry, W. K., Lederman, M. M., Phair, J. P., Niu, M., et al. (1996), “A trial comparing nucleoside monotherapy with combination therapy in HIV-infected adults with CD4 cell counts from 200 to 500 per cubic millimeter,” New England Journal of Medicine, 335, 1081–1090.
  • Hothorn et al. (2006a) Hothorn, T., Bühlmann, P., Dudoit, S., Molinaro, A., and Van Der Laan, M. J. (2006a), “Survival ensembles,” Biostatistics, 7, 355–373.
  • Hothorn et al. (2006b) Hothorn, T., Hornik, K., and Zeileis, A. (2006b), “Unbiased Recursive Partitioning: A Conditional Inference Framework,” Journal of Computational and Graphical Statistics, 15, 651–674.
  • Hothorn et al. (2004) Hothorn, T., Lausen, B., Benner, A., and Radespiel-Tröger, M. (2004), “Bagging survival trees,” Statistics in medicine, 23, 77–91.
  • Ishwaran and Kogalur (2019) Ishwaran, H. and Kogalur, U. (2019), Random Forests for Survival, Regression, and Classification (RF-SRC), r package version 2.8.0.
  • Ishwaran et al. (2008) Ishwaran, H., Kogalur, U. B., Blackstone, E. H., and Lauer, M. S. (2008), “Random survival forests,” The annals of applied statistics, 841–860.
  • Künzel et al. (2019) Künzel, S. R., Sekhon, J. S., Bickel, P. J., and Yu, B. (2019), “Metalearners for estimating heterogeneous treatment effects using machine learning,” Proceedings of the National Academy of Sciences, 116, 4156–4165.
  • Laber and Zhao (2015) Laber, E. and Zhao, Y. (2015), “Tree-based methods for individualized treatment regimes,” Biometrika, 102, 501–514.
  • Leblanc and Crowley (1993) Leblanc, M. and Crowley, J. (1993), “Survival Trees by Goodness of Split,” Journal of the American Statistical Association, 88, 457–467.
  • Lin and Jeon (2006) Lin, Y. and Jeon, Y. (2006), “Random forests and adaptive nearest neighbors,” Journal of the American Statistical Association, 101, 578–590.
  • Lu et al. (2018) Lu, M., Sadiq, S., Feaster, D., and Ishwaran, H. (2018), “Estimating Individual Treatment Effect in Observational Data Using Random Forest Methods,” Journal of Computational and Graphical Statistics, 1–11.
  • Lu et al. (2013) Lu, W., Zhang, H. H., and Zeng, D. (2013), “Variable selection for optimal treatment decision,” Statistical methods in medical research, 22, 493–504.
  • Meinshausen (2006) Meinshausen, N. (2006), “Quantile regression forests,” Journal of Machine Learning Research, 7, 983–999.
  • Mentch and Hooker (2014)

    Mentch, L. and Hooker, G. (2014), “Ensemble trees and clts: Statistical inference for supervised learning,”

    stat, 1050, 25.
  • Mentch and Hooker (2016) — (2016), “Quantifying Uncertainty in Random Forests via Confidence Intervals and Hypothesis Tests,” Journal of Machine Learning Research, 17, 1–41.
  • Murphy (2003) Murphy, S. A. (2003), “Optimal dynamic treatment regimes,” Journal of the Royal Statistical Society: Series B (Statistical Methodology), 65, 331–355.
  • Nie and Wager (2017) Nie, X. and Wager, S. (2017), “Quasi-Oracle Estimation of Heterogeneous Treatment Effects,” arXiv:1712.04912.
  • Oprescu et al. (2019) Oprescu, M., Syrgkanis, V., and Wu, Z. S. (2019), “Orthogonal Random Forest for Causal Inference,” in International Conference on Machine Learning, pp. 4932–4941.
  • Qian and Murphy (2011) Qian, M. and Murphy, S. A. (2011), “Performance guarantees for individualized treatment rules,” Annals of statistics, 39, 1180.
  • Robins et al. (2017) Robins, J. M., Li, L., Mukherjee, R., Tchetgen, E. T., van der Vaart, A., et al. (2017), “Minimax estimation of a functional on a structured high-dimensional model,” The Annals of Statistics, 45, 1951–1987.
  • Robins et al. (1994) Robins, J. M., Rotnitzky, A., and Zhao, L. P. (1994), “Estimation of regression coefficients when some regressors are not always observed,” Journal of the American statistical Association, 89, 846–866.
  • Robinson (1988) Robinson, P. M. (1988), “Root-N-Consistent Semiparametric Regression,” Econometrica, 56, 931–954.
  • Schick (1986) Schick, A. (1986), “On asymptotically efficient estimation in semiparametric models,” The Annals of Statistics, 14, 1139–1151.
  • Segal (1988) Segal, M. R. (1988), “Regression Trees for Censored Data,” Biometrics, 44, 35–47.
  • Sexton and Laake (2009)

    Sexton, J. and Laake, P. (2009), “Standard errors for bagged and random forest estimators,”

    Computational Statistics & Data Analysis, 53, 801–811.
  • Steingrimsson et al. (2016) Steingrimsson, J. A., Diao, L., Molinaro, A. M., and Strawderman, R. L. (2016), “Doubly robust survival trees,” Statistics in medicine, 35, 3595–3612.
  • Steingrimsson et al. (2019) Steingrimsson, J. A., Diao, L., and Strawderman, R. L. (2019), “Censoring Unbiased Regression Trees and Ensembles,” Journal of the American Statistical Association, 114, 370–383.
  • Sun et al. (2019) Sun, Q., Zhu, R., Wang, T., and Zeng, D. (2019), “Counting process-based dimension reduction methods for censored outcomes,” Biometrika, 106, 181–196.
  • Tsiatis (2007) Tsiatis, A. (2007), Semiparametric Theory and Missing Data, Springer Series in Statistics, Springer New York.
  • Tsiatis et al. (2008) Tsiatis, A. A., Davidian, M., Zhang, M., and Lu, X. (2008), “Covariate adjustment for two-sample treatment comparisons in randomized clinical trials: a principled yet flexible approach,” Statistics in medicine, 27, 4658–4677.
  • Wager and Athey (2018) Wager, S. and Athey, S. (2018), “Estimation and inference of heterogeneous treatment effects using random forests,” Journal of the American Statistical Association, 113, 1228–1242.
  • Wager et al. (2014) Wager, S., Hastie, T., and Efron, B. (2014), “Confidence Intervals for Random Forests: The Jackknife and the Infinitesimal Jackknife,” Journal of Machine Learning Research, 15, 1625–1651.
  • Wager and Walther (2015) Wager, S. and Walther, G. (2015), “Adaptive Concentration of Regression Trees, with Application to Random Forests,” arXiv preprint arXiv:1503.06388.
  • Wright and Ziegler (2017)

    Wright, M. N. and Ziegler, A. (2017), “ranger: A Fast Implementation of Random Forests for High Dimensional Data in C++ and R,”

    Journal of Statistical Software, 77, 1–17.
  • Zhang et al. (2012) Zhang, B., Tsiatis, A. A., Laber, E. B., and Davidian, M. (2012), “A robust method for estimating optimal treatment regimes,” Biometrics, 68, 1010–1018.
  • Zhang et al. (2008) Zhang, M., Tsiatis, A. A., and Davidian, M. (2008), “Improving efficiency of inferences in randomized clinical trials using auxiliary covariates,” Biometrics, 64, 707–715.
  • Zhao et al. (2012) Zhao, Y., Zeng, D., Rush, A. J., and Kosorok, M. R. (2012), “Estimating individualized treatment rules using outcome weighted learning,” Journal of the American Statistical Association, 107, 1106–1118.
  • Zhao et al. (2015) Zhao, Y.-Q., Zeng, D., Laber, E. B., Song, R., Yuan, M., and Kosorok, M. R. (2015), “Doubly robust learning for estimating individualized treatment with censored data,” Biometrika, 102, 151–168.
  • Zhu (2018) Zhu, R. (2018), Reinforcement Learning Trees, r package version 3.2.2.
  • Zhu and Kosorok (2012) Zhu, R. and Kosorok, M. R. (2012), “Recursively imputed survival trees,” Journal of the American Statistical Association, 107, 331–340.
  • Zhu et al. (2017) Zhu, R., Zhao, Y.-Q., Chen, G., Ma, S., and Zhao, H. (2017), “Greedy outcome weighted tree learning of optimal personalized treatment rules,” Biometrics, 73, 391–400.