Survival analysis is widely used in the assessment of interventions in clinical trials. In order to provide guidance on future interventions or treatment plans, the estimation of causal effect is highly desirable. Double blinded randomization trials are widely used to estimate the average causal effect (ACE) (Chalmers et al., 1981). The crude unadjusted estimation of ACE is unbiased (Rubin, 1974). However, as pointed out by Fisher (1925)
, adjustment of (low dimensional) features that are correlated with the outcome using an ordinary least square method can reduce the variance of the estimated treatment effect for randomized studies without introducing any bias. The theoretical results are complete for low dimensional covariate cases(Rosenbaum, 1987, 2002; Freedman, 2008; Cole and Hernan, 2004; Lin, Li and Li, 2014).
The adjustment using high dimensional covariates in treatment effect estimation in randomized survival trials was motivated for several reasons. First, the explosion of data made this type of adjustment possible. For example, in many clinical trials, high dimensional baseline information such as genomics, metabolomics or proteomics data are available that can potentially be used as covariates for the adjustment. Second, the relationships between the covariates and the response are complicated. For example, it is often the case that the covariates related to the response are correlated with possible interactions. Third, methods such as penalized survival regression model (such as Lasso (Tibshirani, 1997)) or survival random forest (Ishwaran et al., 2008, 2010) are developed to deal with high dimensional covariates in survival analysis, making adjustment with high dimensional covariates possible.
However, studies in using the high dimensional covariate adjustment to improve the efficiency of the estimation of ACE only emerged recently and is only limited for linear models. Bloniarz et al. (2016) proposed adjustment with Lasso, which, under a rather strong sparsity assumption improves efficiency in the estimation of the ACE. Belloni, Chernozhukov and Hansen (2014)
proposed a de-biased estimator which allowed inference for the ACE, at the price of widening the confidence interval. More recently,Lei and Ding (2018) proposed a new de-biased estimator to do inference under much milder assumptions and allows the number of covariates to diverge under finite population framework. Under super-population framework, Wager et al. (2016)
showed that any intercept-free adjustment can produce unbiased estimation of the ACE, and the estimation precision only depends on the prediction risk of the fitted regression adjustment. There is a gap in characterizing the high dimensional covariate adjustment in survival analysis and the time-varying censored outcome brings challenges for filling this gap. This paper fills this gap by proposing “doubly robust-type estimators" and provide theoretical guarantee for their consistency and asymptotic variance reduction. The assumptions needed are weak, so that the adjustment can be done by many regularized regression methods and nonparametric machine learning methods including the penalized survival regression and nonparametric survival random forest.
The remaining sections are structured as follow. In section 2 we first define the notation and propose the estimators, then we list the assumptions and provide the asymptotic distributions of proposed estimators and study their relative efficiencies. In section 3, we present the simulation results to study the finite sample property of different estimators. In section 4, we applied our method to a real data set for illustration. In section 5, we give more discussion on the methods.
We denote the randomization indicator as , the high dimensional covariates as . Under stable unit treatment value assumption (SUTVA (Rubin, 1978)), we consider the potential time to event for individual shall it be assigned treatment as and the corresponding potential censoring time as for . Also we assume the consistency assumption (Cole and Frangakis, 2009) to link the observed and potential outcome, i.e., and . Due to censoring, we cannot directly observe , instead, we observe the composite outcome where and .
to quantify ACE. Here we consider the ACE on the survival probability scale and define the finite sample ACE as
When considering the finite sample is randomly sampled from a super population, we can define the super population ACE (Imbens and Rubin, 2015) as
The randomness in the estimate of population ACE involves two components, the variation of the design matrix and the conditional variance conditioning on a fixed design where is fixed. In this work, we focus on the ACE when the covariate distribution is fixed, i.e.,
Ignoring the variability in , the two remaining sources of variation are from the potential outcome and the assignment mechanism.
Under the randomization assumption, we know can be consistently estimated without adjustment for covariate if we observe all ’s, i.e.,
So an estimated version using inverse probability censoring weighting (IPCW) for lead to a crude estimator
where and . Here and is a consistent estimator of . Under random censoring assumption, we can estimate using where is the Kaplan Meier estimator (Kaplan, 1958) for . When the censoring depends on or , we could use Kernel based Kaplan Meier estimator as in Uno et al (2007). When , we have , so the above quantity can be further simplified by
Another way to estimate is based on the representation:
By the fixed design, is just the empirical distribution of , so we have the pure model based estimator as below:
where is an estimator of and is defined as . When nonparametric model is used to estimate , usually, we consider the leave one out prediction and then we have
Here is any estimator for that does not depend on the -th training data such as Cox model (Cox, 1972), penalized survival regression model (such as Lasso (Tibshirani, 1997)) or survival random forest (Ishwaran et al., 2008, 2010).
Alternatively, when there is no censoring, a model imputation based estimator could be written as
For semi-parametric model such as Cox model, there is no tuning parameter to choose and including-th individual to fit the model is asymptotically equivalent to fitting without the -th individual. However, for the high-dimensional regression model such as random forest, it is necessary to use the out of bag prediction to remove the strong correlation between and .
Under linear regression setting,Wager et al. (2016) studied the performance of doubly robust-type estimator (Bang and Robins, 2005) when the covariate effects are modeled with high-dimensional regression technique via random forest (Breiman, 2001). We define its counterpart in survival scenario as
Using inverse probability censoring weighted (IPCW) technique (Robins and Finkelstein, 2000), we modify to obtain two estimators based on observable data as below:
Here can be viewed as using model prediction to estimate the individual causal effect among those non-censored individuals and then use the inverse probability censoring weighting (IPCW) to obtain the adjustment needed for these groups. Although have the form similar to , unlike the low dimensional version, it does not have double robust property in usual sense. Here we do need the adjustment to be correct for the consistency of , i.e., need to be a risk-consistent estimator.
To study the performance of , , , , there are several assumptions involved beyond the well-known assumptions such as SUTVA and consistency. Here we give their formal definitions before we build the theoretical results for these adjusted estimators.
Assumption 1 (Random censoring): We assume is independent of . Also, we assume that for the time of interest , there exist a time point and a positive number such that and .
Assumption 2 (Risk-consistency): An estimator is ‘Risk-consistent’ to if
uniformly over , for some with .
Assumption 3 (Jackknife-compatible): An estimator is called ‘jackknife-compatible’ if the expected jackknife estimator (Efron and Stein, 1981) of the variance process for converges to 0 uniformly over and .
Under Assumption 1, we know that the estimated IPCW weights is bounded away from 0 and is uniformly consistent to over . Combine with the consistency assumption on , we are able to obtain the consistency of our proposed estimators.
Theorem 1: Assume that holds by design and assumption 1 holds, the crude estimator is consistent. If assumptions 2 and 3 also hold, then the model based estimator and the high dimensional adjustment estimators , are uniformly consistent to for .
To further obtain the asymptotic normality, we need additional assumptions for and while the limiting distribution of is in general hard to obtain and might not be Gaussian. For example, when Lasso estimator is used for , Knight and Fu (2000) showed that the limiting distribution is not tractable and can have mass at 0. So we will not discuss the asymptotics results for . Although also appears in the estimator and , its bias can be canceled based on the randomization of as shown in the proof. Here we summarize the asymptotics for , and in the following two theorems. The proof can be found in the Web Appendix A.
Theorem 2: Assume that holds by design and assumption 1 holds, the proposed estimator satisfies that weakly converges to Gaussian process with mean and covariance process , where the form and a consistent estimation can be found in the Web Appendix A.
Theorem 3: Assume that holds by design, and assumptions 1-3 hold, the proposed estimators , satisfy and weakly converge to Gaussian process with mean and covariance process and , where the form and a consistent estimation can be found in the Web Appendix A.
In practice, for and , the asymptotic variance due to IPCW is often smaller than that due to variation in . Then we have the following results regarding the efficiency comparing the three estimators , and .
Theorem 4: Assume that holds by design and the assumptions 1-3 hold, the estimators , are asymptotic more efficient than in the sense that for any fixed non-negative weight function , we have holds asymptotically. The second equal sign holds only when has no effect on , i.e., with probability 1 for all where is non-zero. The proof can be found in the Web Appendix A.
2.4 Survival Random Forest
Above, we have shown the performance of and under assumptions 1-3. To make assumptions 2-3 hold, we use the survival random forest (SRF) estimator (Ishwaran et al., 2008) as our nonparametric estimator for . Here we briefly review the algorithm to obtain .
Like the traditional random forest, SRF is an assemble of multiple trees obtained through bootstrap samples. To be more specific, bootstrap samples are drawn from the original data and one survival tree is grown from each sample. For each tree, at each node, randomly select candidate variables and split the node using the candidate variable that maximizes the difference of survival between children nodes. Then grow the tree to full size under the constraint that a terminal node should have no less than unique deaths. At each terminal node, survival function is calculated using Kaplan Meier Estimates. The average survival function for all out of bag data (Bootstrap data that does not contain sample ) was calculated as .
Under the assumption of random censoring and with all predictors categorical, Ishwaran et al. (2008) showed that SRF provides a consistent estimator. Also, given the out of bag prediction, similar as simple random forest, we know that the Jackknife compatibility assumption holds (Wager et al., 2016).
We study the performance of our proposed estimators in both low dimensional and high dimensional cases. For the high-dimensional adjustment model, we experiment both SRF and Lasso type estimators.
We set , and the number of variables () and true sparsity () as , or . We generate
from multivariate normal distribution withcorrelation structure and the autocorrelation parameter is set at . Survival time and are independently sampled using Cox model with hazard functions and where and for indicate the covariate effect and , represent the overall effect magnitude. We use censoring distribution to obtain approximately event rate. We set of interest as the median observation time calculated from a large sample (N=50,000) of observed pooling treatment and control groups. We vary from 0 to 1, and let or and with changing from 0 to 1. For each data setting we perform 100 simulations to study the performance of the four estimators, , , , defined in (5), (7), (8) and (9) paired with three different models (Cox, Lasso and Random Forest). Since for some estimators, the asymptotic distribution is intractable, we use the Bootstrap method to construct the nominal 95% confidence intervals and evaluate their empirical coverage rate (CR) and power (proportion rejected at significance level 0.05) in finite sample cases.
For the high-dimensional settings, we only focus on Lasso adjustment and Random Forest adjustment, because Cox adjustment is not applicable. In figure 1, we show the power curve of the four estimators from two different models (Lasso and Random Forest) when there is no effect () and there is effect () under high dimensional settings with sparsity, i.e., and high dimensional setting without sparsity, i.e.,. From figure 1, the performance of Random Forest and Lasso are similar; outperforms and no matter the covariate effect exists or not. When there is no covariate effect, has some slight power loss, but is better than when covariate effects exist. ’s power lies in between and , which is consistent with our theoretical findings.
In figure 2, we show the relative MSE of the four estimators, , , , with respect to the change of effect size of () when there is no interaction (left: ), or there are interactions (right: , ) under high-dimensional settings () for . From figure 2, we see that and are both less efficient than . The relative MSEs for the adjustment methods , and all decrease when the covariate effect increases. Similar results hold for other ’s.
In Web Appendix B, we show additional simulation results for the comparison of Cox adjustment, Lasso adjustment and Random Forest adjustment in low dimensional settings. In figure S1, we show the power change of the four estimators paired with the three different models with respect to the change of effect size with or without the covariate effect under the low dimensional setting where . We see that the overall performance of the three different models are about the same; for all of them, and perform better than no matter the covariate effect exists or not. When there is no covariate effect, has some slight power loss, but is better than when covariate effects exist.
In figure S2, we show the relative MSE of the four estimators from three different models respective to the change of covariate effect () with interaction () or without interaction () under low dimensional setting, with . From figure S2, we see similar results comparing to the high dimensional settings. Similar results hold for other ’s.
For some representative settings, we report the bias, SD, mean estimated standard error (ESE), and CR for these estimators in table 1. From the table, we can see that all estimators are unbiased and with CRs close to, the ESE’s are close to SD’s under all scenarios. The Cox model usually does not converge when number of covariates are large, so we did not present its result.
|(0.5,10,10 ,0 ,0)||0.014||0.126||0.118||1.000||0.94||0.013||0.125||0.118||1.000||0.94|
|(0.5,10,10 ,0.5 ,0.5)||0.007||0.109||0.111||1.000||0.95||0.007||0.109||0.111||1.000||0.95|
|(0,50,10 ,0 ,0)||0.032||0.131||0.126||1.000||0.90||0.032||0.131||0.126||1.000||0.90|
|(0.5,50,10 ,0 ,0)||0.013||0.125||0.118||1.000||0.94||0.013||0.125||0.118||1.000||0.94|
In conclusion, with low dimensional covariate information, the adjustment by regression without variable selection improves MSE, while when the number of covariates increases, the Cox model does not converge and therefore variable selection methods like Lasso and Random Forest need to be considered for the adjustment. The proposed estimator with random forest adjustment is shown to outperform in all scenarios when compared with and in terms of both power and relative efficiency, which is consistent with our theoretical results. Also, has similar performance as in our experimented data settings. Given the availability of its asymptotic properties, we recommend the use of with Random Forest.
4 Real data example
We applied our method to a real data example. Our data is from the Center for International Blood and Marrow Transplant Research (CIBMTR), on high risk Philadelphia-negative acute lymphoblastic leukemia (ALL) patients age 16 or older who underwent allogeneic hematopoietic stem cell transplantation (allo-HCT) in first complete remission (CR1) or second complete remission (CR2) between 1995 and 2011. The CIBMTR is comprised of clinical and basic scientists who share data on their blood and bone marrow transplant patients, with the CIBMTR Data Collection Center located at the Medical College of Wisconsin. The CIBMTR has a repository of information regarding the results of transplants at more than 450 transplant centers worldwide. Allo-HCT is a potential life-saving therapy for high risk ALL patients. We compare the overall survival probability between human leukocyte antigen (HLA) identical sibling donor (SIB) and 8/8 HLA-matched unrelated donor (MURD). It is impractical or impossible to conduct a randomized clinical trial for such comparison since SIB donors are available for only 20-30% of all eligible patients. For the illustrative purpose, we use balancing propensity score approach to mimic a randomized trial. The data set used for our example with compete information consists of 523 SIB patients and 210 MURD patients, respectively. The variables considered in propensity score modeling and mimicking-trial-matching include cytogenetic abnormality (cytoabnorm), conditioning regimen (condtbi), Karnofsky score (kps), graft-verse-host disease (GVHD) prophylaxis (gvhdgpc), white blood count (wbcdxgp), graft-type (graftype), patient age (age) and year of transplantation (yeargp). Our pseudo-randomized trial cohorts consists of 156 SIB patients and 156 MURD patients, and all above adjusted variables are fully balanced between cohorts. We leave specific cytogenetic risk categories unmatched due to rare presence incidences (see table S1 in Web Appendix C for detailed cytogenetic abnormality risk category list). The main purpose of this example is to show for the efficiency improvement using proposed estimator with adjustment of high dimensional covariates. Our main predictor of interest is MURD. The short covariate list include condtbi, yeargp, del1q, trisx, t411. The medium list further include kps, age, graftype, gvhdgpc, wbcdxgp. The long list further include information on add5qdel5q, add12pdel12p, del7qm7, m17i17qdel17p, add7pi7q, tratriphyper, lohyperpo, add9pdel9p, t119, t1011, t1119, del6q, del11q, tris8, complex, lohyper, hihyper, tetra, mk, axcomplex, m13, tris6, tris10, tris22, add14q32, del1q, del17p.
For each list type, we compared the four estimators under Cox adjustment, Lasso adjustment and Random Forest adjustment. We selected the time points from to month since transplantation and the estimated and corresponding point-wise 95% Confidence Interval for crude estimator and random-forest adjusted estimator with short and long covariate list for adjustment are shown in figure 3. We also computed the average among the period and the results are shown in table 2. From the results, we can see that the adjusted analysis provide us narrower confidence interval for both low and high-dimensional setting when comparing to the crude estimator.
|Cox Adjustment||Lasso Adjustment||Random Forest Adjustment|
|Short List Adjustment|
|Medium List Adjustment|
|Long List Adjustment|
In this work, we studied how to effectively use high-dimensional covariate information to reduce the variance of estimation for ACE under randomization trial. Both simulation studies and the real data example illustrate such efficiency gain compared with the crude IPCW estimator. The proposed estimators does not depend on either the semiparametric model (e.g., Cox) for the survival outcome or the assumption of homogeneity effect among population.
One strong assumption we made is random censoring. This assumption ensures that (1) the model for inverse probability censoring part is correctly specified and (2) the random forest estimator satisfy the risk consistency requirement. In general, as long as we can find a high-dimensional estimator that satisfies the risk consistency requirement, we can use a semiparametric model to estimate the censoring probability and extend our results to dependent censoring.
Also we would like to point out that to keep our asymptotic results clean, the augmentation part for our proposed estimator is not optimal. The use of augmented term similar to Tsiatis (2006) and Lok, Yang, Sharkey and Hughes (2014) could be considered to increase efficiency with the augmented part . Although known to be most efficient when the true is used, it is unclear whether using estimated augmentation part from high-dimensional regression model will still achieve such efficiency. Due to the fact that the conditional expectation follows a nonparametric model, the asymptotic behavior of such estimator is nontrivial. Given that our simple estimator already shows decent improvement in relative efficiency, here we choose not to use augmented IPCW with its optimal form in this work.
In Web Appendix A of the Supplementary Materials, we provide the proof of Theorem 1-4. In Web Appendix B of the Supplementary Materials, we provide the additional simulation results (Figure S1 and S2). In Web Appendix C of the Supplementary Materials, we provide the full description of Cytogenetic abnormalities (Table S1).
- Bang and Robins (2005) Bang, H. and Robins, J. M. (2005) Doubly robust estimation in missing data and causal inference models. Biometrics 61, 962-973.
- Belloni, Chernozhukov and Hansen (2014) Belloni, A., Chernozhukov, V., and Hansen, C. (2014) Inference on treatment effects after selection among high-dimensional controls. The Review of Economic Studies 81, 608–650.
- Bloniarz et al. (2016) Bloniarz, A., Liu, H., Zhang, C-H., Sekhon, J. S., and Yu, B. (2016) Lasso adjustments of treatment effect estimates in randomized experiments. PNAS 113, 7383-7390.
- Breiman (2001) Breiman, L. (2001) Random forests. Machine Learning 45, 5-32.
- Chalmers et al. (1981) Chalmers, T. C., Smith, H. Jr., Blackburn, B., Silverman, B., Schroeder, B., Reitman, D., and Ambroz, A. (1981) A method for assessing the quality of a randomized control trial. Controlled Clinical Trials 2, 31–49.
- Chen and Tsiatis (2001) Chen, P. and Tsiatis A. A. (2001) Causal inference on the difference of the restricted mean life between two groups. Biometrics 57, 1030-1038.
- Cole and Hernan (2004) Cole, S. R. and Hernan, M. A. (2004) Adjusted survival curves with inverse probability weights. Computer Methods and Programs in Biomedicine 75, 45–49.
- Cole and Frangakis (2009) Cole, S. R. and Frangakis, C. E. (2009) The consistency statement in causal inference: a definition or an assumption. Epidemiology 20, 3-5.
- Cox (1972) Cox, D. R. (1972) Regression models and life-tables (with discussion). Journal of the Royal Statistical Society: Series B 34, 187-202.
- Efron and Stein (1981) Efron, B. and Stein, C. (1981) The jackknife estimate of variance. The Annals of Statistics 9, 586–596.
- Fisher (1925) Fisher, R. (1925) Statistical Methods for Research Workers Oliver and Boyd, Edinburgh, UK.
- Freedman (2008) Freedman, D. (2008) On regression adjustments in experiments with several treatments. Annals of Applied Statistics 2, 176–196.
- Hernan (2010) Hernan, M. A. (2010) The hazards of hazard ratios. Epidemiology 21, 13-15.
- Imbens and Rubin (2015) Imbens, G. W. and Rubin, D. B. (2015) Causal Inference for Statistics, Social, and Biomedical Sciences. Cambridge University Press, New York. 15(4): 757-773.
- Ishwaran et al. (2008) Ishwaran, H., Kogalur, U. B., Blackstone, E. H., and Lauer, M. S. (2008) Random survival forests. The Annals of Applied Statistics 2, 841-860.
- Ishwaran et al. (2010) Ishwaran, H., Kogalur, U. B., Gorodeski, E. Z., Minn, A. J., and Lauer, M.S. (2010) High-dimensional variable selection for survival data. Journal of the American Statistical Association 105, 205-217.
- Kaplan (1958) Kaplan, E. L. and Meier, P. (1958) Nonparametric estimation from incomplete observations. Journal of the American Statistical Association 53, 457-481.
- Knight and Fu (2000) Knight, K. and Fu, W. (2000) Asymptotics for Lasso-type estimators. Annals of Statistics 28, 1356-1378.
- Lei and Ding (2018) Lei, L and Ding, P. (2018) Regression adjustment in randomized experiments with a diverging number of covariates. Arxiv 1806.07585v2.
Lin, Li and Li (2014)
Lin, H., Li, Y., and Li, G. (2014) A semiparametric linear transformation model to estimate causal effects for survival data.Canadian Journal of Statistics 42, 18-35.
- Lok, Yang, Sharkey and Hughes (2014) Lok, J., Yang, S., Sharkey, B., and Hughes M.D. (2018) Estimation of the cumulative incidence function under multiple dependent and independent censoring mechanisms. Lifetime Data Analysis 24, 201-223.
- Robins and Finkelstein (2000) Robins, J. M. and Finkelstein, D. M. (2000) Correcting for noncompliance and dependent censoring in an AIDS clinical trial with inverse probability of censoring weighted (IPCW) log-rank tests. Biometrics 56, 779-788.
- Rosenbaum (1987) Rosenbaum, P. R. (1987) Model-based direct adjustment. Journal of the American Statistical Association 82, 387–394.
- Rosenbaum (2002) Rosenbaum, P. R. (2002) Covariance adjustment in randomized experiments and observational studies. Statistical Science 17, 286–327.
- Rubin (1974) Rubin, D. B. (1974) Estimating causal effects of treatments in randomized and nonrandomized studies. Journal of Educational Psychology 66, 688–701.
Rubin, D. B. (1978) Bayesian inference in causal effects: The role of randomization.Annals of Statistics 6, 34-58.
- Tibshirani (1997) Tibshirani, R. (1997) The lasso method for variable selection in the cox model. Statistics in Medicine 16, 385-395.
- Tsiatis (2006) Tsiatis, A.A. (2006) Semiparametric Theory and Missing Data. Springer, Berlin.
- Uno et al (2007) Uno, H., Cai, T., Tian, L., and Wei, L. (2007) Evaluating prediction rules for -year survivors with censored regression models. Journal of the American Statistical Association 102, 527-537.
- VanderWeele (2011) VanderWeele, T. J. (2011) Causal mediation analysis with survival data. Epidemiology 22, 582-585.
- Wager, Hastie and Efron (2014) Wager, S., Hastie, T., and Efron, B. (2014) Confidence intervals for random forests: the jackknife and the infinitesimal jackknife. Journal of Machine Learning Research 15, 1625-1651.
- Wager et al. (2016) Wager, S., Du, W., Taylor, J., and Tibshirani, R. J. (2016) High-dimensional regression adjustments in randomized experiments. PNAS 113, 12673-12678.
Web Appendix A:
Proof of Theorem 1:
For unadjusted estimator , since we have that and are uniformly consistent, for naive estimator, we have
For the model based estimator , we have
For , we have . For , we have
So . Also, we have
So for , we have . This finish the proof of consistency of our proposed estimators.
Proof of Theorem 2:
For the crude estimator , we have