1 Introduction
Treebased methods are popular alternatives to semiparametric and parametric methods. The basic idea of trees is to partition the covariate space into subsets (nodes) where individuals belonging to the same node are alike regarding the outcome of interest. In the setting of classification and regression trees (CART) (Breiman et al., 1984), this can be achieved by a topdown and greedy approach that aims to minimize a measure of node impurity and the sum of squared deviations from the node mean, respectively; then a costcomplexity pruning algorithm is applied to select a rightsized tree. Moreover, trees are ideal candidates for ensemble methods such as bagging (Breiman, 1996) random forests (Breiman, 2001).
With the increasing focus on personalized risk prediction, treebased methods for timetoevent data have received much attention. Survival tree has a few appealing features that make it a useful addition to the conventional survival analysis: first, partition arises in many applications where individuals in the same group share similar event risks, and the tree structure retains this natural interpretability. For example, a single survival tree can identify different prognostic groups, so that treatment or prevention strategies can be tailored for patients with different failure risks. Second, survival trees serve as building blocks for ensemble methods and can be easily transformed into powerful risk prediction tools. Gordon and Olshen (1985) first adapt the idea of CART for rightcensored survival data and defined the node impurity to be the minimum Wasserstein distance between the KaplanMeier curves of the current node and a pure node. Other works that provided a withinnode impurity or an estimate of error include Davis and Anderson (1989), LeBlanc and Crowley (1992), Zhang (1995), Jin et al. (2004), Molinaro et al. (2004) and Steingrimsson et al. (2016). As an alternative of maximizing the withinnode homogeneity, Segal (1988) and LeBlanc and Crowley (1993), among others, adopted betweennode separation approaches that quantify the dissimilarity between two children nodes using the logrank statistic; other possible splitting criteria include the Harrell’s Cstatistic (Schmid et al., 2016) and the integrated absolute difference between survival functions (Moradian et al., 2017). Readers are referred to BouHamad et al. (2011b) for a comprehensive review of survival trees. In practice, a small perturbation in the training data may result in a large change in a treebased prediction function. To address the instability issue of a single tree and improve the prediction accuracy, researchers, including Hothorn et al. (2006), Ishwaran et al. (2008), Zhu and Kosorok (2012), Schmid et al. (2016), Steingrimsson et al. (2018)
, have developed ensemble methods that enlarge the class of models and yield reduced variance.
In many applications, the use of timedependent covariates offers opportunities for exploring the association between a failure event and risk factors that change over time. Applying standard methods such as the Cox model is technically challenging in the choice of covariate form and can often yield biased estimation (Fisher and Lin, 1999). In contrast, treebased methods allow the event risk to depend on the covariates in a flexible way, thus can circumvent the challenge in specifying the functional form of timedependence. However, the aforementioned survival tree methods only deal with baseline covariates and cannot be directly applied. Bacchetti and Segal (1995)
incorporated timedependent covariates by using “pseudosubject”, where the survival experience of one subject was viewed as survival experiences of multiple pseudosubjects on nonoverlapping intervals, and survival probability using the truncation productlimit estimator
(Wang et al., 1986) was reported as the node summary. The idea of pseudosubject was employed by most of the existing works dealing with timedependent covariates (Huang et al., 1998; BouHamad et al., 2011a; Wallace, 2014; Fu and Simonoff, 2017). However, the pseudosubject approach may have practical limitations because one subject could be classified into multiple nodes in a tree, leading to a loss of simple interpretation and possible ambiguous prediction.
In this article, we propose a unified framework for treestructured analysis with right censored survival outcomes and possibly timedependent covariates. Existing survival trees are typically implemented via a greedy algorithm that maximizes the within node homogeneity or between node heterogeneity, which is not directly related to the prediction accuracy. We define timedependent ROC curves for survival trees using the idea of randomized test and show that the target hazard function yields the highest ROC curve. The optimality of the target hazard function motivates us to use the ROCrelated summary measures for (i) evaluating the prediction performance of survival trees and (ii) guiding the treegrowing procedure. We further propose a novel risk prediction forest, where the ensemble is on unbiased martingale estimating equations. Our forest algorithm shows great potential in reducing the bias compared to algorithms that directly average node summaries such as KaplanMeier and NelsonAalen estimates.
The article is organized as follows. In Section 2, we rigorously define the ROC curves to quantify the prediction performance of survival tree models. In Section 3, we develop ROCguided splitting and pruning procedures for a rightcensored survival outcome with timedependent covariates. In Section 4, we extend the proposed survival trees to random forests to improve prediction accuracy. In Section 5, simulations studies are conducted to examine the performance of the proposed methods. In Section 6, the methods are applied to an AIDS study for illustration. We conclude the paper with a discussion in Section 7.
2 Evaluating Survival Trees with ROC Curves
In this section, we introduce a survival tree framework that can incorporate timedependent covariates, and propose the use of ROC curves in evaluating the prediction performance. The conventional definition of incident/dynamic timedependent ROC curve (Heagerty and Zheng, 2005) is appropriate for quantifying the prediction performance of a continuous disease marker, but may not be applicable in evaluating discretevalued risk scores derived from a tree. For a discretevalued marker, the ROC curve at time degenerates to a finite number of points, hence important summary measures such as area under the curve (AUC) are not well defined. We fill in the gap by generalizing the definition of timedependent ROC curves to evaluate continuous, discrete and mixed types of markers.
2.1 Survival trees that partition the survivor population
Suppose is a continuous survival time outcome and is a
dimensional vector of possibly timedependent covariates. Denote by
the hazard function of given , that is, . The hazard function characterizes the instantaneous risk of failure at time among survivors.At time , let denote the covariate space of in the survivor population (i.e., the subpopulation satisfying ). We consider a tree that divides the survivor population into subgroups according to a partition on , denoted by . The elements of the partition, are disjoint subsets of satisfying , and are called terminal nodes of the tree. A subject enters a terminal node at time if and . For ease of discussion, we restrict the attention to a fixed time interval and assume for . Thus the partition can be applied on for all . Since can vary over time, one subject can enter different terminal nodes at different time points. The partition induces the following model for the hazard function at given ,
(1) 
where subjects in the same terminal node have the same hazard, and is the nodespecific hazard function. Define the partition function so that if and only if and . The partitionbased hazard function at can be written as . Under our framework, the timedependent covariates are handled similarly as in the Cox model in the sense that the hazard at time depends on the covariates at . On one hand, a timeinvariant partition on allows a sparse model and easy interpretation of the decision rule; on the other hand, as the number of terminal nodes in increases, the partitionbased hazard function approximates the true hazard function . In Section 3.3, we show that the timeinvariant rule is asymptotically valid. Compared to the Cox model that assumes multiplicative covariate effects and requires a correct choice of the functional form of the timedependent covariates (Fisher and Lin, 1999), the partitionbased hazard function can be more robust to model misspecification.
For a fixed partition , we consider the estimation of with rightcensored survival data. Let be the observed survival time and be the failure event indicator. We use to denote the covariate history up to . The observed training data are , which are assumed to be independent identically distributed replicates of . For , define . By assuming independent censoring within each terminal node, that is, , , we have
(2) 
where . Define the observed counting process . We estimate by the following kernel type estimator,
where , , is a second order kernel function with a support on and is the bandwidth parameter. When , we either set or use the second order boundary kernel in Müller (1991) to avoid biased estimation in the boundary region. For a fixed node , consistently estimates as , and . We use to denote the empirical estimate using the training data. Thus is estimated by
and is estimated by .
If the domain of the timedependent covariates among survivors changes over time, we can transform onto via a onetoone function . Let and use to denote the hazard function of given . A tree can be constructed using the transformed covariates , and the partitionbased hazard estimates . Since , the estimated hazard is . For example, for the th covariate , let
be its empirical cumulative distribution function in the atrisk population, then
is a possible choice. After transformation, the partition is timevarying on the original scale of .2.2 Generalized ROC curves for survival trees
Since different tree algorithms usually generate different partitions, questions arise as to how the accuracy of partitionbased hazard functions can be properly measured. We propose the use of timedependent ROC curves to evaluate
. Heuristically, at time
, the ROC curve is defined on the survivor population, where a subject is considered a case if and a control if . This corresponds to the incident/dynamic ROC curve in Heagerty and Zheng (2005). Let be a scalar function that summarizes information from , and we predict or based on , with a larger value being more indicative of . Following Heagerty and Zheng (2005), the false positive rate is , the true positive rate is , and the ROC function is . In the literature of binary classification, it has been recognized that, when predicting a binary disease outcome with multiple disease markers, the risk score (i.e., the probability of disease given markers) yields the highest ROC curve (Green Dand Swets, 1966; McIntosh and Pepe, 2002). For survival outcomes, the hazard can be viewed as an analog of the risk score. Following arguments of the NeymanPearson Lemma, it can be shown that setting yields the highest . Thus with a higher curve is desired.With a finite number of terminal nodes, at time is a discretevalued scalar function of , thus the function for becomes a finite set of points rather than a continuous curve. More generally, if has a point mass at , the function is not defined for . In this case, we construct a continuous curve, denoted by
, via linear interpolation. Specifically, for
, the point on the curve corresponds to the following prediction rule: if , predict ; if , predict with probability ; and if , predict . In the special case where is a continuous variable, reduces to . The formal definition of is given in the Appendix. We establish the optimality of true hazard function with respect to in Proposition 1. The proof is given in the Supplementary Material. Our result suggests that can be used to evaluate the predictive ability of , and a higher curve is favorable.Proposition 1
At time , among all scalar functions , is optimal in the sense that yields the highest curve.
The area under the curve is , which has the interpretation of a concordance measure (Pepe, 2003). For survival time data, we can shown that is equivalent to
where and are i.i.d. replicates of . Based on Proposition 1, can be used as a summary measure to evaluate the predictive ability of at time . It is worthwhile to point out that other summary measures of ROC curves such as the partial AUC and specific ROC points (Pepe, 2003) can also be employed. Here we focus on the discussion of , and the results can be extended to other summary measures. To evaluate over time, a global measure on is needed. We define a scalar function that combines in a possibly timedependent way. For survivors at , we use to characterize the risk of . To derive a global measure, we integrate over with a weight function and define
Following Proposition 1, the true hazard function maximizes CON. Motivated by this fact, we propose to use CON as a guidance to build survival trees, and the goal is to construct a partition so that is as large as possible. A detailed discussion is given in Section 3.
In practice, investigators can use their own weight functions to reflect costs of misclassification on different time points. A simple example is to set . Moreover, let be the marginal density and survival function of . By setting , we have
(3) 
which measures the probability that the subject who fails earlier has a higher risk at the failure time. In the special case where
is a continuous random variable that does not depend on
, in (2.2) reduces to the global summary in Heagerty and Zheng (2005).The Harrell’s Cstatistic (Harrell et al., 1982) has been commonly used to quantify the capacity of a risk score at baseline in discriminating among subjects with different event times. When the event time is subject to right censoring, however, the population parameters corresponding to the Harrell’s Cstatistics depend on the studyspecific censoring distribution. Uno et al. (2011) studied a modified Cstatistics that is consistent for a population concordance measure free of censoring under a working Cox model , where the linear combination maximizes the limiting value of the Uno’s Cstatistic. However, without the Cox model assumption, it is not clear how to combine so that the limiting Cstatistic is maximized. The proposed CON is defined in a different way so that CON is maximized when , thus is appropriate for guiding the tree building procedure.
3 ROCGuided Survival Trees
In this section, we develop treegrowing algorithms based on and CON. We note that although the assumption
(4) 
is adopted for establishing the largesample properties of a grown tree where the diameters of the terminal nodes tend to zero, stronger assumptions are often needed to understand the splitting criteria, especially in the early steps of the algorithm. For example, when selecting the optimal split at the root node, the logrank splitting rule (Segal, 1988; LeBlanc and Crowley, 1993) implicitly assumes that is independent of within the children nodes, which is not guaranteed by (4). For ease of discussion, we assume independent censoring in Section 3.1 and 3.2, and assume (4) in Section 3.3.
3.1 Splitting based on CON
We first consider the use of as the splitting criterion. At each step, we choose to split the node that leads to the largest increase in CON of the tree. Under independent censoring, the estimation of is developed based on the following expression,
Therefore, replacing the unknown quantities in with the corresponding estimators in Section 2.1 yields a consistent estimator for the concordance measure, denoted by . To estimate CON, we use , where is a weight function that possibly depends on the data.
Consider the partition and a split on the node . The partition after splitting is denoted by . Proposition 2 shows that CON is a proper splitting criterion and can detect the difference in hazards of two children nodes. The proof is given in the Supplementary Material.
Proposition 2
Splitting does not decrease the concordance. Specifically, , and the equality holds if and only if . Moreover, , and the equality holds if and only if .
Based on Proposition 2, we have . When , the equality holds if and only if almost everywhere on . The optimal split is thus defined as . Note that the validity of Proposition 2 does not depend on the censoring distribution. In practice, may not correctly estimate if the independent censoring assumption is violated, but the true concordance usually increases after splitting.
One may also consider another splitting criterion where the optimal split on a node is chosen to maximize the increase of CON within the node. For a node , define and . For ’s children nodes and , it can be shown that within is 0.5 before splitting and is after splitting. Hence the increase in CON within after splitting is
Moreover, can be estimated by
The rule based on can be viewed as maximizing a weighted average of over . Compared with the logrank splitting rule that maximizes a weighted average of over , the based rule encourages more balanced children nodes and can better detect the difference especially when hazards in the children nodes cross each other.
3.2 Pruning
Although splitting increases the concordance, a large tree often overfit the data. Following Breiman et al. (1984), we continue splitting until a tree is fully grown and prune it from the bottom up to find the most predictive subtree. As a guidance for pruning, we define a concordancecomplexity measure,
where is the number of terminal nodes in and is a complexity parameter. The second term is a penalty that controls the size of the tree. For each , the optimal subtree is defined as the tree that has the largest value of among all the possible subtrees. If more than one subtrees have the same largest value of , we define the one with the smallest size to be the optimal subtree. In practice, and the corresponding optimal subtree can be determined by crossvalidation. Details of the splitting and pruning algorithm are given in the Supplementary Material.
3.3 Prediction
We use to denote the partition constructed based on the training sample . Define . Given a new observation , we predict the hazard to be . Theorem 3.3 summarizes the large sample property of the estimated partitionbased hazard function . The proof is given in the Supplementary Material. Under conditions (A1)(A5) in the Supplementary Material, for and any , as , we have with probability 1. Similar as most of the existing works on consistency of treebased estimators, the convergence result is independent of the splitting and pruning algorithm. Theorem 3.3 is developed to provide justification on the timeinvariant partition under the commonly adopted assumption that diameters of terminal nodes go to zero as increases (Breiman et al., 1984; LeBlanc and Crowley, 1993). Largesample results incorporating the algorithm to grow the trees will be investigated in our future work.
Although the above discussion focuses on the hazard function, we can also predict survival probability when is a vector of external timedependent covariates (Kalbfleisch and Prentice, 2011). Assume the hazard at depend on only through . The prediction of survival probability is based on the equation . We predict the survival probability at for a subject with covariate path to be
(5) 
4 ROCGuided Random Survival Forests
The proposed survival trees can be further transformed into powerful risk prediction tools by applying ensemble methods such as bagging (Breiman, 1996) and random forests (Breiman, 2001). The essential idea in bagging is to average many noisy but approximately unbiased tree models to reduce the variance. Random forests improve the variance reduction of bagging by reducing the correlation between the trees via random selection of predictors in the treegrowing process. In the original random forests for regression and classification (Breiman, 2001), the prediction for a new data point is the averaged prediction of all trees in the forest, and trees are grown sufficiently deep to achieve low bias. However, for rightcensored survival data, averaging estimated survival or cumulative hazard functions from deeply grown trees may result in large bias, because the nodespecific estimate is biased when a node contains a small number of observations. For this, we propose to average the unbiased martingale estimating equations rather than averaging node summaries from the KaplanMeier or NelsonAalen estimates (Ishwaran et al., 2008; Zhu and Kosorok, 2012; Steingrimsson et al., 2016; Schmid et al., 2016). Moreover, in light of Meinshausen (2006) and Athey et al. (2018), we treat forests as a type of adaptive nearest neighbor estimator and construct forestbased local estimation for the survival or hazard functions.
Let be a collection of survival trees obtained from resampling the original training data. Each tree is constructed via a treegrowing process where at each split, predictors are selected at random as candidates for splitting. For the th partition and a node
, one can solve the following unbiased estimating equation for the nodespecific hazard at
,Let be the partition function for so that if and only if and , then the th tree induces the following estimating equation for ,
(6) 
It is easy to see that solving Equation (6) yields the prediction of hazard at given based on one single tree . Note that Equation (6) can be rewritten as the following weighted estimating equation,
(7) 
where and . Specifically, the weight is a positive constant if the th subject is atrisk and falls in the node ; otherwise, the weight is zero.
To get forestbased prediction, we take average of the estimating functions in (7) from all the trees and obtain the following local martingale estimating equation for ,
where the weight function is . The weight captures the frequency with which the th observation falls into the same node as , and . Therefore, the forest based estimator for the hazard function is
When are external timedependent covariates, the survival probability at given is predicted to be
(8) 
To achieve good prediction performance, our algorithm relies on subsampling (without replacement) and samplesplitting techniques (Athey et al., 2018). Specifically, we divide the subsample from the original training data into two halves. The first half is used for treegrowing and the second half is used to calculate the weight. The algorithm for forestbased estimation is given in the Appendix.
5 Simulation Studies
We performed simulations to investigate the performance of the proposed methods. To cover some of the common survival models, we considered the following scenarios where the survival distribution only depends on baseline covariates :
 (I)

Proportional hazards model: , where ,
and follows a multivariate normal with mean 0 and covariate matrix with elements .  (II)

Proportional hazards model with noise variables: , where , .
 (III)

Proportional hazards model with nonlinear covariate effects:
.  (IV)

Accelerated failure time model: , where .
 (V)

Generalized gamma family: , , , and , .
Except for Scenario (I), the covariate
were generated from uniform distributions over
. In Scenarios (I), (II), and (III), we set the baseline hazard to be , which is the hazard function of a Weibull distribution. Additionally, we generated censoring times, , from a uniform distribution over , where was tuned to yield censoring percentages of 25% and 50%. As described in Remark 2.1, we treat the baseline covariates as timedependent covariates through , where is the empirical cumulative distribution function of the atrisk subjects. Given a new baseline observation , we set and predict the survival probability as in (5). We compare the proposed methods with the relative risk tree in LeBlanc and Crowley (1992) and the conditional inference survival tree in Hothorn et al. (2006), which are implemented in R (R Core Team, 2018) functions rpart and ctree in packages rpart and party, respectively. We used tenfold crossvalidation in choosing the tuning parameter in the concordancecomplexity measure as well as when selecting the rightsized tree in rpart. To evaluate the performance of different methods, we use the integrated absolute error:where are generated independently from the simulated data. We set and
to be approximately the 95% quantile of
. When predicting survival probability, the proposed algorithm is not sensitive to kernel function and the bandwidth parameter , so we use the Epanechnikov kernel function and set in all scenarios.With and 500 replications, the top panel of Table 1 reports the mean of integrated absolute error of the proposed method, rpart and ctree. For all scenarios considered, the integrated absolute errors decrease with sample size but increase with censoring percentage. On the other hand, splitting by CON and yield similar integrated absolute errors. The proposed methods are as efficient or more efficient than the competitors.
We next consider scenarios where the survival time depends on timedependent covariates. In what follows, we considered a timedependent covariate, , and a timeindependent covariate , where the latter was generated from a uniform distribution over .
 (VI)

Dichotomous time dependent covariate with at most one change in value: Survival times were generated from , where , , , and
follows an exponential distribution with rate 5.
 (VII)

Dichotomous time dependent covariate with multiple changes: Survival times were generated from , , , and are the first three terms of a stationary Poisson process with rate 10.
 (VIII)

Continuous time dependent covariate: Survival times were generated from , where , and follow independent uniform distributions over .
In these scenarios, we continue to consider the transformation in Remark 2.1 to transform onto . In the presence of timedependent covariate, we define the integrated absolute error as
where denotes the covariate history up to , and are generated from the distribution of independently of the training sample. To our knowledge, there is no available software for predicting the survival probability at time based on the covariate history up to . Although there have been existing works on survival tree with timedependent covariates, it is not clear how existing methods can be applied for prediction. We compare our methods with rpart and ctree that can only handle baseline covariates, . As expected, the lower panel of Table 1 shows that the proposed methods outperform rpart and ctree under Scenario (VI) – (VIII). In particular, our methods show a substantial improvement over rpart and ctree in the presence of a continuous timedependent covariate (i.e., Scenario (VIII)). This indicates incorporating timedependent covariates improves the prediction accuracy.
Proposed Methods  
CON  rpart  ctree  
Sce  0%  25%  50%  0%  25%  50%  0%  25%  50%  0%  25%  50%  
Scenarios with baseline covariates  
100  I  122  126  126  122  126  125  118  127  130  115  124  125 
II  91  99  108  90  96  107  96  105  114  96  103  111  
III  89  107  112  90  107  112  87  102  110  89  101  106  
IV  100  117  120  99  115  118  90  111  124  95  118  133  
V  77  94  108  77  93  107  97  115  132  97  113  127  
200  I  110  112  119  120  120  119  113  121  122  111  119  120 
II  78  83  88  76  80  86  79  86  94  78  84  91  
III  71  91  105  72  90  106  71  86  95  80  93  99  
IV  74  94  108  73  100  103  77  95  107  75  96  112  
V  67  80  93  66  78  93  91  109  118  89  106  114  
Scenarios with a timedependent covariate  
100  VI  71  86  99  70  85  98  138  159  178  136  156  174 
VII  81  92  110  80  90  108  149  200  238  150  198  234  
VIII  87  89  92  86  90  93  388  309  212  402  352  259  
200  VI  54  59  73  55  58  79  128  148  167  126  144  163 
VII  73  78  85  74  77  84  145  201  233  150  202  228  
VIII  79  78  80  78  78  80  402  329  220  447  396  274 
We continue to use Scenarios (I) – (VIII) to investigate the performance of the proposed forestbased methods. To grow the trees in the forest, we considered subsampling and set the size of subsample to be 80 when and 100 when . We randomly select variables at each splitting. For each tree, the minimum number of failure that must exist in a node for a split to be attempted is 3 and the minimum number of failures in any terminal node is 1. For each data, we set and compute the integrated absolute error as before, but with replaced with the survival function in (8). We compare the proposed forest methods with random survival forests in Ishwaran et al. (2008) and Schmid et al. (2016), which are implemented in R functions rfsrc (Ishwaran and Kogalur, 2018) and ranger (Wright and Ziegler, 2017) in packages randomForestSRC and ranger, respectively. We fit rfsrc and ranger using the default parameter settings and using smaller trees in the forest with larger minimum node sizes (i.e., with nodesize = 10 in rfsrc and min.node.size = 10 in ranger). Since rfsrc and ranger cannot handle timedependent covariates, we fit these with the baseline covariates, . Table 2 shows the averaged integrated absolute error based on 500 replications. The proposed forest methods perform better or similar to their survival tree counterparts and outperform ranger and rfsrc in almost all of the settings. As expected, our methods have a substantial advantage over the rfsrc and ranger under Scenarios (VI) – (VIII); echoing the importance to incorporate timedependent covariates in prediction. Interestingly, for ranger and rfsrc, the default setting leads to larger error compared to single trees, and their performances are substantially improved after increasing the size of terminal nodes. We conjecture this could be due to the fact that withinnode estimates are less biased with larger node sizes. In summary, our proposed methods are competitive with the existing methods when all the covariates are timeindependent, and even show superior performance in the presence of timedependent covariates.
Proposed Methods  ranger  rfsrc  
CON  default  smaller trees  default  smaller trees  
Sce  0%  25%  50%  0%  25%  50%  0%  25%  50%  0%  25%  50%  0%  25%  50%  0%  25%  50%  
Scenarios with baseline covariates  
100  I  120  124  124  120  124  124  122  132  137  120  129  131  151  156  157  122  131  136 
II  65  67  69  65  67  69  119  123  126  71  75  79  117  120  122  72  77  82  
III  79  86  93  80  88  95  136  139  125  99  109  108  139  143  142  84  94  96  
IV  68  78  86  69  78  87  129  136  126  95  106  109  138  141  138  81  91  95  
V  64  71  76  64  71  76  120  128  121  72  90  99  128  132  131  80  97  109  
200  I  112  115  116  112  115  116  122  132  137  116  125  128  150  155  156  118  129  134 
II  50  52  54  50  52  54  119  123  126  64  67  72  117  119  122  67  71  77  
III  65  71  78  67  73  79  136  139  125  81  90  90  134  139  137  73  83  86  
IV  54  62  71  54  63  71  129  136  126  77  85  88  134  137  133  66  75  78  
V  48  54  59  49  55  59  120  128  121  67  83  92  126  129  127  72  89  101  
Scenarios with a timedependent covariate  
100  VI  57  64  70  57  64  70  186  201  205  123  138  150  199  215  218  128  143  156 
VII  61  71  80  61  71  80  205  238  251  151  189  209  164  203  220  220  249  260  
VIII  68  74  75  69  74  76  408  357  262  206  197  174  240  229  197  405  354  261  
200  VI  47  51  57  47  51  57  189  206  212  118  134  149  204  221  226  120  138  153 
VII  48  57  65  48  57  65  209  249  260  148  196  216  225  261  271  153  203  223  
VIII  54  58  59  54  58  59  436  389  286  254  244  206  432  386  285  291  277  229 
6 Application
We illustrate the proposed methods through an application to a clinical trial conducted by Terry Beirn Community Programs for Clinical Research on AIDS (Abrams et al., 1994; Fleming et al., 1995; Goldman et al., 1996). The trial was conducted to compare didanosine (ddI) and zalcitabine (ddC) treatments for HIVinfected patients who were intolerant to or had failed zidovudine treatments. Of the 467 patients recruited for the study, 230 were randomized to receive ddI treatment and the other 237 received ddC treatment. The average follow up time is 15.6 months, and 188 patients died at the end of the study. Despite having longitudinal measurements that were measured at followup visits, Abrams et al. (1994) showed that the ddC treatment is as efficacious as the ddI treatment in prolonging survival time, based on a proportional hazards model with covariates measured at the baseline visit. In what follows, we apply the proposed methods to investigate timedependent risk factors for overall survival. We included baseline covariates at randomization such as age, gender, treatment received (ddI/ddC), and AIDS diagnosis (yes/no). We also included timedependent covariates such as CD4 count, Karnofsky score, and cumulative recurrent opportunistic infections count. The CD4 count and Karnofsky score are measured at the baseline visit and bimonthly followup visits. We adopt the last covariate carried forward approach between visit times when constructing these timedependent covariates. For the opportunistic infection, we use the cumulative number of infections prior to as the covariate value at . Variables including Karnofsky score and CD4 count are transformed into the range using the corresponding estimated cumulative distribution functions.
Figure 1 displays the proposed ROC guided survival tree using the CON splitting criterion. The splitting criterion yielded the same tree and the result is not shown. With the concordancecomplexity pruning, there are three terminal nodes in the final tree. The terminal nodes are , , and , where KSC is the transformed Karnofsky score at and OP is the cumulative number of opportunistic infection up to . The partitions corresponds to node 2, 6, and 7, whose estimated hazard rates are plotted in Figure (a)a. Figure (a)a clearly shows that lower Karnofsky score is associated with higher mortality risk. For these with high Karnofsky score (e.g., KSC 0.396), previous opportunistic infections are also associated with higher mortality risk. Since a patient can enter different terminal nodes at different time points, one hazard curve does not necessarily represent the hazard of one patient. For example, a patient with a high Karnofsky score and no opportunistic infection enters node 6 and have relatively low mortality risk. As time progress, the patient could enter node 7 if the patient experiences an opportunistic infection or enter node 2 if the patient’s Karnofsky score decreases. Either progression leads to higher mortality risk.
We also applied the proposed forestbased method, using the same setting as in our simulation studies. For prediction, we considered hypothetical patients with Karnofsky score that is either increasing linearly
Comments
There are no comments yet.