1 Introduction
Wild bootstrap resampling has evolved as one of the stateoftheart choices for inferring cumulative incidences or hazards in nonparametric multistate models in event history analysis. Starting with the initial papers by Lin et al. (1993) and Lin et al. (1994) for Cox models and Lin (1997)
for competing risks setups, the basic idea is to consider martingale representations of the nonparametric estimators (particularly, the NelsonAalen or AalenJohansen) and to replace the nonobservable martingale residuals
with randomly weighted counting processes . This approach has been extended in various directions, allowing for arbitrary multipliers (Beyersmann et al., 2013; Dobler and Pauly, 2014; Dobler et al., 2017) and multiple, possibly recurrent, states (Dobler, 2016; Bluhmki et al., 2018a, b). In the current paper we like to transfer the latter results from the nonparametric case to semiparametric regression models. The most used regression model in survival analysis is Cox’s proportional hazard model. It is highly useful to estimate the survival function for specific covariates, e.g., to show how the model predicts survival. The survival function of interest is then typically provided with pointwise confidence intervals which is implemented in all major software packages. In reality, however, when interest is in the survival function as a whole, it would be preferable to report it together with uniform confidence intervals. These socalled confidence bands describe the uncertainty of the whole survival function. This is often not done in practice because there are few programs that construct such uniform bands. In addition, apart from only few exceptions such as
(Lin, 1997), systematic evaluations of finite sample results, that demonstrate the performance of such bands, are rarely available in the literature. We here provide such results and in addition investigate various new resampling bands that exhibit improved performance for smaller sample sizes compared to previously implemented bands for Cox’s regression model. This proportional hazards model (Cox, 1972) is given by an individualspecific intensity function of the form(1) 
Our main achievements are the introduction of valid resampling strategies that
jointly mimic the unknown distribution of baseline and parameter estimators for
Model (1) and corresponding multistate versions
(Martinussen and Scheike, 2006).
Different to existing approaches (Lin et al., 1993; Martinussen and Scheike, 2006),
we prove their theoretical validity by martingalebased
arguments which allow the simultaneous treatment of different mechanisms for
incomplete observations. In particular, the observations may be subject to
independent rightcensoring and lefttruncation.
How to resample? There exist plenty of possible approaches to achieve the above tasks in Model (1) with independent rightcensoring alone. A first corresponds to the nonparametric ansatz at the outset: here, we consider martingale representations of the Breslow estimator for the cumulative hazard and a parameter estimator that is found via a likelihood approach. Then we replace the involved martingale residuals with reweighted counting processes (e.g., Lin et al. 1993). Since the latter do not take the semiparametric nature into account, another possibility would be to replace them with (e.g., Spiekerman and Lin 1998). Here are estimators of the martingales , that exploit the involved covariates and allow for a greater range of applicability, for instance in rate estimations.
A novel and even more natural approach starts one step earlier by rewriting the score equations for the baseline function and the Euclidean parameter: after identifying a martingale representation of the score equations, both multiplier techniques from above lead to new equations which are solved by quantities depending on the . Hence, paralleling the same steps as for the original estimators, we receive their resampling counterparts in a primal way.
In all approaches we follow Beyersmann et al. (2013) and allow for general wild bootstrap weights, i.e., the
are i.i.d. random variables with zero mean and unit variance that are independent of the data.
For ease of presentation we exemplify the new methodology mainly for the rather simple Cox model but also explain their extensions to more general multistate or even other regression models. The theoretical derivations for the wild bootstrap approaches thereby utilize clever martingale arguments which are novel for bootstrapping in semiparametric regression models. In particular, we prove that the wild bootstrap counterparts share the martingale properties of the original estimators – and can therefore be handled in the same way, using convenient martingale central limit theorems. Thus, intricate derivations for verifying conditional tightness are no longer required. Moreover, beneath theoretical benefits, mirroring the martingale structure in the bootstrap world allows for a simple interpretation and easy incorporation of missing mechanisms (such as independent rightcensoring or lefttruncation). Consequently, our findings allow for a wide range of applications, which to some extent will be discussed in more detail in future papers. Such martingale representations for the wild bootstrap have first been made in the nonparametric context for resampling AalenJohansen estimators; see
Dobler (2016) and Bluhmki et al. (2018b) for details.The paper is organized as follows: Section 2 outlines how estimation is done for Cox’s regression model and lists the technical conditions that are needed in proving the validity of the considered resampling approaches. Section 3 contains a description of the various wild bootstrap procedures that we consider here with theoretical statements about their validity. In addition, we discuss several important extensions to more general multistate or other regression models. In Section 4 we present an extensive simulation study that compares the various resampling procedures. Section 5 has a brief demonstration of the methodology in a survival setting where interest is on constructing confidence bands for the survival function for patients with acute myocardial infarction. Finally, we discuss the results in Section 6.
All proofs are given in the Appendix, and these are a central part of this paper. Their novelty lies in the fact that we are able to show the performance of our resampling methods using martingale methods which considerably simplify the technical arguments.
2 Joint large sample properties in the Cox model
We consider the multiplicative Cox model (1) given by the intensity process of the counting process of subject given . Here, is the atrisk indicator of individual at time , is the baseline hazard function, is for each a possibly timedependent
dimensional vector of predictable covariates of individual
, and is an unknown dimensional regression parameter (Andersen et al., 1993). Let be a terminal evaluation time on the treatment timescale. Throughout we assume that all are contained in a bounded set and denote the cumulative baseline hazard function as which we assume to be finite for all .A series of standard arguments typically leads to the Breslow estimator for and the maximum likelihood parameter estimator for . To illustrate this, let us simplify the derivations in Scheike and Zhang (2002) for the CoxAalen model to the present Cox model (1): the score equation for the cumulative baseline function is given by
(2) 
which is solved by . Here, we used the definition of
where and for any vector , and is the indicator that any individual is under risk shortly before . For lucidity, the notion of will be suppressed most of the time. If we replace for in the score equation for ,
(3) 
and define , we obtain a solvable score equation for :
Denoting its solution by , we also obtain the Breslow estimator for the cumulative baseline hazard function. To explain their joint large sample properties we define by
the negative of the Jacobimatrix of , where and . Recall that the covariates are assumed to be uniformly bounded. Therefore, it follows from Theorems VII.2.2 and VII.2.3 of Andersen et al. (1993) that and are both asymptotically Gaussian as long as the following regularity conditions are fulfilled which we assume throughout; see also Condition VII.2.1 Andersen et al. (1993). Here and throughout,
denotes convergence in probability.
Condition 1.
There exist a neighbourhood of and functions , , and such that for each :


is a continuous function of uniformly in and bounded on ;

is bounded away from zero on ;

for ;

is positive definite, where .
Note that (a) and (b) immediately imply convergence in probability for each :
(4) 
as long as . Carefully checking the proofs of the forementioned theorems from Andersen et al. (1993), we obtain asymptotic representations of the normalized estimators which will motivate the first bootstrap approaches in the following section:
(5)  
(6) 
Here, denotes the limit (in probability) of , and defines a squareintegrable martingale in ; cf. Section VII.2.2 in Andersen et al. (1993).
3 Wild bootstrap approaches and main theorems
While inference about can be based on the asymptotic normality of its estimator (e.g., Martinussen and Scheike, 2006), the complicated limit process of the normalized Breslow estimator does not allow time simultaneous inference about the cumulative hazard function or functionals thereof (such as the survival function). To this end, we propose two general approaches to establish asymptotically valid resampling strategies. Since implicitly depends on , we have to ensure that their wild bootstrap counterparts mimic their distribution jointly.
3.1 The ‘classical’ wild bootstrap
The first method is inspired by the use of the wild bootstrap in Beyersmann et al. (2013) and is in line with the resampling procedures of Lin et al. (1993) or Spiekerman and Lin (1998) for the special choice of i.i.d. standard normal weights. This procedure is based on the above asymptotic representation of the normalized estimators and replaces the involved martingales by or together with plugin estimators for all unknown quantities. Here, is an estimate of the martingale increment . We exemplify the idea for : To this end, we introduce resampling versions of the score equation defining vector and the negative Jacobi matrix :
(7)  
(8) 
Following the above instruction we obtain from the asymptotic representations (5)–(6) the following wild bootstrap counterparts of the normalized estimators:
(9)  
(10) 
Alternatively, the Spiekerman and Lin (1998)type martingale increment estimates may replace in (7) and (10). A bootstraptype covariance estimate similar to (8) has been suggested by Dobler and Pauly (2014) in a nonparametric competing risks context. Here, it is additionally motivated from martingale arguments: defining and as in (7) and (8) with replaced with , it turns out that is the optional variation process of the squareintegrable martingale ; see the appendix for details. To motivate a different resampling strategy, we finally note that both wild bootstrap procedures ignore the terms in the asymptotic expansions (5) – (6).
3.2 Wild bootstrapping the score equations
A second, possibly more natural wild bootstrap approach does not ignore the terms. The idea is to replace martingale representations of score equations with their multiplier counterparts. To this end, paralleling the approach of jointly solving two score equations to find the estimators for the parametric as well as the nonparametric model components, we first expand the score equation in (2) to A wild bootstrap counterpart thereof is now given by replacing with , with , and with :
(11)  
Now, keeping fixed, the “solution” for is clearly
Next, to find an appropriate wild bootstrap version of , we consider a martingale representation of the score equation (3) for :
Again, a wild bootstrap version thereof is given by
Inserting for eventually yields the final wild bootstrap score equation
(12) 
The last equality is due to . Define as the solution of (12) and note that coincides with formula (7). In almost the same way as in the proof of Theorem VII.2.1 in Andersen et al. (1993) it can be shown that the probability of the existence of tends to one and that (conditionally) converge to zero in probability; see also the proof of Theorem 1 below for similar arguments.
Finally, a wild bootstrap version of the Breslow estimator is obtained via with normalized version
(13)  
A Taylor expansion around of the first term on the far righthand side and the martingale property of the second term reveal the striking similarity to decomposition (10). However, the current wild bootstrap approach does not ignore the term resulting from the Taylor expansion. Another nice property of this “estimating equation” approach is the similar treatment for bootstrap and original estimator which is in line with general recommendations for constructing resampling algorithms (Beran and Ducharme, 1991; Efron and Tibshirani, 1994). As above we have by the mean value theorem (cf. Feng et al. 2013)
where and each is on the line segment between and .
3.3 Consistency and confidence bands for the cumulative hazard
To prove (asymptotic) validity of both resampling strategies (based on asymptotic expansions or score equations) the following result is needed.
Lemma 1.
Under Conditions 1(a)(e) it holds that, given all observations,

is asymptotically normally distributed,

whenever ,

as in probability if the resampling is done via method (9) or (12).
The next theorem constitutes that both resampling approaches utilizing the Lin et al. (1993) approach (i.e. with ), have the correct asymptotic behaviour. Therein, denotes a distance that metrizes weak convergence on , e.g. the Prohorov distance (Dudley, 2002), and and are the unconditional and conditional distribution of a random variable , respectively.
Theorem 1.
The asymptotic variance function of (and thus also of ) can be found in Andersen et al. (1993, Corollary VII.2.4), where also a consistent estimator is given. In our simulation study in Section 4, our choice of a wild bootstrap counterpart of was the empirical variance function of the obtained wild bootstrap realizations of . We also studied variance estimators based on direct resampling of involving squared multipliers (results not shown) as proposed in Dobler and Pauly (2014). However, the empirical versions performed preferably.
The theorem is proven in the Appendix. Here, we use it to construct timesimultaneous confidence bands for on fixed intervals . In particular, we obtain results similar to those of Lin et al. (1994): denoting by a continuously differentiable function we get confidence bands of asymptotic level for on as
where is a possibly random weight function. Typical choices are
in case of the transformation , and
for the transformation . The resulting confidence bands correspond to the socalled equal precision (for or ) and HallWellner bands (for or ), respectively. Finally, the value of has been chosen as the quantile of the conditional distribution of , and the naïve choice for would have been the corresponding quantile of . Here, and are the wild bootstrap analogues of and , respectively, . However, this choice of results in some numerical instabilities, which is why we preferred the asymptotically equivalent choice .
Here, the “wild bootstrap analogues” refer to the use of for any of the bootstrap strategies (9) or (12), and its corresponding empirical variances. It follows from Theorem 1 that all confidence bands are valid for large sample sizes. To additionally asses their small sample properties, we compare them in MonteCarlo simulations in Section 4. There, we also analyze the analogue behaviour of the resampling approaches based on .
3.4 Extensions to more general models and more on inference
After having carefully checked the arguments used to establish the wild bootstrap consistency for the Cox survival model (1), it is apparent that the same approach directly carries over to more general models in multistate setups. In particular, as long as the counting process martingale methods can be mimicked with the help of wild bootstrap multipliers, the asymptotics of the resampled estimators can be argued in almost the same way as for the original estimators. Thus, the above methodology can straightforwardly be extended to multistate models with states and multiplicative intensity processes
(15) 
for each transition , where . Different to above this model allows for an arbitrary number of transitions between different states. However, following Dobler (2016) and Bluhmki et al. (2018b), the above wild bootstrap approach can also be applied here. The only major change is to replace the currently used multipliers
by more general white noise processes
with zero mean and unit variance (Bluhmki et al., 2018b) to randomly weight the increments of the counting processes, leading to . Since the martingale concept is still working in this case it can again be shown that the wild bootstrap mimics the joint limit distribution of the parameter and multivariate hazard transition estimators. Indeed, Dobler (2016) and Bluhmki et al. (2018b) have shown that, for different transitions and and thus independent white noise processes and , the processes and define orthogonal squareintegrable martingales in with respect to the filtrationHere, and are predictable random functions with respect to this filtration, i.e. in particular, they may be datadependent. The predictable variation processes of the above martingales are and . This property nicely reflects the situation for the original estimators, as the corresponding counting process martingales and are orthogonal and squareintegrable as well with predictable variation processes
In this sense, not only the wild bootstrap martingales resemble the original counting process martingales well but also the predictable variation processes of the wild bootstrap martingales are estimates of the original predictable variation processes. Using these findings in combination with the arguments presented in the proofs in the appendix, it is apparent that also in such more general multistate setups the arguments for the largesample properties of the estimators easily transfer to their wild bootstrap versions, as long as the original estimators allow for martingale representations. Therefore, these arguments even extend to more general models such as the CoxAalen multiplicativeadditive intensity model (Scheike and Zhang, 2002) or the Fine and Gray (1999) model for subdistribution functions.
Also, the incorporation of certain filtered (e.g., rightcensored) observations is again allowed and this yields several important inferential applications: apart from confidence bands for cumulative transition hazards or incidence functions (which are functionals thereof), tests for null hypotheses formulated in terms the parameters can be constructed as well. Here, new bootstrapbased versions of score or Waldtype test statistics
(Martinussen and Scheike, 2006) may be employed to ensure a proper finite sample behaviour. However, a detailed evaluation of all these applications would need additional extensive simulations and further elaborations. As a matter of lucidity, we leave them to future research and we focus below on the simple Cox model (1) to exploit the impact of the proposed methods in simulations.4 Simulation study
To compare the performances of the various resampling approaches described in Section 3.3, we conducted a simulation study in which we covered situations of small to large sample sizes: . The generated data follow the Cox survival model with baseline hazard rate , onedimensional covariates
which are normally distributed with standard deviation 4, and regression parameter
. The censoring times are the minima ofand standard exponentially distributed random variables. The considered time interval, along which 95% confidence bands for the cumulative baseline hazard function shall be constructed, was
. Here we chose the start time of because “the approximations tend to be poor for close to 0” (Lin et al., 1994, p. 77). As wild bootstrap multipliers , we considered the common choice , as well as centered unit Poisson variableswith unit skewness, and centered unit exponential variables
which have a skewness of 2. We simulated all the confidence bands for the cumulative baseline hazard function that were introduced in Section 3.3, i.e.  and non transformed HallWellner and equal precision bands. In particular, we also considered both resampling approaches in which the martingale increments were replaced with or , denoted in Tables 3–3 as “” and “”, respectively, and also both kinds of resampling algorithms, the direct resampling method of Section 3.1 and the method of Section 3.2 in which the estimating equations were bootstrapped. All of these bands were compared with the confidence band for that one obtains from the cox.aalen function in the R package timereg. For each considered setup and type of band, we constructed 10,000 confidence bands, each of which was based on 999 wild bootstrap iterations.HallWellner  equal precision  

timereg  estimating  direct  estimating  direct  
resampling  standard  equation  resampling  equation  resampling  
approach  band  id  log  id  log  id  log  id  log  
100  89.5  88.6  95.5  88.8  95.6  89.5  96.7  88.8  96.3  
86.5  88.7  95.4  88.7  95.5  89.4  96.6  88.8  96.1  
200  93.2  92.3  95.7  92.3  95.8  92.6  96.3  92.3  96.1  
91.7  92.2  95.7  92.3  95.7  92.4  96.4  92.2  95.9  
400  95.8  94.5  96.2  94.6  96.1  94.7  96.3  94.5  96.2  
95.1  94.4  96.2  94.5  96.2  94.6  96.6  94.6  96.4 
HallWellner  equal precision  

timereg  estimating  direct  estimating  direct  
resampling  standard  equation  resampling  equation  resampling  
approach  band  id  log  id  log  id  log  id  log  
100  89.5  90.0  96.6  90.0  96.5  94.3  99.2  93.4  98.9  
86.5  90.2  96.7  89.8  96.4  94.3  99.2  92.3  98.2  
200  93.2  92.9  96.3  92.9  96.2  95.1  98.4  94.6  98.0  
91.9  93.4  96.6  93.3  96.4  95.1  98.6  94.2  97.9  
400  95.8  94.8  96.3  94.8  96.3  96.0  97.6  95.7  97.3  
95.1  94.8  96.3  94.7  96.4  96.0  97.7  95.5  97.3 
HallWellner  equal precision  

timereg  estimating  direct  estimating  direct  
resampling  standard  equation  resampling  equation  resampling  
approach  band  id  log  id  log  id  log  id  log  
100  89.5  88.9  95.7  88.8  95.6  91.0  97.5  90.0  97.0  
86.5  88.8  95.6  89.1  95.6  91.0  97.6  89.6  96.9  
200  93.2  92.3  95.8  92.5  95.7  93.4  97.0  92.8  96.6  
91.9  92.9  96.1  92.9  96.2  93.4  97.2  92.9  96.7  
400  95.8  94.6  96.2  94.6  96.2  95.0  96.7  94.7  96.5  
95.1  94.5  96.2  94.5  96.2  95.1  96.8  94.8  96.6 
We note that when the sample size is all methods gives a reasonable performance. When the sample size is smaller there are notable differences, and it seems that the transform does improve the performance in this case. Whether the bootstrap is based on or does not seem important, and in terms of computations it is considerably easier and faster to use the multipliers based on .
Even though there are strong theoretical and practical advantages of the Poisson variables over standard normal multipliers in nonparametric competing risks models (Dobler et al., 2017), the choice of the bootstrap multipliers does not seem highly important here. Also, the choice of the particular resampling method, be it the direct approach of Section 3.1 or the estimating equation approach of Section 3.2, does not seem to have a clearly positive or negative impact on the outcomes.
Finally, we would like to note that our simulation results are only partially comparable to those of Lin et al. (1994): here, we construct confidence bands for the baseline cumulative hazard function, i.e. for an individual with covariate , whereas they consider bands for survival curves for multiple covariates and their utilized transformations result in different bands. Overall, however, the empirical reliability of the bands in both simulation studies, i.e. theirs and ours, are approximately the same.
5 Data example
In this section we briefly demonstrate how the confidence bands should be used in a standard survival setting. The key point is that they are most often the ones of interest unless focus is on a particular survival probability at a specific time such as for example year survival.
We consider the TRACE study (Jensen et al., 1997) where interest is on survival after acute myocardial infarction for 1878 consecutive patients included in the study. The dataset is available in the timereg Rpackage. Here for sake of illustration we focus interest on the covariates diabetes (1/0), sex and age. Due to the large sample size, we decided to use the timeregbands and show the survival predictions with uniform 95% equal precision bands based on the identity transformation (broken lines) and standard normal multipliers. We also computed 95% pointwise confidence intervals (dotted lines). We depict the confidence bands for a male with average age (66.9 years) and with or without diabetes, as well as the standard 95% pointwise confidence intervals.
We note that the hazard ratio related to diabetes is with 95% confidence interval . Thus reflecting that diabetes is a factor that leads to increased mortality. More interestingly, seen in connection with absolute level of mortality, this is then reflected in our estimated survival curves for males with average age and with diabetes (lower broken fat curve, with confidence bands and intervals) or without diabetes (upper solid fat curve). We note that the bands are a bit wider than the pointwise intervals. As the latter do not provide simultaneous coverage the bands should be used to provide uncertainty about the entire survival curve as shown in Figure 1.
We finally illustrate how the joint asymptotic distribution of the baseline and the covariates can be used with other functionals. To this end, consider the restricted residual mean
with estimator . To get a description of its uncertainty based on the wild bootstrap constructions we can simply apply the functional to the obtained bootstrap samples. It follows that has the same asymptotic distribution as due to Hadamard differentiability of the functional. Thus, we can easily construct symmetric 95% confidence intervals for the restricted residual mean and their differences based on the bootstrap. The key point being that these are very easy to get at when the bootstrap estimates are at hand.
For example, using the direct wild bootstrap approach based on and standard normal multipliers, we find that males with diabetes have a restricted residual mean within the first years at for males without diabetes and with diabetes. Males with diabetes thus lose years within the first years. In a similar way confidence intervals for other functionals can be obtained by means of the continuous mapping theorem or the functional delta method.
6 Discussion and further research
Despite their importance, confidence bands are not used much in practice even though there is considerable interest in making survival predictions based on semiparametric regression models such as the Cox model. This is probably due to the fact that the key software solutions do not have confidence bands implemented in this setting. The aim of this work is to investigate some natural and simple wild bootstrap approaches for filling this gap. In particular, we have shown in the Appendix that the proposed bootstrap solutions do asymptotically have the desired properties. A key point in our proofs is the fact that we show the properties of our bootstrap procedures relying solely on martingale arguments. This enormously facilitates the transfer of the classical proofs for the estimators to their wild bootstrap counterparts. It became apparent that this approach generalizes to much more complex models as long as they admit a martingale structure for the involved counting processes. This covers for example Cox models in multistate models or FineGray regression models for subdistribution functions. A future work will focus on how the procedure can be adapted to more complex designs.
In addition, we consider the finite sample performance of various confidence bands and we see that, when the sample size is too small, one needs to be cautious when constructing such bands. When the sample size is reasonable, however, the bands perform well and should be the preferred way of illustrating the uncertainty of the survival curves.
Another, nice feature of the bootstrap approach is that it provides a very simple tool for constructing confidence intervals for functionals of the parameters of interest. We illustrated this by computing the restricted residual mean based on estimates from the Cox model.
Acknowledgements
Markus Pauly likes to thank for the support from the German Research Foundation (Deutsche Forschungsgemeinschaft).
Appendix
Appendix A Proofs
Proof of Lemma 1.
To a large extent, it is possible to parallel the martingale arguments as used in the proofs of Theorems VII.2.1 and VII.2.2 in Andersen et al. (1993). We show the proof for the resampling scheme (12) only; once it has been understood how martingale methods can be applied here, it will be apparent how to conduct the proof for the classical wild bootstrap scheme (9) which entirely consists of martingales.
Proof of 3. We introduce the process such that , where again denotes the gradient with respect to . We wish to analyse the asymptotic behaviour of the process
whose compensator is
Comments
There are no comments yet.