Semiparametric multi-parameter regression survival modelling

01/15/2019
by   Kevin Burke, et al.
University of Limerick
0

We consider a log-linear model for survival data, where both the location and scale parameters depend on covariates and the baseline hazard function is completely unspecified. This model provides the flexibility needed to capture many interesting features of survival data at a relatively low cost in model complexity. Estimation procedures are developed and asymptotic properties of the resulting estimators are derived using empirical process theory. Finally, a resampling procedure is developed to estimate the limiting variances of the estimators. The finite sample properties of the estimators are investigated by way of a simulation study, and a practical application to lung cancer data is illustrated.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

11/16/2021

Multi-Parameter Regression Survival Modelling with Random Effects

We consider a parametric modelling approach for survival data where cova...
04/27/2022

Semiparametric transformation Model with measurement error in Covariates: An Instrumental variable approach

Linear transformation model provides a general framework for analyzing c...
12/18/2018

A Bayesian semiparametric Archimedean copula

An Archimedean copula is characterised by its generator. This is a real ...
09/13/2020

A simulation-extrapolation approach for the mixture cure model with mismeasured covariates

We consider survival data from a population with cured subjects in the p...
07/27/2018

Survival of the Fittest Group: Factorial Analyses of Treatment Effects under Independent Right-Censoring

This paper introduces new effect parameters for factorial survival desig...
03/09/2018

The nonparametric location-scale mixture cure model

We propose completely nonparametric methodology to investigate location-...
08/10/2021

Estimating a distribution function for discrete data subject to random truncation with an application to structured finance

The literature for estimating a distribution function from truncated dat...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

In the context of survival analysis, we often consider log-linear models of the form
, where and are location and scale parameters, respectively, and is a random error with an assumed parametric distribution on the real numbers. The familiar accelerated failure time model then arises by setting where and

are, respectively, vectors of regression coefficients and covariates (

Kalbfleisch and Prentice (2002, chap. 3) and Lawless (2003, chap. 6). As discussed in Burke and MacKenzie (2017), taking a multi-parameter regression approach (i.e., allowing both and to depend on covariates simultaneously) offers an intuitive and simple way of modelling complicated phenomena. For instance, the phenomenon of crossing survival curves is directly linked to the concentration of events at a given location which is governed by the scale parameter .

A limitation of fully parametric approaches is that the assumed baseline hazard may not always be realistic in practice. Thus, we propose to further extend the log-linear multi-parameter regression model by allowing the baseline hazard to vary freely. This semiparametric model therefore brings together the flexibility of multi-parameter regression with additional robustness afforded by relaxing the assumption of a parametric error distribution. The proposed extension not only generalises multi-parameter regression to semiparametric status but also generalises the semiparametric accelerated failure time model to multi-parameter regression status; the latter fact is noteworthy given that the semiparametric accelerated failure time model has been considered by many authors over the years (Miller, 1976; Prentice, 1978; Buckley and James, 1979; Tsiatis, 1990; Ritov, 1990; Lai and Ying, 1991; Ying, 1993a; Lin et al., 1998; Jin et al., 2003); see Martinussen and Scheike (2007, chap. 8) for a summary of developments in this area.

Other examples of semiparametric multi-parameter regression models, which differ from the model developed in this paper, exist in the literature. Chen and Jewell (2001) considered a model which combined the semiparametric accelerated failure time model and Cox’s (1972) proportional hazards model and, therefore, had two regression components; the model also contained the lesser-known accelerated hazards model (Chen and Wang, 2000) as a special case. Scheike and Zhang (2002a, b) developed a different hybrid model which incorporated the Cox model and the Aalen model (Aalen, 1980) leading to two regression components. Somewhat closer to our work is that of Zeng and Lin (2007b) who considered transformation models with a covariate-dependent scale parameter and, therefore, like us, had regression components corresponding to location and scale. However, whereas their transformation was unspecified with a parametric baseline distribution, we, conversely, focus on the log-transformation with an unspecified baseline distribution.

From a practical perspective, inference based on the semiparametric accelerated failure time model has historically been somewhat cumbersome. This is partly due to the non-smooth nature of the estimating equations involved, but, more importantly, the precision of the resulting estimators does not lend itself to direct (i.e., plug-in) estimation due to its intractability. However, with recent resampling techniques, this is no longer an obstacle, and, specifically, we adapt the method of Zeng and Lin (2008) to our setting to facilitate inference for the regression coefficients. Moreover, we expand Zeng and Lin’s approach to obtain the variance of the cumulative hazard estimator, and combine this with modern empirical process theory which permits straightforward inference for any functional of interest without the need for resolving estimating equations.

2 Model

2.1 Specification and interpretation

In line with the classical formulation of the accelerated failure time model (cf. Kalbfleisch and Prentice (2002, chap. 3)) we specify a regression model for with denoting the failure time. In particular, for the th individual, , we assume that

where the location and scale parameters, and , are related to covariates, , via

where and are vectors of regression coefficients. The error terms, , are assumed to be independent and identically distributed with cumulative hazard function which will be unspecified in our work.

The conditional quantile function for this model is given by

where and is the quantile function for the error distribution, i.e., is the quantile function for a baseline individual. Consider individuals and whose respective and vectors are denoted by . The ratio of their quantile functions is then

This quantile ratio provides insight into the interpretation of the location and scale regression coefficients and, indeed, can be used in practical applications to quantify the overall effect of a given covariate on lifetime. We immediately see that when , the quantile ratio reduces to the usual accelerated failure time constant, , so that the effect of covariates is quantile-independent, i.e., it applies across the whole lifetime, and, for example, implies reduced lifetime. Hence, the proposed model directly extends the accelerated failure time model, providing a lack of fit test of accelerated failure time effects.

It is worth noting that, since is an increasing function on  , the quantile ratio decreases with for , increases with for , and, in both cases, equals one for some value if . Therefore, when , the model implies crossing quantile functions and, hence, crossing survivor functions.

2.2 Derivation of estimation equations

We now reformulate the model in a counting process framework (Andersen et al., 1993) to adopt potential right censoring, and to derive estimation equations based on the resulting intensity processes. For this we denote by the censoring time, the observed event time, and the failure indicator. With these quantities in place, the counting and at risk processes are, respectively, defined as

We note that the hazard rate of is given by

where denotes the derivative of . Consequently, with independent right censoring (Andersen et al., 1993), the intensity process of is given by . Furthermore, for some monotone increasing function, , the time-transformed counting process

has intensity

where . In particular, with , we have that has intensity . This observation motivates a Nelson-Aalen type estimator of , that is, for a given value of , we estimate by

To estimate the regression parameters, , we propose the use of a likelihood based approach. For given , the score function for is given by

where is the gradient operator. By observing that

we rewrite the score function as

To arrive at an operational estimation procedure, we modify this score function as follows. Firstly, we substitute the quantities and with known deterministic functions which we denote by and , respectively. Secondly, we replace by . Thirdly, we truncate integration at an upper limit

, where there is still a positive probability of being at risk. Doing so, we arrive at the estimating equations

which, for ease of exposition in the developments to follow, is rewritten as

where is a diagonal matrix with diagonal elements given by repeated times followed by repeated times, and

The resulting estimate of the true parameter value is obtained as a minimizer of which in turn enables estimation of the cumulative hazard by .

2.3 Weight functions

From the above we see that the weight functions in the efficient score function obey the relationship . Accordingly, we suggest using using weights of the form and in practice as this choice mimics the efficient structure. As for the specific choice of , throughout the literature various authors have found rank-based estimation procedures, and associated variance estimators, to be quite insensitive to the choice of weight function (Lin et al., 1998; Chen and Jewell, 2001; Jin et al., 2003) and, in particular, typically suggest the use of (i) the log-rank weight, , which assigns equal weight to all observations and is efficient when Extreme Value (i.e., Weibull), or (ii) the Gehan weight, , which is somewhat more data-driven in that it assigns less weight to observations for which there is less information (i.e., those corresponding to survival times in the tail of the distribution).

Alternatively, a theoretically semiparametrically efficient procedure could be based on adaptively estimating directly from the data, perhaps using kernel smoothing (Tsiatis, 1990; Lai and Ying, 1991; Zeng and Lin, 2007a). However, this step introduces additional complexity beyond the use of a deterministic weight function, which can introduce some instability into the numerical estimation procedure, and, moreover, one must then consider the selection of an optimal bandwidth – for which there are no clear guidelines in this context, and to which the results (particularly the variance estimators) can be sensitive (Zeng and Lin, 2007a). Furthermore, the resulting efficiency gain is not large in practice (cf. Chen and Jewell (2001), Jin et al. (2003), and Zeng and Lin (2007a)). For these reasons we propose the use of rank-based procedures within our semiparametric multi-parameter regression setting, and investigate some choices of weight function in Section 4.1 and the Online Supporting Information.

3 Asymptotic properties

3.1 Key results

We show that is consistent, that

converges to a zero mean Gaussian distribution, and that

converges to a tight zero mean Gaussian process. Regularity conditions and proofs, extending the arguements of Nan et al. (2009) to the multi-parameter regression setting, can be found in the Online Supporting Information.

First we turn to the consistency. For this purpose let denote the limit of and let denote the limit of . Then we have the following result.

Theorem 1

Assume that is the unique solution of . Then an approximate root satisfying is consistent for .

Next, for detailing the weak convergence of , we adopt the following notation. Let denote what we observe on the th individual, and

so that . In line with this, we shall use the short notation for and also denote by . Moreover we define

so that , and

We also define the matrices and where the subscript in and serves as a reminder that these derivatives are taken with respect to .

Finally, we adopt the usual empirical process notation

where denotes some bounded function on the sample space.

Theorem 2

Let be an approximate root satisfying . Suppose that is differentiable with uniformly bounded and continuous derivative . Then if is non-singular,

Now we turn to the weak convergence of of . For this we define

where , and, furthermore, the associated vector of derivatives . We further define

With this notation we have the following result

Theorem 3

Let be an estimator of such that

converges weakly to a zero mean normal distribution. Then

converges weakly to a tight zero mean Gaussian process on and the following holds

3.2 A resampling procedure for estimating asymptotic variances

We note that the limiting covariance matrix for can be estimated by

where , and and are estimates of their theoretical counterparts. While is easily computed, estimation of would require an estimate of the error hazard which is difficult to obtain reliably. The classical solution to this problem is to produce a sample , , by solving perturbed estimating equations (often based on Parzen et al. (1994)), from which the limiting covariance matrix can be estimated directly; such procedures are, however, computationally intensive owing to solving estimating equations multiple times.

As the representation in Theorem 2 is of the form considered in Zeng and Lin (2008), we may apply their more modern resampling approach which requires re-evaluating (but not re-solving) estimating equations. Their approach is based on the fact that , where is a zero-mean Gaussian -vector independent of the data, which motivates the least squares estimate where and are matrices whose th rows are, respectively, given by the zero-mean Gaussian vector , and the vector , .

In a similar manner to the regression coefficients, we can estimate the limiting variance of using

where is estimated using least squares as described in the previous paragraph, and where . However, an estimator of is difficult to obtain directly. Instead we adapt the resampling idea of Zeng and Lin (2008) to obtain an estimator of . In particular, according to the asymptotic representation of in Theorem 3, we find that

which motivates the least squares estimate where is as before, and is a matrix whose th row is given by , .

With the estimates of the limiting variances of and

, we can straightforwardly produce Wald-type confidence intervals for the parameters, and confidence bands for the error cumulative hazard. In the latter case, as is standard, it is preferable to produce confidence bands on the

scale first and back-transform to the scale. For functionals of and , one could apply the functional delta method (Andersen et al., 1993). However, in line with the resampling approaches discussed above, we suggest the use of the conditional multiplier method from empirical process theory (van der Vaart and Wellner, 1996) which is described in the Online Supporting Information.

4 Numerical studies

4.1 Simulation

We now investigate the performance of our procedure in finite samples by way of a simulation study. In particular, we generated survival times according to the following setup: and with covariates and , parameter vector , and error distribution

. Furthermore, we considered a sample size of 100 where survival times were randomly censored according to a log-normal distribution with unit-scale and location set to achieve censored proportions of approximately 20% or 50% respectively.

Since the estimation equations are non-smooth step-functions of the parameters, we applied the Nelder-Mead optimisation procedure as implemented in the optim function in the R programming language. It is also worth highlighting the fact that the estimation equations, as we have presented them, depend on a threshold, , which was required for our asymptotic derivations to ensure a non-empty risk set so that denominators are theoretically bounded away from zero. To investigate the sensitivity of our approach to the inclusion of , we consider thresholds of 2 and . As for the choice of weight function we consider the log-rank weight, , and the normal weight, where and are the normal pdf and cdf functions, which is the true, efficient weight in our simulation study.

In total we present eight simulation scenarios comprising one sample size, two censored proportions, two threshold values, and two choices of weight function. Each of these scenarios was replicated 5000 times. Within each replicate we estimated the following quantities: (i) the parameter vector, , (ii) the cumulative error hazard at the median error, 0.6931, (iii) the conditional survivor function for covariate profile evaluated at the median time for this covariate profile, 0.5, and (iv) the ratio of the median for to the median for , 0.3679. For quantities (i) and (ii), Wald-type confidence intervals were produced where the limiting variances were estimated using least squares (as described in Section 3.2) with , and for quantites (iii) and (iv) the conditional multiplier method was used (as described in the Online Supporting Information) with .

Log-rank Normal (true, efficient)
Cens. Parameter Bias SE SEE Cov. Bias SE SEE Cov.
20% 0.001 0.099 0.099 94.8 -0.001 0.097 0.096 94.4
0.008 0.189 0.185 94.0 0.002 0.178 0.169 93.1
-0.001 0.174 0.179 94.1 0.004 0.176 0.157 91.8
-0.009 0.168 0.156 94.5 -0.002 0.164 0.153 94.4
0.001 94.6 0.001 94.3
0.003 95.1 0.004 94.8
50% 0.001 0.127 0.124 94.2 -0.006 0.120 0.121 94.8
0.005 0.216 0.210 93.3 0.003 0.201 0.193 93.7
-0.015 0.208 0.230 94.6 0.004 0.215 0.206 93.2
-0.010 0.219 0.203 95.3 -0.003 0.217 0.201 94.8
0.008 95.3 0.005 95.5
0.006 95.5 0.008 95.9
20% 0.001 0.100 0.099 94.3 -0.001 0.098 0.096 94.3
0.003 0.189 0.183 93.6 -0.002 0.179 0.169 93.4
-0.007 0.175 0.177 93.4 0.003 0.171 0.156 93.2
-0.004 0.165 0.156 94.3 -0.005 0.165 0.152 94.2
0.001 94.5 0.004 94.4
0.004 95.3 0.005 94.6
50% 0.003 0.126 0.124 94.5 0.001 0.124 0.121 94.1
0.001 0.215 0.208 93.1 -0.001 0.207 0.191 92.6
-0.016 0.215 0.230 93.7 0.008 0.224 0.206 92.0
-0.010 0.230 0.204 95.1 -0.008 0.223 0.201 94.5
0.009 95.3 0.005 94.4
0.005 95.4 0.007 95.3

Cens., censored proportion; Bias, median bias; SE, standard error of estimates; SEE, median of estimated standard error, Cov., empirical coverage percentage for 95% confidence interval;

; ; . Since the variances for the functionals and are not estimated directly within our scheme, SE and SEE are not shown in those cases.

Table 1: Results of simulation study

The results are summarised in Table 1. It is clear that the estimates are reasonably unbiased in all cases and the associated 95% confidence intervals achieve a coverage percentage which is close to the desired nominal level for both choices of weight function, and, in either case, the results for and are very similar. The estimated standard errors capture the true variations adequately, and, moreover, the efficiency based on the log-rank weights is very close to that of the true, efficient weights (which is in line with the findings of other authors in the simpler accelerated failure time model context). Additional simulation results are given in the Online Supporting Information which cover and , and Gehan weights; the results are comparable with those shown here.

4.2 Lung cancer data

We now apply our model to data arising from a lung cancer study which was the subject of a 1995 Queen’s University Belfast PhD thesis by P. Wilkinson (previously analysed in Burke and MacKenzie (2017)). This observational study pertains to 855 individuals who were diagnosed with lung cancer during the one-year period 1st October 1991 to 30th September 1992, and these individuals were followed up until 30th May 1993 (approximately 20% of survival times were right-censored). The primary interest was to investigate the differences between the following treatment groups: palliative care, surgery, chemotherapy, radiotherapy, and a combined treatment of chemotherapy and radiotherapy. While various other covariates were measured (see Burke and MacKenzie (2017)), the aim here is to illustrate our semiparametric multi-parameter regression methodology for the treatment model.

The results of the fitted model are given in Table 2 where the log-rank weights were used in the estimation procedure (see the Online Supporting Information for other choices of weights which yield numerically very similar results). Firstly note that the coefficients are all negative, and statistically significant, suggesting an improvement in survival relative to palliative care group. The coefficients of radiotherapy and the combined treatment of both chemotherapy and radiotherapy differ statistically from zero, indicating that the quantile ratios are non-constant. Furthermore, note that the coefficients are positive and thus the quantile ratios decrease over the timeframe, i.e., the effectiveness of these treatments diminishes over time. Recall that, this being an observational study, the estimated effects are not “treatment effects” in the sense of a randomised trial, but, notwithstanding this, analyses of observational effects are still useful in their own right.

Location Scale Joint
Treatment Group Sample size Est. SE P-val Est. SE P-val P-val
Palliative care 441 0.00 0.00
Surgery 79 -2.65 0.17 0.01 0.31 0.20 0.12 0.01
Chemotherapy 45 -0.54 0.27 0.04 0.04 0.11 0.73 0.06
Radiotherapy 256 -1.08 0.10 0.01 0.30 0.07 0.01 0.01
Chemo. & radio 34 -1.87 0.12 0.01 0.94 0.17 0.01 0.01

Est., estimated location () or scale () coefficient; SE, standard error; P-val, p-value. The joint p-value corresponds to testing that .

Table 2: Regression coefficients for model fitted to lung cancer data

It is also of interest to test whether the overall effect of a given treatment is statistically significant, i.e., testing for the th group. Asymptotic normality of the estimated parameter vector means that this can be achieved by comparing to a distribution where is the covariance matrix for the pair . The resulting p-values for this test are shown in the last column of Table 2.

Figure 1: Kaplan-Meier (solid) curves with model-based curves (dash) overlayed where P palliative, C chemotherapy, R radiotherapy, CR chemotherapy & radiotherapy combined, and S surgery, respectively.

Figure 1 compares the fitted survivor curves to the Kaplan-Meier curves. We can see that the model provides an excellent fit to the data. Furthermore, we see that, relative to the palliative care curve, the various curves have different shapes, particularly those of radiotherapy and the combined treatment. Indeed, the curves converge at a rate which is indicative of a reduction in treatment effectiveness over time, and is something which cannot be handled by a basic accelerated failure time model. This highlights the flexibility of the multi-parameter regression extension wherein the scale parameter depends on covariates thereby facilitating survivor curves corresponding to non-constant quantile ratios.

While the results of Table 2 provide useful information on the nature of the treatment effects, and which are statistically significant, we now consider the quantile ratios which allow us to quantify the effect of each treatment on lifetime. From Figure 2, we can see that the population assigned to surgery has the highest survival, and the difference is sustained with time. Even though it may appear to diminish with time, it is clear that a horizontal line corresponding to a constant quantile ratio easily fits within the confidence bands. This is in line with the non-significant scale coefficient seen in Table 2. The combined treatment of chemotherapy and radiotherapy is particularly effective early on but drops sharply in effectiveness over time. Radiotherapy provides a more modest improvement in lifetime but has a similar performance to the combined treatment later in time. Finally, chemotherapy appears to have a relatively weak effect over the whole lifetime.

Figure 2: Quantile ratios (relative to palliative care) for each treatment group relative to palliative care along with pointwise 95% confidence intervals (solid). Reference line at unity (dot) also shown.

5 Discussion

In this paper we have extended the semiparametric accelerated failure time model to multi-parameter regression status by jointly modelling the location and scale parameters of its log-linear model representation. This brings together the structural flexibility of multi-parameter regression modelling and the robustness of an unspecified baseline hazard. The resulting model can be interpreted on the lifetime scale through its quantile ratio. The form of the quantile ratio directly generalizes that of the accelerated failure time model in that it depends on the quantile in question (rather than being constant over all quantiles) which allows for effects which change over the individual lifetime. Moreover, this modelling framework produces a new semiparametric test of the accelerated lifetime property for a given covariate which is adjusted for other covariates in the model.

Clearly, the combination of multi-parameter regression and semiparametric modelling is fruitful, and some approaches in this direction exist in current literature, e.g., the proportional-hazards-accelerated-failure-time hybrid of Chen and Jewell (2001) and the proportional-hazards-Aalen hybrid of Scheike and Zhang (2002a, b)

. However, in the aforementioned models, the two regression components essentially both correspond to distributional scale-type parameters, i.e., the components play similar roles. Furthermore, the models do not have a natural scale for interpretation which is related to the previous point. In contrast, the choice of jointly modelling location and dispersion of the error distribution (or scale and shape of the survival distribution) provides somewhat more “orthogonal” components for the inclusion of covariates – heteroscedastic linear models are familiar in other areas such as econometrics – and, as mentioned above, yields an interpretation on the lifetime scale. Note also that the Aalen model

(Aalen, 1980), while not related to multi-parameter regression, is another flexible and robust survival regression model, being fully non-parametric, but which can be somewhat difficult to interpret (owing to the regression functions being on a cumulative hazard scale); Martinussen and Pipper (2013, 2014)

considered its interpretation through the “odds of concordance”.

On model interpretability, we could even go as far as criticizing hazard-based models in general, as did D.R. Cox when he stated that “accelerated life models are in many ways more appealing [than hazard modelling] because of their quite direct physical interpretation” (Reid, 1994), i.e., such models are interpretable on the lifetime scale as is the case for our model. While we might expect the basic accelerated failure time to be more popular on the basis of this “direct physical interpretation”, inference in semiparametric accelerated failure time models has, historically, been comparatively more difficult than in the Cox model, which we discuss in the following paragraph.

Estimation of parameters and baseline cumulative hazard function for our model is based on the creation of a time-transformed counting process (see Section 2.2), yielding a set of estimation equations which directly generalize those of the basic accelerated failure time model. It is noteworthy that our counting process formulation means that the estimating equations extend immediately to more general settings such as multiple events and time-varying covariates. Unlike those of the Cox model, however, the estimation equations for accelerated failure time models (and, hence, for our model) are such that the resulting covariance matrix for the estimators has a non-analytic form. However, we have overcome this by making use of modern empirical process theory in combination with a modification of a resampling procedure due to Zeng and Lin (2008) (as described in Section 3 and the Online Supplementary Information). In particular, our proposal does not require resolving of estimation equations, and permits straightforward inference for any (conditional) survival functional of interest.

In summary, the semiparametric multi-parameter regression model of this paper achieves flexibility through its basic model structure, robustness with respect to distributional assumptions as its baseline distribution is unspecified, and is interpretable on the lifetime scale similar to that of the accelerated failure time model which it directly extends. Our inferential framework combines modern approaches in a novel way which is useful for rank-based estimation in general (beyond our setting and the accelerated failure time model which we generalize), for example, those used in the estimation of transformation models.

Acknowledgement

This research was supported by the Irish Research Council (New Foundations Award) and the Royal Irish Academy (Charlemont Award).

References

  • Aalen (1980) Aalen, O. (1980).

    A model for nonparametric regression analysis of counting processes.

    In

    Mathematical statistics and probability theory

    , pp. 1–25. Springer.
  • Andersen et al. (1993) Andersen, P., Ø. Borgan, R. Gill, and N. Keiding (1993). Statistical models based on counting processes. New York: Springer.
  • Buckley and James (1979) Buckley, J. and I. James (1979). Linear regression with censored data. Biometrika 66, 429–436.
  • Burke and MacKenzie (2017) Burke, K. and G. MacKenzie (2017). Multi-parameter regression survival modelling: An alternative to proportional hazards. Biometrics 73, 678–686.
  • Chen and Jewell (2001) Chen, Y. and N. Jewell (2001). On a general class of semiparametric hazards regression models. Biometrika 88, 687–702.
  • Chen and Wang (2000) Chen, Y. and M. Wang (2000). Analysis of accelerated hazards models. J. Am. Statist. Ass. 95, 608–618.
  • Cox (1972) Cox, D. R. (1972). Regression models and life tables (with discussion). J. R. Statist. Soc. 34, 187–220.
  • Jin et al. (2003) Jin, Z., D. Lin, L. Wei, and Z. Ying (2003). Rank-based inference for the accelerated failure time model. Biometrika 90, 341–353.
  • Kalbfleisch and Prentice (2002) Kalbfleisch, J. D. and R. L. Prentice (2002). The Statistical Analysis of Failure Time Data (Second ed.). Wiley.
  • Kim and Zeng (2013) Kim, E. and D. Zeng (2013). Semiparametic ROC analysis using accelerated regression models. Stat. Sin. 23, 829–851.
  • Lai and Ying (1991) Lai, T. L. and Z. Ying (1991). Large sample theory of a modified buckley-james estimator for regression analysis with censored data. Ann. Statist. 19, 1370–1402.
  • Lawless (2003) Lawless, J. (2003). Statistical Models and Methods for Lifetime Data (Second ed.). Wiley.
  • Lin et al. (1998) Lin, D., L. Wei, and Z. Ying (1998). Accelerated failure time models for counting processes. Biometrika 85, 605–618.
  • Martinussen and Pipper (2013) Martinussen, T. and C. B. Pipper (2013). Estimation of odds of concordance based on the Aalen additive model. Lifetime Data Analysis 19, 100–116.
  • Martinussen and Pipper (2014) Martinussen, T. and C. B. Pipper (2014). Estimation of causal odds of concordance using the Aalen additive model. Scand. J. Statist. 41, 141–151.
  • Martinussen and Scheike (2007) Martinussen, T. and T. Scheike (2007). Dynamic regression models for survival data. Springer.
  • Miller (1976) Miller, R. G. (1976). Least squares regression with censored data. Biometrika 63, 449–464.
  • Nan et al. (2009) Nan, B., J. Kalbfleisch, and M. Yu (2009). Asymptotic theory for the semiparametric accelerated failure time model with missing data. Ann. Statist. 37, 2351–2376.
  • Parzen et al. (1994) Parzen, M., L. Wei, and Z. Ying (1994). A resampling method based on pivotal estimating functions. Biometrika 81, 341–350.
  • Prentice (1978) Prentice, R. L. (1978). Linear rank tests with right censored data. Biometrika 65, 167–179.
  • Reid (1994) Reid, N. (1994). A conversation with Sir David Cox. Statist. Sci. 9, 439–455.
  • Ritov (1990) Ritov, Y. (1990). Estimation in a linear regression model with censored data. Ann. Statist. 18, 303–328.
  • Scheike and Zhang (2002a) Scheike, T. H. and M. J. Zhang (2002a). An additive-multiplicative Cox-Aalen model. Scand. J. Statist. 28, 75–88.
  • Scheike and Zhang (2002b) Scheike, T. H. and M. J. Zhang (2002b). Extensions and applications of the Cox-Aalen survival model. Biometrics 59, 1033–1045.
  • Tsiatis (1990) Tsiatis, A. A. (1990). Estimating regression parameters using linear rank tests for censored data. Ann. Statist. 18, 354–372.
  • van der Vaart (1998) van der Vaart, A. (1998). Asymptotic statistics. Cambridge: Cambridge university press.
  • van der Vaart and Wellner (1996) van der Vaart, A. and J. Wellner (1996). Weak convergence and empirical processes : with applications to statistics. New York: Springer.
  • Ying (1993a) Ying, Z. (1993a). A large sample study of rank estimation for censored regression data. Ann. Statist. 21, 76–99.
  • Ying (1993b) Ying, Z. (1993b). A large sample study of rank estimation for censored regression data. Ann. Statist. 21, 76–99.
  • Zeng and Lin (2007a) Zeng, D. and D. Lin (2007a). Efficient estimation for the accelerated failure time model. J. Amer. Statist. Assoc. 102, 1387–1396.
  • Zeng and Lin (2007b) Zeng, D. and D. Lin (2007b). Maximum likelihood estimation in semiparametric regression models with censored data. J. R. Statist. Soc. 69, 507–564.
  • Zeng and Lin (2008) Zeng, D. and D. Lin (2008). Efficient resampling methods for nonsmooth estimating functions. Biostatistics 9, 355–363.

Appendix A Assumptions

Assumption 1

The parameter lies in the interior of a compact set .

Assumption 2

There exists constants and , such that for all , and .

Assumption 3

The covariates and are uniformly bounded with probability one.

Assumption 4

The error density and its derivative are bounded and .

Assumption 5

has uniformly bounded densities.

Assumption 6

The diagonal of is differentiable in with bounded continuous derivative and is a bounded Donsker class.

Assumption 7

has finite second order moment.

Remark 1

Assumptions 4 and 5 are those assumed in Nan et al. (2009) directly adapted from Ying (1993b).

Appendix B Asymptotic results

The asymptotic properties are established by extending the proofs of Nan et al. (2009) to the multi parameter regression setting. We show asymptotic linearity of in in a neighborhood of the true value . This will rely heavily on the fact that the function classes and defined below have bracketing numbers of polynomial order.

and

With and denoting the bracketing numbers of and , respectively, we have the following result

Lemma 1

There exist , so that for all ,

(B.1)

Proof. First note that due to the compactness of and Assumption 3 there exist so that . It follows as in van der Vaart (1998) Example 19.7 that there exist and so that the class can be covered by brackets of the form where . Now let be a partition of the interval so that and so that . From this partition define

and note that the brackets cover . Moreover note that from Assumptions 3 and 7 the class has an envelope which we shall term . It now follows that

for all , where the last inequality follows from direct application of Markov’s generalized inequality. Similarly for and for all

Using Markov’s inequality we get . Furthermore from Assumptions 4 and 5 there exists a constant such that for all and

Consequently, by choosing , we obtain the following bound for

Combining all the bounds we conclude that there exists and such that for all , . A rescaling then proves (B.1).

For the second part of the lemma note that is bounded as is and . Finally according to van der Vaart (1998, Example 19.7), the bracketing number of is less than of the order . Adding up we see that the bracketing number of is less than of the order .

 

Theorem 1

Assume that is the unique solution of . Then an approximate root satisfying is consistent for .

(i).

Let denote the supremum norm. Since is the unique solution to and is compact it follows that for any fixed , there exists a such that . If we can show that

(B.2)

then the consistency of follows.

We first show that