1 Introduction
In the context of survival analysis, we often consider loglinear models of the form
, where and are location and scale parameters, respectively, and is a random error with an assumed parametric distribution on the real numbers. The familiar accelerated failure time model then arises by setting where and
are, respectively, vectors of regression coefficients and covariates (
Kalbfleisch and Prentice (2002, chap. 3) and Lawless (2003, chap. 6). As discussed in Burke and MacKenzie (2017), taking a multiparameter regression approach (i.e., allowing both and to depend on covariates simultaneously) offers an intuitive and simple way of modelling complicated phenomena. For instance, the phenomenon of crossing survival curves is directly linked to the concentration of events at a given location which is governed by the scale parameter .A limitation of fully parametric approaches is that the assumed baseline hazard may not always be realistic in practice. Thus, we propose to further extend the loglinear multiparameter regression model by allowing the baseline hazard to vary freely. This semiparametric model therefore brings together the flexibility of multiparameter regression with additional robustness afforded by relaxing the assumption of a parametric error distribution. The proposed extension not only generalises multiparameter regression to semiparametric status but also generalises the semiparametric accelerated failure time model to multiparameter regression status; the latter fact is noteworthy given that the semiparametric accelerated failure time model has been considered by many authors over the years (Miller, 1976; Prentice, 1978; Buckley and James, 1979; Tsiatis, 1990; Ritov, 1990; Lai and Ying, 1991; Ying, 1993a; Lin et al., 1998; Jin et al., 2003); see Martinussen and Scheike (2007, chap. 8) for a summary of developments in this area.
Other examples of semiparametric multiparameter regression models, which differ from the model developed in this paper, exist in the literature. Chen and Jewell (2001) considered a model which combined the semiparametric accelerated failure time model and Cox’s (1972) proportional hazards model and, therefore, had two regression components; the model also contained the lesserknown accelerated hazards model (Chen and Wang, 2000) as a special case. Scheike and Zhang (2002a, b) developed a different hybrid model which incorporated the Cox model and the Aalen model (Aalen, 1980) leading to two regression components. Somewhat closer to our work is that of Zeng and Lin (2007b) who considered transformation models with a covariatedependent scale parameter and, therefore, like us, had regression components corresponding to location and scale. However, whereas their transformation was unspecified with a parametric baseline distribution, we, conversely, focus on the logtransformation with an unspecified baseline distribution.
From a practical perspective, inference based on the semiparametric accelerated failure time model has historically been somewhat cumbersome. This is partly due to the nonsmooth nature of the estimating equations involved, but, more importantly, the precision of the resulting estimators does not lend itself to direct (i.e., plugin) estimation due to its intractability. However, with recent resampling techniques, this is no longer an obstacle, and, specifically, we adapt the method of Zeng and Lin (2008) to our setting to facilitate inference for the regression coefficients. Moreover, we expand Zeng and Lin’s approach to obtain the variance of the cumulative hazard estimator, and combine this with modern empirical process theory which permits straightforward inference for any functional of interest without the need for resolving estimating equations.
2 Model
2.1 Specification and interpretation
In line with the classical formulation of the accelerated failure time model (cf. Kalbfleisch and Prentice (2002, chap. 3)) we specify a regression model for with denoting the failure time. In particular, for the th individual, , we assume that
where the location and scale parameters, and , are related to covariates, , via
where and are vectors of regression coefficients. The error terms, , are assumed to be independent and identically distributed with cumulative hazard function which will be unspecified in our work.
The conditional quantile function for this model is given by
where and is the quantile function for the error distribution, i.e., is the quantile function for a baseline individual. Consider individuals and whose respective and vectors are denoted by . The ratio of their quantile functions is then
This quantile ratio provides insight into the interpretation of the location and scale regression coefficients and, indeed, can be used in practical applications to quantify the overall effect of a given covariate on lifetime. We immediately see that when , the quantile ratio reduces to the usual accelerated failure time constant, , so that the effect of covariates is quantileindependent, i.e., it applies across the whole lifetime, and, for example, implies reduced lifetime. Hence, the proposed model directly extends the accelerated failure time model, providing a lack of fit test of accelerated failure time effects.
It is worth noting that, since is an increasing function on , the quantile ratio decreases with for , increases with for , and, in both cases, equals one for some value if . Therefore, when , the model implies crossing quantile functions and, hence, crossing survivor functions.
2.2 Derivation of estimation equations
We now reformulate the model in a counting process framework (Andersen et al., 1993) to adopt potential right censoring, and to derive estimation equations based on the resulting intensity processes. For this we denote by the censoring time, the observed event time, and the failure indicator. With these quantities in place, the counting and at risk processes are, respectively, defined as
We note that the hazard rate of is given by
where denotes the derivative of . Consequently, with independent right censoring (Andersen et al., 1993), the intensity process of is given by . Furthermore, for some monotone increasing function, , the timetransformed counting process
has intensity
where . In particular, with , we have that has intensity . This observation motivates a NelsonAalen type estimator of , that is, for a given value of , we estimate by
To estimate the regression parameters, , we propose the use of a likelihood based approach. For given , the score function for is given by
where is the gradient operator. By observing that
we rewrite the score function as
To arrive at an operational estimation procedure, we modify this score function as follows. Firstly, we substitute the quantities and with known deterministic functions which we denote by and , respectively. Secondly, we replace by . Thirdly, we truncate integration at an upper limit
, where there is still a positive probability of being at risk. Doing so, we arrive at the estimating equations
which, for ease of exposition in the developments to follow, is rewritten as
where is a diagonal matrix with diagonal elements given by repeated times followed by repeated times, and
The resulting estimate of the true parameter value is obtained as a minimizer of which in turn enables estimation of the cumulative hazard by .
2.3 Weight functions
From the above we see that the weight functions in the efficient score function obey the relationship . Accordingly, we suggest using using weights of the form and in practice as this choice mimics the efficient structure. As for the specific choice of , throughout the literature various authors have found rankbased estimation procedures, and associated variance estimators, to be quite insensitive to the choice of weight function (Lin et al., 1998; Chen and Jewell, 2001; Jin et al., 2003) and, in particular, typically suggest the use of (i) the logrank weight, , which assigns equal weight to all observations and is efficient when Extreme Value (i.e., Weibull), or (ii) the Gehan weight, , which is somewhat more datadriven in that it assigns less weight to observations for which there is less information (i.e., those corresponding to survival times in the tail of the distribution).
Alternatively, a theoretically semiparametrically efficient procedure could be based on adaptively estimating directly from the data, perhaps using kernel smoothing (Tsiatis, 1990; Lai and Ying, 1991; Zeng and Lin, 2007a). However, this step introduces additional complexity beyond the use of a deterministic weight function, which can introduce some instability into the numerical estimation procedure, and, moreover, one must then consider the selection of an optimal bandwidth – for which there are no clear guidelines in this context, and to which the results (particularly the variance estimators) can be sensitive (Zeng and Lin, 2007a). Furthermore, the resulting efficiency gain is not large in practice (cf. Chen and Jewell (2001), Jin et al. (2003), and Zeng and Lin (2007a)). For these reasons we propose the use of rankbased procedures within our semiparametric multiparameter regression setting, and investigate some choices of weight function in Section 4.1 and the Online Supporting Information.
3 Asymptotic properties
3.1 Key results
We show that is consistent, that
converges to a zero mean Gaussian distribution, and that
converges to a tight zero mean Gaussian process. Regularity conditions and proofs, extending the arguements of Nan et al. (2009) to the multiparameter regression setting, can be found in the Online Supporting Information.First we turn to the consistency. For this purpose let denote the limit of and let denote the limit of . Then we have the following result.
Theorem 1
Assume that is the unique solution of . Then an approximate root satisfying is consistent for .
Next, for detailing the weak convergence of , we adopt the following notation. Let denote what we observe on the th individual, and
so that . In line with this, we shall use the short notation for and also denote by . Moreover we define
so that , and
We also define the matrices and where the subscript in and serves as a reminder that these derivatives are taken with respect to .
Finally, we adopt the usual empirical process notation
where denotes some bounded function on the sample space.
Theorem 2
Let be an approximate root satisfying . Suppose that is differentiable with uniformly bounded and continuous derivative . Then if is nonsingular,
Now we turn to the weak convergence of of . For this we define
where , and, furthermore, the associated vector of derivatives . We further define
With this notation we have the following result
Theorem 3
Let be an estimator of such that
converges weakly to a zero mean normal distribution. Then
converges weakly to a tight zero mean Gaussian process on and the following holds3.2 A resampling procedure for estimating asymptotic variances
We note that the limiting covariance matrix for can be estimated by
where , and and are estimates of their theoretical counterparts. While is easily computed, estimation of would require an estimate of the error hazard which is difficult to obtain reliably. The classical solution to this problem is to produce a sample , , by solving perturbed estimating equations (often based on Parzen et al. (1994)), from which the limiting covariance matrix can be estimated directly; such procedures are, however, computationally intensive owing to solving estimating equations multiple times.
As the representation in Theorem 2 is of the form considered in Zeng and Lin (2008), we may apply their more modern resampling approach which requires reevaluating (but not resolving) estimating equations. Their approach is based on the fact that , where is a zeromean Gaussian vector independent of the data, which motivates the least squares estimate where and are matrices whose th rows are, respectively, given by the zeromean Gaussian vector , and the vector , .
In a similar manner to the regression coefficients, we can estimate the limiting variance of using
where is estimated using least squares as described in the previous paragraph, and where . However, an estimator of is difficult to obtain directly. Instead we adapt the resampling idea of Zeng and Lin (2008) to obtain an estimator of . In particular, according to the asymptotic representation of in Theorem 3, we find that
which motivates the least squares estimate where is as before, and is a matrix whose th row is given by , .
With the estimates of the limiting variances of and
, we can straightforwardly produce Waldtype confidence intervals for the parameters, and confidence bands for the error cumulative hazard. In the latter case, as is standard, it is preferable to produce confidence bands on the
scale first and backtransform to the scale. For functionals of and , one could apply the functional delta method (Andersen et al., 1993). However, in line with the resampling approaches discussed above, we suggest the use of the conditional multiplier method from empirical process theory (van der Vaart and Wellner, 1996) which is described in the Online Supporting Information.4 Numerical studies
4.1 Simulation
We now investigate the performance of our procedure in finite samples by way of a simulation study. In particular, we generated survival times according to the following setup: and with covariates and , parameter vector , and error distribution
. Furthermore, we considered a sample size of 100 where survival times were randomly censored according to a lognormal distribution with unitscale and location set to achieve censored proportions of approximately 20% or 50% respectively.
Since the estimation equations are nonsmooth stepfunctions of the parameters, we applied the NelderMead optimisation procedure as implemented in the optim function in the R programming language. It is also worth highlighting the fact that the estimation equations, as we have presented them, depend on a threshold, , which was required for our asymptotic derivations to ensure a nonempty risk set so that denominators are theoretically bounded away from zero. To investigate the sensitivity of our approach to the inclusion of , we consider thresholds of 2 and . As for the choice of weight function we consider the logrank weight, , and the normal weight, where and are the normal pdf and cdf functions, which is the true, efficient weight in our simulation study.
In total we present eight simulation scenarios comprising one sample size, two censored proportions, two threshold values, and two choices of weight function. Each of these scenarios was replicated 5000 times. Within each replicate we estimated the following quantities: (i) the parameter vector, , (ii) the cumulative error hazard at the median error, 0.6931, (iii) the conditional survivor function for covariate profile evaluated at the median time for this covariate profile, 0.5, and (iv) the ratio of the median for to the median for , 0.3679. For quantities (i) and (ii), Waldtype confidence intervals were produced where the limiting variances were estimated using least squares (as described in Section 3.2) with , and for quantites (iii) and (iv) the conditional multiplier method was used (as described in the Online Supporting Information) with .
Logrank  Normal (true, efficient)  
Cens.  Parameter  Bias  SE  SEE  Cov.  Bias  SE  SEE  Cov.  
20%  0.001  0.099  0.099  94.8  0.001  0.097  0.096  94.4  
0.008  0.189  0.185  94.0  0.002  0.178  0.169  93.1  
0.001  0.174  0.179  94.1  0.004  0.176  0.157  91.8  
0.009  0.168  0.156  94.5  0.002  0.164  0.153  94.4  
0.001  —  —  94.6  0.001  —  —  94.3  
0.003  —  —  95.1  0.004  —  —  94.8  
50%  0.001  0.127  0.124  94.2  0.006  0.120  0.121  94.8  
0.005  0.216  0.210  93.3  0.003  0.201  0.193  93.7  
0.015  0.208  0.230  94.6  0.004  0.215  0.206  93.2  
0.010  0.219  0.203  95.3  0.003  0.217  0.201  94.8  
0.008  —  —  95.3  0.005  —  —  95.5  
0.006  —  —  95.5  0.008  —  —  95.9  
20%  0.001  0.100  0.099  94.3  0.001  0.098  0.096  94.3  
0.003  0.189  0.183  93.6  0.002  0.179  0.169  93.4  
0.007  0.175  0.177  93.4  0.003  0.171  0.156  93.2  
0.004  0.165  0.156  94.3  0.005  0.165  0.152  94.2  
0.001  —  —  94.5  0.004  —  —  94.4  
0.004  —  —  95.3  0.005  —  —  94.6  
50%  0.003  0.126  0.124  94.5  0.001  0.124  0.121  94.1  
0.001  0.215  0.208  93.1  0.001  0.207  0.191  92.6  
0.016  0.215  0.230  93.7  0.008  0.224  0.206  92.0  
0.010  0.230  0.204  95.1  0.008  0.223  0.201  94.5  
0.009  —  —  95.3  0.005  —  —  94.4  
0.005  —  —  95.4  0.007  —  —  95.3 
Cens., censored proportion; Bias, median bias; SE, standard error of estimates; SEE, median of estimated standard error, Cov., empirical coverage percentage for 95% confidence interval;
; ; . Since the variances for the functionals and are not estimated directly within our scheme, SE and SEE are not shown in those cases.The results are summarised in Table 1. It is clear that the estimates are reasonably unbiased in all cases and the associated 95% confidence intervals achieve a coverage percentage which is close to the desired nominal level for both choices of weight function, and, in either case, the results for and are very similar. The estimated standard errors capture the true variations adequately, and, moreover, the efficiency based on the logrank weights is very close to that of the true, efficient weights (which is in line with the findings of other authors in the simpler accelerated failure time model context). Additional simulation results are given in the Online Supporting Information which cover and , and Gehan weights; the results are comparable with those shown here.
4.2 Lung cancer data
We now apply our model to data arising from a lung cancer study which was the subject of a 1995 Queen’s University Belfast PhD thesis by P. Wilkinson (previously analysed in Burke and MacKenzie (2017)). This observational study pertains to 855 individuals who were diagnosed with lung cancer during the oneyear period 1st October 1991 to 30th September 1992, and these individuals were followed up until 30th May 1993 (approximately 20% of survival times were rightcensored). The primary interest was to investigate the differences between the following treatment groups: palliative care, surgery, chemotherapy, radiotherapy, and a combined treatment of chemotherapy and radiotherapy. While various other covariates were measured (see Burke and MacKenzie (2017)), the aim here is to illustrate our semiparametric multiparameter regression methodology for the treatment model.
The results of the fitted model are given in Table 2 where the logrank weights were used in the estimation procedure (see the Online Supporting Information for other choices of weights which yield numerically very similar results). Firstly note that the coefficients are all negative, and statistically significant, suggesting an improvement in survival relative to palliative care group. The coefficients of radiotherapy and the combined treatment of both chemotherapy and radiotherapy differ statistically from zero, indicating that the quantile ratios are nonconstant. Furthermore, note that the coefficients are positive and thus the quantile ratios decrease over the timeframe, i.e., the effectiveness of these treatments diminishes over time. Recall that, this being an observational study, the estimated effects are not “treatment effects” in the sense of a randomised trial, but, notwithstanding this, analyses of observational effects are still useful in their own right.
Location  Scale  Joint  
Treatment Group  Sample size  Est.  SE  Pval  Est.  SE  Pval  Pval 
Palliative care  441  0.00  —  —  0.00  —  —  — 
Surgery  79  2.65  0.17  0.01  0.31  0.20  0.12  0.01 
Chemotherapy  45  0.54  0.27  0.04  0.04  0.11  0.73  0.06 
Radiotherapy  256  1.08  0.10  0.01  0.30  0.07  0.01  0.01 
Chemo. & radio  34  1.87  0.12  0.01  0.94  0.17  0.01  0.01 
Est., estimated location () or scale () coefficient; SE, standard error; Pval, pvalue. The joint pvalue corresponds to testing that .
It is also of interest to test whether the overall effect of a given treatment is statistically significant, i.e., testing for the th group. Asymptotic normality of the estimated parameter vector means that this can be achieved by comparing to a distribution where is the covariance matrix for the pair . The resulting pvalues for this test are shown in the last column of Table 2.
Figure 1 compares the fitted survivor curves to the KaplanMeier curves. We can see that the model provides an excellent fit to the data. Furthermore, we see that, relative to the palliative care curve, the various curves have different shapes, particularly those of radiotherapy and the combined treatment. Indeed, the curves converge at a rate which is indicative of a reduction in treatment effectiveness over time, and is something which cannot be handled by a basic accelerated failure time model. This highlights the flexibility of the multiparameter regression extension wherein the scale parameter depends on covariates thereby facilitating survivor curves corresponding to nonconstant quantile ratios.
While the results of Table 2 provide useful information on the nature of the treatment effects, and which are statistically significant, we now consider the quantile ratios which allow us to quantify the effect of each treatment on lifetime. From Figure 2, we can see that the population assigned to surgery has the highest survival, and the difference is sustained with time. Even though it may appear to diminish with time, it is clear that a horizontal line corresponding to a constant quantile ratio easily fits within the confidence bands. This is in line with the nonsignificant scale coefficient seen in Table 2. The combined treatment of chemotherapy and radiotherapy is particularly effective early on but drops sharply in effectiveness over time. Radiotherapy provides a more modest improvement in lifetime but has a similar performance to the combined treatment later in time. Finally, chemotherapy appears to have a relatively weak effect over the whole lifetime.
5 Discussion
In this paper we have extended the semiparametric accelerated failure time model to multiparameter regression status by jointly modelling the location and scale parameters of its loglinear model representation. This brings together the structural flexibility of multiparameter regression modelling and the robustness of an unspecified baseline hazard. The resulting model can be interpreted on the lifetime scale through its quantile ratio. The form of the quantile ratio directly generalizes that of the accelerated failure time model in that it depends on the quantile in question (rather than being constant over all quantiles) which allows for effects which change over the individual lifetime. Moreover, this modelling framework produces a new semiparametric test of the accelerated lifetime property for a given covariate which is adjusted for other covariates in the model.
Clearly, the combination of multiparameter regression and semiparametric modelling is fruitful, and some approaches in this direction exist in current literature, e.g., the proportionalhazardsacceleratedfailuretime hybrid of Chen and Jewell (2001) and the proportionalhazardsAalen hybrid of Scheike and Zhang (2002a, b)
. However, in the aforementioned models, the two regression components essentially both correspond to distributional scaletype parameters, i.e., the components play similar roles. Furthermore, the models do not have a natural scale for interpretation which is related to the previous point. In contrast, the choice of jointly modelling location and dispersion of the error distribution (or scale and shape of the survival distribution) provides somewhat more “orthogonal” components for the inclusion of covariates – heteroscedastic linear models are familiar in other areas such as econometrics – and, as mentioned above, yields an interpretation on the lifetime scale. Note also that the Aalen model
(Aalen, 1980), while not related to multiparameter regression, is another flexible and robust survival regression model, being fully nonparametric, but which can be somewhat difficult to interpret (owing to the regression functions being on a cumulative hazard scale); Martinussen and Pipper (2013, 2014)considered its interpretation through the “odds of concordance”.
On model interpretability, we could even go as far as criticizing hazardbased models in general, as did D.R. Cox when he stated that “accelerated life models are in many ways more appealing [than hazard modelling] because of their quite direct physical interpretation” (Reid, 1994), i.e., such models are interpretable on the lifetime scale as is the case for our model. While we might expect the basic accelerated failure time to be more popular on the basis of this “direct physical interpretation”, inference in semiparametric accelerated failure time models has, historically, been comparatively more difficult than in the Cox model, which we discuss in the following paragraph.
Estimation of parameters and baseline cumulative hazard function for our model is based on the creation of a timetransformed counting process (see Section 2.2), yielding a set of estimation equations which directly generalize those of the basic accelerated failure time model. It is noteworthy that our counting process formulation means that the estimating equations extend immediately to more general settings such as multiple events and timevarying covariates. Unlike those of the Cox model, however, the estimation equations for accelerated failure time models (and, hence, for our model) are such that the resulting covariance matrix for the estimators has a nonanalytic form. However, we have overcome this by making use of modern empirical process theory in combination with a modification of a resampling procedure due to Zeng and Lin (2008) (as described in Section 3 and the Online Supplementary Information). In particular, our proposal does not require resolving of estimation equations, and permits straightforward inference for any (conditional) survival functional of interest.
In summary, the semiparametric multiparameter regression model of this paper achieves flexibility through its basic model structure, robustness with respect to distributional assumptions as its baseline distribution is unspecified, and is interpretable on the lifetime scale similar to that of the accelerated failure time model which it directly extends. Our inferential framework combines modern approaches in a novel way which is useful for rankbased estimation in general (beyond our setting and the accelerated failure time model which we generalize), for example, those used in the estimation of transformation models.
Acknowledgement
This research was supported by the Irish Research Council (New Foundations Award) and the Royal Irish Academy (Charlemont Award).
References

Aalen (1980)
Aalen, O. (1980).
A model for nonparametric regression analysis of counting processes.
InMathematical statistics and probability theory
, pp. 1–25. Springer.  Andersen et al. (1993) Andersen, P., Ø. Borgan, R. Gill, and N. Keiding (1993). Statistical models based on counting processes. New York: Springer.
 Buckley and James (1979) Buckley, J. and I. James (1979). Linear regression with censored data. Biometrika 66, 429–436.
 Burke and MacKenzie (2017) Burke, K. and G. MacKenzie (2017). Multiparameter regression survival modelling: An alternative to proportional hazards. Biometrics 73, 678–686.
 Chen and Jewell (2001) Chen, Y. and N. Jewell (2001). On a general class of semiparametric hazards regression models. Biometrika 88, 687–702.
 Chen and Wang (2000) Chen, Y. and M. Wang (2000). Analysis of accelerated hazards models. J. Am. Statist. Ass. 95, 608–618.
 Cox (1972) Cox, D. R. (1972). Regression models and life tables (with discussion). J. R. Statist. Soc. B 34, 187–220.
 Jin et al. (2003) Jin, Z., D. Lin, L. Wei, and Z. Ying (2003). Rankbased inference for the accelerated failure time model. Biometrika 90, 341–353.
 Kalbfleisch and Prentice (2002) Kalbfleisch, J. D. and R. L. Prentice (2002). The Statistical Analysis of Failure Time Data (Second ed.). Wiley.
 Kim and Zeng (2013) Kim, E. and D. Zeng (2013). Semiparametic ROC analysis using accelerated regression models. Stat. Sin. 23, 829–851.
 Lai and Ying (1991) Lai, T. L. and Z. Ying (1991). Large sample theory of a modified buckleyjames estimator for regression analysis with censored data. Ann. Statist. 19, 1370–1402.
 Lawless (2003) Lawless, J. (2003). Statistical Models and Methods for Lifetime Data (Second ed.). Wiley.
 Lin et al. (1998) Lin, D., L. Wei, and Z. Ying (1998). Accelerated failure time models for counting processes. Biometrika 85, 605–618.
 Martinussen and Pipper (2013) Martinussen, T. and C. B. Pipper (2013). Estimation of odds of concordance based on the Aalen additive model. Lifetime Data Analysis 19, 100–116.
 Martinussen and Pipper (2014) Martinussen, T. and C. B. Pipper (2014). Estimation of causal odds of concordance using the Aalen additive model. Scand. J. Statist. 41, 141–151.
 Martinussen and Scheike (2007) Martinussen, T. and T. Scheike (2007). Dynamic regression models for survival data. Springer.
 Miller (1976) Miller, R. G. (1976). Least squares regression with censored data. Biometrika 63, 449–464.
 Nan et al. (2009) Nan, B., J. Kalbfleisch, and M. Yu (2009). Asymptotic theory for the semiparametric accelerated failure time model with missing data. Ann. Statist. 37, 2351–2376.
 Parzen et al. (1994) Parzen, M., L. Wei, and Z. Ying (1994). A resampling method based on pivotal estimating functions. Biometrika 81, 341–350.
 Prentice (1978) Prentice, R. L. (1978). Linear rank tests with right censored data. Biometrika 65, 167–179.
 Reid (1994) Reid, N. (1994). A conversation with Sir David Cox. Statist. Sci. 9, 439–455.
 Ritov (1990) Ritov, Y. (1990). Estimation in a linear regression model with censored data. Ann. Statist. 18, 303–328.
 Scheike and Zhang (2002a) Scheike, T. H. and M. J. Zhang (2002a). An additivemultiplicative CoxAalen model. Scand. J. Statist. 28, 75–88.
 Scheike and Zhang (2002b) Scheike, T. H. and M. J. Zhang (2002b). Extensions and applications of the CoxAalen survival model. Biometrics 59, 1033–1045.
 Tsiatis (1990) Tsiatis, A. A. (1990). Estimating regression parameters using linear rank tests for censored data. Ann. Statist. 18, 354–372.
 van der Vaart (1998) van der Vaart, A. (1998). Asymptotic statistics. Cambridge: Cambridge university press.
 van der Vaart and Wellner (1996) van der Vaart, A. and J. Wellner (1996). Weak convergence and empirical processes : with applications to statistics. New York: Springer.
 Ying (1993a) Ying, Z. (1993a). A large sample study of rank estimation for censored regression data. Ann. Statist. 21, 76–99.
 Ying (1993b) Ying, Z. (1993b). A large sample study of rank estimation for censored regression data. Ann. Statist. 21, 76–99.
 Zeng and Lin (2007a) Zeng, D. and D. Lin (2007a). Efficient estimation for the accelerated failure time model. J. Amer. Statist. Assoc. 102, 1387–1396.
 Zeng and Lin (2007b) Zeng, D. and D. Lin (2007b). Maximum likelihood estimation in semiparametric regression models with censored data. J. R. Statist. Soc. B 69, 507–564.
 Zeng and Lin (2008) Zeng, D. and D. Lin (2008). Efficient resampling methods for nonsmooth estimating functions. Biostatistics 9, 355–363.
Appendix A Assumptions
Assumption 1
The parameter lies in the interior of a compact set .
Assumption 2
There exists constants and , such that for all , and .
Assumption 3
The covariates and are uniformly bounded with probability one.
Assumption 4
The error density and its derivative are bounded and .
Assumption 5
has uniformly bounded densities.
Assumption 6
The diagonal of is differentiable in with bounded continuous derivative and is a bounded Donsker class.
Assumption 7
has finite second order moment.
Appendix B Asymptotic results
The asymptotic properties are established by extending the proofs of Nan et al. (2009) to the multi parameter regression setting. We show asymptotic linearity of in in a neighborhood of the true value . This will rely heavily on the fact that the function classes and defined below have bracketing numbers of polynomial order.
and
With and denoting the bracketing numbers of and , respectively, we have the following result
Lemma 1
There exist , so that for all ,
(B.1)  
Proof. First note that due to the compactness of and Assumption 3 there exist so that . It follows as in van der Vaart (1998) Example 19.7 that there exist and so that the class can be covered by brackets of the form where . Now let be a partition of the interval so that and so that . From this partition define
and note that the brackets cover . Moreover note that from Assumptions 3 and 7 the class has an envelope which we shall term . It now follows that
for all , where the last inequality follows from direct application of Markov’s generalized inequality. Similarly for and for all
Using Markov’s inequality we get . Furthermore from Assumptions 4 and 5 there exists a constant such that for all and
Consequently, by choosing , we obtain the following bound for
Combining all the bounds we conclude that there exists and such that for all , . A rescaling then proves (B.1).
For the second part of the lemma note that is bounded as is and . Finally according to van der Vaart (1998, Example 19.7), the bracketing number of is less than of the order . Adding up we see that the bracketing number of is less than of the order .
Theorem 1
Assume that is the unique solution of . Then an approximate root satisfying is consistent for .
(i).
Let denote the supremum norm. Since is the unique solution to and is compact it follows that for any fixed , there exists a such that . If we can show that
(B.2) 
then the consistency of follows.
We first show that
Comments
There are no comments yet.