Quantile regression on inactivity time

by   Lauren C. Balmert, et al.
Northwestern University

The inactivity time, or lost lifespan specifically for mortality data, concerns time from occurrence of an event of interest to the current time point and has recently emerged as a new summary measure for cumulative information inherent in time-to-event data. This summary measure provides several benefits over the traditional methods, including more straightforward interpretation yet less sensitivity to heavy censoring. However, there exists no systematic modeling approach to inferring the quantile inactivity time in the literature. In this paper, we propose a regression method for the quantiles of the inactivity time distribution under right censoring. The consistency and asymptotic normality of the regression parameters are established. To avoid estimation of the probability density function of the inactivity time distribution under censoring, we propose a computationally efficient method for estimating the variance-covariance matrix of the regression coefficient estimates. Simulation results are presented to validate the finite sample properties of the proposed estimators and test statistics. The proposed method is illustrated with a real dataset from a clinical trial on breast cancer.



There are no comments yet.


page 1

page 2

page 3

page 4


Smoothed quantile regression for censored residual life

We consider a regression modeling of the quantiles of residual life, rem...

Quantile Regression Modeling of Recurrent Event Risk

Progression of chronic disease is often manifested by repeated occurrenc...

Estimation and Inference for Multi-Kink Quantile Regression

The Multi-Kink Quantile Regression (MKQR) model is an important tool for...

Homogeneity Test for Functional Data basedon Data-Depth Plots

One of the classic concerns in statistics is determining if two samples ...

Multi-Kink Quantile Regression for Longitudinal Data with Application to the Progesterone Data Analysis

Motivated by investigating the relationship between progesterone and the...

On an extension of the promotion time cure model

We consider the problem of estimating the distribution of time-to-event ...

Deep Learning for Quantile Regression: DeepQuantreg

The computational prediction algorithm of neural network, or deep learni...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Time-to-event data can be encountered in many research areas such as engineering, economics, medicine, and social sciences. Statistical methods to analyze time-to-event data mainly utilize cumulative information up to the time of analysis while there also exists a long history of statistical methods for residual information such as mean residual life (Csörgö and Csörgö, 1987), or life expectancy (Deevey, 1947). On the other hand, inactivity time (Nanda et al., 2003; Li and Lu, 2003), also known as reversed residual life, has recently emerged as a new summary measure for cumulative information inherent in censored time-to-event data under the name of lost lifespan or life lost specifically for mortality data (Balmert and Jeong, 2016). Earlier Andersen (2013) defined “years lost” as a subtraction of the restricted mean lifetime from a prespecified time point under competing risks and extended it to a regression setting. The concept of inactivity time can be broadly applied to many research areas that involve time-to-event data such as survival analysis, reliability, and engineering. However, no systematic modeling approach exists to infer the quantile inactivity time under censoring, adjusting for confounding factors, in the literature.

When the primary outcome for a study is time to an event of interest, the popular hazard rates or survival probabilities may be compared between groups based on the cumulative information, say, up to year 10. On the other hand, the distributions of the remaining years beyond year 10 may be also compared, which might be an alternative summary measure with more intuitive and straightforward interpretation, but they can be heavily influenced by censored observations toward the tail of the distribution. The inactivity time is defined as the time lost due to an event that has occurred before a given time point (see Figure 1). A clear distinction between residual lifetime and inactivity time is that the subgroup of subjects targeted by the inactivity time analysis consists of subjects who do not survive up to time , while the subgroup addressed by the residual lifetime only consists of subjects who survive beyond time . Therefore, major advantages of using the inactivity time, or life lost, in survival analysis would be that (i) it is a new way of summarizing time-to-event data in terms of lifetime lost rather than using the hazard function, survival probability and its inverse as quantiles, or residual life and (ii) provides straightforward interpretation as it has a time dimension like days, weeks, or years lost before a given time point rather than more mathematical quantities like the hazard function defined as the limiting conditional probability of instantaneous failure rate or other limiting concepts of probability.

The quantity of life lost may be carefully interpreted in two ways; (i) for data analysis and (ii) for prediction. First, for the purpose of data analysis, suppose a clinical trial on a disease was performed, and data were collected on various patient characteristics, together with treatment group, time to an event of interest, and event status for a study period. To analyze this type of observed data set with some data points being right censored, the proposed quantile regression can provide a panoramic view of the treatment effect on years lost due to the event of interest, adjusted for some confounding factors, as the conditioning time progresses. In case of prediction, however, more care is needed since we are conditioning on a future time point of . For example, it can be stated carefully such as “If a patient fails within years after diagnosis of a disease, the median of the distribution of years lost would be years, so that if the patient gets treated with this medicine, it would decrease the years lost by years". Of course, the value of can vary for different practical scenarios, or a careful residual life analysis can be used in parallel to estimate a minimum value of as the estimated median residual lifetime of that particular patient.

Another potential application of the proposed model might be to epigenetic data from studies on biological age where subjects are treated with medications and followed until the end of the study period, at which time subjects’ genomes are measured to evaluate how many days, weeks, and years were reversed in biological clock (Fahy et al., 2019). This type of data would be different from the usual survival data in that events are not occurring as time progresses, but identified at the end of follow-up period as reversed biological clock. To maintain the feature of survival data, there could still be censored observations due to lost to follow-up, in which case the only available information would be that the length of reversed biological age would not have reached the observed censoring time point from the last follow-up. Under the setting of the proposed model, time-to-event can be defined as time from study entry to the time point where the age reverse has reached, and inactivity time can be renamed as reversed lifetime in this case.

Figure 1: Description of inactivity time (dashed lines with left arrows) and residual lifetime at a fixed time point

Quantile regression, originally developed by Koenker and Basset (1978), is a well-studied extension of the linear regression (Portnoy and Koenker, 1997). Methods have been also established for time-to-event data in the presence of censoring (Ying et al., 1995; Lindgren, 1997; McKeague et al., 2001; Yin and Cai, 2005; Peng and Huang, 2008). More recently, covariate effects on residual life were examined under the parametric proportional hazards and accelerated life models (Rao et al., 1992), and Bayesian modeling was also considered on the median residual life (Gelfand and Kottas, 2003). Jung et al. (2009) developed a method of the quantile regression on residual life, which has been extended to cause-specific quantile residual life regression (Lim and Jeong, 2015), and more recently to methods allowing for dynamic predictions (Li et al., 2016). In this paper, we propose a regression method on the quantiles of the distributions of the inactivity time adjusting for potential confounding factors under right censoring.

In Section 2, we define the quantile inactivity time function and provide the notation to be used throughout the paper. In Section 3, proposed are an estimating equation, variance estimator, and test statistics for the regression parameters. In Section 4, the proposed method is assessed via simulation studies, which is applied to a breast cancer dataset in Section 5. Finally, we provide concluding remarks in Section 6.

2 Quantile Inactivity Time Function

Throughout the paper, and will denote the potential event time and censoring time for the subject with survival functions of and

, respectively. The random variable

will represent the observed survival time as the minimum of and , and will be an event indicator if . The censoring distribution to be used in our estimating equation later will be estimated by the Kaplan-Meier estimator (Kaplan and Meier, 1958) denoted . We will assume independence between and .

The inactivity time, defined specifically as lost lifespan for mortality data in Balmert and Jeong (2016), considers the time lost due to an event occurring prior to a specified time point, . Here can be chosen such that the cumulative information up to can be statistically meaningful in terms of number of events as well as clinically meaningful in terms of milestones during the disease treatment period such as 5-, 10-, or, 15-year cancer-free survival as in the routine time-to-event analysis.

Let us define the -percentile of the inactivity time distribution as

Then satisfies or equivalently

which can be rewritten in terms of the survival function as

Here given observed data and , can be nonparametrically estimated after replacing with its consistent estimator (Balmert and Jeong, 2017). Practically, in clinical intervention studies, researchers might be interested in knowing what would be a robust measure of the center of the distribution of life years the patients lost due to their deaths given the data up to 5 years. These measures can be also compared among intervention groups to infer the intervention effect of a study drug with or without adjusting for confounding factors. The purpose of this paper is to propose the following log-linear quantile regression model for inactivity time to :



is a vector of the regression coefficients,

, and is a vector of covariates for the individual, . Here the regression parameter can be interpreted as the difference of the two quantile inactivity times on a log-scale when the corresponding covariate is binary. For a continuous covariate, it can be interpreted as an increment or decrement of the quantile inactivity time on a log-scale when the associated covariate increases by one unit.

3 Estimation and Inference

Since model (1) implies

we have

Assuming conditional independence between and given and the independence between and , which often occurs under administrative censoring in randomized clinical trials, it holds that


Note that the independence assumption between and can be relaxed so that can be replaced by through some additional regression modeling of given . Similarly we have


Now that, given observed data, the events and are equivalent to and respectively, equations (2) and (3) imply

Therefore the regression parameter can be estimated from the following equation under right censoring:


where is the Kaplan-Meier estimate (Kaplan-Meier, 1958) of the censoring distribution based on the observed data (, assuming and are independent. Note that the equation (4) is the estimating equation for the weighted quantile regression with the weight function of . More specifically, an individual term in the estimating equation (4) takes the form of , where and , which is the first derivative of the check function, i.e.

. The check function can be minimized by using a linear programming simplex-based method such as Barrodale-Roberts algorithm (Barrodale and Roberts, 1973), which was implemented as the default in the function

rq() with the weight option in the R package quantreg.

Suppose is the true value in the interior of a bounded convex region. Define


where denotes all observed data from the subject and

where is the limiting value of the risk process for all subjects and is the martingale process of the censoring time for the subject. Also define where

and . Then the following theorem states the uniform consistency of and asymptotic normalities of the estimating equation (4) and .

Theorem.  Under the regularity conditions that (i) there exists such that and , (ii) (Peng and Fine, 2009) and with probability 1, and (ii) the conditional probability density function of given is uniformly bounded,

  1. , a.s. as .

  2. follows a zero-mean Gaussian process in , with the variance-covariance matrix of where

  3. weakly converges to a Gaussian process with the variance-covariance matrix of


where is a different value of .

Assumptions in (i) are often satisfied in the presence of adminstrative censoring. In the general case, can be truncated by , with being chosen as a constant slightly smaller than the observed upper bound of ’s support, in which case those assumptions hold. As long as is only slightly smaller than , we expect truncating would incur very minimal information loss.

The form of the variance-covariance matrix in (5) is different from those of the standard likelihood theory or the Cox’s partial likelihood approach, where the inverse of the negative Hessian matrix is the corresponding variance-covariance matrix.

Under the regularity conditions (i)-(iii), we prove the consistency of in Appendix A, and establish the asymptotic normalities of the estimating equation and the proposed estimator in Appendix B.

Under the null hypothesis of

, a test statistic for the global test can be constructed based on the asymptotic distribution of the estimating function in (4) as

which approximately follows a -distribution with degrees of freedom, where is the number of covariates. However, a test statistic for a subset of , e.g. , would also include the remaining parameters not being tested as nuisance parameters. A variation of the minimum dispersion statistic (Basawa and Koul, 1988) can be adopted to eliminate the nuisance parameters, but the computational burden could be enormously heavy especially when a large number of regression coefficients are included in the model.

For this reason and also to avoid estimation of the probability density function of under censoring, we have employed a perturbation method (Jin et al., 2001) to estimate the limiting distribution of

, from which confidence intervals could be obtained using the normal approximation of

. Specifically, the weight function in the estimating equation (4

) was perturbed by a set of independent random variates from the unit exponential distribution, i.e.

, and the regression parameters were estimated from

where is obtained from perturbing the indicator functions for the risk sets and the event indicators in by the same exponential variates. Given data, the random variates were repetatively generated and a large number of realizations of were obtained. Following the arguments of Jin et al. (2001), we can show that the conditional distribution of given the observed data is asymptotically equivalent to the unconditional distribution of as a process of . For fixed , the variance-covariance matrix of can be estimated by the sample variance-covariance matrix of ’s, which can be used to infer an individual or a subset of the regression coefficients.

4 Simulation Studies

Several simulation studies were performed to assess the performance of the proposed estimators and test statistics with finite samples. We generated data from a parametric proportional hazards model (Cox, 1972) with a Weibull distribution as the baseline distribution and one group indicator as a covariate. Thus, the true survival function follows


where the Weibull parameters and are set to be 0.2 and 2, respectively, througout the simulation studies and is the regression parameter associated with the group indicator ( for the control and for an intervention). Under the parametric Cox model (6), the true median inactivity time equals


Potential censoring times

were generated from a uniform distribution on

, where and were chosen to render the desired censoring proportions. Observed survival times were then determined as the minimum of potential failure times and potential censoring times, i.e. .

First, we evaluate the estimation performance of our proposed method. The true values of in (7) when would be the same for both control and intervention groups as 10.8, 9.8, 8.8, and 7.8 at = 15, 14, 13, and 12, respectively. Let us consider a simple log-linear median regression model for inactivity time,


where is a binary covariate indicating intervention group or control group , and and are the intercept and a regression coefficient associated with , respectively. Following the invariance property of the log-transformation, the model is equivalent to

implying that and can be interpreted as the median inactivity time in the control group and in the intervention group, respectively. Thus, the difference in median inactivity times between two groups is given by , and the ratio of two inactivity times by , so that testing a null hypothesis of will be equivalent to testing whether the ratio of two median inactivity times equals 1.

In order to evaluate our parameter estimates, we compare to 0 and to the logarithm of the true median inactivity time from (7) under . At time point 15, for example, the true median inactivity time of 10.8 corresponds to = 2.38 and = 0 under the simple log-linear regression model (8). As described in Section 3, we estimated the regression coefficients using the rq() function with the weight function of . Then, we used the perturbation method to estimate the variance-covariance matrix of and construct confidence intervals for using the normal approximation. Four hundred (400) perturbations were implemented for each simulation.

Table 1 displays the results based on 1000 simulations with 200 observations per group. The bias and standard deviation of the parameter estimates were used to evaluate the empirical distribution of

and given various ’s (15, 14, 13, and 12) and censoring proportions (10%, 20%, and 30%). For each simulation, the SE’s for the parameter estimates were calculalted from 400 perturbations, which were used to construct confidence intervals for the true parameters. The average of those 1,000 SE’s are presented under the column of “ASE". One can notice that the biases are minimal under all scenarios, and the ASE’s are overall close to SD’s. Table 1 also presents the median inactivity time estimates for control and intervention groups. As the censoring proportion increases, the differences between parameter estimates and their true values slightly increase. The empirical standard deviations also inflate as the censoring proportion increases and as decreases.

Bias() SD() ASE() Bias() SD() ASE()
15 10 0.0005 0.0291 0.0296 0.0001 0.0428 0.0438 10.843 10.844
20 0.0005 0.0314 0.0312 0.0004 0.0451 0.0466 10.843 10.847
30 -0.0012 0.0332 0.0340 0.0011 0.0494 0.0500 10.825 10.837
14 10 0.0015 0.0328 0.0330 -0.0019 0.0466 0.0482 9.853 9.834
20 0.0007 0.0350 0.0347 0.0006 0.0513 0.0514 9.845 9.851
30 -0.0004 0.0365 0.0370 0.0002 0.0561 0.0552 9.835 9.836
13 10 -0.0004 0.0351 0.0361 -0.0014 0.0519 0.0530 8.837 8.825
20 -0.0013 0.0359 0.0389 -0.0007 0.0532 0.0577 8.829 8.823
30 -0.0021 0.0412 0.0414 0.0004 0.0602 0.0612 8.822 8.826
12 10 -0.0004 0.0391 0.0406 -0.0010 0.0567 0.0602 7.843 7.836
20 -0.0033 0.0427 0.0431 0.0017 0.0625 0.0646 7.821 7.834
30 -0.0003 0.0455 0.0467 -0.0005 0.0659 0.0691 7.844 7.840
Table 1: Bias and standard deviation of the empirical estimates of true regression parameters = 2.38, 2.29, 2.18, and 2.06 and =0 at = 15, 14, 13, and 12; , estimated median inactivity time in control group; , estimated median inactivity time in intervention group; c%, censoring proportion

We then assessed the proposed test statistic in terms of rejection probabilities of the null hypothesis of at a two-sided significance level of 0.05 for different values of , given various ’s (15, 14, 13, and 12), censoring proportions (10%, 20%, and 30%), and sample sizes (100 and 200). The rejection probability was calculated as the mean, over the 1,000 simulations, of the proportions that 95% confidence intervals from 400 perturbations do not include the null value of . Therefore, the column under

in Table 2 displays type I error probability for testing the null hypothesis of

. For power analysis, we have generated data under the parametric proportional hazards model in (6) by increasing the value of to induce differences between control and intervention groups. We set the true coefficient , and in (7), which is equivalent to increasing the differences in median inactivity time between control and intervention by 1, 2, and 3. The results are displayed in Table 2. Empirical type I error probabilities are generally close to 0.05 regardless of different censoring proportions or sample sizes. Power decreases as decreases since less observations are included in the analysis, and increases as decreases, indicating a greater power to detect a larger difference between groups. Power decreases slightly as the censoring proportion increases, but we still have reasonable power to detect small absolute differences under heavy censoring with a smaller sample size of 100. Power also increases as sample size increases, as expected.

0.0 -0.44 -0.82 -1.18 0.0 -0.44 -0.82 -1.18 0.0 -0.44 -0.82 -1.18
15 10 0.041 0.363 0.823 0.969 0.052 0.676 0.994 1.000 0.051 0.999 1.000 1.000
20 0.036 0.287 0.701 0.940 0.043 0.559 0.950 0.999 0.038 0.997 1.000 1.000
30 0.045 0.210 0.614 0.854 0.046 0.417 0.842 0.999 0.055 0.981 1.000 1.000
14 10 0.044 0.351 0.804 0.946 0.055 0.670 0.992 1.000 0.054 0.999 1.000 1.000
20 0.040 0.303 0.686 0.923 0.036 0.589 0.936 1.000 0.049 0.999 1.000 1.000
30 0.038 0.219 0.599 0.821 0.039 0.372 0.827 0.99 0.041 0.981 1.000 1.000
13 10 0.048 0.357 0.792 0.936 0.046 0.639 0.981 0.999 0.050 0.999 1.000 1.000
20 0.041 0.262 0.662 0.885 0.034 0.580 0.926 0.998 0.041 0.997 1.000 1.000
30 0.043 0.177 0.567 0.783 0.041 0.398 0.801 0.985 0.041 0.977 1.000 1.000
12 10 0.032 0.340 0.749 0.888 0.044 0.597 0.977 1.000 0.047 0.999 1.000 1.000
20 0.039 0.294 0.597 0.830 0.034 0.536 0.886 0.992 0.043 0.977 1.000 1.000
30 0.044 0.174 0.523 0.737 0.039 0.367 0.766 0.977 0.051 0.967 1.000 1.000
Table 2: Empirical rejection rates for values of

5 Application

In this section, we apply the proposed estimation procedure and test-statistic to a real dataset from a clinical trial on breast cancer, i.e. NSABP (National Surgical Adjuvant Breast and Bowel Project) B-04 dataset (Fisher et al. 2002), which contains survival information on 1,665 breast cancer patients. The primary outcome of interest in this analysis is time to death. In addition to follow-up information, surgery type, and nodal status, the dataset also contains other covariates including age at diagnosis and pathological tumor size. In our analysis, we consider the following covariates: nodal status as a binary covariate with 0 for node-negative and 1 for node-postive, and both age at diagnosis and pathological tumor size as continuous covariates. There were 1,079 node-negative women and 586 node-positive women. Age at diagnosis ranged from 20 to 87 years with the mean of 55.4, and pathological tumor size ranged from 0 to 250mm with the mean of 34.1mm. Additionally, the median follow-up was 26 years with the overall censoring proportion of 23%. In the models, the continuous covariates were multiplied by 0.01, for computational convenience. In our analysis, the main interest is how many more years the node-positive patients are expected to lose compared to the node-negative patients at various time points after surgery, adjusted for age at diagnosis and tumor size. In this particular cancer mortality dataset, the inactivity time, specifically referred to as lost lifespan in this section, is defined as the number of years lost due to death following a surgery. Our goal is to infer the effects of covariates on the median (or a quantile) of the lost lifespan distribution, and predict the median lost lifespan adjusting for significant covariate effects.

First, we used the proposed method to evaluate the significance of nodal status in the univariate log-linear quartile regression model (

1) () that only includes nodal status as a covariate. The test statistic was calculated at 3 time points ( = 15, 20, and 25 years after surgery). Table 3 summarizes the results, including the parameter estimates for the intercept and for the effect of nodal status, and their 95% confidence intervals calculated from the perturbation method. Significance of the nodal status parameter was indicated by a 95% confidence interval not containing 0. Note that regardless of different time points specified, the quartile lost lifespans were significantly different between the two nodal groups. The node positive group had consistently longer quartile lost lifespans across all time points indicating worse prognosis in survival. The difference between nodal status groups also increased as time point increased or decreased. The results from the simple log-linear median () regression model presented here are also consistent with the ones from the two-sample test statistic proposed in Balmert and Jeong (2016).

= 15 = 20 = 25
0.25 1.71 (1.60, 1.83) 2.03 (1.90, 2.16) 2.23 (2.07, 2.38)
0.30 (0.15, 0.45) 0.33 (0.16, 0.51) 0.42 (0.22, 0.63)
0.50 2.25 (2.20, 2.30) 2.58 (2.54, 2.63) 2.81 (2.76, 2.87)
0.12 (0.04, 0.20) 0.13 (0.06, 0.19) 0.15 (0.08, 0.23)
0.75 2.50 (2.47, 2.52) 2.81 (2.79, 2.83) 3.05 (3.03, 3.07)
0.07 (0.04, 0.11) 0.07 (0.04, 0.10) 0.07 (0.05, 0.10)
Table 3: Parameter estimates and 95% confidence intervals from the univariate log-linear quartile () regression models

Now we extend our analysis to a log-linear quartile regression model containing nodal status, age at diagnosis, and pathological tumor size as covariates. Using similar notations as before, let and

denote the effects of additional covariates, age at diagnosis and pathological tumor size, respectively. Each covariate was tested separately for its significance using the confidence interval approach as previously described. The parameter estimates and corresponding 95% confidence intervals are shown in Table 4. Except the median regression analysis at

, the nodal status remained statistically significant in all the other multivariate models. Additionally, the difference between node-negative and node-positive groups increased as increased, similarly to the results from the simple log-linear median regression models, except the 3 quartile () regression model. Age at diagnosis was mostly significant except the 1 quartile () regression model at while pathological tumor size was consistently significant in all models. The proposed regression model allows for predicting a patient’s median lost lifespan for a given time point based on significantly important factors, i.e. nodal status and age at diagnosis. For example, a 30-year old woman with positive lymph nodes and tumor size of 50mm is expected to have a median lost lifespan of 17.6 years ( at 20 years after diagnosis. In comparison, a 30-year old patient with negative lymph nodes and tumor size of 50mm is expected to have a median lost lifespan of 16.3 years at 20 years after diagnosis.

= 15 = 20 = 25
0.25 2.29 (1.87, 2.71) 2.87 (2.15, 3.23) 2.51 (1.95, 3.06)
0.24 (0.09, 0.38) 0.25 (0.09, 0.41) 0.37 (0.15, 0.58)
-1.31 (-2.02, -0.60) -1.55 (-2.12, -0.99) -0.68 (-1.44, 0.08)
0.47 (0.19, 0.74) 0.29 (-0.09, 0.67) 0.36 (-0.04, 0.76)
0.50 2.57 (2.40, 2.74) 2.86 (2.69, 3.02) 3.00 (2.87, 3.12)
0.06 (-0.01, 0.13) 0.08 (0.01, 0.15) 0.12 (0.07, 0.18)
-0.70 (-0.94, -0.46) -0.62 (-0.85, -0.40) -0.45 (-0.64, -0.25)
0.24 (0.08, 0.39) 0.24 (0.09, 0.39) 0.24 (0.12, 0.36)
0.75 2.55 (2.49, 2.62) 2.87 (2.81, 2.93) 3.11 (3.06, 3.17)
0.05 (0.02, 0.08) 0.05 (0.02, 0.08) 0.05 (0.02, 0.08)
-0.20 (-0.31, -0.09) -0.18 (-0.27, -0.08) -0.17 (-0.26, -0.08)
0.16 (0.09, 0.22) 0.13 (0.07, 0.19) 0.12 (0.05, 0.19)
Table 4: Parameter estimates and corresponding 95% confidence intervals from the multivariate log-linear quartile regression models () using the proposed perturbation method

6 Conclusions

The inactivity time, or lost lifespan specifically for mortality data, is a simple summary measure for time-to-event data that provides more straightforward interpretation yet is less sensitive to right ensoring compared to residual life. In this paper, we proposed a new regression method for analyzing covariate effects on the quantiles of the distribution of inactivity time. Asymptotic properties were derived for the regression parameter estimators and test statistics. Simulation studies validated the estimation and inference procedure under various scenarios, and the proposed method was illustrated with an application to a breast cancer dataset. The proposed model does not have strong assumption like proportional hazards, and provides a new and sensible perspective to understand treatment effects or covariate effects, so that it can be a useful alternative in survival modeling.

Even if a direct comparison between the proposed model and the popular Cox’s proportional hazards model would not be fair due to different model assumptions and simply because they are different summary measures of time-to-event data, both approaches could be useful for clinicians from different perspectives to communicate intervention options to patients. Another candidate model to be compared would be accelerated failure time (AFT) model (Kalbfleisch and Prentice, 2002), which is a log-linear model in failure time that is also different from our proposed model in this paper in both model assumptions and definitions of the summary measure. Possible extensions of the proposed model would be to include time-dependent covariates, competing risks, and random effects, which will merit future research.


Dr. Li’s research was supported in part by NIH grant 1R01DK117209. Dr. Peng’s research was supported in part by NIH grant R01HL-113548. Dr. Jeong’s research was supported in part by National Institute of Health (NIH) grant 5-U10-CA69651-11.

Appendix A: Consistency of

We start by defining

which is equivalent to


since the events and are equivalent to and , respectively, as introduced following the equation (3) in Section 3. When is replaced with , the true value in the interior of a bounded convex region D, the above equation reduces to 0 approximately. Following Csörgö and Horváth (1983), we know that for all ,

where is a constant satisfying and , with probability 1. This can be used to show that for ,



it follows that


Using the nonexistent mean value theorem (MEMVT) for vector-valued function (Feng et al., 2013) around and letting be some point between and , we have


where being the probability density function of , and hence is nonpositive definite. From the definition of , we know , and so by (10) will converge to 0, almost surely, as . Also with from (9) and being negative definite, equation (11) gives , a.s. as .

Appendix B: Asymptotic Normality of and

Recall that we have the estimating equation

Also define

We first derive the limiting distribution of , where


The first item, , clearly follows a zero-mean Gaussian process, because of the fact that is Donsker, where

The Donsker’s property holds because the class of indicator functions is Donsker, and due to the preservation properties of the Donsker’s class (Section 9.4, Kosorok, 2008).

Next, let us define , , , , , and is the martingale process of the censoring time for the subject. From Pepe (1991), we have

where denotes the influence function of and has been shown to be Donsker (Peng and Fine, 2009). It immediately follows that