A novel calibration framework for survival analysis when a binary covariate is measured at sparse time points

by   Daniel Nevo, et al.

The goals in clinical and cohort studies often include evaluation of the association of a time-dependent binary treatment or exposure with a survival outcome. Recently, several impactful studies targeted the association between aspirin-taking and survival following colorectal cancer diagnosis. Due to surgery, aspirin-taking value is zero at baseline and may change its value to one at some time point. Estimating this association is complicated by having only intermittent measurements on aspirin-taking. Naive, commonly-used, methods can lead to substantial bias. We present a class of calibration models for the distribution of the time of status change of the binary covariate. Estimates obtained from these models are then incorporated into the proportional hazard partial likelihood in a natural way. We develop nonparametric, semiparametric and parametric calibration models, and derive asymptotic theory for the methods that we implement in the aspirin and colorectal cancer study. Our methodology allows to include additional baseline variables in the calibration models for the status change time of the binary covariate. We further develop a risk-set calibration approach that is more useful in settings in which the association between the binary covariate and survival is strong.



page 1

page 2

page 3

page 4


Cox Model with Covariate Measurement Error and Unknown Changepoint

The standard Cox model in survival analysis assumes that the covariate e...

X-CAL: Explicit Calibration for Survival Analysis

Survival analysis models the distribution of time until an event of inte...

Yakovlev Promotion Time Cure Model with Local Polynomial Estimation

In modeling survival data with a cure fraction, flexible modeling of cov...

Spline Analysis of Biomarker Data Pooled From Multiple Matched/Nested Case-Control Studies

Pooling biomarker data across multiple studies enables researchers to ge...

Statistical methods for biomarker data pooled from multiple nested case-control studies

Pooling biomarker data across multiple studies allows for examination of...

casebase: An Alternative Framework For Survival Analysis and Comparison of Event Rates

In epidemiological studies of time-to-event data, a quantity of interest...

The Missing Covariate Indicator Method is Nearly Valid Almost Always

Background: Although the missing covariate indicator method (MCIM) has b...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

One benefit of the Cox proportional hazards (PH) model for analysis of time-to-event data (Cox, 1972) is the simplicity of including time-dependent covariates, while preserving desirable theoretical properties. Classical methods assume that time-dependent covariates are measured continuously. However, in practice, they are often measured intermittently, leading to bias in effect estimates if treated näively (Andersen and Liestøl, 2003; Cao et al., 2015). We consider a time-dependent binary covariate having zero value at baseline, that may change its value to one at some point, and once a change has occurred, the covariate retains this value for the rest of the follow-up time. Real-life scenarios of this nature are widespread, including the onset of a irreversible medical condition (e.g., HIV infection, Langohr et al., 2004) or a treatment with a constant effect that is administrated in a different time for each patient (Austin, 2012). For example, Goggins et al. (1999) described data arising from a clinical trial where the goal was to study the effect of Cytomegalovirus (CMV) shedding on risk of developing active CMV disease.

In the problems motivating this paper, the researchers were interested in the association between regularly taking aspirin, henceforth simply referred to as aspirin-taking, and the survival time following the diagnosis of colorectal cancer (CRC). A series of studies published in leading clinical journals (Chan et al., 2009; Liao et al., 2012; Hamada et al., 2017)

provided evidence for differential association between post-diagnosis aspirin-taking and the survival time according to tumor subtype classification, which is a baseline variable coded as a categorical variable. The main dataset was obtained from two cohort studies: the Nurses’ Health Study (NHS) and the Health Professionals Follow-Up Study (HPFS). Following their enrollment to the studies, participants have been receiving questionnaires biennially and answering questions about life-style and other participants’ characteristics. Patients diagnosed with colorectal cancer who had been taking aspirin typically stop taking aspirin at the time of diagnosis as part of their preparations to surgery. Post-diagnosis aspirin-taking is a time-dependent binary covariate, and the data about it is incomplete. The time a participant started to take aspirin is known to lie within the interval between the time of the last questionnaire answered as no aspirin-taking and the time of the first questionnaire answered as aspirin-taking. A popular strategy is to replace unknown values of the time-dependent covariate with the last available value. Another alternative is to impute the change-time, the time of change in the covariate value from 0 to 1, using the middle point of the observed interval.

There is little existing literature relevant to this statistical challenge. For time-to-event data, Andersen and Liestøl (2003) studied the attenuation in effect estimates caused by having infrequently measured covariates and, more recently, Cao et al. (2015) developed kernel-based weighted score methods. Goggins et al. (1999) considered the problem of interval-censored change-time of a binary covariate and proposed an EM algorithm to estimate the association between the binary covariate and a survival time. Langohr et al. (2004) developed a parametric log-linear survival model for the interval-censored change-time problem. None of these methods allow to use covariates that are informative about the change-time of the binary covariate.

In this paper, we propose a novel analysis framework for the problem of interval-censored change-time and suggest a two-stage approach. In the first stage, we fit a model for the status-changing time of the binary covariate. This model may be nonparametric, semiparametric or fully parametric, and may include baseline covariates that affect the change-time. For this first stage, we exploit existing methods, theory and efficient algorithms for interval-censored data (Sun, 2007; Turnbull, 1976; Finkelstein, 1986; Huang, 1996; Anderson-Bergman, 2016; Wang et al., 2016). In the second stage, we incorporate the first-stage calibration model into the main PH model. We construct a partial likelihood using the conditional hazard with respect to the available history, which, in the CRC and aspirin dataset, includes the timing of the last questionnaire, corresponding aspirin-taking status and baseline covariates.

Unlike existing methods, our flexible approach allows to include variables potentially related to the aspirin-taking status, utilizing the available data and subject-matter knowledge in a novel way. In addition, participants without data about post-diagnosis aspirin status, due to e.g., death or censoring before submitting answers for the first post-diagnosis questionnaire, can be included in the analysis. Existing methods (e.g., Goggins et al., 1999) exclude these participants from the analysis.

Our goal in this paper is three-fold. First, we develop a conceptual framework for the analysis of time-to-event data under the common practice of infrequent updates of a binary covariate of interest. Second, we present a rigorous analysis of data arisen from several impactful studies in the area of colorectal cancer. Third, we provide the R package ICcalib that implements our flexible methodology under a wide range of models.

The rest of the paper is structured as follows. In Section 2, we describe the motivating studies. In Section 3, we define the main model, and in Section 4, we present our calibration and risk-set calibration methods. Section 5 contains asymptotic properties where proofs are deferred to the Appendix and the Supplementary Materials. In Section 6, we describe our simulation study. In Section 7, we present the analysis of the CRC data and in Section 8, we reanalyze the CMV data of Goggins et al. (1999). We end with concluding remarks in Section 9.

2 Data description

The data were formed from two large cohorts: the NHS, to which 121,701 female nurses enrolled in 1976, and the HPFS that began in 1986 with the enlistment of 51,529 males in various health professions. Description of the studies, the data collection process and eligibility conditions for inclusion in the CRC survival studies can be found in Chan et al. (2009); Liao et al. (2012) and Hamada et al. (2017).

Participants have been receiving questionnaires biennially. During each questionnaire cycle, participants returned their answers in varying times. CRC researchers are interested in the first 5 or 10 years following the diagnosis since, conditionally on 10-year survival, the survival rate among CRC patients is similar to the survival rate in the CRC-free population. In our analyses, we considered 10 years follow-up. About 85% of stage 4 participants died within the first 5 years (163 of 190), and 108 of them (57%) had no data on aspirin-taking. Therefore, in order to not bias or destabilize the analysis results, we limited our analyses to the 1,371 participants with stage 1-3 CRC or missing stage data. In the span of 10 years, 249 CRC-related deaths were observed among these participants. Table 1 presents basic descriptive statistics of the main variables. Data on post-diagnosis aspirin use were not available for 113 participants due to loss to follow-up or death before answering any post-diagnosis questionnaire. Our methodology allowed us to include them in the analysis.

All Data low CD274 PIK3CA PTGS2
(No. Events) 1371 (249) 278 (50) 171 (28) 672 (125)
Age at diagnosis: Mean (SD) 69.4 (9) 69.4 (8.97) 69.9 (9) 68 (8.73)
CRC Stage
I 375 (27%) 74 (27%) 55 (32%) 177 (26%)
II 453 (33%) 99 (36%) 62 (36%) 218 (32%)
III 387 (28%) 74 (27%) 44 (26%) 199 (30%)
Missing 156 (11%) 31 (11%) 10 (6%) 78 (12%)
Pre-diagnosis Aspirin Status
Taking 587 (43%) 118 (42%) 74 (43%) 272 (40%)
Non-taking 784 (57%) 160 (58%) 97 (57%) 400 (60%)
No. Available Questionnaires: Mean (SD) 3 (1.7) 3.1 (1.66) 3.2 (1.65) 3.3 (1.68)
No. Participants with no Questionnaires 113 21 11 47
Table 1: Summary of the main variables in the dataset

In the three CRC studies (Chan et al., 2009; Liao et al., 2012; Hamada et al., 2017), evidence was presented for differential association of aspirin-taking with CRC mortality across the following molecular subtypes, by considering the interaction terms of the aspirin-taking status and these molecular subtypes. The three molecular subtypes were PTGS2 (cyclooxygenase-2) overexpression (Chan et al., 2009), PIK3CA mutation (Liao et al., 2012) and low CD274 (PD-L1) expression (Hamada et al., 2017). These subtype definitions are not exclusive; a tumor may be included in zero, one, two or three of these subtypes. The goal of our analyses was to evaluate the association between post-diagnosis aspirin and CRC-related mortality within the subpopulation of each tumor subtype. Therefore, each analysis included different, although overlapping, subgroup of patients with the subgroup defined by the molecular subtype for which evidence for aspirin effect was previously revealed. We also included an analysis of all tumors, regardless of the tumor subtype classification.

The post-diagnosis time when diagnosed participants started taking aspirin varied. Figure 1 presents estimated time-to-start-taking-aspirin curves, for the entire sample (black, middle curve) and by pre-diagnosis aspirin-taking status (red, top curve and blue, bottom curve). The figure depicts the non-parametric maximum likelihood estimator (NPMLE) and the survival curve estimated assuming Weibull distribution, both calculated from the interval-censored data about post-diagnosis aspirin-taking (Sun, 2007). With the exception of emergency surgeries, patients diagnosed with CRC were asked to stop taking aspirin to prepare to the surgery, and were allowed to resume aspirin few days after the surgery. It is likely that patients who had been taking aspirin prior to CRC diagnosis were more inclined to start taking aspirin, once it was possible. This is consistent with the rapid drop right after the baseline in the curve for pre-diagnosis aspirin-takers in Figure 1.

Figure 1: Survival curve estimation of time to start-taking-aspirin for the entire sample (in black, middle curve) and by pre-diagnosis (Predx) aspirin-taking status (takers vs non-takers; bottom red vs top blue curves). Solid lines are the NPMLE, dashed lines are survival curves obtained from Weibull model fitting.

3 The main model

Let be the time to event of interest, in the CRC data time-to-death following CRC diagnosis. is possibly right censored by a censoring time . We assume that, conditionally on measured baseline covariates, and are independent, and let and be the observed time and censoring indicator, respectively. Of main interest is the association of the monotone nondecreasing binary covariate with . For presentational simplicity, we henceforth refer to as the exposure, although it can be a treatment or any other binary covariate. Let be the change time in the value of . That is, is the first time when . As described in the introduction, and for all . In some applications, the exposure may change its value back to zero, at some point after . In studying aspirin-taking this option is often neglected. This is done to avoid misinterpretation of the results due to reverse causality, because deteriorating patients, who are more likely to die, may stop taking aspirin. In our study of aspirin-taking and survival among CRC patients, we can therefore refer to the exposure as ever aspirin-taking (since diagnosis).


be a column vector of covariates presumably associated with

. For simplicity of presentation, we will assume that is time-independent. The proposed methods can be straightforwardly applied to the scenarios when is time-dependent. The Cox PH model (Cox, 1972) for the hazard function of given and is


where is an unspecified baseline function and and are parameters to be estimated. We are mainly interested in . We refer to this PH model for as the main model. We assume that, given , are not informative about the hazard at time . The partial likelihood for is


where is an at-risk indicator.

If was known for all , we could simply set the time-dependent exposure to be equal to zero for all and to one after that time, i.e., . However, the data collected from questionnaires are often limited. The aspirin-taking status was measured only at a series of discrete time points: , with . If was not measured at any time point before , then . In the CRC data, these time points are the times when questionnaires were answered by the participants. When the main event is terminal, as in our data, questionnaire data available only until the event or censoring time; that is, . In other studies, e.g., the CMV data previously analyzed by Goggins et al. (1999), data on the exposure can be obtained even after the main event has occurred. The time points at which

was measured are either fixed or random. We take a working assumption that their distribution is independent of the other random variables presented so far. This assumption would not hold in certain situations. The degree of the deviation from this assumption, and its impact on the proposed methodology, will be discussed later in the paper.

Let denote the last time was measured before time ; if was not previously measured. If , then . However, if , then can be either one or zero. Therefore, the likelihood (2) cannot be calculated from the observed data. Two common methods for addressing this type of missing data are the last-value-carried-forward (LVCF) and midpoint imputation (MidI) methods.

The LVCF method imputes missing values as the last observed values. In our context, this is equivalent to setting . That is, a participant is assumed to not taking aspirin, until the first time she reports aspirin-taking. The MidI method imputes the changepoint (i.e., ) in the middle of the interval in which the changepoint is known to have occurred. For example, if a participant reported she was not taking aspirin 1 year after diagnosis, and then reported taking aspirin 2.5 years after diagnosis, then MidI would assume for that participant; LVCF would assume . These are ad hoc methods that may lead to substantially biased results. This calls for new conceptual framework and methodology development, which we now present.

4 A calibration approach

Let be the history of an at-risk participant until time , where is a possibly vector-valued covariates informative about , and reflects the fact that participant has been event-free so far. In our application, includes, among other covariates, the pre-diagnosis aspirin-use status, which is clearly informative about , as previously demonstrated in Figure 1. If a covariate affects both and , it is included in both and .

Under model (1), the hazard function conditional on is

following the assumption that conditionally on and

, the probability of event at time

is independent of , , and .

A similar derivation was presented for the case of measurement error in a time-dependent continuous covariate (Prentice, 1982), and is exploited by the “regression calibration” method (Wang et al., 1997; Ye et al., 2008; Liao et al., 2011). It is readily seen that is also a PH model (Prentice, 1982) and therefore we can consider the partial likelihood


Unlike the regression calibration method, the fact that is binary allows us to explicitly write the expectation in (3) as


where can be expressed using the distribution of , the time of exposure status change,


In words, the probability of a positive aspirin-taking status at time , conditionally on the personal history, equals to one, if the participant previously reported she is taking aspirin, and if she did not report aspirin-taking previously, it equals to the probability of a change in the aspirin-taking status between the last questionnaire time and time , conditionally on no aspirin-taking at time of last questionnaire. Combining (4) and (5), it is evident that is a functional of the distribution of conditionally on and . If the distribution of was known, we could have obtained valid estimates for and by substituting (4) and (5) into . However, the distribution of is unknown.

Let be without the survival information, i.e., . Our first proposal is to estimate and by applying two modifications to . First, we replace by in the expectations in (3) to get . The second modification involves replacing expectations of the form by estimators . Our ordinary calibration (OC) estimator is the maximizer of


with respect to and . Noting the simplicity of this likelihood function, maximization can be done in a straightforward way, e.g., using Newton-Raphson algorithm.

The expectation can be expressed as a functional of the distribution of , as in (4) and (5), with replaced by , and omitting in (5). Therefore, we first estimate the distribution of , and then calculate using the estimated distribution.

4.1 Calibration models fitted from interval-censored data

For each participant, the data available on include the previously mentioned measurement times , and the corresponding measurements. These data can be summarized in a form of the interval is censored into, denoted by , where

If , is left-censored, and if , is right-censored. Note that data about measurements of before or after do not add information about .

We refer to the model for the distribution of as the calibration model. Let be the survival function of . The general form of the likelihood of interval-censored time-to-event data, under independent censoring assumption, is (Sun, 2007)


The NPMLE estimator has been studied as an extension of the Kaplan-Meier estimator to interval-censored data. Algorithms suggested to find the NPMLE include the self-consistency EM algorithm (Turnbull, 1976), the iterative convex minorant (ICM) algorithm (Groeneboom and Wellner, 1992), and an EM-ICM hybrid algorithm (Wellner and Zhan, 1997). Consistency and asymptotic distribution (at a rate) were proved by Groeneboom and Wellner (1992).

One can postulate a parametric or semiparametric model for the distribution of . This is especially appealing when the distribution of is likely to depend on additional covariates, previously denoted by . In the CRC studies, pre-diagnosis aspirin-taking status is strongly associated with the time from CRC diagnosis to post-diagnosis aspirin-taking (Figure 1). Additional covariates include risk factors for cardiovascular and cerebrovascular events because aspirin is often taken to reduce the risk of these events among high-risk patients; see Section 7 for further discussion. Therefore, while our framework allows for variety of time-to-event models, we focus in this paper on a calibration model with covariates, and specifically a PH model of this nature.

Let denote a vector of parameters characterizing the distribution of . An estimator of is obtained by maximizing the equivalent of (7) under a model for that possibly accommodates the covariates. See Chapter 2 of Sun (2007)

for discussion of parametric models for interval-censored time-to-event data. A more flexible model is a PH regression model with an unspecified baseline hazard function.

Finkelstein (1986) discussed PH models for interval-censored data and suggested discretization of the baseline hazard function. Algorithms for computation of the MLE were previously proposed (Huang, 1996; Huang and Wellner, 1997; Pan, 1999), and asymptotic theory was studied (Huang, 1996; Huang and Wellner, 1997). Cai and Betensky (2003) developed a related method for fitting PH models from interval-censored data using a linear spline model for the log-baseline hazard and maximized a penalized log-likelihood.

We adopt the recently developed framework of Wang et al. (2016), which uses flexible I-splines (Ramsay, 1988) for the cumulative baseline hazard function. Wang et al. (2016) further developed a fast EM algorithm, which we apply in our simulations and data analysis. Let and be the cumulative baseline hazard function and the survival function of , respectively, which under the PH model are


where for all , are unknown parameters to be estimated and are integrated spline basis functions, that are nondecreasing. The spline basis functions are calculated according to the user specification of interior knots and a polynomial degree for the basis functions. The resulting is guaranteed to be monotone increasing. See Ramsay (1988) and Wang et al. (2016) for further details and discussion about I-splines in general and for the PH model, respectively, including the important practical issues of choosing the number and the location of the knots.

To summarize this section, the calibration model is fitted by maximizing the likelihood (7) while substituting a model for , chosen based on the available data and subject-matter knowledge. In our case, we are going to use the PH model (8) for the time until aspirin-taking following a CRC diagnosis.

4.2 Risk-set calibration

The OC estimator may suffer from asymptotic bias. It is calculated as the maximizer of while the partial likelihood is . The degree of divergence between and depends on how different and are. Recall that was defined by omitting from the event . If the probability of this event is close to one, as in the case of rare events, the bias should be attenuated. If has no effect on , i.e., under the null, is independent of and . This implies that as the absolute value of , the true value of , increases, a larger bias may be expected.

Another source of bias stems from fitting the calibration model under the independent interval-censoring assumption, which is the assumption that and are independent of . However, in our studies, the time to event, , is informative about the censoring in the calibration model. If, for example, reduces the risk of death, the time of aspirin-taking status change is more likely to be censored in non-aspirin-taking patients. This may cause bias in the estimation of when fitting the calibration model. As before, if the event is rare, the censoring of is most likely due to administrative reasons, and hence the independent interval-censoring assumption approximately holds. Furthermore, under the null, the censoring interval is independent of . As before, larger typically implies more substantial bias. We investigate this point in the simulation studies in Section 6 and the Supplementary Materials. In studies with non-terminal event, the independent censoring assumption for the calibration model may be more plausible, because data on can be collected after the main event has occurred.

In order to reduce potential bias, we propose a risk-set calibration (RSC) procedure, an adaption of risk-set regression calibration previously developed in the context of error-prone covariates in survival analysis (Xie et al., 2001; Ye et al., 2008; Liao et al., 2011). This method uses , and estimate the distribution of by refitting the calibration model for at each observed event time, using only the members of the risk set at that time, so only participants with are used. Then, at each risk set, we plug-in the estimated distribution of in (5), leading to which is then substituted in (4) to obtain for .

The RSC is expected to lead to less bias than OC, especially when is large (Xie et al., 2001). However, some asymptotic bias in the RSC estimator may be expected, due to model misspecification. Even if the PH model for the distribution of holds at , it is not expected to hold for all

. The RSC estimator is also expected to have larger variance, due to increased number of parameters, and the decreasing (in

) sample size for the risk-set calibration models. Therefore, it is advised to use this estimator when the - association is strong, and the sample size is large.

5 Asymptotic properties

We focus on the PH model under the I-splines representation for of Wang et al. (2016). The results can be straightforwardly extended to parametric models for . Let and and recall that is obtained by maximizing , or alternatively, by solving where

with being the study end-time, the counting process associated with , and and are as defined in the Appendix. Under assumptions AE in the Appendix, . The proof is outlined in the Appendix. Furthermore, and can be estimated by a sandwich estimator


with being a sample version of given in the Appendix. The proof is outlined in the Appendix.

For the RSC estimator, results of similar nature are given and proved in the Supplementary Materials. Main modifications in the results are that , where is possibly different from , and that is replaced by , a time-dependent parameter vector.

6 Simulation study

We considered a PH calibration model in our main simulation study. In the Supplementary Materials, we describe simulation studies for parametric and nonparametric calibration models in absence of covariates for calibration.

For the main model, the hazard function of was , with and , all independent of each other. We took the Gompertz baseline hazard function , and , and . For , we considered the values . We simulated time-to-event data with time-dependent covariate as described in Austin (2012). We took exponential censoring (mean=5) and additional censoring at time 5. The resulting censoring rate varied between 42% and 63%, depending on the value of .

For the calibration model, we used the setup of Wang et al. (2016), with , where , , and and are the same covariates as in the main model. For each observation, questionnaire time points were simulated from the intervals on the equally spaced grid of . For example, under , the first questionnaire time point was simulated from , and the second from . However, to mimic the motivating studies, we considered a terminal main event, and kept only questionnaire time points before .

The PH calibration model fitting was done for each simulated dataset with equally-spaced interior knots and a quadratic order for the basis functions of the I-spline cumulative baseline hazard. In the CRC data analysis (Section 7), we have used the data as recommended in Wang et al. (2016)

to choose more carefully the number of knots, and used cubic splines. Standard errors were estimated by (


) and confidence intervals were calculated using the normal distribution.

We simulated 1000 datasets per scenario. Table 2 summarizes the results for the LVCF, the PH calibration model (PH-OC) and PH risk-set calibration models (PH-RSC), for . Under the null, all three methods were valid. As the association between and got stronger, a more substantial bias was observed. The OC estimator preformed generally well, with increased bias and lower coverage rates of the 95% confidence interval for the combination of and . The RSC estimator had lower bias in these scenarios. These observed biases were much smaller (in absolute value) than the bias of the LVCF estimator.

Table A.1 in the Supplementary Materials presents more results from this simulation scenario. It includes the results for , , and the MidI estimator. In terminal main event scenarios, MidI had a very large bias, even under the null. This is because a finite interval is only observed for observations with left- or interval-censored exposure times. Right-censored exposure times are dealt with differently (e.g., using LVCF) from left- or interval-censored observations, and a right-censored exposure time is the result of the main event occurring before a change in was observed. This creates negative dependency between the imputed and , even under the null. In studies with non-terminal events, the MidI method is valid under the null, but not when . A small simulation study (not presented here) confirmed these claims.

Method Mean EMP.SE CP95%
0.000 2 LVCF -0.002 0.141 0.135 0.944
PH-OC 0.003 0.183 0.178 0.950
PH-RSC 0.004 0.183 0.177 0.952
5 LVCF 0.003 0.122 0.124 0.955
PH-OC 0.007 0.138 0.146 0.956
PH-RSC 0.007 0.138 0.142 0.956
0.693 2 LVCF 0.462 0.119 0.118 0.498
PH-OC 0.680 0.175 0.179 0.936
PH-RSC 0.684 0.177 0.175 0.938
5 LVCF 0.572 0.107 0.110 0.810
PH-OC 0.690 0.132 0.145 0.958
PH-RSC 0.689 0.132 0.137 0.957
1.609 2 LVCF 0.968 0.110 0.109 0.000
PH-OC 1.472 0.179 0.210 0.869
PH-RSC 1.516 0.190 0.195 0.897
5 LVCF 1.212 0.096 0.099 0.013
PH-OC 1.577 0.139 0.166 0.951
PH-RSC 1.575 0.137 0.151 0.948
1.946 2 LVCF 1.130 0.112 0.110 0.000
PH-OC 1.695 0.182 0.230 0.723
PH-RSC 1.773 0.198 0.205 0.816
5 LVCF 1.410 0.097 0.098 0.000
PH-OC 1.890 0.148 0.187 0.933
PH-RSC 1.890 0.147 0.158 0.929
Table 2:

Simulation study results under a calibration PH model. Methods compared are LVCF, PH calibration model (PH-OC) and PH risk-set calibration models (PH-RSC). We present mean estimates (Mean), empirical standard deviations (EMP.SE), mean estimated standard errors (

) and empirical coverage rate of 95% confidence intervals (CP95%) for . Results are based on 1000 simulated datasets per scenario.

To investigate the performance of our methods in other settings, we have considered additional simulation studies without any covariates affecting the time-to-exposure (i.e., no and in the calibration model) and compared Weibull and nonparamteric calibration models when the true calibration model was Weibull, and when it was piecewise Exponential. The results, presented in the Supplementary Materials, generally agreed with the results we have descried in this section.

7 Results of aspirin and CRC survival analyses

Our first step was to construct a calibration model. That is, a model for the time-to-start-aspirin-taking. The 113 participants without available questionnaire data were not used for fitting the calibration model, and the calibration model was fitted using the remaining 1,258 participants. The baseline potential covariates included gender, age-at-diagnosis, pre-diagnosis body mass index (BMI), pre-diagnosis aspirin-taking status, and the following tumor characteristics: disease stage (1-3 and missing), differentiation (poor vs well-moderate) and location (proximal colon, distal colon or rectum). We considered three PH calibration models: (I) a model with all the aforementioned covariates, (II) a model with all non tumor-related baseline covariates and disease stage, (III) a model with all non tumor-related baseline covariates. Model (III) minimized the BIC (Schwarz, 1978). In addition, including strong determinant of the terminal event, such as disease stage, is undesired. An association between disease stage and aspirin-taking could be the result of violation of the independent censoring assumption for the calibration model. That is, the association of the disease stage with aspirin-taking could be only due to the association of disease stage with the censoring event, i.e., death.

Model (III) is logically sound from a subject-matter perspective. Aspirin is a preventive care for patients in high-risk for vascular diseases. Therefore, determinants of vascular diseases would also increase the probability of aspirin-taking. The covariates in Model (III), age, gender, and BMI, are all well-established risk factors for vascular diseases. Therefore, we adopted Model (III) as our calibration model. Table 3 presents the results of fitting this PH model to the data. We used the same PH calibration model for all the analyses since there is no reason to believe that the tumor subtype, that was not even known at the time of diagnosis, informs the aspirin-taking status changing time.

As suggested by Wang et al. (2016), we used BIC to choose the number of equally-spaced interior knots, which led to . We used the flexible cubic order for the spline basis functions. The number of interior knots according to the AIC criterion (Akaike, 1974) was . The final results did not substantially change when we specified instead of interior knots.

Est () CI -value
Predx-asp 1.14 (0.074) 3.11
Predx-BMI 0.03 (0.006) 1.03
Age-at-dx 0.01 (0.001) 1.01
Female -0.18 (0.073) 0.83 0.013
Table 3: The PH calibration model for time-to-start-taking-aspirin, using cubic order and interior knots. Covariates include pre-diagnosis aspirin status (Predx-asp, taker versus non-taker), pre-diagnosis BMI (Predx-BMI), age-at-diagnosis (Age-at-dx) and gender (Female).

The simulation results presented in Section 6 and the Supplementary Materials have shown that when there are covariates affecting the time-to-exposure, the PH-OC estimator is preferable over the näive methods, namely LVCF and MidI. Figure 2 illustrates the difference between the methods. On its right panel, we drew the probabilities versus , for the first nine participants in our data, using LVCF, MidI, non-parametric (NP) calibration, and the PH calibration model. Once was observed, all methods assign for the rest of the relevant risk sets. The left panel of Figure 2 presents the same probabilities as the right panel of the figure, but for the first participant only. This person did not report aspirin taking during the 10 years follow-up (). From the data, this person was a 75 years old (at diagnosis time) male, that was taking aspirin at baseline. From Table 3, we would expect the probability of this person to take aspirin to be higher than in the general population. This aligns with this person having larger estimated probabilities under the PH model comparing to the NP calibration model.

(a) First participant
(b) First 9 participants
Figure 2: by the different considered methods for the first 9 participants. Panel (a) corresponds to the top left corner of Panel (b).

We included in the risk-set calibration models the covariates of Model (III). To avoid destabilization of the calibration model fittings, we grouped the risk sets in intervals of size 0.5. That is, we refitted the calibration model every 6 months.

Turning to the main model, we included baseline covariates that are known to be associated with CRC-related death. These included age-at-diagnosis, pre-diagnosis BMI, family history of CRC, and the following tumor characteristics: stage, differentiation and location. Table 4 shows estimates and corresponding standard errors, confidence intervals, and -values for the association between aspirin-taking and survival in the three motivating studies and in the entire data. Compared to LVCF, stronger protective associations were estimated by our methods in all three subtype-specific analyses. Compared to the OC estimates, the RSC estimates were only slightly further from null. This could be partially explained by the high censoring rate (

80%). Even though the sample sizes and case numbers were low to moderate, there was evidence for strong protective effect of aspirin for PIK3CA subtype CRC. The point estimates for aspirin in low CD274 subtype imply negative association between aspirin and death, but, possibly due to limited power, the null hypothesis was not rejected.

Study Method Est (SE) CI 95%(for HR) -value
All Data LVCF -0.34 (0.147) 0.71 0.020
() PH-OC -0.32 (0.176) 0.73 0.072
(No. Events ) PH-RSC -0.32 (0.177) 0.73 0.073
Low CD274 LVCF -0.64 (0.352) 0.53 0.067
() PH-OC -0.77 (0.400) 0.46 0.054
(No. Events ) PH-RSC -0.79 (0.401) 0.45 0.049
PIK3CA LVCF -2.13 (0.639) 0.12 0.001
() PH-OC -2.22 (0.653) 0.11 0.001
(No. Events ) PH-RSC -2.23 (0.657) 0.11 0.001
PTGS2 LVCF -0.30 (0.202) 0.74 0.138
() PH-OC -0.32 (0.241) 0.73 0.185
(No. Events ) PH-RSC -0.32 (0.241) 0.72 0.181

Hamada et al. (JCO, 2017)
Liao et al. (NEJM, 2012)
Chan et al. (JAMA, 2009)

Table 4: Results for the main model in the three CRC studies and in all data. Results presented for the aspirin effect. The main model also included family history of CRC, pre-diagnosis BMI, age-at-diagnosis, gender, disease stage, differentiation and location.

8 Reanalyzing the data in Goggins et al. (1999)

To our knowledge, the only non-näive method developed for the problem of fitting a PH model with interval-censored time-to-exposure was developed by Goggins et al. (1999). Their motivated application was an analysis of AIDS Clinical Trial Group (ACTG) 181 (Finkelstein et al., 2002). The event of interest, active CMV disease, was non-terminal. Their binary covariate was CMV shedding. Based on the joint likelihood for and the distribution of ( in their notation), Goggins et al. (1999) proposed an EM algorithm where the E-step is carried out with respect to the ordering of CMV shedding among the study participants. They further developed a Gibbs sampler to improve computational efficiency.

Our methodology has several improvements over the method of Goggins et al. (1999). First, we exploit modern and fast estimation procedures for estimating the distribution of , by the NPMLE. Second, we can (but do not have to) include parametric modeling assumptions on the distribution of , if appropriate. Third, since we fit the calibration and main models separately, we can include in our analysis participants without data about the covariate of interest. Finally, and most importantly, we include measured covariates affecting the time-to-exposure .

We analyzed the ACTG 181 data (Finkelstein et al., 2002) using our method under non-parametric calibration. The results are presented in Table ???? in the Supplementary Materials. We observed a divergence between the OC and RSC estimates. The RSC estimates were further away from the null, and similar to those obtained by Goggins et al. (1999). The confidence intervals were quite wide, as one may obtain for hazard ratios of strong effects when the sample size is moderate.

9 Conclusion

We have presented a novel calibration approach for studying the association between a time-dependent binary exposure and survival time, when the data about the monotone time-dependent exposure is only available intermittently. Our proposed approach allows for wide range of calibration models. In practice, an adequate model should be chosen by combination of subject-matter knowledge and the available data. The R package ICcalib implementing our methods is publicly and freely available at blinded.

When the association between exposure and survival is strong, and a terminal event is non-rare, our calibration framework may suffer from bias, that can be reduced, though not eliminated, by risk-set calibration. Unlike regression calibration methods for error-prone covariates, additional data (e.g., reliability or validation data) is not needed to fit the calibration model, as the covariate measurements are inherently part of the data collected to answer the question of interest.

The problem described in this paper and the proposed conceptual framework open the way for further research. A main question of interest is whether can be estimated consistently from the data and what is the nature of further assumptions and methods to ensure consistency. Our model is a type of a joint model for time-to-event and longitudinal data (Tsiatis and Davidian, 2004; Rizopoulos, 2012). Potential alternative methods may include developing joint modeling methods for binary longitudinal covariates (Faucett et al., 1998; Larsen, 2004; Rizopoulos et al., 2008) and adapting them to the problem of interval-censored change-time. The bias in the calibration model can be tackled using methods for dependent interval-censoring (Finkelstein et al., 2002). Another potential way to expand the research presented in this paper is to define alternative causal estimands and investigate the identifiability assumptions needed to estimate parameters of interest.

In conclusion, we presented novel analyses to overcome a common problem in medical and epidemiological research while presenting a new conceptual framework accompanied by flexible and simple methodology to preform time-to-event analysis under the problem of infrequently updated binary covariate.


Section A presents the asymptotic theory for the RSC, Section B describe our R package. Additional simulation studies are described in Section C. Section D presents the analysis results of the CMV data.


  • (1)
  • Akaike (1974) Akaike, H. (1974), ‘A new look at the statistical model identification’, IEEE transactions on automatic control 19(6), 716–723.
  • Andersen and Gill (1982) Andersen, P. K. and Gill, R. D. (1982), ‘Cox’s regression model for counting processes: a large sample study’, The annals of statistics pp. 1100–1120.
  • Andersen and Liestøl (2003) Andersen, P. K. and Liestøl, K. (2003), ‘Attenuation caused by infrequently updated covariates in survival analysis’, Biostatistics 4(4), 633–649.
  • Anderson-Bergman (2016) Anderson-Bergman, C. (2016), ‘An efficient implementation of the emicm algorithm for the interval censored npmle’, Journal of Computational and Graphical Statistics (just-accepted).
  • Austin (2012) Austin, P. C. (2012), ‘Generating survival times to simulate cox proportional hazards models with time-varying covariates’, Statistics in medicine 31(29), 3946–3958.
  • Cai and Betensky (2003) Cai, T. and Betensky, R. A. (2003), ‘Hazard regression for interval-censored data with penalized spline’, Biometrics 59(3), 570–579.
  • Cao et al. (2015) Cao, H., Churpek, M. M., Zeng, D. and Fine, J. P. (2015), ‘Analysis of the proportional hazards model with sparse longitudinal covariates’, Journal of the American Statistical Association 110(511), 1187–1196.
  • Chan et al. (2009) Chan, A. T., Ogino, S. and Fuchs, C. S. (2009), ‘Aspirin use and survival after diagnosis of colorectal cancer’, Jama 302(6), 649–658.
  • Cox (1972) Cox, D. R. (1972), ‘Regression models and life-tables’, Journal of the Royal Statistical Society. Series B (Methodological) 34(2), 187–220.
  • Faucett et al. (1998) Faucett, C. L., Schenker, N. and Elashoff, R. M. (1998), ‘Analysis of censored survival data with intermittently observed time-dependent binary covariates’, Journal of the American Statistical Association 93(442), 427–437.
  • Finkelstein (1986) Finkelstein, D. M. (1986), ‘A proportional hazards model for interval-censored failure time data’, Biometrics pp. 845–854.
  • Finkelstein et al. (2002) Finkelstein, D. M., Goggins, W. B. and Schoenfeld, D. A. (2002), ‘Analysis of failure time data with dependent interval censoring’, Biometrics 58(2), 298–304.
  • Goggins et al. (1999) Goggins, W. B., Finkelstein, D. M. and Zaslavsky, A. M. (1999), ‘Applying the cox proportional hazards model when the change time of a binary time-varying covariate is interval censored’, Biometrics 55(2), 445–451.
  • Groeneboom and Wellner (1992) Groeneboom, P. and Wellner, J. A. (1992), Information bounds and nonparametric maximum likelihood estimation, Vol. 19, Springer Science & Business Media.
  • Hamada et al. (2017) Hamada, T., Cao, Y., Qian, Z. R., Masugi, Y., Nowak, J. A., Yang, J., Song, M., Mima, K., Kosumi, K., Liu, L. et al. (2017), ‘Aspirin use and colorectal cancer survival according to tumor CD274 (programmed cell death 1 ligand 1) expression status’, Journal of Clinical Oncology 35(16), 1836–1844.
  • Huang (1996) Huang, J. (1996), ‘Efficient estimation for the proportional hazards model with interval censoring’, The Annals of Statistics 24(2), 540–568.
  • Huang and Wellner (1997) Huang, J. and Wellner, J. A. (1997), Interval censored survival data: a review of recent progress, in ‘Proceedings of the First Seattle Symposium in Biostatistics’, Springer, pp. 123–169.
  • Langohr et al. (2004) Langohr, K., Gómez, G. and Muga, R. (2004), ‘A parametric survival model with an interval-censored covariate’, Statistics in medicine 23(20), 3159–3175.
  • Larsen (2004) Larsen, K. (2004), ‘Joint analysis of time-to-event and multiple binary indicators of latent classes’, Biometrics 60(1), 85–92.
  • Liao et al. (2012) Liao, X., Lochhead, P., Nishihara, R., Morikawa, T., Kuchiba, A., Yamauchi, M., Imamura, Y., Qian, Z. R., Baba, Y., Shima, K. et al. (2012), ‘Aspirin use, tumor pik3ca mutation, and colorectal-cancer survival’, New England Journal of Medicine 367(17), 1596–1606.
  • Liao et al. (2011) Liao, X., Zucker, D. M., Li, Y. and Spiegelman, D. (2011), ‘Survival analysis with error-prone time-varying covariates: A risk set calibration approach’, Biometrics 67(1), 50–58.
  • Lin and Wei (1989) Lin, D. Y. and Wei, L.-J. (1989), ‘The robust inference for the cox proportional hazards model’, Journal of the American statistical Association 84(408), 1074–1078.
  • Pan (1999) Pan, W. (1999), ‘Extending the iterative convex minorant algorithm to the cox model for interval-censored data’, Journal of Computational and Graphical Statistics 8(1), 109–120.
  • Prentice (1982) Prentice, R. L. (1982), ‘Covariate measurement errors and parameter estimation in a failure time regression model’, Biometrika 69(2), 331–342.
  • Ramsay (1988) Ramsay, J. O. (1988), ‘Monotone regression splines in action’, Statistical science pp. 425–441.
  • Rizopoulos (2012) Rizopoulos, D. (2012), Joint models for longitudinal and time-to-event data: With applications in R, CRC Press.
  • Rizopoulos et al. (2008) Rizopoulos, D., Verbeke, G., Lesaffre, E. and Vanrenterghem, Y. (2008), ‘A two-part joint model for the analysis of survival and longitudinal binary data with excess zeros’, Biometrics 64(2), 611–619.
  • Schwarz (1978) Schwarz, G. (1978), ‘Estimating the dimension of a model’, The annals of statistics 6(2), 461–464.
  • Sun (2007) Sun, J. (2007), The statistical analysis of interval-censored failure time data, Springer Science & Business Media.
  • Tsiatis and Davidian (2004) Tsiatis, A. A. and Davidian, M. (2004), ‘Joint modeling of longitudinal and time-to-event data: an overview’, Statistica Sinica pp. 809–834.
  • Turnbull (1976) Turnbull, B. W. (1976), ‘The empirical distribution function with arbitrarily grouped, censored and truncated data’, Journal of the Royal Statistical Society. Series B (Methodological) pp. 290–295.
  • Wang et al. (1997) Wang, C., Hsu, L., Feng, Z. and Prentice, R. L. (1997), ‘Regression calibration in failure time regression’, Biometrics pp. 131–145.
  • Wang et al. (2016) Wang, L., McMahan, C. S., Hudgens, M. G. and Qureshi, Z. P. (2016), ‘A flexible, computationally efficient method for fitting the proportional hazards model to interval-censored data’, Biometrics 72(1), 222–231.
  • Wellner and Zhan (1997) Wellner, J. A. and Zhan, Y. (1997), ‘A hybrid algorithm for computation of the nonparametric maximum likelihood estimator from censored data’, Journal of the American Statistical Association 92(439), 945–959.
  • Xie et al. (2001) Xie, S. X., Wang, C. and Prentice, R. L. (2001), ‘A risk set calibration method for failure time regression by using a covariate reliability sample’, Journal of the Royal Statistical Society: Series B (Statistical Methodology) 63(4), 855–870.
  • Ye et al. (2008) Ye, W., Lin, X. and Taylor, J. M. (2008), ‘Semiparametric modeling of longitudinal measurements and time-to-event data–a two-stage regression calibration approach’, Biometrics 64(4), 1238–1246.


We first show that . Denote and , where for any , is the expectation under the PH model (8) for , and is the expectation under the true distribution. Denote also

where and where for any vector , , and . Observe that is the true hazard function (conditionally on ). Define also and let and . We impose the following standard regularity assumptions:

  1. The number of knots does not grow with the sample size .

  2. , for some .

  3. are continuous and bounded functions of , for any and in the neighborhoods and of and , respectively, and for all . Furthermore, is bounded away from zero.

  4. The components of are bounded for all .

  5. The matrix is continuous in and positive definite at .

Assumption (A1) could be relaxed, see Wang et al. (2016)

. But for simplicity, we consider (A1) as given above. By the Weak Law of Large Numbers,

By the assumptions above and using arguments similar to those of Andersen and Gill (1982) as implemented by Lin and Wei (1989), it follows that for any and , . Recall that is the solution of and observe that

where the last equality holds by Assumption B and since is bounded for finite values of . Let be the solution of . By the assumptions above, and specifically Assumption E, . In particular, .

Regrading asymptotic normally, by a Taylor expansion

which can be rearranged as


Similarly to Andersen and Gill (1982) and Lin and Wei (1989), by the assumptions given above, , and by invoking similar arguments . Let . By arguments similar to those of Lin and Wei (1989), it can be shown that , where

Regarding , by Assumption A, it is a finite-size vector, and as explained in Wang et al. (2016), it can be treated with the standard tools for parametric models. Denote . By further incorporating standard theory of misspecified likelihood-based models and Assumption B, we may write

Substituting this in (10), we get


Therefore, by the multivariate central limit theorem and Slutsky’s theorem, we conclude that

is asymptotically normally distributed with covariance matrix

which can be consistently estimated by in Equation (9), by replacing parameters with their estimates and expectations with their corresponding sample version.