Truncation is a statistical phenomenon that has been shown to occur in a wide range of applications, including survival analysis, epidemiology, economics, and astronomy. Individuals who are subject to truncation provide no information to the investigator. Left truncation occurs when data is only recorded for individuals whose event time exceeds a random time (i.e. left truncation time). Under left truncation, individuals with smaller event times are less likely to be observed, resulting in a study sample that is biased towards larger event times and risk factors associated with larger event times. Right truncation occurs when data is only recorded for individuals whose event time proceeds a random time (i.e. right truncation time). Under right truncation, individuals with larger event times are less likely to be observed, resulting in a study sample that is biased towards smaller event times and risk factors associated with smaller event times. When both left and right truncation are present, this is known as double truncation.
Double truncation is inherent in autopsy-confirmed studies of neurodegenerative diseases . Left truncation occurs because individuals enter the study after the onset of the disease, and therefore those who succumb to the disease before they enter the study are unobserved. The right truncation occurs because individuals who live past the end of the study date do not receive a pathological diagnosis of the disease. Since these subjects cannot be definitively diagnosed with a particular disease, they are excluded from the autopsy-confirmed study sample and therefore provide no information to the investigator. This is contrary to censored individuals, who provide partial information about their survival time. We note, however, that right censoring is not possible in autopsy-confirmed studies, since any individual who has an autopsy performed will also have a known survival time. This truncation scheme is illustrated in Figure 1, where only individuals whose time of death falls between the study entry time and end of study time are observed.
The aim of our data analysis is to get accurate estimates of the effect of risk factors on survival from disease symptom onset in subjects with autopsy-confirmed Alzheimer’s disease (AD), the most common neurodegenerative disease. Because individuals with shorter survival times are less likely to enter the study, left truncation leads to a study sample that is biased towards larger survival times and risk factors associated with larger survival times. Similarly, individuals with longer survival times are more likely to live past the end of the study, and therefore right truncation leads to a study sample that is biased towards smaller survival times and risk factors associated with smaller survival times. If double truncation is not accounted for, then the regression coefficient estimators from the Cox regression model  will be biased.
. The estimation procedure for all three methods rely on estimating the joint distribution of the left and right truncation times, which is used to compute the probability that a subject is observed (i.e. not truncated). These probabilities are then used as weights or offsets in the Cox model. However, the estimation of the truncation distribution relies on the assumption of independence between the observed survival and truncation times, which may not a reasonable assumption in practice. For example, according to the Alzheimer’s association and discussions with our clinical investigators, factors such as lower age of symptom onset, depression, and stress are associated with delayed study entry. Since these factors are associated with survival, this induces a dependence between the left truncation times and survival times. As shown in the simulation studies in Section 3, the regression coefficient estimators from[1, 3, 4] are sensitive to violations of this independence assumption. Therefore, the existing literature is unable to address the unique challenges present in our study.
In this paper, we propose a novel method to relax the assumption of independence between the observed survival and truncation times in the Cox proportional hazards model under left, right, and double truncation. Specifically, by conditioning on the observed truncation times, our method relaxes the independence assumption to an assumption of conditional independence. Treating the truncated survival times as missing, we introduce an expectation-maximization (EM) algorithm to estimate the regression coefficients and baseline hazard rates. This approach, which completely avoids the estimation of the truncation distribution, yields consistent and asymptotically normal regression coefficient estimators. We show through extensive simulation studies that our proposed estimators have little bias in small samples, while the estimators based on the methods introduced in [1, 3, 4], and the standard model which ignores double truncation, can be heavily biased under violations of the independence assumption. We show that even if the independence assumption is satisfied, our proposed method performs as well as the existing approaches. We illustrate our method by analyzing the effect of cognitive reserve on survival in an autopsy-confirmed AD cohort.
Cognitive reserve (CR) is a widely used hypothetical construct intended to account for individual differences in cognitive decline and clinical manifestations of dementia among individuals with AD [5, 6]. CR hypothesizes that individuals develop cognitive strategies and neural connections throughout their life times through experience such as occupation, education, and other forms of mental engagement . This may modulate the effects of AD because of compensatory strategies obtained from a higher level of professional performance or a good education . For example, CR may have a protective role in the brain and therefore lengthen survival from disease symptom onset .
Occupation, often used as a proxy for CR, has been shown to modulate survival in healthy aging and AD . Several studies in healthy aging have examined the possible protective influence of higher occupational attainment on survival [11, 12, 13]. However other studies have shown that for individuals with AD, those with a higher occupational attainment had a higher mortality rate than those with a lower occupation attainment [14, 15]. The caveat to previous studies assessing the effect of occupation on survival is that most consisted of populations with clinically diagnosed AD subjects, which can be unreliable . Due to the inaccuracy of clinical diagnosis of AD, autopsy-confirmation is used for a definitive diagnosis . Without an accurate diagnosis of AD, any estimates of factors affecting survival are not reliable. In this paper, we aim to get improved estimates of the effect of occupation on survival from an autopsy-confirmed AD sample, adjusting for both truncation and dependence.
The remainder of this paper is organized as follows. In Section 2 we introduce the proposed EM method, including the estimation procedure and the large sample properties of the resulting estimators. In Section 3, we conduct a simulation study to assess the finite sample performance of the proposed estimators under dependent truncation. In Section 4, we apply the proposed method to estimate the effect of occupation on survival in individuals with autopsy-confirmed AD. Discussion and concluding remarks are given in Section 5. Proofs of the large sample results are outlined in the Appendix.
We first introduce notation and assumptions. Let denote the survival time of interest (e.g. survival time from disease symptom onset), denote the left truncation time (e.g. time from disease symptom onset to entry into the study), denote the right truncation time (e.g. time from disease symptom onset to the end of study date), and denote a vector of covariates. Let denote the size of the target population – the population that would have been observed had there been no truncation present in the study. Due to double truncation, we only observe for individuals who live long enough to enter the study (i.e. ) and do not live past the end of the study (i.e.
). Here we have denoted the population random variables from the target population without subscripts, and the sampling random variables from the observed sample with subscripts.
The proportional hazards model  is considered the standard regression model for analyzing traditional right-censored survival data. The model assumes that the covariate-specific hazard function is given by , where is a regression parameter vector, and is the baseline hazard function and is unspecified. When the survival data are subject to selection bias, Cox’s partial likelihood approach  cannot be directly applied. This is because the observed data are not a representative sample of the target population, and therefore the observed, biased data do not follow the model that is assumed for the unbiased data from the target population. When the data is biased due to double truncation, the distribution of the observed survival time is given by:
which differs from the distribution of the survival time from the target population. Therefore the resulting estimates of the regression coefficients based on data from the observed sample will be biased estimators of the regression coefficients from the target population.
Under the assumption of independence between the survival and truncation times, [1, 3, 4] adjust for double truncation by estimating the probability that a subject with survival time is observed, defined by , . These probabilities are then used as weights or offsets in the Cox regression model. For example, under double truncation and independence between the survival times and truncation times, Rennert and Xie (2017) consistently estimate the true regression coefficient vector by , the solution to
where , , , and is the indicator function. Here is the maximum of the observed event times. The standard Cox regression estimator  which ignores double truncation, , is the solution to , where is the score equation from the standard Cox model.
The caveat of the approaches which adjust for double truncation is that they require estimating the distribution of the truncation times, which is needed to obtain the estimator of the selection probabilities . Existing methods to estimate the truncation distribution require independence between the survival and truncation times. When this independence assumption is violated, the estimator of the truncation distribution will be biased, and therefore the estimator of the selection probabilities will be biased. Because the methods in [1, 3, 4] depend on , the resulting regression coefficient estimators will also be biased. The severity of this bias is demonstrated by the simulation studies in the next section.
When the survival times are conditionally independent of the truncation times given the covariate , the likelihood of the observed survival times, conditional on the truncation times and covariates, is given by
where and . That is, is the probability of observing a random subject from the target sample with covariate and truncation times and . Conditioning on the truncation times allows us to utilize the information in the covariates to relax the assumption of independence to an assumption of conditional independence. Furthermore, this conditioning completely avoids the need to estimate the distribution of the truncation times.
The log-likelihood function, , can be expressed as
Due to the difficulties of maximizing the log-likelihood (2) over all absolutely continuous cumulative hazard functions, we allow the estimator of to be discrete. Because the maximum likelihood estimation (MLE) of and may be computationally intractable if directly solving the score equations for (2), we estimate and using an EM algorithm. This has the advantage that its maximization step (M-step) only involves the complete-data likelihood. Based on the EM algorithm, we provide a convenient estimation approach to obtain estimators of the regression coefficients and baseline hazard function under left, right, or double truncation. This approach allows the survival and truncation times to be dependent through the covariate vector . Furthermore, it does not require the estimation of the truncation time distribution. The estimation approach given here can easily be implemented using standard software for the Cox regression model.
2.1 Proposed EM Algorithm
Motivated by the approach in , who proposed EM algorithms for length-biased and right-censored data, Shen and Liu  proposed an EM algorithm to obtain pseudo MLEs of the regression coefficients from the Cox model under independent left and right truncation. They referred to their MLEs as pseudo because their proposed likelihood included the plug-in value of the estimator of the selection probabilities . However, as the authors point out, the estimated selection probabilities will be biased if the truncation times depend on the covariates Z. Hence, the resulting pseudo MLEs of the regression coefficients from the Cox model will also be biased.
We propose an EM algorithm for obtaining the MLE of based on (2). This allows us to relax the assumption of independence required by the methods in [1, 3, 4] to an assumption of conditional independence by avoiding the estimation of the truncation distribution (and corresponding selection probabilities). Similar to the approaches of  and , we let denote the ordered, distinct failure times for . We develop the EM algorithm based on the discrete version of , which we redefine as a step function only taking jumps at . Specifically, we set , were is the positive jump at time for .
Our observed data consists of , where for . Let denote the truncated latent data, where is the missing survival time for a subject with truncation times and covariate vector for and . For notational convenience, we set and define the density of at time , given , as , where . Assuming the latent survival times take their values in , the complete data log-likelihood is given by
To estimate the parameter , the EM algorithm begins by choosing an initial value for , say . In our setting, we can choose , which are the estimates from the standard Cox model. For , the expectation step (E-step) consists of calculating the expected value of the complete data log likelihood function with respect to the missing data , , conditional on the observed data , , under the current estimate . That is, we compute:
In the maximization step (M-step), we choose to maximize . That is, we set
The E- and M-steps are carried out again, but this time with replaced by . The E- and M-steps are then alternated repeatedly until for some prespecified error .
The EM algorithm described here is under the double truncation setting. If only left truncation is present, the algorithm is easily adjusted by setting for . When only right truncation is present, we set for . Note that when only left truncation is present, the standard Cox regression estimator can account for dependent left truncation by adjusting the risk set at a given time point to include all individuals who are alive and in the study at that time . We denote this estimator by . We show through simulations in the next section that when only left truncation is present, our proposed estimator and yield nearly identical results.
At the iteration, define . Then,
Since is the number of missing/truncated subjects with covariate values and truncation times and ,
follows a geometric distribution with success rate. Therefore when , .
The complete data log likelihood is then given by
Let . The complete data log likelihood can be written as
where and .
Treating as constant, we set to get a closed form solution to as a function of :
Differentiating with respect to yields
Setting the equation above equal to and inserting the equation for yields
The estimating equation (4) can be solved by specifying the “weights” option in the “coxph” function in R. First, a weight vector of length must be created: . The corresponding failure time data and covariate vectors are also created with length as follows: and . Letting be the identity vector of length , the solution to (4), which we denote by , can be obtained with the following command:
Plugging into (3) yields an updated estimator for , . We then set , and repeat the E- and M-steps. We continue to alternate between the E- and-M steps until , for some prespecified error . The MLE of the hazard ratio is then given by . We denote the corresponding baseline hazard by , and the cumulative baseline hazard function by .
The EM algorithm presented here falls into the general scheme of the ECM algorithm, and therefore its convergence to the local maximizer is guaranteed by the same conditions required for convergence of the ECM algorithm . The uniqueness of the resulting estimators are guaranteed by the regularity conditions in Appendix A.1.
2.2 Asymptotic Properties
In this section, we establish the strong consistency and asymptotic normality of the proposed EM estimators. Here we denote the proposed estimators by , and denote the true regression coefficients and cumulative baseline hazard function . The asymptotic properties of the proposed estimators refer to the situation when the total number of observed (non-truncated) subjects . The following theorems assume that the regularity assumptions in Appendix A.1 hold.
Theorem 1: Under the regularity assumptions given in the Appendix, is consistent: As , converges to , and converges to almost surely and uniformly in for .
The existence and uniqueness of the MLE can be proved based on the log-likelihood function
Theorem 1 can then be proved by applying the classical Kullback-Leibler information approach as in .
Theorem 2: Under the regularity assumptions given in the Appendix, converges weakly to a tight mean-zero Gaussian process.
Theorem 2 is proved using the Z-theorem for infinite dimensional equations . The proofs of Theorem 1 and Theorem 2 are outlined in Appendix A.2 and A.3, respectively.
To obtain an estimate of the standard deviation of, we apply the simple bootstrap technique. In our setting, the bootstrap sample is obtained by drawing independent vectors , , from the observed data vectors , , with replacement. These data vectors are then used to obtain an estimate of regression coefficients, denoted by . This process is repeated times to obtain the estimators . The estimate of the standard deviation of is computed by taking the standard deviation of the , . We denote this estimate by . We show through simulation studies in the next section that the standard deviation of is accurately estimated by .
In this section we examine the performance of the proposed estimator under dependent truncation. We compare our proposed estimator to the weighted estimator which adjust for double truncation but assumes independence between the survival and truncation times. We also compare the proposed estimator to the estimator from the standard Cox regression model. In all simulations, the survival times were generated from a proportional hazards model with hazard function , and follow a Weibull distribution with scale parameter and shape parameter . We set , and generated the risk factors and from independent Unif[0,5] distributions. The truncation times were also simulated from Weibull distributions with scale parameter and shape parameter . The left truncation times were generated from a proportional hazards model with hazard function , and the right truncation times were simulated from a proportional hazards model with hazard function . Here and were generated from independent Unif[0,5] distributions, with . To adjust the proportion of missing data due to left and right truncation, the truncation times were multiplied by constants and , respectively. A higher value of induced a higher proportion of missing data due to left truncation, while a lower value of induced a higher proportion of missing data due to right truncation. Because the survival, left, and right truncation times are all functions of for , , and , they are dependent. However, the survival and truncation times are conditionally independent given . To adjust the degree of dependence between and , and and , we varied the regression coefficients and , respectively.
We conducted 1000 simulation repetitions with sample sizes of = 100 and 250. To obtain observations after truncation, we simulated observations, where is the proportion of truncated data. For each simulation, we estimated , using the proposed EM estimator , the weighted Cox regression estimator , and the standard Cox regression estimator . Of the estimators which adjust for double truncation under the independence assumption, we only focus on from , as previous simulations (not shown here) have concluded that this estimator and that in  are nearly identical, and both outperform the estimators in 
. For each estimator, we calculated the estimated bias, observed sample standard deviations (SD), estimated standard errors (
), and the average empirical coverage probability of the 95% confidence intervals (Cov). To compare the efficiency of the estimators which adjust for double truncation to the efficiency of the standard estimator, we calculated the relative mean-squared error (MSE) ofto , . That is, we computed rMSE for and . We used 200 bootstrap resamples to estimate the standard error of and , .
Table 1 shows the results of the simulations described above. In the first model, we set to induce a negative dependence between the survival times and left truncation times, which resulted in a correlation of -0.35. In the second model, we set to induce independence between the survival and left truncation times. In the third model, we set to induce a positive dependence between the survival times and left truncation times, which resulted in a correlation of 0.35. Here and were chosen such that 25% of the survival times were left truncated and 25% of the survival times were right truncated, which resulted in 0.50. The parameter was set to in all models, which resulted in a correlation of 0.35 between the survival times and right truncation times.
In all models, the proposed EM estimators and had little bias, while the standard estimators and were biased. The weighted estimator was heavily biased in all models, while was biased in the first set of models (). The observed sample standard deviations of the proposed estimators were accurately estimated by the bootstrap technique, and the coverage probabilities of the proposed estimators were all close to the nominal level of 0.95. The coverage probabilities of and were well below the nominal level, as were the coverage probabilities for . Furthermore, the mean-squared errors of the proposed estimators were lower than those of the weighted and standard estimators in almost all settings, indicating that the proposed EM method is more efficient.
We further explored the bias and MSE of these estimators as a function of left and right truncation proportion (Figure 2). We set and , which corresponded to the setting of the last model in Table 1, inducing a positive dependency between the survival times and both left and right truncation times. The proposed estimators had little bias, regardless of truncation proportion. Even under mild truncation, the weighted estimator of the regression coefficient corresponding to , which is correlated with the truncation times, was biased. This bias increased drastically as the proportion of right truncation increased. The bias was relatively small for both the proposed and weighted estimator of the regression coefficient corresponding to , which is uncorrelated with the truncation times. Both standard estimators and were heavily biased in this setting. The MSE of was significantly lower than the MSE of for . Furthermore, in most cases the MSE of was lower than the MSE of the standard estimator , i.e. rMSE() for .
In Figure 3, we compared the bias and MSE of these estimators under varying truncation proportions, when the assumption of independence holds (i.e. ). The proposed EM estimators and weighted estimators had little bias, while the standard estimators were biased for . We also compared the rMSE of and to , . As indicated by the bottom row of Figure 3, the proposed EM estimators had similar efficiency to the weighted estimators when the independence assumption holds. When the proportion of missing data due to left and right truncation were approximately equal, the standard estimator was more efficient than the proposed EM estimator and the weighted estimator. This is because the bias due to left truncation canceled out with the bias due to right truncation when these proportions were equal, which yielded a lower MSE.
The standard Cox regression model can accommodate left truncation when the left truncation time is conditionally independent of the survival times given the observed risk factors. We compare the estimator from this model to our proposed estimator under dependent left truncation only in Appendix A.4. As shown in Figure 4, the results are nearly identical, and both estimators outperform the weighted estimators in this setting.
4 Application to Alzheimer’s Disease
We illustrate our method by considering an autopsy-confirmed AD study conducted by the Center for Neurodegenerative Disease Research at the University of Pennsylvania. The target population for the research purposes of this study consists of all subjects with AD symptom onset before 2012 that met the study criteria and therefore would have been eligible to enter the center. Our observed sample contains all subjects who entered the center between 1995 and 2012, and had an autopsy performed before July 1, 2012. Thus one criterion for a subject to be included in our sample is that they did not succumb to AD before they entered the study, yielding left truncated data. In addition, our sample only contains subjects who had an autopsy-confirmed diagnosis of AD, and therefore we have no knowledge of subjects who live past the end of the study. Thus our data is also right truncated. Our data consists of n=91 subjects, all of whom have event times. The event time of interest is the survival time () from AD symptom onset. The left truncation time () is the time between the onset of AD symptoms and entry into the study (i.e. initial clinic visit). The right truncation time () is the time between the onset of AD symptoms and the end of the study, which is taken to be July 1, 2012. Due to double truncation, we only observe subjects with .
We are interested in assessing the effect of occupation on survival in AD. Occupation is often used as a proxy for cognitive reserve (CR), which hypothesizes that individuals develop cognitive strategies and neural connections throughout their life times through experience such as occupation, education, and other forms of mental engagement . A common hypothesis in the literature is that CR has protective role in the brain and modulates the effects of AD because of compensatory strategies obtained from a higher level of professional performance and therefore lengthens survival during the course of the disease [8, 9].
However some studies have shown a higher mortality rate in AD individuals with higher occupational attainment [14, 15]. This supports an alternative theory of CR; individuals with higher CR tolerate more pathology which delays the onset of the disease. Because higher age of AD symptom onset is associated with an increased risk of mortality, this would support the hypothesis that those with higher CR would have an increased risk of mortality. There are two caveats to the studies described above. The first is that these studies consisted of populations with clinically diagnosed AD subjects, which can be unreliable. The second caveat is that the statistical analyses were subject to confounding, since age of AD symptom onset was not recorded nor adjusted for.
Here we are interested in obtaining improved estimates of the effect of occupation on survival from an autopsy-confirmed cohort of individuals with AD who have a known age of disease symptom onset. We use the highest occupational attainment for a given subject as a proxy for their CR. Primary occupation was classified and ranked based on the US census categories. In the following analyses, subjects who were classified as manager, business/government, and professional/technical workers were labeled as havinghigh occupational attainment in our study. Subjects classified as unskilled/semiskilled, skilled trade or craft, and clerical/office workers were classified as having low occupational attainment. This classification is consistent with previous studies [10, 14, 22]. Age at AD symptom onset was estimated based on a family report at first contact with the individual.
We first check the assumption of independence between the observed survival and truncation times using the conditional Kendall’s tau proposed by Martin and Betensky . The resulting p-value is 0.038, and therefore we reject this independence assumption. The corresponding Kendall’s tau statistic is , indicating positive dependence between the survival times and truncation times. The positive dependence between the left truncation times and survival times is clinically plausible because doctors often attribution the symptoms of early onset AD (onset of AD before 65 years of age) to other causes such as depression and stress, hence delaying the study entry time. Since younger age at onset is also associated with higher survival, this induces a positive dependence between the left truncation times and survival times.
Due to the dependence between the survival and truncation times, we apply the proposed method to estimate the effect of occupation on survival, adjusting for age at AD symptom onset and sex. Table 2 displays the results from the Cox regression model using the proposed EM estimators, weighted estimators, and the standard estimators. Using the proposed method, the estimated log hazard ratio for age at AD symptom onset is 0.029 (p-value = 0.016), indicating that AD individuals who have symptom onset one year later are roughly 3% more likely to die than subjects who have symptom onset a year earlier (). The estimated effect of female is -0.636 (p-value = 0.023), indicating that males are almost twice as likely to die than females (). These effects are nearly doubled using the weighted method which assumes independence, however the effects are not statistically significant (p-values = 0.117 and 0.088, respectively).
High occupational attainment is associated with increased survival in all models. Under the proposed method, the effect of high occupational attainment on survival is -0.673 (p-value = 0.009), indicating that those with a low occupational attainment are approximately twice as likely to die than those with a high occupational attainment (). This effect is attenuated under the weighted and standard methods, and neither method yielded statistically significant estimates (p-values are 0.186 and 0.158, respectively).
We proposed a novel method which relaxes the independence assumption between the observed survival and truncation times in the Cox model under left, right, or double truncation to an assumption of conditional independence between the observed survival and truncation times. We obtained consistent and asymptotically normal estimators of the regression coefficients and baseline hazard function by maximizing the conditional likelihood of the observed survival times using an EM algorithm. The simulation studies confirmed that the proposed estimators had little bias in small samples, while the naïve estimators from the Cox models which ignore truncation or assume independence were biased. The existing methods which adjust for truncation but assume independence resulted in heavily biased estimators of the regression coefficients for risk factors of survival that were also correlated with the truncation times. Furthermore, the proposed estimators were more efficient than the naïve estimators in most of the simulation settings.
We applied our proposed method to an autopsy-confirmed sample of individuals with Alzheimer’s disease (AD). AD is a major neurodegenerative disease which currently affects 5.3 million people in the United States according to the Alzheimer’s Association. In 2017 alone, AD and other dementias will have cost the nation an estimated $259 billion. Autopsy-confirmation is needed for a definitive diagnosis of AD, and a definitive diagnosis is necessary to accurately estimate the effect of potential risk factors associated with a given neurodegenerative disease. However, autopsy-confirmed samples of neurodegenerative diseases are subject to an inherent selection bias due to double truncation. Existing methods which adjust the Cox model in the presence of double truncation assume that the observed survival and truncation times are independent. This assumption may not be reasonable for studies of neurodegenerative diseases. In our data example, this independence assumption was rejected. Therefore, previous methods are not appropriate for our setting.
Given the severity of Alzheimer’s disease on patients, their caregivers, and society, accurate estimation of the effects of risk factors on survival is crucial. One such factor, cognitive reserve (CR), is hypothesized to lengthen survival during the course of the disease. Using occupation as a proxy for CR, we estimated the effect of CR on survival in an autopsy-confirmed AD sample. Using our proposed method to adjust for both left and right truncation and dependence between the survival and truncation times, we found that a low occupational attainment was associated with shortened survival. Compared to existing methods, the estimated hazard ratios for occupation on survival were larger under our proposed method. This is consistent with many studies concluding that an individual’s occupation may provide a protective effect and lengthen survival in AD. These findings suggest the importance of incorporating occupation in treatment trials and prognostic considerations in individuals with AD.
A limitation of our proposed method is that it in its current form, it cannot properly handle time-varying covariates measured after study entry, such as cognitive test scores. This is a consequence of the estimation procedure, which uses an expectation-maximization algorithm to estimate the latent survival times conditional on the observed truncation times and risk factors. This leads to predicting survival times based on risk factors measured after death for those missing subjects whose survival time is less than their left truncation time, which may yield biased regression coefficient estimators.
The proposed method has useful implications for observational studies. Double truncation has been shown to be present in a variety of studies, such as studies of clinically diagnosed Parkinson’s disease , childhood cancer , astronomy data , and studies based on registry data [3, 26]. In fact, any data pulled from a disease registry will be subject to inherent right truncation, since data is only recorded for subjects who have the disease and are entered in the registry by the time the data is extracted . In certain cases, the data will also be subject to left truncation [3, 26]. In a similar fashion, studies which only include data from individuals whose event times fall within the time course of the study are subject to double truncation . Therefore careful consideration of the study design must be taken into account when fitting the Cox proportional hazards model. Furthermore, the assumption of independence should always be tested, given the high sensitivity of existing methods to this assumption. For example, a quick application of a Kendall’s conditional Tau test  revealed this independence assumption is violated in the AIDS data used in . We therefore recommend using the proposed estimators in most practical settings, since they have little bias, and in most situations, have a lower mean-squared error compared to existing estimators under left, right, or double truncation, under a wide range of dependence structures.
We first define some notation. Let and , and set . The true parameter is denoted by .
A.1 Regularity Assumptions
The true hazard function is continuously differentiable, , and .
The true parameter vector lies in a compact set . The set contains all nondecreasing functions satisfying regularity assumption 1.
and are bounded, where .
The information matrix is positive definite. Here is used to emphasize the dependence on .
If for some constant , then .
Assumptions 1 and 2 are required for stochastic approximation. Assumptions 3 and 4 are needed to establish the asymptotic properties of the regression parameter estimates from the Cox model . Assumption 5 implies no covariate colinearity and thus ensures that the model is identifiable.
A.2 Consistency: Proof of Theorem 1
Since each function of in is concave or strictly concave, and the summation of concave functions is concave, the log-likelihood function is strictly concave in . Therefore we can find a unique maximizer of for each in a compact set . The existence of the NPMLE for follows by compactness of for the likelihood , which is continues in . Uniqueness is guaranteed by Assumption for in A.1 for large samples.
Here we show that if converges, it must converge to . As maximizes the log-likelihood given in (2), , the empirical Kullback-Leibler distance must be nonnegative. Suppose converges to some
. Then by the strong law of large numbers (SLLN),must converge to the negative Kullback-Leibler distance between and . Here is the probability measure under the parameter . Since the Kullback-Leibler distance and are nonnegative, the Kullback-Leibler distance between and must be zero. Therefore almost surely, and it then follows from model identifiability that . Therefore if converges, it must converge to .
The technical details to show that indeed converges are similar to those in . The idea is to find a further convergent subsequence for any subsequence of , and then apply Helly’s selection theorem. Here we provide only a sketch of the proof. The first step is to show that stays bounded. By regularity assumption 3, is in a compact set and is therefore bounded. To show is bounded, we make use of the fact that the empirical Kullback-Leibler distance is always non-negative for each in the parameter set. Using the approach of Murphy (1994), it can be shown that if does indeed diverge to , then it is possible to construct some sequence such that eventually becomes negative infinity, which contradicts the nonnegativity of the empirical Kullback-Leibler distance.
Since stays bounded, we can apply Helly’s selection principal to find a further convergent subsequence for any subsequence of indexed by . By the classical Kullback-Leibler information approach, and the SLLN, must converge to . It then follows from Helly’s selection theorem that the entire sequence must converge to for every , where is the maximum of the observed event times. Since is assumed to be monotone and continuous, the convergence of is uniformly in . Because the proof is carried out for a fixed in the underlying probability space , where the SLLN is applied countably many times, the convergence here is also almost surely a true convergence.
A.3 Asymptotic Normality: Proof of Theorem 2
Here we outline the proof for the weak convergence of , which follows the proof for weak convergence in . The proof consists of the application of empirical process theory and the Z-theorem for infinite dimensional estimating equations .
Denote the score equation for by . To obtain the score equation , we define the submodel , where is a bounded and integrable function. Setting , the score equation for is given by .
We denote the vector of the score functions by . The expectation under the true value is given by , where and .
By the definition of the MLE, . Since , we can show that . The estimating equation evaluated at , , is a sum of iid terms. We can therefore use empirical process theory to show that converges weakly to , where is a Gaussian random vector and is a tight Gaussian process. The covariance matrix for is given by , and the covariance between and is given .
Applying the Z-theorem for the infinite dimensional estimating equations (theorem 3.3.1 in ), we have that under the regularity conditions in A.1, converges weakly to a tight mean-zero Gaussian process .
Here is the Fréchet derivative of the map evaluated at . Using arguments similar to Appendix A.5 in , we can show is Fréchet differentiable and the Fréchet derivative, , is continuously invertible. By definition of the Fréchet derivative, we have that . This completes the proof.
A.4 Simulations under dependent left truncation
Under dependent left truncation only, we compared our proposed method to the weighted method and the standard method which accounts for dependent left truncation. To adjust the correlation between the left truncation times and survival times, we varied the parameter between and . In this setting, a value of , which yields a correlation of , indicates independence between the left truncation times and survival times. We denote the standard regression coefficient estimator which adjusts for dependent left truncation as