1 Introduction
Intervalcensored data commonly arise in many clinical trials and longitudinal studies, and is characterized by the fact that the event time of interest is not directly observable, but rather is known relative to observation times. As a special case, current status data (or case1 interval censoring) arise when there exists exactly one observation time per study unit; i.e., at the observation time one discovers whether or not the event of interest has occurred. Data of this structure often occurs in resource limited environments or due to destructive testing. Alternatively, general intervalcensored data (or case2 interval censoring) arise when multiple observation times are available for each study unit, and the event time can be ascertained relative to two observation times. It is well known that ignoring the structure of intervalcensored data during an analysis can lead to biased estimation and inaccurate inference; see Odell et al (1992), Dorey et al (1993). Further exasperating this issue, some studies are subject to the occurrence of instantaneous failures; i.e., the event time of interest for a number of the study units occurs at time zero. This feature can occur as an artifact of the study design or may arise during an intenttotreat analysis (Liu et al, 2016; Matsuzaki et al, 2005; Lamborn et al, 2008). For example, Chen et al (2015) describes a registry based study of endstage renal disease patients, with the time of enrollment corresponding to the time at which the patient first received dialysis. In this study, several of the patient expire during the first dialysis treatment, leading to the occurrence of an instantaneous failure. Similarly, Liem et al (1997) describes an intenttotreat clinical trial comparing conventional anterior surgery and laparoscopic surgery for repairing inguinal hernia. In this study, various patients not receiving the allocated intervention, were inadequately excluded from the analysis to overcome issues such as consent withdrawal, procedure misfit etc. that would rightly attribute to instantaneous failures. Survival data with instantaneous events is not uncommon in epidemiological and clinical studies, and for this reason, herein a general methodology under the proportional hazards (PH) model is developed for the analysis of intervalcensored data subject to instantaneous failures.
Originally proposed by Cox et al (1972), the PH model has (arguably) become one of the most popular regression models for analyzing timetoevent data. For analyzing intervalcensored data under the PH model, several notable contributions have been made in the recent years; e.g., see Finkelstein (1986), Satten (1996), Goggins et al (1998), Groeneboom and Wellner (1992), Pan (1999), Pan (2000), Goetghebeur and Ryan (2000), Betensky et al (2002), Cai and Betensky (2003), Zhang et al (2010), Sun (2007), Zhang and Sun (2010), and Li and Ma (2013). More recently, Wang et al (2016) developed a methodology under the PH model which can be used to accurately and reliably analyze intervalcensored data. In particular, this approach makes use of a monotone spline representation to approximate the cumulative baseline hazard function. In doing so, an expectationmaximization (EM) algorithm is developed through a data augmentation scheme involving latent Poisson random variables which can be used to complete model fitting. It is worthwhile to note, that none of the aforementioned techniques were designed to account for the effects associated with instantaneous failures.
The phenomenon of instantaneous (or early) failures occur in many lifetime experiments; to include, but not limited to, reliability studies and clinical trials. In reliability studies, instantaneous failures may be attributable to inferior quality or faulty manufacturing, where as in clinical trials these events may manifest due to adverse reactions to treatments or clinical definitions of outcomes. When the failure times are exactly observed, as is the case in reliability studies, it is common to incorporate instantaneous failures through a mixture of parametric models, with one being degenerate at time zero; e.g., see
Muralidharan (1999), Kale and Muralidharan (2002), Murthy et al (2004), Muralidharan and Lathika (2006), Pham and Lai (2007), and Knopik (2011). In the case of intervalcensored data, more common among epidemiological studies and clinical trials, accounting for instantaneous failures becomes a more tenuous task, with practically no guidance available among the existing literature. Arguably, in the context of intervalcensored data, one could account for instantaneous failures by introducing an arbitrarily small constant for each as an observation time, and subsequently treat the instantaneous failures as leftcensored observations. In doing so, methods for intervalcensored data, such as those discussed above, could be employed. While this approach may seem enticing, in the case of a relatively large number of instantaneous failures it has several pitfalls. In particular, through numerical studies (results not shown) it has been determined that this approach when used in conjunction with modeling techniques such as those proposed in Pan (1999) and Wang et al (2016) may lead to inaccurate estimation of the survival curves and/or the covariate effects. Further, after an extensive literature review, it does not appear that any methodology has previously been developed to specifically address data of this structure. For these reasons, herein a general methodology under the PH model is developed for the analysis of intervalcensored data subject to instantaneous failures.For the analysis of intervalcensored data subject to instantaneous failures a new mixture model is proposed, which is a generalization of the semiparametric PH model studied in Wang et al (2016). The proposed PH model is developed under the standard PH assumption; i.e., the covariates provide for a multiplicative effect on the baseline risk of both experiencing a failure at time zero and thereafter. Two separate techniques are developed for the purposes of estimating the cumulative baseline hazard function. The first allows a practitioner to specify a parametric form (up to a collection of unknown coefficients) for the unknown function, while the second provides for more modeling flexibility through the use of the monotone splines of Ramsay (1988)
. Under either formulation, a twostage data augmentation scheme involving latent Poisson random variables is used to develop an efficient EM algorithm which can be used to estimate all of the unknown parameters. Through extensive simulation studies the proposed methodology is shown to provide reliable estimation and inference with respect to the covariate effects, baseline cumulative hazard function, and baseline probability of experiencing an instantaneous failure.
The remainder of this article is organized as follows. Section 2 presents the development of the proposed model, the derivation of the EM algorithm, and outlines uncertainty quantification. The finite sample performance of the proposed approach is evaluated through extensive numerical studies, the features and results of which are provided in Section 3. Further, code which implements the proposed methodology has been added to the existing R software package ICsurv and is freely available from the CRAN (i.e., http://cran.us.rproject.org/).
2 Model and Methodology
Let denote the failure time of interest. Under the PH model, the survival function can be generally written as
(1) 
where is a
dimensional vector of covariates,
is the corresponding vector of regression coefficients, and is the baseline survival function. Under the phenomenon of interest, there is a baseline risk (probability) of experiencing an instantaneous failure; i.e., , where is the baseline risk and is a dimensional vector of zeros. Thus, under the PH assumptions, the probability of experiencing an instantaneous failure, given the covariate information contained in , can be ascertained from (1) asSimilarly, given that an instantaneous failure does not occur, it is assumed that the failure time conditionally follows the standard PH model; i.e.,
where and is the usual cumulative baseline hazard function. Note, in order for
to be a proper cumulative distribution function,
should be a monotone increasing function with. Thus, through an application of the Law of Total Probability, one has that
for . Based on these assumptions, the cumulative distribution function of can be expressed as the following mixture model,
where, for reasons that will shortly become apparent, is reparametrized as , for .
2.1 Observed data likelihood
In scenarios where intervalcensored data arise, one has that the failure time () is not directly observed, but rather is known relative to two observation times, say ; i.e., one has that . In general, the four different outcomes considered here can be represented through the values of and ; i.e., an instantaneous failure (), is leftcensored (), is intervalcensored (), and is rightcensored (). For notational convenience, let be an indicator denoting the event that is not an instantaneous failure, and , , and be censoring indicators denoting left, interval, and rightcensoring, respectively; i.e., , , , and .
In order to derive the observed data likelihood, it is assumed throughout that the individuals are independent, and that conditional on the covariates, the failure time for an individual is independent of the observational process. This assumption is common among the survival literature; see, e.g., Zhang and Sun (2010), Liu and Shen (2009) and the references therein. The observed data collected on individuals is given by , which constitutes independent realization of . Thus, under the aforementioned assumptions, the observed data likelihood is given by
(2) 
where represents the set of unknown parameters which are to be estimated.
2.2 Representations of
The unknown parameters in the observed likelihood involve the regression parameters , , and the cumulative baseline hazard function . Herein, two techniques for modeling the cumulative baseline hazard function are discussed. The first approach considers the use of a fully parametric model, which is known up to a set of unknown coefficients. For example, a linear, quadratic, or logarithmic parametric model can be specified by setting , , and , respectively. Note, all of these models obey the constraints placed on , as long as the , for . In general, a parametric form for the cumulative baseline hazard model can be specified as
(3) 
where is a monotone increasing function, , and , for . Under these mild conditions, it is easily verified that inherits the same properties, and therefore adheres to the aforementioned constraints.
The second approach, which is inspired by the works of Wang et al (2016), Lin and Wang (2010), Wang and Dunson (2011), Cai et al (2011), McMahan et al (2013), and Wang et al (2015), views as an unknown function, and hence an infinite dimensional parameter. To reduce the dimensionality of the problem, the monotone splines of Ramsay (1988) are used to approximate . Structurally, this representation is identical to that of (3) with the exception that is a spline basis function and is an unknown spline coefficient, for . Again, it is required that , for all , to ensure that is monotone increasing function. Briefly, the spline basis functions are piecewise polynomial functions and are fully determined once a knot sequence and the degree are specified. The shape of the basis splines are predominantly determined by the placement of the knots while the degree controls the smoothness (Cai et al, 2011). For instance, specifying the degree to take values 1, 2 or 3 correspond to the use of linear, quadratic or cubic polynomials respectively. Given the knots and degree, the ( degree  2) basis functions are fully determined. For further discussion on specifying the knots, as well as their placement, see Wang et al (2016), Ramsay (1988), and McMahan et al (2013).
2.3 Data Augmentation
Under either of the representations of proposed in Section 2.2, the unknown parameters in the observed data likelihood consist of , where . Since the observed data likelihood exists in closedform, the maximum likelihood estimator MLE of could be obtained by directly maximizing (2) with respect to ; i.e., one could obtain , the MLE of , as . It is worthwhile to point out that the numerical process of directly maximizing (2), with respect to , is often unstable and rarely performs well (Wang et al, 2015).
To circumvent these numerical instabilities, an EM algorithm was derived for the purposes of identifying the MLE. This algorithm was developed based on a twostage data augmentation process, where carefully structured latent Poisson random variables are introduced as missing data. The first stage relates both the instantaneous failure indicator and the censoring indicators to latent Poisson random variables; i.e., the , , and are introduced such that
subject to the following constraints: , , , and where , and . At this stage of the data augmentation, the conditional likelihood is
(4) 
where and is the probability mass function of the random variable . In the second and final stage, the and are separately decomposed into the sum of independent latent Poisson random variables; i.e., and , where
At this stage of the augmented data likelihood is
(5) 
where and . It is relatively easy to show that by integrating (5) over the latent random variables one will obtain the observed data likelihood depicted in (2).
2.4 EM algorithm
In general, the EM algorithm consists of two steps: the expectation step (Estep) and the maximization step (Mstep). The Estep in this algorithm involves taking the expectation of with respect to all latent variables conditional on the current parameter value and the observed data . This results in obtaining the function, where . The Mstep then finds . This process is repeated in turn until convergence of the algorithm is attained. In this particular setting, the Estep yields as
where is a function of but is free of . Notice that in we suppress, for notational convenience, the dependence of the expectations on the observed data and ; i.e., from henceforth it should be understood that .
An enticing feature, which makes the proposed approach computationally efficient, is that all of the expectations in can be expressed in closedform, and moreover can be computed via simple matrix and vector operations. In particular, from (4) it can be ascertained that if and then conditionally, given and
, follows a zerotruncated Poisson distribution, and it follows a degenerate distribution at 0 for any other values of
and . Thus, the conditional expectation of , given and , can be expressed aswhere . Through a similar set of arguments one can obtain the necessary conditional expectations of and as
respectively. Further, from (5) it can be ascertained that if and then conditionally, given and
, follows a binomial distribution with
being the number of trials and being the success probability, and it is follows a degenerate distribution at 0 for any other values of and . Thus, through an application of the Law of Iterated Expectations, the conditional expectation of , given and , can be expressed asThrough a similar set of arguments one can obtain the necessary conditional expectation of as
Note, in the expressions of the expectations of and the dependence on , , and are suppressed with the properties associated with these variables being inherited from the expectations associated with and , respectively.
The Mstep of the algorithm then finds . To this end, consider the partial derivatives of with respect to which are given by
(6)  
(7)  
(8) 
By setting (6) equal to zero and solving for , one can obtain
(9) 
for . Similarly, by setting (7) equal to zero and solving for , one can obtain
(10) 
Notice that, and depend on . Thus, one can obtain by setting (8) equal to zero and solving the resulting system of equations for , after replacing and by and , respectively. Note, the aforementioned system of equations can easily be solved using a standard Newton Raphson approach. Finally, one obtains and as and , respectively.
The proposed EM algorithm is now succinctly stated. First, initialize and repeat the following steps until converges.

Calculate by solving the following system of equations
where and are defined above.

Calculate for and .

Update .
At the point of convergence, define to be the proposed estimator , which is the MLE of .
2.5 Variance estimation
For the purposes of uncertainty quantification, several variance estimators were considered and evaluated; e.g., the inverse of the observed Fisher information matrix, the Huber sandwich estimator, and the outer product of gradients (OPG) estimator. After extensive numerical studies (results not shown), it was found that the most reliable variance estimator, among those considered, was that of the OPG estimator. In general, the OPG estimator is obtained as
where and is the loglikelihood contribution of the th individual. Using this estimator, one can conduct standard Wald type inference.
3 Simulation Study
In order to investigate the finite sample performance of the proposed methodology, the following simulation study was conducted. The true distribution of the failure times was specified to be
where (i.e., ), , , , and , where and take on values of 0.5 and 0.5 resulting 4 regression parameter configurations. Additionally, these studies consider two cumulative baseline hazard functions; i.e., a logarithmic and a linear
. These choices were made so that the hazard functions behave similarly but have different shapes. In total, these specifications lead to eight separate data generating models for the failure times. Two generating processes were considered for the observation times: an exponential distribution with a mean of 10 and a discrete uniform over the interval
. In both cases, a single observation time, , was generated for each failure time which was not instantaneous (i.e., ), and the intervals were created such that () and () if was smaller (greater) than . A few comments are warranted on the selection of the observation processes. First, the former process attempts to match the baseline characteristics of the failure time distribution. Second, note that the specification of the two observation processes result in case1 intervalcensored (i.e., current status) data. This was done for a primary reason that the data of this nature possess less information when compared to general intervalcensored data, and thus if the proposed approach works well in this setting it could be inferred that it should perform better in the case of intervalcensored data. In total, these data generating steps lead to sixteen generating mechanisms, and each were used to create 500 independent data sets consisting of observations.In order to examine the performance of the proposed approach across a broad spectrum of characteristics, several different models were used to analyze each data set. First, following the development presented in Section 2.2, three different parametric forms were considered for the cumulative baseline hazard function: , , and , which are henceforth referred to as models M1, M2, and M3, respectively. Note, these specifications allow one to examine the performance of the proposed approach when the cumulative baseline hazard function is correctly specified (e.g., M2 when ), over specified (e.g., M3 when ), and misspecified (e.g., M1 when ). Further, for each data set a model (M4) was fit using the monotone spline representation for the cumulative baseline hazard function developed in Section 2.2. In order to specify the basis functions, the degree was specified to be 2, in order to provide adequate smoothness, and one interior knot was placed at the median of the observed finite nonzero interval end points, with boundary knots being placed at the minimum and maximum of the same. The EM algorithm derived in Section 2.4 was used to complete model fitting for M1M4. The starting value for all implementations was set to be , where () is a dimensional vector of zeros (ones). Convergence was declared when the maximum absolute differences between the parameter updates were less than the specified tolerance of .
Table 1 summarizes the estimates of the regression coefficients and the baseline instantaneous failure probability for all considered simulation configurations and models, when the observation times were drawn from a exponential distribution. This summary includes the empirical bias, the sample standard deviation of the 500 point estimates, the average of the 500 standard error estimates, and the empirical coverage probabilities associated with 95% Wald confidence intervals. Table 2 provides the analogous results for the case in which the observation times are sampled from a discrete uniform distribution. From these results, one will first note that across all considered simulation settings the proposed approach performs very well for M4 and the correct parametric model (i.e., M1 when
and M2 when ); i.e., the parameter estimates exhibit very little bias, the sample standard deviation of the 500 point estimates are in agreement with the average of the standard error estimates, and the empirical coverage probabilities are at their nominal level. In summary, these findings tend to suggest that the proposed methodology can be used to reliably estimate the covariate effects, the instantaneous failure probability, and quantify the uncertainty in the same. Additionally, these findings generally continue to persist for the case in which the parametric model is over specified (e.g., M3 when ), with the resulting estimates in some cases exhibiting a slightly larger bias and a bit more variability, as one would expect. Further, from these results one will also note that when the parametric model is misspecified (e.g., M2 and M3 when ) the estimates tend to exhibit more bias and less reliable inference, which is expected under the misspecification of the cumulative baseline hazard function. Finally, the estimates obtained under M4 (i.e., the model which makes use of the monotone splines) exhibit little if any difference when compared to the estimates resulting from correct parametric model. In totality, from these findings, it is conjectured that the approach which makes use of the spline representation to approximate the unknown cumulative baseline hazard functions (i.e., M4) would likely be preferable, since it avoids the potential of model misspecification and it obtains estimators of the unknown parameters, as well as their standard errors, that are equivalent to those estimators obtained under the true parametric model, the form of which is generally not known.True  
M1(True)  M2(Misspecified)  M3(Misspecified)  M4(Spline)  
Parameter  Bias  SD  ESE  CP95  Bias  SD  ESE  CP95  Bias  SD  ESE  CP95  Bias  SD  ESE  CP95 
0.03  0.15  0.15  0.94  0.06  0.17  0.15  0.89  0.06  0.17  0.15  0.90  0.04  0.16  0.16  0.93  
0.01  0.29  0.28  0.94  0.05  0.33  0.27  0.89  0.05  0.33  0.28  0.90  0.02  0.30  0.29  0.93  
0.00  0.06  0.06  0.95  0.00  0.06  0.06  0.94  0.00  0.06  0.06  0.94  0.00  0.06  0.06  0.95  
0.03  0.15  0.14  0.94  0.07  0.17  0.14  0.87  0.07  0.17  0.14  0.88  0.04  0.15  0.15  0.94  
0.01  0.26  0.26  0.94  0.06  0.30  0.25  0.89  0.06  0.30  0.26  0.89  0.03  0.27  0.27  0.94  
0.00  0.06  0.06  0.94  0.01  0.06  0.05  0.92  0.01  0.06  0.06  0.93  0.01  0.06  0.06  0.94  
0.01  0.15  0.15  0.95  0.05  0.16  0.15  0.91  0.05  0.16  0.15  0.92  0.02  0.15  0.15  0.95  
0.01  0.28  0.28  0.94  0.03  0.32  0.27  0.90  0.03  0.32  0.27  0.91  0.01  0.29  0.29  0.94  
0.00  0.05  0.06  0.97  0.00  0.06  0.06  0.96  0.00  0.06  0.06  0.96  0.00  0.05  0.06  0.96  
0.02  0.14  0.14  0.95  0.07  0.15  0.14  0.89  0.07  0.15  0.14  0.90  0.03  0.14  0.15  0.95  
0.02  0.28  0.26  0.94  0.06  0.32  0.25  0.86  0.06  0.32  0.26  0.87  0.03  0.28  0.27  0.93  
0.00  0.06  0.06  0.93  0.01  0.06  0.05  0.90  0.01  0.06  0.06  0.91  0.00  0.06  0.06  0.93  
True  
M1(Misspecified)  M2(True)  M3(Over specified)  M4(Spline)  
Parameter  Bias  SD  ESE  CP95  Bias  SD  ESE  CP95  Bias  SD  ESE  CP95  Bias  SD  ESE  CP95 
0.01  0.15  0.16  0.96  0.03  0.16  0.16  0.94  0.05  0.17  0.16  0.94  0.04  0.16  0.17  0.94  
0.04  0.28  0.29  0.95  0.00  0.30  0.30  0.95  0.03  0.32  0.30  0.94  0.02  0.32  0.31  0.95  
0.00  0.06  0.06  0.95  0.00  0.06  0.06  0.95  0.00  0.06  0.06  0.95  0.00  0.06  0.06  0.95  
0.02  0.15  0.15  0.94  0.02  0.15  0.15  0.94  0.04  0.16  0.15  0.93  0.04  0.16  0.16  0.94  
0.02  0.26  0.27  0.96  0.02  0.28  0.27  0.93  0.03  0.29  0.28  0.93  0.03  0.29  0.29  0.95  
0.01  0.06  0.06  0.94  0.00  0.06  0.06  0.94  0.01  0.06  0.06  0.94  0.01  0.06  0.06  0.94  
0.02  0.14  0.16  0.96  0.02  0.15  0.16  0.96  0.03  0.16  0.16  0.95  0.03  0.16  0.17  0.95  
0.06  0.29  0.29  0.94  0.02  0.30  0.29  0.93  0.01  0.31  0.30  0.93  0.01  0.31  0.31  0.94  
0.00  0.05  0.06  0.97  0.00  0.05  0.06  0.96  0.00  0.05  0.06  0.96  0.00  0.06  0.06  0.97  
0.02  0.14  0.15  0.95  0.02  0.15  0.15  0.95  0.04  0.15  0.15  0.95  0.04  0.15  0.16  0.95  
0.02  0.27  0.27  0.94  0.02  0.29  0.27  0.94  0.04  0.30  0.28  0.93  0.04  0.30  0.29  0.93  
0.01  0.06  0.06  0.96  0.00  0.06  0.06  0.95  0.01  0.06  0.06  0.94  0.01  0.06  0.06  0.95 
True  
M1(True)  M2(Misspecified)  M3(Misspecified)  M4(Spline)  
Parameter  Bias  SD  ESE  CP95  Bias  SD  ESE  CP95  Bias  SD  ESE  CP95  Bias  SD  ESE  CP95 
0.02  0.14  0.15  0.95  0.03  0.15  0.15  0.92  0.03  0.15  0.15  0.93  0.03  0.15  0.15  0.94  
0.00  0.27  0.27  0.96  0.02  0.29  0.27  0.94  0.02  0.29  0.27  0.94  0.02  0.28  0.28  0.96  
0.01  0.06  0.06  0.95  0.01  0.06  0.06  0.93  0.01  0.06  0.06  0.94  0.01  0.06  0.06  0.94  
0.03  0.14  0.14  0.94  0.05  0.15  0.14  0.92  0.05  0.15  0.14  0.92  0.04  0.15  0.14  0.94  
0.00  0.26  0.25  0.94  0.03  0.28  0.25  0.92  0.03  0.28  0.25  0.93  0.02  0.27  0.26  0.93  
0.00  0.06  0.06  0.95  0.00  0.06  0.06  0.94  0.00  0.06  0.06  0.94  0.00  0.06  0.06  0.95  
0.02  0.16  0.15  0.94  0.03  0.16  0.15  0.94  0.03  0.16  0.15  0.94  0.03  0.16  0.15  0.94  
0.00  0.28  0.27  0.95  0.01  0.29  0.27  0.93  0.01  0.29  0.27  0.93  0.01  0.29  0.28  0.94  
0.00  0.06  0.06  0.95  0.00  0.06  0.06  0.95  0.00  0.06  0.06  0.95  0.00  0.06  0.06  0.95  
0.01  0.14  0.14  0.95  0.04  0.15  0.14  0.92  0.04  0.15  0.14  0.93  0.03  0.14  0.14  0.94  
0.03  0.27  0.25  0.94  0.05  0.29  0.25  0.91  0.05  0.29  0.25  0.91  0.04  0.28  0.26  0.92  
0.00  0.06  0.06  0.95  0.01  0.06  0.05  0.94  0.01  0.06  0.06  0.94  0.00  0.06  0.06  0.94  
True  
M1(Misspecified)  M2(True)  M3(Over specified)  M4(Spline)  
Parameter  Bias  SD  ESE  CP95  Bias  SD  ESE  CP95  Bias  SD  ESE  CP95  Bias  SD  ESE  CP95 
0.00  0.15  0.15  0.95  0.02  0.15  0.15  0.93  0.03  0.16  0.16  0.93  0.03  0.16  0.16  0.94  
0.02  0.27  0.28  0.96  0.00  0.28  0.28  0.96  0.01  0.29  0.29  0.96  0.01  0.29  0.29  0.96  
0.01  0.06  0.06  0.95  0.01  0.06  0.06  0.95  0.01  0.06  0.06  0.94  0.01  0.06  0.06  0.95  
0.00  0.14  0.14  0.95  0.02  0.14  0.14  0.94  0.03  0.15  0.15  0.94  0.03  0.15  0.15  0.93  
0.03  0.25  0.26  0.95  0.00  0.27  0.26  0.94  0.01  0.27  0.27  0.94  0.01  0.27  0.27  0.94  
0.01  0.06  0.06  0.96  0.00  0.06  0.06  0.95  0.00  0.06  0.06  0.95  0.00  0.06  0.06  0.95  
0.00  0.15  0.15  0.94  0.02  0.16  0.15  0.94  0.02  0.16  0.16  0.94  0.02  0.16  0.16  0.94  
0.01  0.28  0.28  0.95  0.01  0.29  0.28  0.95  0.01  0.30  0.28  0.94  0.01  0.30  0.29  0.94  
0.00  0.06  0.06  0.96  0.00  0.06  0.06  0.96  0.00  0.06  0.06  0.96  0.00  0.06  0.06  0.96  
0.02  0.13  0.14  0.96  0.01  0.14  0.14  0.95  0.02  0.14  0.15  0.95  0.02  0.14  0.15  0.96  
0.00  0.26  0.26  0.96  0.03  0.27  0.26  0.95  0.04  0.28  0.27  0.95  0.04  0.28  0.27  0.95  
0.00  0.06  0.06  0.96  0.00  0.06  0.06  0.95  0.00  0.06  0.06  0.95  0.00  0.06  0.06  0.96 
Figure 1 summarizes the estimates of the baseline survival function (i.e., ) obtained from M1M4 across all considered regression parameter configurations when and the observation times were sampled from the exponential distribution. Figures 24 summarizes the analogous results for the other simulation configurations. In particular, these figures present the true baseline survival functions, the average of the pointwise estimates, and the pointwise 2.5th and 97.5th percentiles of the estimates. These figures reinforce the main findings discussed above; i.e., M4 and the correctly specified parametric model again provide reliable estimates of the baseline survival function, and hence the cumulative baseline hazard function, across all simulation configurations. Similarly the over specified model also provides reliable estimates, but the same can not be said for the misspecified models. It is worthwhile to point out that the baseline survival curves do not extend to unity as time goes toward the origin, this is due to the fact that the baseline instantaneous failure probability is . Again, these findings support the recommendation that the spline based representation of the cumulative baseline hazard function should be adopted in lieu of a parametric model, thus obviating the possible ramifications associated with misspecification.
In summary, this simulation study illustrates that proposed methodology can be used to analyze current status data which is subject to instantaneous failures, and moreover that the monotone spline approach discussed in Section 2.2 should be adopted for approximating the unknown cumulative baseline hazard function. A few additional details about the numerics of the approach follow. First, the average time required to complete model fitting was approximately one second, supporting the claim that the proposed approach is computationally efficient. Lastly, for the reasons of complete transparency, for a single data set, among 8000, the OPG estimator under M4 resulted in a singular matrix, which prevented standard error estimation estimation, and this issue was resolved by slightly shifting the interior knot.
4 Discussion
This work proposed a new model, under the PH assumption, which can be used to analyze intervalcensored data subject to instantaneous failures. Through the model development, two techniques for approximating the unknown cumulative baseline hazard function are illustrated. To complete model fitting, a novel EM algorithm is developed through a twostage data augmentation process. The resulting algorithm is easy to implement and is computationally efficient. These features are likely attributable to the fact that the carefully structured data augmentation steps lead to closedform expressions for all necessary quantities in the Estep of the algorithm. Moreover, in the Mstep the regression coefficients are updated through solving a lowdimensional system of equations, while all other unknown parameters are updated in closedform. The finite sample performance of the proposed approach was exhibited through an extensive numerical study. This study suggests that the use of the monotone spline representation of the cumulative baseline hazard function would in general be preferable, in order to circumvent the possibility of model misspecification. To further disseminate this work, code, written in R, has been prepared and is available upon request from the corresponding author.
5 Acknowledgments
This research was partially supported by National Institutes of Health grant AI121351.
References
 Betensky et al (2002) Betensky RA, Lindsey JC, Ryan LM, Wand M (2002) A local likelihood proportional hazards model for interval censored data. Statistics in Medicine 21(2):263–275, DOI 10.1002/sim.993
 Cai et al (2011) Cai B, Lin X, Wang L (2011) Bayesian proportional hazards model for current status data with monotone splines. Computational Statistics & Data Analysis 55(9):2644–2651, DOI 10.1016/j.csda.2011.03.013
 Cai and Betensky (2003) Cai T, Betensky RA (2003) Hazard regression for intervalcensored data with penalized spline. Biometrics 59(3):570–579, DOI 10.1111/15410420.00067
 Chen et al (2015) Chen CM, Lai CC, Cheng KC, Weng SF, Liu WL, Shen HN (2015) Effect of endstage renal disease on longterm survival after a firstever mechanical ventilation: a populationbased study. Critical Care 19(1):354, DOI 10.1186/s130540151071x
 Cox et al (1972) Cox R, et al (1972) Regression models and life tables (with discussion). Journal of the Royal Statistical Society 34:187–220

Dorey et al (1993)
Dorey FJ, Little RJ, Schenker N (1993) Multiple imputation for thresholdcrossing data with interval censoring. Statistics in medicine 12(17):1589–1603
 Finkelstein (1986) Finkelstein DM (1986) A proportional hazards model for intervalcensored failure time data. Biometrics pp 845–854

Goetghebeur and Ryan (2000)
Goetghebeur E, Ryan L (2000) Semiparametric regression analysis of intervalcensored data. Biometrics 56(4):1139–1144, DOI
10.1111/j.0006341X.2000.01139.x 
Goggins et al (1998)
Goggins WB, Finkelstein DM, Schoenfeld DA, Zaslavsky AM (1998) A markov chain monte carlo em algorithm for analyzing intervalcensored data under the cox proportional hazards model. Biometrics pp 1498–1507
 Groeneboom and Wellner (1992) Groeneboom P, Wellner JA (1992) Information bounds and nonparametric maximum likelihood estimation, vol 19. Springer Science & Business Media
 Kale and Muralidharan (2002) Kale B, Muralidharan K (2002) Optimal estimating equations in mixture distributions accommodating instantaneous or early failures. Quality control and applied statistics 47(6):677–680
 Knopik (2011) Knopik L (2011) Model for instantaneous failures. Scientific Problems of Machines Operation and Maintenance 46(2):37–45
 Lamborn et al (2008) Lamborn KR, Yung WA, Chang SM, Wen PY, Cloughesy TF, DeAngelis LM, Robins HI, Lieberman FS, Fine HA, Fink KL, et al (2008) Progressionfree survival: an important end point in evaluating therapy for recurrent highgrade gliomas. Neurooncology 10(2):162–170, DOI 10.1215/152285172007062
 Li and Ma (2013) Li J, Ma S (2013) Survival analysis in medicine and genetics. CRC Press
 Liem et al (1997) Liem MS, van der Graaf Y, van Steensel CJ, Boelhouwer RU, Clevers GJ, Meijer WS, Stassen LP, Vente JP, Weidema WF, Schrijvers AJ, et al (1997) Comparison of conventional anterior surgery and laparoscopic surgery for inguinalhernia repair. New England Journal of Medicine 336(22):1541–1547, DOI 10.1056/NEJM199705293362201
 Lin and Wang (2010) Lin X, Wang L (2010) A semiparametric probit model for case 2 intervalcensored failure time data. Statistics in medicine 29(9):972–981, DOI 10.1002/sim.3832
 Liu et al (2016) Liu C, Yang W, Devidas M, Cheng C, Pei D, Smith C, Carroll WL, Raetz EA, Bowman WP, Larsen EC, et al (2016) Clinical and genetic risk factors for acute pancreatitis in patients with acute lymphoblastic leukemia. Journal of Clinical Oncology 34(18):2133–2140, DOI 10.1200/JCO.2015.64.5812
 Liu and Shen (2009) Liu H, Shen Y (2009) A semiparametric regression cure model for intervalcensored data. Journal of the American Statistical Association 104(487):1168–1178, DOI 10.1198/jasa.2009.tm07494
 Matsuzaki et al (2005) Matsuzaki A, Nagatoshi Y, Inada H, Nakayama H, Yanai F, Ayukawa H, Kawakami K, Moritake H, Suminoe A, Okamura J (2005) Prognostic factors for relapsed childhood acute lymphoblastic leukemia: Impact of allogeneic stem cell transplantation?a report from the kyushuyamaguchi children’s cancer study group. Pediatric blood & cancer 45(2):111–120, DOI 10.1002/pbc.20363
 McMahan et al (2013) McMahan CS, Wang L, Tebbs JM (2013) Regression analysis for current status data using the em algorithm. Statistics in medicine 32(25):4452–4466, DOI 10.1002/sim.5863
 Muralidharan (1999) Muralidharan K (1999) Tests for the mixing proportion in the mixture of a degenerate and exponential distribution. Journal Indian Statistical Associations 37:105–119
 Muralidharan and Lathika (2006) Muralidharan K, Lathika P (2006) Analysis of instantaneous and early failures in weibull distribution. Metrika 64(3):305–316, DOI 10.1007/s0018400600502
 Murthy et al (2004) Murthy DP, Xie M, Jiang R (2004) Weibull models, vol 505. John Wiley & Sons
 Odell et al (1992) Odell P, Anderson K, Agostino R (1992) Maximum likelihood estimation for intervalcensored data using a weibullbased accelerated failure time model. Biometrics pp 951–959, DOI 10.2307/2532360
 Pan (1999) Pan W (1999) Extending the iterative convex minorant algorithm to the cox model for intervalcensored data. Journal of Computational and Graphical Statistics 8(1):109–120
 Pan (2000) Pan W (2000) A multiple imputation approach to cox regression with intervalcensored data. Biometrics 56(1):199–203, DOI 10.1111/j.0006341X.2000.00199.x
 Pham and Lai (2007) Pham H, Lai CD (2007) On recent generalizations of the weibull distribution. IEEE transactions on reliability 56(3):454–458, DOI 10.1109/TR.2007.903352
 Ramsay (1988) Ramsay JO (1988) Monotone regression splines in action. Statistical science pp 425–441, DOI 10.1214/ss/1177012761
 Satten (1996) Satten GA (1996) Rankbased inference in the proportional hazards model for interval censored data. Biometrika 83(2):355–370, DOI 10.1093/biomet/83.2.355
 Sun (2007) Sun J (2007) The statistical analysis of intervalcensored failure time data. Springer Science & Business Media

Wang and Dunson (2011)
Wang L, Dunson DB (2011) Semiparametric bayes’ proportional odds models for current status data with underreporting. Biometrics 67(3):1111–1118, DOI
10.1111/j.15410420.2010.01532.x  Wang et al (2016) Wang L, McMahan CS, Hudgens MG, Qureshi ZP (2016) A flexible, computationally efficient method for fitting the proportional hazards model to intervalcensored data. Biometrics 72(1):222–231, DOI 10.1111/biom.12389
 Wang et al (2015) Wang N, Wang L, McMahan CS (2015) Regression analysis of bivariate current status data under the gammafrailty proportional hazards model using the em algorithm. Computational Statistics & Data Analysis 83:140–150, DOI 10.1016/j.csda.2014.10.013
 Zhang et al (2010) Zhang Y, Hua L, Huang J (2010) A splinebased semiparametric maximum likelihood estimation method for the cox model with intervalcensored data. Scandinavian Journal of Statistics 37(2):338–354, DOI 10.1111/j.14679469.2009.00680.x
 Zhang and Sun (2010) Zhang Z, Sun J (2010) Interval censoring. Statistical Methods in Medical Research 19(1):53–70, DOI 10.1177/0962280209105023
Comments
There are no comments yet.