1 Introduction
We study the estimation of failure time distribution where the failure times can be either observed directly, or be rightcensored or leftcensored. This type of survival data arises, for example, in estimation of time to the appearance of a medical condition where characteristic symptoms may or may not appear when the condition exists. Specific medical settings include relapse in childhood brain tumors, which may be observed due to clinical symptoms, or rightcensored due to periodic screening with negative result (no tumor), or leftcensored due to periodic screening with a positive result (Minn et al., 2001). Another medical setting is melanoma cancer, which is observed if selfdetected, or is right censored due to a negative screening (no melanoma), or leftcensored if it goes undetected until screening. Additional examples can be found in Whitehead (1989).
The motivating example for this work comes from estimating customer patience in service system which, as discussed by Mandelbaum and Zeltyn (2007), is a challenging problem. In our study, we focus on patients who wait for treatment in an emergency department (ED). Three categories of patients are observed. The first category consists of patients who get service and thus their patience time is rightcensored by the waiting time. The second category comprises those who leave the system and announce it, and thus their patience time is observed while the waiting time is rightcensored. The third category consists of patients who leave the system without announcing it; their absence is hence revealed only when they are called to service, which is after they have already left; formally, their patience time is leftcensored.
Estimating the patience time is of importance as the decision of patients to leave the system before getting served might have a strong effect on their physical wellbeing. There has been considerable research on the reasons why patients leave an ED before being served; see Baker et al. (1991), Hunt et al. (2006), Bolandifar et al. (2014), and Batt and Terwiesch (2015). However, these and other authors have not proposed a model by which ED patience time  namely the duration that a potential patient is willing to wait for ED service  can be estimated, and this is our goal here.
We propose novel parametric and nonparametric estimators of the unknown survival function for this 3type survival data. We then study their rates of convergence. The parametric estimator is based on both full and partial likelihoods. We provide condition under which the parametric estimator is a linear asymptotic normal (LAN) estimator and converges to a normal distribution in a root
rate. The nonparametric estimator is based on nonparametric kernel estimators for density functions and on a novel estimator of the cumulative probability function that has some similarities to the Nelson–Aalen estimator
(e.g., Klein and Moeschberger, 2013, Chapter 4). We show that, under some regularity conditions, the nonparametric estimator pointwise converges to the normal distribution.We perform a simulation study and compare the proposed parametric and nonparametric estimators. For the parametric model, we study both correct and misspecified models and show the different corresponding results. We show how the accuracy changes with sample size. We then carry out a case study that is based on data of patients waiting for treatment in an ED, in the U.S. in 2008. We analyzed separately different severity levels (15106 observations in the emergency group, 43600 in the urgent group, and 26541 in the semiurgent group). We conclude with a comparison of the parametric and nonparametric estimators for the three different severity levels of this dataset.
2 Brief Literature Review
Developing screening methods for medical conditions, such as breast and melanoma cancers, has a long history (Wilson et al., 1968; Zelen and Feinleib, 1969). In the classical setting, the medical condition either already exists at the time of screening and is thus leftcensored, or does not exist, and is thus rightcensored. The setting in which selfdetection is possible, and thus the condition time is observed, has been surprisingly mostly ignored in the literature. For example, Minn et al. (2001) treat both selfdetection times and screening times as event times, ignoring the censoring. The closest model to the one that we present here appears in Whitehead (1989). It is assumed there that the condition can be detected at screening or before screening due to symptoms. In both cases, the condition already exists at the time of detection. It is also assumed that screenings take place at a sequence of fixed time points. Whitehead (1989) recommends to ignore the extra knowledge gained due to selfreporting and to replace these times with the time of the next screening. The survival function is then estimated only at the discrete fixed screening times using standard techniques (Prentice and Gloeckler, 1978).
There has been considerable research effort, dedicated to modeling and analysis of customer (im)patience while waiting for service. Here we describe several papers that, together with references therein, provide what is required for a historical background and stateofart perspective. First, we recommend the literature review (Section 3) in the recent Batt and Terwiesch (2015), accompanied by Gans et al. (2003): these survey patienceresearch from an operational/queueing view point (mainly Section 6.3.3 in the latter), while connecting it to the medical literature on LWBS (mainly Section 3 in the former); see also Aksin et al. (2007) who, relative to Gans et al. (2003), expand on managerial challenges. Next we mention Mandelbaum and Zeltyn (2013), which is an Explanatory Data Analysis of (im)patience in telephone call centers (that appears in a special issue that is devoted to models of queues abandonment). Finally, and the most related to the present study, are the following two studies. Brown et al. (2005) applies, in Section 5, the Kaplan–Meier estimator (Kaplan and Meier, 1958) to estimate the survival functions and consequently hazard rates, of both virtual waiting time and impatience; the data is that of a call center, in which times of abandonment are all recorded hence the data is rightcensored. Then Wiler et al. (2013), which is also the source of our present ED data case study, estimate LWBS rates as a function of ED patient arrival rates, treatment times, and ED boarding times. There was no attempt in that work to estimate the patiencetime distribution.
We conclude this brief survey with the observation that the estimation of customer (im)patience is relevant beyond screening, call centers, and EDs. For example, Nah (2004) studies tolerance of Web users (during information retrieval). YomTov et al. (2018) analyzes chat services, in which customers abandon at any phase during chatexchanges with a service center: one expects that such services give rise to the same options as in EDs: some customers receive service, others abandon without letting anyone know, and the rest announce their abandonment time.
3 The Model
In the standard setting of rightcensored data one observes, for each patient, either the failure time or the censoring time. In terms of our motivating example, failure time is patience time while censoring time is the waiting time. Patience time is observed when patients leave the ED while informing the system of their departure; waiting time is observed when a patient is called for service. However, unlike in standard rightcensored data and like in current status data, there are also patients who leave without informing; in this case their absence is observed only when they are called for service, and this latter time provides an upper bound for their patience time. In other words, the (virtual) waiting time is observed, and the only information on patience time is that it is less than this observed waiting time. Hence, in this case, the patience time is leftcensored.
More formally, let be the patient’s failure time, i.e., the time until the patient loses patience. Let be the censoring time, i.e., the waiting time until the patient gets (or could have gotten) service. We assume that
has a cumulative distribution function (cdf)
and a probability density function (pdf)
, and that has cdf and pdf . Let be the indicator ; i.e., if the patient loses patience before being called to service, and otherwise.Let be the indicator that is for a patient who leaves and informs when leaving, and otherwise. Denote by the conditional probability that a patient reports leaving given that the waiting time equals to . In other words, . We assume that the waiting time and the patience time are independent. This assumption, which is common in the rightcensored data literature (see, Klein and Moeschberger, 2013, Chapter 3, pages 6566), seems appropriate in our case study, as we stratify by acuity levels. We also assume that announcement indicator is independent of the waiting time , as it seems reasonable that the decision of a patient to report when leaving does not depend on the waiting time. Summarizing, we assume that the pair is independent of the waiting time . When this assumption does not hold, different theoretical tools are needed for a valid estimation.
Let be the recorded time: . The observed data consist of the triplets , , and there are three categories of patients:
 :

The patient gets service, hence the waiting time is observed, which serves as a lower bound on the patience time; thus the patience time is right censored. Formally, , , and .
 :

The patient leaves without being treated and reports departure. The patience time is thus revealed: , , and .
 :

The patient leaves without reporting, hence virtual waiting time (the time that the patient would have waited had he stayed in the ED) is observed, which provides an upper bound for the patience time, thus the patience time is leftcensored. Formally, , , and .
Lemma 1.
The following equalities hold:

.

.

.
See the proof in A.1.
For , we introduce the following substochastic distribution functions
(1) 
From Lemma 1 above, we deduce that
Here, and are the survival functions of the patience time and the waiting time, respectively.
Define
(2) 
Then is the density function of the observed time given . Our model assumes that all denominators are positive.
To summarise what is known and what is to be estimated, there are two unknown distributions in our setting, and , and we aim to estimate them using both parametric and nonparametric techniques. For each patient, the waiting time is either observed or right censored. If the patient reports and then leaves, the waiting time is longer than the observed patience time. Hence, the waiting time is rightcensored. Therefore, parametric and nonparametric estimation for the distribution of waiting time can be done by standard techniques for rightcensored data. However, estimation of the distribution of patience time , is more complicated and is discussed in Sections 4 and 5.
4 Parametric estimation
Assume now that the distributions of both the patience time and the waiting time belong to some parametric families. More formally, let where , where . We assume that the density of the patience time can be written as . We also assume that the density of the waiting time can be written as . Write , and similarly and .
The likelihood of the observed data can be written in terms of the functions , , and , as follows:
Using the explicit representations of , , , we obtain that is given by
The value of that maximizes this likelihood is independent of . Therefore, a maximum likelihood estimator (MLE) to can be constructed from this likelihood. However, maximizing the likelihood with respect to is difficult. Even if is given or estimated, the maximizer of depends on the unknown function . Therefore, we consider the partial likelihood of category ,
The value of that maximizes this partial likelihood depends on . We plug the MLE into this partial likelihood. In Theorem 1 below we show that, under standard regularity conditions, the maximizer of is a consistent and asymptotically normal estimator for .
We need the following assumptions:

The derivative is continuous in for each , is continuous in for each .

For all , is unique, hence denote
. It is assumed as well that for each , . 
For all , is unique, hence denote
. It is assumed as well that for each , .
Theorem 1.
Let be the maximizer of and let be the maximizer of . Then, as ,

in probability.

in distribution.

in probability.

in distribution.
Here , are covariance matrices as defined in Appendix A.1.
The proof appears in Appendix A.1.
5 Nonparametric estimation
In this section we propose nonparametric estimators for the survival function of the patience time and study its theoretical properties. For simplicity, we restrict the estimation to a segment for some , such that the probability of and being larger than is positive. This is a standard condition in survival estimation (see Kosorok, 2008, Chapter 4.2). Note that for observations of Categories 1 and 3, the waitingtime is observed. For Category 2, only a lower bound of the waiting time is observed. Hence, the waiting time is either observed or rightcensored. Therefore, estimating the waiting time distribution can be done by using standard survival analysis estimators such as the Kaplan–Meyer estimator (see Klein and Moeschberger, 2013, Chapter 4). On the other hand, estimating the distribution of the patience time is more challenging since we cannot distinguish between the density function and the unknown function . Our goal is thus to estimate the distribution of the patience time F.
Assume that over all positive numbers, the waiting time density function is strictly positive. Recall that , , where the functions are defined as in (1). Therefore,
(3) 
which is well defined as . Reordering the terms in (3), we get that
Hence,
From the definitions in (2), it follows that
(4) 
Therefore, we propose to estimate F(t) by estimating the following terms:
(i) and ,
(ii) and ,
(iii) .
Estimating the expression in (i) can be done by the empirical estimators: ,
. These estimators converge, by the central limit theorem (CLT), to
and , respectively, at the rate of .Since and are density functions, they can be estimated using a kernel estimator (Tsybakov, 2008, Chapter 1.2). Let and be kernel estimators of and , respectively. Assume that both and belong to a Sobolev function class of order . Then for each , both and converge at a rate of (see Tsybakov, 2008, Chapter 1.7, for both the definition of a Sobolev class and the proof).
We now turn to estimate the term . A nonparametric estimator that we created for this term is defined and proven to be consistent in the following lemma.
Lemma 2.
Let
Define . Then converges pointwise to , at a rate of , for every .
The proof is given in Appendix A.3.
Theorem 2.
The estimator converges pointwise to at a rate of ,
for every .
The proof appears in Appendix A.4.
6 Simulations
Setting 1  Setting 2  

Parametric  Nonparametric  Parametric  Nonparametric  
N  mean  median  sd  mean  median  sd  mean  median  sd  mean  median  sd 
100  6.23  2.79  8.2  14.41  12.64  8.28  23.24  18.44  11.61  16.47  13.18  12.16 
200  3.29  1.32  4.94  9.77  7.48  6.51  19.07  15.67  7.87  10.43  8.47  7.33 
500  1.31  0.77  1.57  5.32  4.74  2.49  16.53  15.01  4.09  4.33  3.95  2.18 
1000  0.72  0.25  0.99  3.55  3.36  1.32  15.38  14.16  3.68  2.49  2.2  1.18 
2000  0.36  0.16  0.53  2.35  2.21  0.87  14.8  14.18  1.98  1.78  1.65  0.9 
We study the performance of both the parametric and nonparametric estimators that were proposed in Sections 4 and 5, respectively. Based on the setting of the case study discussed in Section 7, we consider two simulation settings. In the case study, both the exponential and Weibull distributions seem to fit well the waiting time and patience time distributions, respectively. The case study data also indicated that the mean of the waiting time is smaller then the patience time . Thus, the two simulation settings consist of samples from exponential and Weibull distributions in which the waiting time has a smaller mean then the patience time mean. In the first setting, a sample was taken from the model in which the patience time
follows an exponential distribution with rate
, and the waiting time follows an exponential distribution with rate . In the second setting a sample was taken from a model in which the patience time follows a Weibull distribution with rate and shape , and the waiting time follows an exponential distribution with rate . In both settings, the unknown probability of announcement is . Taking the probability of announcement to be the increasing function or the constant function yields similar results which are omitted. Moreover, we experimented with additional numerical values. The behavior and conclusions, as reported here, remain consistent across these experiments.In each setting, we calculated the parametric estimator for the rate of for five different sample sizes (). For each sample size, we repeated the simulation times. When using the parametric method, it was assumed that both and follow an exponential distribution with unknown parameters. Note that this assumption holds for the first setting but does not hold for the second one. In other words, the second setting is carried out under a misspecified model. The results are shown in Figure 1.
We compare , the estimator of the survival function of , to the true survival function . For the parametric estimation, , while for the nonparametric estimator is given by (A.4). The comparison is done using mean square error (MSE), which is defined by
where is the density of . The parametric and nonparametric survival function estimators are demonstrated in Figures 2 and 3. Figure 2 represents the results of the first setting in which and . Figure 3 represents the results of the second setting in which and . Summaries of the MSE are given in Table 1. Not surprisingly, for Setting 1, since the parametric model is correct, the MSE is smaller for the parametric estimator. Similarly, since in Setting 2 the parametric model is incorrect, the MSE is smaller for the nonparametric estimator.
7 Case study
Retrospective data were collected from all patient presentations to triage at an urban, academic, adultonly emergency department (ED) with visits in calendar year 2008. This data was used for the analysis in Wiler et al. (2013). The data consist of the waiting time of patients arriving at emergency rooms. One of the categories defined in this data is acuity. Since our model assumes that all patients follow the same distribution, we calculated the estimators for each level of acuity separately. We focused on the following three levels of acuity: emergency, urgent, and semiurgent. The emergency level consist of patients, the urgent level consist of patients, and the semiurgent level consist of patients.
The data consists of the triple variables described in Section 3. At each acuity level, an observation is categorized to one of the three possible categories.
Parametric and nonparametric estimators for the survival of each acuity level were calculated. The results of these estimators are given in Figures 4 and 5. As can be seen in Figure 4, the nonparametric estimators of the patience time are stochastically ordered by levels of acuity. In other words, patients at the severe acuity level are less probable to loose patience than patients at the urgent level, who in turn are less prone to lose patience than patients at the semiurgent level. The results for the parametric estimator seem unreasonable since one would expect that patients with more severe acuity level are more likely to loose patience, and loose it faster.
8 Discussion
In this paper, we consider survival data that combine observed, rightcensored, and leftcensored data. The setting we analyzed was that of patients who wait for treatment in an emergency department, where some patients may leave without being seen. We proposed both parametric and nonparametric estimators for the distribution of the patience time. Using simulation, we showed that when the parametric model holds, the parametric estimator estimates the patience time well. However, when the model is misspecified, the nonparametric estimator behaved better. In our case study, we also observed that the nonparametric estimator performed better.
So far, no baseline covariates were given. Novel parametric and nonparametric estimators are needed for addressing settings that include baseline covariates.
9 Acknowledgement
Y. Ritov was partially supported by the Israeli Science Foundation (grant No. 1770/15).
Y. Goldberg was partially supported by the Israeli Science Foundation (grant No. 849/17).
Appendix A Proofs
a.1 Proof of Lemma 1
where in the fourth equality we use the independence between and .
a.2 Proof of Theorem 1
The log of the full likelihood is
Given the data ,
(6) 
where is defined by
and
From assumption A1 we obtain that, for each , . The that maximizes does not depend on the value of or the function . Define and .If, for a general function , and ) then by Assumptions A1–A3, Theorem 5.7 in van der Vaart (2000) can be applied. Therefore , in probability, which concludes the proof of i).
Given the data , the term is a function of and does not depend on the unknown function . We also have
where is defined as
By Assumptions (A1)–(A3), satisfies the conditions of Theorem 5.41 in van der Vaart (2000) and, therefore,
where . Hence is a linear asymptotically normal (LAN) estimator with influence function . From all of the above we get that ii) is proved with .
To prove iii), note that due to the term that appears in , the term depends on the unknown function . We therefore consider a partial likelihood function such that its derivate with respect to does not depend on . The partial likelihood that satisfies this request is the partial likelihood of :
The log of the partial likelihood is
Given the data , the term is a function only of the parameters and . We also have
where is given by
Define , and . Then,
Theorem 5.7 in van der Vaart (2000) can be applied. Therefore in probability, and
in particular in probability, and iii) is proven.
In order to prove iv), note that
where is defined as
Using Assumptions A1–A3, together from Theorem 5.41 in van der Vaart (2000), we obtain that
Define and note that (since under the true parameters , where is the true distribution of category 1).
By Talyor’s theorem,
Elementary arithmetic leads to
where
Comments
There are no comments yet.