1.1 Generalized exponential (GE) distribution
The random variable
follows GE distribution if its probability density function (pdf) and distribution function are given by
is parameter vector (is the shape parameter and is the rate parameter). The family of GE distributions was introduced by Mudholkar and Srivastava (1993). For a comprehensive account of the theory and applications of GE distribution, we refer the readers to Gupta and Kundu (2007).
1.2 Progressively type-I interval censoring scheme
Suppose subjects are placed on a life testing simultaneously at time and under inspection at pre-determined times in which is the time to terminate the life testing. At the -th inspection time, , the number, , of failures within is recorded and alive items are randomly removed from the life testing, for . As pointed out by Chen and Lio (2010), since the number, , of surviving items is a random variable and the exact number of items withdrawn should not be greater than at time schedule , then could be determined by the pre-specified percentage of the remaining surviving units at , or equivalently ; for . Each progressively type-I interval censoring scheme is shown by where is the sample size. If ; for , then the progressively type-I interval censoring scheme is equivalent to a type-I interval censoring scheme with sample . Suppose a life testing scheme where items each follows independently the cdf is under the test. The likelihood function is (see ) is
As the most common used tool, the maximum likelihood (ML) approach is employed to estimate the . But, equation (3
) must be maximized through iterative algorithm such as Newton-Raphson to obtain the ML estimators and there is no guarantee that the Newton-Raphson method converges. Another technique is the expectation-maximization (EM) algorithm that always converges, see. However, if practitioner is interested in the ML estimators, the first few steps of the EM algorithm can be used to get a good starting value for the Newton-Raphson algorithm, see .
1.3 EM algorithm
The EM algorithm, introduced by , is known as the popular method for computing the ML estimators when we encounter the incomplete data problem. In other word, the use of the EM algorithm involves cases that we are dealing with the latent variables, provided that the statistical model is formulated as a missing or latent variable problem. In what follows, we give a brief description of the EM algorithm. Let , , and denote the complete, unobservable variable, and observed data, respectively (complete data consists of observed values and unobservable variables, i.e., ). The EM algorithm works by maximizing the conditional expectation of complete data log-likelihood function given observed data and a current estimate of the parameter vector where denotes the complete data log-likelihood function. Each iteration of the EM algorithm consists of two steps:
Expectation (E)-step: Computing at the -th iteration.
Maximization (M)-step: Maximizing with respect to to get .
2 EM algorithm for GE family under progressive type-I interval censoring scheme
Suppose failure times follow the GE distribution with pdf and cdf given by expressions (1) and (2), respectively. For convenience, let us to use the notations given by Chen and Lio (2010). So, let ; for , denote the independent and identically distributed (iid) failure times in the subinterval ; for and ; for , indicate on iid failure times of the randomly removed items alive at the end of the subinterval ; for . Then, the complete data log-likelihood, , is (see )
In expression (4), we show unobservable (or missing) variables by capital letters (for , ) and (for ; ) in which . The progressive type-I censoring scheme is an incomplete data problem. The observed values are s and s; for and unobservable variables are (iid failure times during subinterval ) and (iid withdrawn survival times during subinterval ). Therefore, under EM algorithm framework mentioned in subsection 1.3, the vector of observed data, , is and the vector of unobservable variables is, ; for . Assuming that we are at -th iteration, in order to implement the EM algorithm, we follow two steps given by the following.
E-step: we need to compute the conditional expectation of the complete data log-likelihood function. It follows, form (4), that
where C is a constant independent of and . We note that the lifetimes of the unobserved items during subinterval are conditionally independent, identically distributed, and follow the truncated GE distribution on interval . Also, lifetimes of the unobservable subjects during subinterval are conditionally independent, identically distributed, and follow the double-truncated GE distribution on subinterval ; for . Therefore, considering the right-hand side of (2), the required conditional expectations are:
(6) (7) (8) (9)
where and .
M-step: by substituting the computed conditional expectations , and given in (6)-(9) into the right-hand side of (2), we follow the EM algorithm by calculating the derivatives with respect to parameters as follows.
(10) (11) (12)
The M-step is complete.
We mention that the EM algorithm proposed by Chen and Lio (2010) is incorrect since they took expectation form the complete data log-likelihood function after differentiating it with respect to parameters which in not usual in the EM framework. Using the starting values as and repeating the E-step and M-step described as above the EM estimators are obtained. Compare the updated shape and rate parameters at -th iteration given in (12) and (13) with those given by Chen and Lio (2010). It is known that the updated shape parameters are the same but there is a significant difference between updated rate parameter given here and that given in Chen and Lio (2010). Although, difference between rate parameters is theoretically significant, however we perform a simulation study in the next section to observe the differences visually.
3 Simulation study
Here, we perform a simulation study to compare the performance of three estimators including: EM algorithm, ML, and EM algorithm proposed by Chen and Lio (2010) for estimating the parameters of GE distribution when items lie under progressive type-I censoring scheme. For simulating a scheme we use the algorithm proposed by Chen and Lio (2010). We consider four scenarios as:
and . Under each of above four scenarios, we simulate observations from GE distribution with shape parameter and rate parameter and pre-specified inspection times including: , , , , , , , , and termination time is . These settings was used by Chen and Lio (2010). We run simulations for 1000 times when the ML method, proposed EM algorithm in this paper (called here EM), and proposed EM algorithm by Chen and Lio (2010) (called here EM-Chen) take part in the competition. We note that the starting values for implementing both of EM and EM-Chen algorithms are and . The stopping criterion for both algorithms is ; for . The time series plots of the estimators are displayed in Figures (1)-(2). The summary statistics including bias and mean of squared errors (MSE) of estimators are given in Table 1. Recall that the EM and EM-Chen algorithms give the same estimators for the shape parameter and hence time series plot of disappeared in left-hand side subfigures of Figures (1)-(2). As it is seen from Table 1, proposed EM algorithm outperforms EM-Chen algorithm under the first, second, and fourth scenarios in terms of bias, and it outperforms the EM-Chen algorithm in all four scenarios in the sense of MSE. Also, the EM algorithm shows better performance than the ML approach under the first scenario in the sense of both bias and MSE criteria.
We have discovered that the EM algorithm proposed by Chen and Lio (Computational Statistics and Data Analysis 54: 1581-1591, 2010) for estimating the parameters of generalized exponential distribution under progressive type-I censoring scheme is incorrect. Here, the corrected EM algorithm is proposed and then a comparison study have been made to discover differences. Theoretically there is no difference between shape estimators of our proposed EM algorithm and that proposed by Chen and Lio (2010). However, for the rate parameter the difference is quite significant. A simulation study have been performed to show visually the differences between performance of our proposed EM algorithm, maximum likelihood estimators, and EM algorithm proposed by Chen and Lio (2010). We note that both of our proposed EM algorithm and EM algorithm proposed by Chen and Lio (2010) converge under all four scenarios before 20 iterations.
-  Aggarwala, R. (2001). Progressive interval censoring: Some mathematical results with applications to inference, Communication in Statistics-Theory and Methods, 30, 1921-1935.
-  Chen, D. G. and Lio, Y. L. (2010). Parameter estimations for generalized exponential distribution under progressive type-I interval censoring, Computational Statistics and Data Analysis, 54, 1581-1591.
-  Dempster, A. P., Laird, N. M., and Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm, Journal of the Royal Statistical Society Series B, 39, 1-38.
-  Gupta, R. D. and Kundu, D., (2007). Generalized exponential distribution: Existing results and some recent developments, Journal of Statistical Planning and Inference, 137, 3537-3547.
-  Little, R. J. A. and Rubin, D. B. (1983). Incomplete data, in Encyclopedia of Statistical Sciences, S. Kotz and N.L. Johnson, eds., Vol. 4, John Wiley, New York, pp. 46-53.
-  McLachlan, G. J. and Krishnan, T. (1997). The EM Algorithm and Extensions, John Wiley, New York.
-  Mudholkar, G. S. and Srivastava, D. K., (1993). Exponentiated Weibull family for analyzing bathtub failure data, IEEE Transactions on Reliability, 42, 299-302.
-  Teimouri, M., Nadarajah, S., and Shou, H. S. (2014). EM algorithms for beta kernel distributions, Journal of Statistical Computation and Simulation, 84(2), 451-467.