1 Introduction
Statistical inference for remaining lifetimes would be intuitively more appealing than the popular hazard rate function, since its interpretation as “the risk of immediate failure” can be difficult to grasp. A function called the mean residual life (or mean excess loss) which represents “the average remaining time before failure” is easier to understand. The mean residual life (or MRL for short) function is of interest in many fields relating to time and finance, such as biomedical theory, survival analysis, and actuarial science.
Let
be independently and identically distributed absolutely continuous random variables supported on an interval
, where , , and . Also, let be the density function,be the cumulative distribution function,
be the survival function, and be the cumulative survival function, of . Then(1) 
is the definition of the mean residual life function, or can be written as
(2) 
For a detailed discussion about the MRL function, see Embrechts et al. [1] or Guess and Proschan [2]. Murari and Sujit [3] and Belzunce et al. [4]
discussed the use of the MRL function for ordering and classifying distributions. On the other hand, Cox
[5], Kotz and Shanbhag [6], and Zoroa et al. [7] proposed how to determine distribution via an inversion formula of . Ruiz and Navarro [8] have considered the problem of characterization of the distribution function through the relationship between the MRL function and the hazard rate function. The MRL functions of finite mixtures and order statistics have been studied as well by Navarro and Hernandez [9].Some properties and applications of the MRL concept related to operational research and reliability theory in engineering are interesting topics. While Nanda et al. [10] discussed the properties of associated orderings in the MRL function, Huynh et al. [11] studied the usefulness of the MRL models for maintenance decisionmaking.
Another examples are the utilization of the MRL functions of parallel system by Sadegh [12], the MRL for records by Raqab and Asadi [13], the MRL of a outof:G system by Eryilmaz [14], the MRL of a outof system by Poursaeed [15], the MRL in reliability shock models by Eryilmaz [16], the MRL subjected to MarshallOlkin type shocks by Bayramoglu and Ozkut [17], the MRL of coherent systems by Eryilmaz et al. [18] and Kavlak [19], the MRL for degrading systems by Zhao et al. [20], and the MRL of rail wagon bearings by Ghasemi and Hodkiewicz [21].
The natural estimator of the MRL function is the empirical one, defined as
(3) 
where is the usual indicator function on set . Yang [22], Ebrahimi [23], and Csörgő and Zitikis [24] studied the properties of . Even though it has several good attributes (e.g. unbiasedness and consistency), the empirical MRL function is just a rough estimate of and lack of smoothness. Estimating is also impossible for large because for . Though we can just define for such case, it is a major disadvantage as analysing the behaviour of the MRL function when is of an interest.
Various parametric models of MRL have been discussed in literatures, for example the transformed parametric MRL models by Sun and Zhang
[25], the upsidedown bathtubshaped MRL model by Shen et al. [26], the MRL order of convolutions of heterogeneous exponential random variables by Zhao and Balakrishnan [27], the proportional MRL model by Nanda et al. [28] and Chan et al. [29], and the MRL models with timedependent coefficients by Sun et al. [30].Some nonparametric estimators of which are related to the empirical one have been discussed in a fair amount of literature. For example, Ruiz and Guillamón [31] estimated the numerator in by a recursive kernel estimate and left the empirical survival function unchanged, while Chaubey and Sen [32] used the Hille’s Theorem in Hille [33] to smooth both the numerator and denominator in .
The other maneuver that can be used for estimating the MRL function nonparametrically is the kernel method. Let be a symmetric continuous nonnegative kernel function with , and be a bandwidth satisfying and when . From this, we will have three other functions derived from , they are
(4) 
Hence, the naive kernel MRL function estimator can be defined as
(5) 
Guillamón et al. [34] discussed the asymptotic properties of the naive kernel MRL function estimator in detail.
However, as usually is used for time or finance related data, which are on nonnegative real line or bounded interval, the naive kernel MRL function estimator suffers the so called boundary bias problem. In the case of (or ), the boundary effects of when (or ) is not as bad as in the kernel density estimator, but the problems still occur. It is because the term and in the and can never be since , which means causes the boundary problems for . Moreover, in the case of and
(e.g. uniform distribution), not only
, but also adds its share to the boundary problems for .To make things worse, the naive kernel MRL function estimator does not preserve one of the most important properties of the MRL function, which is . It is reasonable if we expect . However, is less than and is smaller than the average value of , due to the weight that they still put on the outside of . Accordingly, there is no guarantee of how far or how close is to . Some simulations in Section 4.1 illustrate this statement.
Some articles have suggested methods to solve the boundary bias problems in the density estimation, such as data reflection by Schuster [35]; simple nonnegative boundary correction by Jones and Foster [36]; boundary kernels by Müller [37], Müller [38], and Müller and Wang [39]; generating pseudo data by Cowling and Hall [40]; hybrid method by Hall and Wehrly [41]; and the local polynomial fitting by Fan and Gijbels [42]. Even though only few literatures discussed how to extend previous ideas for solving the problems in the MRL function estimation, Abdous and Berred [43] successfully adopted the idea of local polynomial fitting (linear in their case) for the MRL function estimation.
In this article we are going to try another idea to remove the boundary effects, which is utilizing transformations that map to bijectively. In this situation there are no boundary effects at all, as we will not put any weight outside the support. Hence, instead of using , we will apply the kernel method for the transformed , where and is a bijective function. However, even though the idea is easy to understand, we cannot just substitute with and with in the formula of , due to avoiding nonintegrability. We need to modify the naive kernel MRL function estimator before substituting and
in order to preserve the integrability and to ensure that the new formulas are good estimators of the mean excess loss function.
Before moving on to our main focus, we need to impose some conditions:

The kernel is a continuous nonnegative function and symmetric at with

The bandwidth satisfies and when

The function is continuous and strictly increasing

The density and the function are continuously differentiable at least twice

The integrals and are finite for all in an neighbourhood of the origin

The expectations , , and exist.
The first and the second conditions are standard assumptions for kernel methods, and C3 is needed for the bijectivity and the simplicity of the transformation. Since we will use some expansions of the survival and the cumulative survival functions, C4 is important to ensure the validity of our proofs. The last two conditions are necessary to make sure we can derive the bias and the variance formulas. In order to calculate the variances, we also define a new function
(6) 
for simpler notation. Some numerical studies are discussed in Section 4, and the detailed proofs can be found in the appendices.
2 Estimators of the survival function and the cumulative survival function
Before jumping into the estimation of the mean residual life function, we will first discuss on the estimations of each component, which are the survival function and the cumulative survival function . In this article, we proposed two sets of estimators using the idea of transformation. Based on those two sets of estimators, we will propose two estimators of the MRL function in Section 3.
Geenens [44], also Wen and Wu [45], used probit transformation to eliminate the boundary bias problems in the kernel density estimation for data on the unit interval. If we generalize their idea for any interval and using any function that satisfies the conditions stated before, then we will have
as the generalized boundaryfree kernel density estimator by transformation. Then, by doing simple subtitution technique on , the first proposed survival function estimator is
(7) 
where
(8) 
Using the same approach, we define the first proposed cumulative survival function estimator as
(9) 
where
(10) 
Their biases and variances are given in the following theorem.
Theorem 2.1.
Under the condition C1C6, the biases and the variances of and are
(11)  
(12) 
and
(13)  
(14) 
where
(15)  
(16) 
Furthermore, the covariance of them is
(17) 
Remark 2.2.
Because , it means that our first set of estimators preserves the relationship between the theoretical and .
We have utilized the relationship among density, survival, and cumulative survival functions to construct the first set of estimators, now we are going to use another maneuver to build our second set of estimators. The second proposed survival function estimator is defined as
(18) 
where
(19) 
As we can see, is basically just a result of a simple subsitution of and to the formula of . This can be done due to the changeofvariable property of the survival function (for a brief explanation of the changeofvariable property, see Lemma A.2). Though it is bit trickier, the changeofvariable property of the cumulative survival function leads us to the construction of our second proposed cumulative survival function estimator, which is
(20) 
where
(21) 
In the above formula, multiplying with is necessary to make sure that is an estimator of (see equation (43)). Now, with and , their biases and variances are as follows.
Theorem 2.3.
Under the condition C1C6, the biases and the variances of and are
(22)  
(23) 
and
(24)  
(25) 
where
(26) 
Furthermore, the covariance of them is
(27) 
Remark 2.4.
Remark 2.5.
We can prove that both and are always equal to (see Appendix G), and it is obvious that both and are . Hence, it is clear that their variances are when approaches the boundaries. This is one of the reasons our proposed methods outperform the naive kernel estimator.
3 Estimators of the mean residual life function
In this section, we will discuss the estimation for the mean residual life function. As we already have defined the survival function and the cumulative survival function estimators, we just need to plug them into the MRL function formula. Hence, our proposed estimators of the mean excess loss function are
(28) 
and
(29) 
At first glance, seems more representative to the theoretical , since the mathematical relationship between and are same as the relationship between the numerator and the denumerator of , as stated in Remark 2.2. This is not a major problem for , as we stated in Remark 2.4 that the relationship between and is statistically same to the relationship between and . However, when a statistician wants to keep the mathematical relationship between the survival and the cumulative survival functions in their estimates, it is suggested to use instead.
Theorem 3.1.
Under the condition C1C6, the biases and the variances of , , are
(30)  
(31)  
(32) 
where
(33) 
Similar to most of kernel type estimators, our proposed estimators attain asymptotic normality, as stated in Theorem 3.2.
Theorem 3.2.
Under the condition C1C6, the limiting distribution
(34) 
holds for .
Furthermore, we also establish strong consistency of the proposed estimators in the form of the following theorem.
Theorem 3.3.
Under the condition C1C6, the consistency
(35) 
holds for .
The last property that we would like to discuss is the behaviour of our proposed estimators when is in the boundary regions. As stated in Section 1, we want our estimators to preserve the behaviour of the theoretical MRL function, specifically the property of . If we can prove this, then not only will our proposed methods be free of boundary problems, but also superior in the sense of them preserving the key property of the MRL function.
Theorem 3.4.
Let and be the transformed kernel mean residual life function estimators. Then
(36) 
and
(37) 
Remark 3.5.
Please note that, although for convenience it is written as (or ), but we actually mean it as (or ), since (or ) might be undefined.
Remark 3.6.
From equation (37), we can say that is unbiased, because
In other words, its bias is exactly . On the other hand, even though is not exactly the same as , we can at least say they are close enough, and the rate of error is relatively small. However, from this we may take a conclusion that is superior than in the aspect of preserving behaviour of the MRL function near the boundary.
4 Numerical studies
In this section, we show the results of our numerical studies. The studies are divided into two parts, the simulations and the real data analysis.
4.1 simulation results
In this study, we calculated the average integrated squared error (AISE) and the average squared error (ASE) with several sample sizes, and repeated them times for each case. We compared four estimators: empirical ; naive kernel ; and our two proposed estimators and . The distributions which we generated are standard uniform , beta , gamma , Weibull , and absolutenormal distributions. For and , we took
, the standard normal distribution function; and we chose
for the rests. The kernel function we used here is the Epanechnikov Kernel and the bandwidths were chosen by crossvalidation technique. We actually did the same simulation study using Gaussian Kernel, but the results are quite similar. That being the case, we do not show those results in this article.(Table 1) compares the AISE in order to illustrate the general measure of error among the estimators. (Table 2) compares the ASE of each estimate when , as a representation of the error when is in the boundary region. For (Table 3), the ASE at represents the error when the point of evaluation is moderate. The last table represents the error of the estimators when is large enough.
As we can see in the tables, our proposed estimators gave the best results for all cases. This is particularly true for our second proposed estimator in most cases. Though our first proposed estimator’s performances are not as good as the second one, it is still fairly comparable because the differences are not huge. Furthermore, the first proposed estimator is better than the empirical and the naive kernel estimators in most cases.
We may take interest in (Table 2) as the empirical MRL function gave similar results as our second proposed estimator did. However, this is reasonable due to the fact that , same as according to Theorem 3.4. In (Table 3), even though our second estimator still outperformed the others, the margins of difference with the other estimators are not big. This can be explained as has high density, neither it has boundary problems nor lack of data as in the tail. However, (Table 4) showed another story. As the tail of distribution has lesser density of data, the empirical and naive kernel estimators dropped to quickly. This explains why the ASE of theirs are much larger than the ASE of both of the proposed estimators.
Distributions  Empirical  Naive  Proposed 1  Proposed 2 

Distributions  Empirical  Naive  Proposed 1  Proposed 2 

Distributions  Empirical  Naive  Proposed 1  Proposed 2 

Distributions  Empirical  Naive  Proposed 1  Proposed 2 

As further illustrations, we also provide some graphs to compare our proposed estimators’ performances with the other estimators. (Figure 1) is about the graphs comparison of the empirical, the naive kernel, and our two proposed estimators. By (Figure 2), we compare the pointwise simulated bias of the same estimators. From those, we can say that our proposed estimators outperformed the empirical and the naive kernel estimators.
There are three things that we want to emphasize from these figures. First, instead of resembling the theoretical shape, the graphs of the naive kernel estimator are more like a smoothed version of the graphs of the empirical estimator, especially in (Figure 1(a)) and (Figure 1(b)). This is somewhat interesting, as even though lack of smoothness, empirical type estimators (e.g. empirical distribution function) usually quite resemble the shape of the theoretical ones. However, in this MRL function case, the empirical MRL function cannot be used as a reference, because its shape is too unstable and different to the theoretical shape (see (Figure 1(a)), (Figure 1(b)), and (Figure 1(d))). Same goes for the naive kernel MRL function estimator. Even though (Figure 1(c)) and (Figure 1(d)) showed the naive kernel estimator has nice graphs, it performed fairly poorly in (Figure 1(a)) and (Figure 1(b)). On the other hand, the graphs of our proposed estimators resemble the theoretical ones. The difference is quite striking in (Figure 1(a)), where the empirical and naive kernel estimators are jumpy, but the proposed estimators gave stable and almost straightline graphs.
The second thing we want to emphasize is, from all figures we can see that the boundary bias problems affect naive kernel estimator severely, as (Figure 2) shows the simulated bias values of near are the farthest from . We can also conclude that the empirical MRL function does not suffer from the boundary bias problems, as its bias is almost near . However, as goes larger, the bias drops to negative value quickly, especially in (Figure 2(a)). In contrast, our estimators, especially the second one, gave almost straight line at ordinate in (Figure 2(b)), which means its simulated bias is almost always . And at last, we can conclude that though all of the graphs of the estimators presented here will fade to when is large enough, our proposed estimators are more stable and fading to much slower than the other two estimators.
4.2 real data analysis
In this analysis, we used the UIS Drug Treatment Study Data from [46] to show the performances our proposed methods for real data. The data set records the result of an experiment about how long someone who got drug treatment to relapse (reuse) the drug again. The variable we used in the calculation is the "time" variable, which represents the number of days after the admission to drug treatment until drug relapse.
(Figure 3) shows that, once again, the naive kernel estimator is just a smoothed version of the empirical MRL function. Furthermore, soon after touches , also reaches . Conversely, though our proposed estimators are decreasing as well, but they are much slower than the other two.
5 Conclusion
This article has proposed two new estimators for the mean residual life function (also the survival and the cumulative survival functions) when the data are supported not on the entire real line. First we constructed two new estimators for both the survival function and the cumulative survival function using bijective transformation. After deriving the formulas for the biases and the variances of the estimators, we defined two estimators for the MRL function. The properties of our proposed methods have been discovered and discussed. Moreover, the results of the numerical studies reveal the superior performance of the proposed estimators. For future research, establishing new estimators using a similar idea for other functions such as the distribution function, the hazard rate function, or regression of the MRL function will be a valuable contribution to this field.
References
 [1] Embrechts P, Klüppelberg C, Mikosch T. Modelling extremal events. Berlin: Springer; 1997.
 [2] Guess F, Proschan F. Mean residual life. In: Krishnaiah PR, Rao CR, editors. Handbook of statistics. Vol. 7. Amsterdam: NorthHolland; 1988. p. 215–224.
 [3] Murari M, Sujit K. Change point estimation in nonmonotonic aging models. Ann I Stat Math. 1995;3:483–491.
 [4] Belzunce F, Ruiz JM, Pellerey F, Shaked M. The dilation order, the dispersion order, and orderings of residual lifes. Stat Probabil Lett. 1996;33:263–275.
 [5] Cox DR. Renewal Theory. London: Methuen; 1962.

[6]
Kotz S, Shanbhag D. Some new approaches to probability distributions. Adv Appl Probab. 1980;12:903–921.
 [7] Zoroa P, Ruiz JM, Marlin J. A characterization based on conditional expectations. Commun Stat  Theor M. 1990;19:3127–3135.
 [8] Ruiz JM, Navarro J. Characterization of distributions by relationships between the failure rate and the mean residual life. IEEE T Reliab. 1994;43:640–644.
 [9] Navarro J, Hernandez PJ. Mean residual life functions of finite mixtures, order statistics, and coherent systems. Metrika. 2008;67:277–298.
 [10] Nanda AK, Bhattacharjee S, Balakrishnan N. Mean residual life function, associated orderings, and properties. IEEE T Reliab. 2010;59(1):55–65.
 [11] Huynh KT, Castro IT, Barros A, Bérenguer C. On the use of mean residual life as a condition index for conditionbased maintenance decisionmaking. IEEE T Syst Man CyS. 2014;44(7):877–893.
 [12] Sadegh MK. Mean past and mean residual life functions of a parallel system with nonidentical components. Commun Stat  Theor M. 2008;37(7):1134–1145.
 [13] Raqab MZ, Asadi M. On the mean residual life of records. J Stat Plan Infer. 2008;138:3660–3666.
 [14] Eryilmaz S. On the mean residual life of a outof:G system with a single cold standby component. Eur J Oper Res. 2012;222:273–277.
 [15] Poursaeed MH. A note on the mean past and the mean residual life of a outof system under multi monitoring. Stat Pap. 2010;51:409–419.
 [16] Eryilmaz S. Computing optimal replacement time and mean residual life in reliability shock models. Comput Ind Eng. 2017;103:40–45.
 [17] Bayramoglu I, Ozkut M. Mean residual life and inactivity time of a coherent system subjected to Marshall–Olkin type shocks. J Comput Appl Math. 2016;298:190–200.
 [18] Eryilmaz S, Coolen FPA, CoolenMaturi T. Mean residual life of coherent systems consisting of multiple types of dependent components. Nav Res Log. 2018;65(1):86–97.
 [19] Kavlak KB. Reliability and mean residual life functions of coherent systems in an active redundancy. Nav Res Log. 2017;64(1):19–28.
 [20] Zhao S, Makis V, Chen S, Li Y. Evaluation of reliability function and mean residual life for degrading systems subject to condition monitoring and random failure. IEEE T Reliab. 2018;67(1):13–25.
 [21] Ghasemi A, Hodkiewicz MR. Estimating mean residual life for a case study of rail wagon bearings. IEEE T Reliab. 2012;61(3):719–730.
 [22] Yang GL. Estimation of a biometric function. Ann Stat. 1978;6:112–116.
 [23] Ebrahimi N. On estimating change point in a mean residual life function. Sankhya Ser A. 1991;53:206–219.
 [24] Csörgő M, Zitikis R. Mean residual life processes. Ann Stat. 1996;24:1717–1739.
 [25] Sun L, Zhang Z. A class of transformed mean residual life models with censored survival data. J Am Stat Assoc. 2009;104(486):803–815.
 [26] Shen Y, Tang LC, Xie M. A model for upsidedown bathtubshaped mean residual life and its properties. IEEE T Reliab. 2009;58(3):425–431.
 [27] Zhao P, Balakrishnan N. Mean residual life order of convolutions of heterogeneous exponential random variables. J Multivariate Anal. 2009;100:1792–1801.
 [28] Nanda AK, Bhattacharjee S, Alam SS. Properties of proportional mean residual life model. Stat Probabil Lett. 2006;76:880–890.
 [29] Chan KCG, Chen YQ, Di CZ. Proportional mean residual life model for rightcensored lengthbiased data. Biometrika. 2012;99(4):995–1000.
 [30] Sun L, Song X, Zhang Z. Mean residual life models with timedependent coefficients under right censoring. Biometrika. 2012;99(1):185–197.
 [31] Ruiz JM, Guillamón A. Nonparametric recursive estimator for mean residual life and vitality function under dependence conditions. Commun Stat  Theor M. 1996;25:1997–2011.
 [32] Chaubey YP, Sen PK. On smooth estimation of mean residual life. J Stat Plan Infer. 1999;75:223–236.
 [33] Hille E. American mathematical society colloquium publications. Vol. 31. New York: American Mathematical Society; 1948. Functional analysis and semigroups.
 [34] Guillamón A, Navarro J, Ruiz JM. Nonparametric estimator for mean residual life and vitality function. Stat Pap. 1998;39:263–276.
 [35] Schuster EF. Incorporating support constraints into nonparametric estimators of densities. Commun Stat  Theor M. 1985;14:1123–1136.
 [36] Jones MC, Foster PJ. A simple nonnegative boundary correction method for kernel density estimation. Stat Sinica. 1996;6:1005–1013.
 [37] Müller HG. Smooth optimum kernel estimators near endpoints. Biometrika. 1991;78:521–530.
 [38] Müller HG. On the boundary kernel method for nonparametric curve estimation near endpoints. Scand J Stat. 1993;20:313–328.
 [39] Müller HG, Wang JL. Hazard rate estimation under random censoring with varying kernels and bandwidths. Biometrics. 1994;50:61–76.
 [40] Cowling A, Hall P. On pseudodata methods for removing boundary effects in kernel density estimation. J R Stat Soc B. 1996;58:551–563.
 [41] Hall P, Wehrly TE. A geometrical method for removing edge effects from kerneltype nonparametric regression estimators. J Am Stat Assoc. 1991;86:665–672.
 [42] Fan J, Gijbels I. Monographs on statistics and applied probability. Vol. 66. London: Chapman & Hall; 1996. Local polynomial modeling and its applications
 [43] Abdous B, Berred A. Mean residual life estimation. J Stat Plan Infer. 2005;132:3–19.
 [44] Geenens G. Probit transformation for kernel density estimation on the unit interval. J Am Stat Assoc. 2014;109:346–358.
 [45] Wen K, Wu X. An improved transformationbased kernel estimator of densities on the unit interval. J Am Stat Assoc. 2015;110(510):773–783.
 [46] Hosmer DW, Lemeshow S. Applied survival analysis: Regression modeling of time to event data. New York: John Wiley and Sons; 1998.

[47]
Loeve M. Probability Theory. New Jersey: Van NostrandReinhold; 1963.
 [48] Nadaraya EA. Some new estimates for distribution functions. Theor Probab Appl +. 1964;9:497–500.
Appendix A Some lemmas needed to prove the theorems
Though sometimes not stated explicitly in the proofs of our theorems, the following lemmas are needed for the calculations.
Lemma A.1.
Under the condition C1, the following equations hold
(38)  
(39)  
(40) 
Proof.
All of the above equations can be proven using the integration by parts and the definitions of , , and . ∎
Lemma A.2.
Let and
be the probability density function and the survival function of
, and let and . Then, under the condition C6, we have for ,(41)  
(42)  
(43)  
(44) 
Proof.
Remark A.3.
Lemma A.4.
Let and
(45) 
be the naive kernel estimator of . If is an interval where both and are bounded, then .
Proof.
Since and are both bounded, nonincreasing, and continuous on , then for any , we can find number of points on such that
and , . For any , it is clear that there exists such that . For that particular , we have
which result in
Therefore,
Now, because is a naive kernel estimator, it is clear that for fix , converges almost surely to . Thus, we get . Hence, for any , almost surely when , which concludes the proof. ∎
Appendix B Proof of theorem 2.1
Utilizing the usual reasoning of i.i.d. random variables and the transformation property of expectation, and with the fact
we have