A Class of Tests for Trend in Time Censored Recurrent Event Data

02/22/2018 ∙ by Jan Terje Kvaløy, et al. ∙ 0

Statistical tests for trend in recurrent event data not following a Poisson process are generally constructed for event censored data. However, time censored data are more frequently encountered in practice. In this paper we contribute to filling an important gap in the literature on trend testing by presenting a class of statistical tests for trend in time censored recurrent event data, based on the null hypothesis of a renewal process. The class of tests is constructed by an adaption of a functional central limit theorem for renewal processes. By this approach a number of tests for time censored recurrent event data can be constructed, including among others a version of the classical Lewis-Robinson trend test and an Anderson-Darling type test. The latter test turns out to have attractive properties for general use by having good power properties against both monotonic and non-monotonic trends. Extensions to situations with several processes are considered. Properties of the tests are studied by simulations, and the approach is illustrated in two data examples.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Analyzing recurrent event data is a challenge encountered in many fields, for instance engineering, medicine and economy to mention some. Generally, recurrent event data arise when the phenomenon studied can occur repeatedly. Some examples are the occurrence of a failure in a repairable system or the outbreak of a recurrent disease. One aspect of the data which typically is of interest is to examine whether there are any systematic alterations, i.e., trends, in the pattern of events. For example, does a repairable system have a tendency to fail more often as it gets older? Or is there any improvement in how often a recurrent disease occurs for a particular patient? Visual inspections of the data can be very useful and give important information on systematic tendencies in the data, but generally, in order to distinguish actual systematic alterations from random fluctuations, statistical methods are needed.

There is a rich literature on trend testing, see for instance the overviews in coxlewis, ascherfeingold, kvaloeylindqvist and Lawless2012tmt. Trend tests are based on different assumptions for the data collection process and different definitions of trend. Many of the existing tests for trend are based on Poisson process theory and constructed for testing the null hypothesis of a homogeneous Poisson process (HPP), see for instance coxlewis, ascherfeingold, cohensackrowitz, kvaloeylindqvist, Lawless2012tmt and references therein. Such tests are, however, generally sensitive to departures from the Poisson process assumption. This fact was noted in the classical reference lewisrobinson, who observed that the commonly used Laplace trend test often led to rejection of the null hypothesis of no trend, even in cases where a trend could not exist. More specifically, the authors observed that false rejections were particularly occurring in cases of overdispersion of the interevent times with respect to the exponential distribution. Their idea was to modify the Laplace test statistic to account for this overdispersion, which led to the test known under the name of Lewis-Robinson test, to be further considered later in this paper.

The immediate conclusion to draw from this seems to be that, unless the Poisson assumption can be verified, trend tests need to be based on more general null hypotheses than the one of HPP. So how could one formalize a more useful null hypothesis? Lawless2012tmt concluded that there is no single definition which covers all cases that can naturally be thought of. lewisrobinson argued that a definition of no trend should state that the event process is stationary in some sense, possibly allowing some amount of serial correlation. On the other hand, because of analytical possibilities they found that the renewal process (RP) assumption would be the best choice for further investigations. Under this assumption they were able to repair the Laplace test and introduce the Lewis-Robinson test.

In this paper we shall consider trend tests assuming the null hypothesis of RP. In addition to the Lewis-Robinson test, there exist several trend tests in the literature based on this null hypothesis. We would like to mention first the nonparametric test by mann. Other tests are found in ascherfeingold, klRP, vaurio2, Lawless2012tmt and references therein.

RP based tests for trend, including the classical Lewis-Robinson test are, however, usually constructed for event censored data, which means that the recurrent event process is censored when it has completed a fixed number of renewal events. On the other hand, time censored data, where the event process is censored after a predetermined observation period, are far more naturally occurring in practice. As pointed out by Lawless2012tmt, there is still an unfortunate lack of available trend tests constructed for time censored data. The crucial issue when going from event censoring to time censoring is how to involve in a consistent manner the time interval from the last event to the censoring time. Lawless2012tmt argued that ignoring this interval may lead to considerable bias, see also the most interesting discussion of this and related issues in aalenhusebye. The latter authors, furthermore, pointed out that it is far less critical to ignore an incomplete time at the start of the observation, which will not introduce bias although it might incur a certain loss of efficiency.

With the above as our motivation and point of departure, we demonstrate in this paper how a flexible class of trend tests for time censored data can be constructed under the RP null hypothesis. We thereby complement the above mentioned literature on trend tests for event censored data, in particular the paper by Lawless2012tmt. Our construction is based on an adaption of Donsker’s theorem [Donsker1952] to renewal processes following the lines of billingsley1999convergence. Among other tests, the class turns out to include a time censored version of the Lewis-Robinson test, an Anderson-Darling type test with power against both monotonic and non-monotonic trends and an extension of the Lewis-Robinson test with power against non-monotonic trend. After having studied tests for trend in single processes, we consider extensions to trend tests based on the joint observation of several processes.

The paper is organized as follows. In Section 2 we define the necessary notation and give some key results for renewal processes. The general construction of tests is presented in Section 3 and several specific tests are derived. Section 4 discusses extensions to cases where several similar processes are observed. A simulation study is presented in Section 5, while two case studies are considered in Section 6. Some concluding remarks are given in Section 7

. The paper is ended by Appendix 1 and 2 providing detailed derivations of, respectively, parameter estimators and a specific trend test.

2 The Basic Convergence Results for Renewal Processes

2.1 Setup and Notation

Consider a renewal process observed from time . The successive event times are denoted and the corresponding interevent times, or gap times, are denoted where (with the convention ). The are independent and identically distributed, with and , where it will be assumed throughout the paper that

We use the standard notation where is the number of events in for all . For the theory of renewal processes we refer to, e.g., ross and gallager.

2.2 A Functional Central Limit Theorem for Renewal Processes

The key result in our approach is a functional central limit theorem given in billingsley1999convergence. With notation as above, define

Then [billingsley1999convergence, thm. 14.6],

(1)

where denotes weak convergence and is the Wiener measure [billingsley1999convergence, chap. 8].

Now define for , so that is a Brownian bridge [billingsley1999convergence, chap. 8]. It is straightforward to verify that (1) implies the following result which together with the succeeding corollary is the basis of our construction of trend tests.

Theorem 1

Define

(2)

Then .

Let the coefficient of variation of the interevent times be denoted . As will become clear, plays a special role in our construction of tests. First, define

(3)

Then Theorem 1 implies the following corollary:

Corollary 1

With notation as above we have

Proof: We can write

From standard renewal process theory [ross] it is well known that a.s. The result then follows by use of [thm. 3.1]billingsley1999convergence, sometimes called ’the converging together lemma’. The argument, using the uniform norm, is as follows:

where the convergence to 0 follows since the first factor tends to 0 a.s. and hence in probability, and the last factor converges in distribution to

which has the Kolmogorov distribution (and will be considered below).

3 The Class of Tests for Trend

In the present section we consider event data from a single counting process observed from time until time censoring at the given time . With notation as in Section 2, we thus observe a random number of events, at times , and with fully observed interevent times and a censored interevent time .

From Theorem 1 and Corollary 1 it follows that, under the null hypothesis of RP, and will approximately be Brownian bridges. Thus, if there is a trend in the data, these processes are likely to deviate from a Brownian bridge. Tests for trend can therefore be based on measures of deviation from a Brownian bridge of the two asymptotically equivalent processes and .

Since the parameters are generally unknown, they must be estimated. It is clear that the results of Theorem 1 and Corollary 1 continue to hold under the RP assumption if , and are replaced by consistent estimators, , and .

Below we first derive test statistics based on four different ways of measuring deviations from a Brownian bridge. This leads to test statistics of, respectively, Lewis-Robinson, Kolmogorov-Smirnov, Cramr-von Mises and Anderson-Darling types. In addition we propose an extension of the Lewis-Robinson test which can be used to construct tests for non-monotonic trend. The test constructions are based on applications of Corollary 1. Finally we discuss how to estimate the parameters and .

3.1 Lewis-Robinson Type Test

A classical measure of deviation from a Brownian bridge is the signed area under the path of the process. Using Corollary 1 this gives rise to the statistic , which converges in distribution to

, which is normally distributed with expectation 0 and variance 1/12.

In order to obtain the test statistic on the form that is most common for this test, we use instead the negative of the above suggested statistic, which will have the same limiting distribution. By scaling we obtain an asymptotically standard normally distributed test statistic given by

(4)

If the factor is ignored, we actually get the well known Laplace test statistic for the null hypothesis of HPP for the time censored case, which can be derived from properties of Poisson-processes. The division by corresponds to the correction obtained by lewisrobinson, who considered the event censored case.

The resulting test will primarily have power against deviations from an RP caused by monotonic trends. It is seen that positive (negative) values of the test statistic will correspond to an increasing (decreasing) trend.

3.2 Kolmogorov-Smirnov Type Test

Another classical measure of deviation from a Brownian bridge is the maximum deviation, giving rise to the statistic . By Corollary 1, this statistic converges in distribution to , which has the Kolmogorov distribution [kolmogorov, smirnov]. A Kolmogorov-Smirnov type test for trend in the time censored case is hence given by the test statistic

(5)

3.3 Cramr-von Mises Type Test

Using the Cramr-von Mises type measure we obtain

where the right hand side has the commonly known limit distribution of the Cramr-von Mises statistic [andersondarling]. Due to the squaring of it is clear that a test which rejects the null hypothesis of RP for large values of will have sensitivity against both monotonic and non-monotonic trends. Straightforward calculations give the statistic

(6)

3.4 Anderson-Darling Type Test

The Anderson-Darling type measure leads to

which has the limit distribution of the Anderson-Darling statistic [andersondarling, andersondarling2].  As for the Cramr-von Mises type test it is clear that this test will have sensitivity against both monotonic and non-monotonic trends. The difference between the Cramr-von Mises and the Anderson-Darling statistics is that the latter puts more weight on the information at the beginning and the end of the observation interval. Straightforward but somewhat tedious calculations give that

(7)

3.5 The Extended Lewis-Robinson Test for Non-Monotonic Trend

Recall that the Lewis-Robinson type test for the time censored case was based on the integral . This test is suited for alternatives of monotonic trend. Consider instead the expression

(8)

where . It is seen that in fact leads to the preferred test statistic (4) for the Lewis-Robinson test (of course, gives the negative of the LR statistic (4)).

A test based on (8) will obviously have power to detect non-monotonic trends where the trend in and are in opposite directions. Clearly, (8) converges in distribution to , which is normally distributed with expectation 0 and variance (see Appendix 2). It follows from a calculation in Appendix 2 that (8), after a scaling to give an asymptotically standard normal distribution under the null hypothesis, can be written

(9)

A disadvantage of the above test is that the value of has to be given. One possibility would of course be to allow an adaptive choice of . This will, however, destroy the above distributional properties, and we will therefore not pursue this approach here.

vaurio2 suggested on an ad hoc basis, and for the event censored case, a test statistic similar to (9) with .

3.6 Parameter Estimation

If one assumes the null hypothesis of HPP, then is known, and hence no estimation is needed in the use of Corollary 1

. If we more generally assume specific parametric models for the event process, then the parameters

may be estimated by maximum likelihood methods since they are functions of the model parameters. In the case studies of Section 6 we illustrate the parametric estimation by fitting Weibull RPs to the interevent times, taking into account also the censored time at the end of the observation. Since the Weibull distribution is a rather flexible distribution, the corresponding estimates of and may be satisfactory also under the null hypothesis of RP when no parametric assumptions are made. But strictly, when fitting Weibull distributions under , we test the null hypothesis that the events follow a Weibull RP.

While this paper is basically about nonparametric trend testing, it should be noted that fully parametric tests can be obtained by assuming a parametric model for the original event process, where the null hypothesis of RP refers to some parameter having a specific value. A trend test can then be constructed by the likelihood ratio method, see Section 6.2 for an example.

When no distributional assumptions are made on the process, obvious choices for estimators of and are the sample mean

and sample standard deviation

of the completely observed interevent times. These estimators are consistent as (see Appendix 1), but have the disadvantage of not utilizing the censored times at the end of the observation period. The corresponding estimator of is .

Alternative estimators which involve the censored time may be derived from standard renewal process theory. Again we refer to Appendix 1 for justification of the following estimators,

(10)

Another variance estimator (see Appendix 1 for its verification) is

(11)

The potential advantage of this estimator is that it tends to be smaller than and under alternatives with positive dependence between subsequent interevent times. This makes the estimated become smaller, which leads to larger (absolute) values of the test statistics and hence higher rejection probability under alternatives of monotonic trend, see for example vaurio2. We will, however, in our simulation and data examples use or and not , due to apparent less satisfactory significance level properties, as experienced in simulations.

4 Tests for Trend in Multiple Processes

Suppose now that similar processes are observed. Under the assumption that the processes are stochastically independent it may be of interest to test the null hypothesis that they all have no trend. One possible formulation of the null hypothesis is to let state that all the processes are independent RPs, but that they are not necessarily identically distributed. A stronger null hypothesis would be to state that the processes are independent RPs with the same distribution of the interevent times. We will below mostly stick to the former interpretation, but will consider the latter hypothesis in the example of Section 6.2.

Construction of the tests is based on the following fact, which we state as a lemma:

Lemma 1

Let be independent Brownian bridges and let be real numbers with . Then

is a Brownian bridge.

Proof: By linearity it is clear that is a Gaussian process with expectation . The result follows by a straightforward calculation of the covariance function.


Let , , and be, respectively, the censoring time, mean, standard deviation and coefficient of variation corresponding to process , . Let further

be random variables where

depends on the data from process only, and assume that , , where the are constants with . Then from Lemma 1, Corollary 1 and the already cited ’converging together lemma’ it follows that

(12)

Depending on the choice of weights , this can lead to different generalizations of the tests in Section 3. One way of constructing tests will be to perform the same transformations as in Section 3 to the left hand side of (12). This is a straightforward operation for the Lewis-Robinson type tests, but for the other types of tests, the derivation of the test statistics will be more cumbersome. For these we might therefore instead consider linear combinations of the tests for single processes.

4.1 Lewis-Robinson Type Test for Processes

By the same arguments as in Section 3.1, and with the assumption on the weights given above, the following statistic will be asymptotically standard normally distributed under ,

(13)

Here denotes the time until failure number in process , , .

Different choices of the weights will lead to different tests. For instance, , will mean equal weighting of the information from each process. This might, however, not be an optimal choice in cases where the processes have been observed for different lengths of time, or if there is a large variation in the number of events per process.

For the Poisson process case, kvaloeylindqvist suggested to generalize the Laplace test for a single process to a test statistic based on standardizing the sum . In the more general situation considered here, the form of the coefficients on the right hand side of (13) suggests the use of weights such that

(14)

Suppose now that the tend to infinity in such a manner that, for a tending to infinity, for positive constants , . Since the a.s. and , we have

(15)

Clearly, , so the statistic (13) will converge to a standard normal distribution under the null hypothesis .

Inserting the weights from (15) and rearranging we can write the test statistic (13) as

(16)

Lawless2012tmt considered a similar test statistic for the time censored case, but under the slightly different null hypothesis that all the processes have constant rate functions, and with asymptotics as . Let for . The test statistic of what they named the generalized Laplace test is

which under the null hypothesis is asymptotically standard normal as .

4.2 Other Tests for Processes

For the other tests considered in Section 3 it is in principle possible to replace the by and apply the same operations as for the case . This corresponds to what we did for the Lewis-Robinson test in the previous subsection, but here things are easy due to the linearity of integrals. This also applies to the extended Lewis-Robinson test. For the remaining tests it is not straightforward, however, to derive explicit expressions for the test statistics, and it is neither clear what would be the best weights to use. The problem associated with the Cramér-von Mises and Anderson-Darling tests are of course that the integrand is a square, while for the Kolmogorov-Smirnov test the various processes are mixed together before taking the absolute value, making tractable expressions impossible.

Another possibility for these last mentioned tests would therefore be to use (weighted) sums of the individual test statistics to define the new test statistics. Such an approach requires, on the other hand, the distributions of sums or linear combinations of the limiting distributions for the single process cases. These may be determined by simulations or, for larger , by normal approximations. Note also that scholzstephens have considered the distribution of sums of independent Anderson-Darling statistics.

For such linear combinations there are no obvious choices for the weights given to each process. A reasonable choice under the assumption of the same interevent distribution in all processes would be to let the weights be proportional to . Otherwise, it may be tempting to use weights like (14), hence taking into account the length of observation of each process as well as the number of observed events and the coefficient of variation of the interevent times. A problem would then of course be that these weights are random, making exact simulation of the distribution under the null hypothesis impossible.

In practice we have found that the normal approximation works fairly well for the Cramér-von Mises test, but less well for the Anderson-Darling test due to the very skew distribution of the Anderson-Darling statistic.

5 Simulation Study

We have done various simulations to study and compare the properties of the tests. When we report results for single processes we do not include the Cramr-von Mises test as this test had less power than the Anderson-Darling test, while for several processes we do not include the Anderson-Darling test as the Cramr-von Mises test had better level properties in this case as discussed in Section 4.2. For the extended Lewis-Robinson test we chose in (9) and we only report this test for non-monotonic trend as it has inferior power against monotonic trends.

In the reported simulations we estimated rejection probabilities by simulating 100 000 data sets for each choice of model and parameter values, and recorded the relative number of rejections of each test. The standard errors of the simulated rejection probabilities are then

. All simulations were done in R. The nominal significance level was set to 5%.

To simulate data with trend, we used the trend-renewal processes (TRP) [leh] which in short is defined as follows: Let be a non-negative function defined for and let . Then the process is a TRP with trend function and renewal distribution , if is an RP with interevent times having the distribution .

The RP, the non-homogeneous Poisson process (NHPP) and the HPP are all special cases of the TRP. For example, if the trend function is constant, then the TRP is an RP, while if the distribution is the unit exponential distribution, then the TRP is an NHPP with intensity function . The trend in a TRP is hence governed by the trend function , and by letting the distribution be any positive-valued distribution, we are left with a large class of processes with trend. In our simulations we will use parameterizations of the TRP where the renewal distribution is a Weibull-distribution and the trend function is either of so called power law or bath tube type, see Section 5.2 below.

5.1 One Process - Level Properties

First the level properties of the tests were studied by generating data sets from Weibull RPs with shape parameters respectively 0.75 and 1.5, corresponding respectively to a process which is overdispersed and a process which is underdispersed relative to an HPP. In Figure 1 the simulated level of the tests for systems with the expected number of events ranging from 10 to 60 are reported.

Figure 1: Simulated level properties as a function of expected number of events, with data generated from Weibull RPs with shape parameters respectively 0.75 (overdispersed RP) and 1.5 (underdispersed RP). Abbreviations: LR = Lewis-Robinson, KS = Kolmogorov-Smirnov, AD = Anderson-Darling, ELR = Extended Lewis-Robinson test.

The tests mostly have adequate level properties, but all tests are a bit non-conservative for small samples in the underdispersed case, while the Kolmogorov-Smirnov test is too conservative in the overdispersed case.

5.2 One Process - Power Properties

Data sets with a monotonic trend were generated by simulating data from TRPs with the renewal distribution being Weibull and the trend function being of the power law form . The rejection probability as a function of was simulated, where corresponds to a decreasing trend, corresponds to no trend and corresponds to an increasing trend.

Figure 2: Simulated power properties as a function of trend parameter , with data simulated from TRPs with trend function and Weibull renewal distribution with shape parameters respectively (overdispersed TRP) and (underdispersed TRP). The censoring time is adjusted such that the expected number of failures is 30. Abbreviations: LR = Lewis-Robinson, KS = Kolmogorov-Smirnov, AD = Anderson-Darling.

Two different values of the shape parameter of the Weibull renewal distribution were considered, and . The censoring times were adjusted such that the expected number of failures was 30. The results are displayed in Figure 2. We see in this figure that the Anderson-Darling test is the most powerful test against decreasing trend, but is a bit less powerful than the Lewis-Robinson test for increasing trend. The Kolmogorov-Smirnov test is less powerful than the other tests.

Data sets with a bathtub trend were generated by simulating data from TRPs with trend function on the form displayed in Figure 3.

Figure 3: Bathtub-shaped trend function.

Here is the average of over . The degree of bathtub shape can be expressed by the parameter , with corresponding to a horizontal line (no trend).

The rejection probability as a function of was simulated with and in each case set to values such that the expected number of failures in each phase (decreasing, no, increasing trend) were equal to 20. The shape parameter of the Weibull renewal distribution was set to respectively and . The results are displayed in Figure 4.

Figure 4: Simulated power properties as a function of trend parameter , with data simulated from TRPs with bathtub trend function (Figure 3), Weibull renewal distribution with shape parameter (overdispersed) and (underdispersed TRP), and expected number of failures in each phase equal to 20. Abbreviations: LR = Lewis-Robinson, KS = Kolmogorov-Smirnov, AD = Anderson-Darling, ELR = Extended Lewis-Robinson test

We see in Figure 4 that the extended Lewis-Robinson test and the Anderson-Darling test have the ability to detect this non-monotonic trend, while the other tests have no power in such cases. Not surprisingly, the trend is easier to detect in the underdispersed case. The extended Lewis-Robinson test with (9) is by its construction particularly well suited for picking up non-monotonic trends which are symmetric around the mid-point of the observation interval, , as we have in this case.

5.3 Several Processes

When considering several processes, the number of processes is one of the important factors for the behavior of the tests. We show here some simulations which illustrate power and level properties for the test with different number of processes. In this setting with several processes the generalized Laplace test also applies.

Figures 5 and 6 show power properties for cases with respectively 5 and 25 processes and with censoring time chosen such that the expected number of events in each process is 20. Simulations with other expected number of failures showed similar behavior, just with lower or higher power depending on whether the expected number of failures was lower or higher. These simulations show that the Lewis-Robinson type test has the best power properties in these monotonic trend cases. We also notice that the generalized Laplace test is very similar to the Lewis-Robinson test in the case with 25 processes.

Figure 5: Simulated power properties as a function of trend parameter , with data simulated from 5 TRPs with trend function and Weibull renewal distribution with shape parameters respectively (overdispersed TRP) and (underdispersed TRP). The censoring time is adjusted such that the expected number of failures in each process is 20. Abbreviations: LR = Lewis-Robinson, CvM = Cramr-von Mises, GL = Generalized Laplace Test.
Figure 6: Simulated power properties as a function of trend parameter , with data simulated from 25 TRPs with trend function and Weibull renewal distribution with shape parameters respectively (overdispersed TRP) and (underdispersed TRP). The censoring time is adjusted such that the expected number of failures in each process is 20. Abbreviations: LR = Lewis-Robinson, CvM = Cramr-von Mises, GL = Generalized Laplace Test.

6 Case Studies

6.1 Load-Haul-Dump Machine Data (Kumar et al., 1989)

kkg reported failure data for a load-haul-dump machine operating in a Swedish mine. For the purpose of this example we considered the data to be time censored at hours. The recorded failure times of the machine up to this time are reported in Table 1, and a plot of the observed process for is given in the left panel of Figure 7. The plot seems to indicate a non-monotonic trend, apparently in the form of a bathtub trend.

For illustration we also show, in the right panel of Figure 7, a plot of the function for . This is the transformed and tied down version of , and should, if the null hypothesis holds, be close to a Brownian bridge. However, this plot too indicates a non-monotonic trend with an upward deviation in the first part and a downward deviation in the second part.

16 39 71 95 98 110 114 226 294 344 555 599
757 822 963 1077 1167 1202 1257 1317 1345 1372 1402 1536
1625 1643 1675 1726 1736 1772 1796 1799 1814 1868 1894 1970
Table 1: Load-haul-dump data. Failure times in hours. The data are time censored at 2000 hours.

.

Estimators
Sample estimators - not including censored time 54.72 48.61 0.888
Sample estimators - including censored time 55.56 47.23 0.850
Parametric: Weibull - including censored time 55.46 47.22 0.851
Table 2: Load-haul-dump data. Parameter estimates using methods of Section 3.6

For estimation of the coefficient of variation under the null hypothesis, we estimated the parameters using methods considered in Section 3.6. The results are given in Table 2. It is seen that the estimates which use the censored time are very close, while the one that disregard this time gives a slightly higher estimated coefficient of variation. This might be a coincidence, however, and will not be generally valid.

In order to calculate the LR-test statistic (4), we first calculated the Laplace test statistic, and then divided by the estimated coefficient of variation, to get using the estimates in the first row of Table 2. This gave the -value 0.50 for a two-sided test. We also calculated the estimator of (11), which gave the result 42.77, which is lower than the estimates of in Table 2, and would give an estimated coefficient of variation of and a test statistic of and a -value of 0.44. This illustrates the effect of using , as estimator of , as discussed in Section 3.6, namely to possibly give a lower estimated coefficient of variation, and in turn a lower calculated -value.

Two-sided -values for all tests are reported in Table 3. In the extended Lewis-Robinson test we used , and it is interesting to see that this test detected a significant trend in the data while the tests for monotonic trend had fairly high -values. The example thus illustrates the need for trend tests with power against non-monotonic trend.

Figure 7: Load-haul-dump data. Left: Plot of for . Right: Plot of the process for .
LR KS CvM AD ELR
0.50 0.29 0.13 0.086 0.011
Table 3: Load-haul-dump data. The table reports -values for trend tests applied to the load-haul-dump machine data in Table 1. Abbreviations: LR = Lewis-Robinson, KS = Kolmogorov-Smirnov, CvM = Cramr-von Mises, AD = Anderson-Darling, ELR = Extended Lewis-Robinson test. In ELR was used.

6.2 Small Bowel Motility Data (Aalen and Husebye, 1991)

aalenhusebye studied data on small bowel motility measured on 19 persons. In particular they considered data on the length of a cyclic motility pattern observed during a fasting state. The data are time censored, and each person had from one to nine complete cycles observed before the censoring, see aalenhusebye for the complete data set.

Since the number of periods for each patient are small, and our methods are constructed for the case when censoring times and number of events tend to , we will consider testing of the null hypothesis that the 19 processes are independent RPs with the same distribution of interevent times. We therefore estimate common parameters using all the data.

It should be noted here that aalenhusebye fitted a model where the events for each patient follow a Weibull RP, with individual variation modeled by a gamma frailty model. The variation was, however, not found significant (-value 0.11), and this justifies to some extent our analysis. On the other hand, aalenhusebye did not check the data for a trend, which is the purpose of the present example.

Figure 8 shows the Nelson-Aalen estimate of the common mean function for the patients, see nelson88 and lawlessnadeau for the motivation and validity of the plot. As shown by lawlessnadeau, the Nelson-Aalen estimator is unbiased and consistent for under fairly general conditions. Here we present the plot as an illustration of an apparent increasing trend in the data.

Figure 8: Small bowel motility data. Plot of the the cumulative number of failures over time for the small bowel motility data. A parametric curve of the form has been fitted to the data.

In order to calculate test statistics we need an estimate for the coefficient of variation. By considering only the 80 fully observed periods we got , and from this . Since there are 19 censored interevent times in these data, one for each patient, we found that the estimators and are less satisfactory. Instead we fitted a Weibull RP to the data, taking into account the censored periods. The resulting estimates were and . There is thus a clear underdispersion in the interevent times when comparing to the exponential distribution. Using , the LR-statistic (16) equals , where 1.95 is the value of the corresponding Laplace test statistic. Thus the -value would be 0.051 for testing the null hypothesis of HPP versus a monotonic trend, while it is 0.00024 for the LR-test for the null hypothesis of RP. The -values obtained for different tests are reported in Table 4. We see that all the tests find a significant trend in these data.

We also performed a parametric trend test using the TRP with a Weibull renewal distribution and a power law trend function, see Section 5. Leaving out further details, we report a -value for trend of 0.041 using a standard asymptotic likelihood ratio test.

LR CvM AD GL
0.007
Table 4: Small bowel motility data. The table report -values for trend tests applied to the data. Abbreviations: LR = Lewis-Robinson, CvM = Cramr-von Mises, AD = Anderson-Darling, GL = Generalized Laplace Test.

7 Conclusion

We have presented a novel class of tests for trend in time censored recurrent event processes, based on the general null hypothesis of an RP. This class includes, among other tests, new versions of the Lewis-Robinson test and the Anderson-Darling test, extending these tests to time censored processes. For the single process case, the Anderson-Darling test turns out to have attractive properties when used as a test for general alternatives, both monotonic and non-monotonic trends. If power against monotonic trends is of main interest, the Lewis-Robinson type test is on the other hand a safe choice, both for single and multiple processes.

The derived test statistics are based on asymptotic results for renewal processes. The calculated critical values are hence only approximate when used in small and medium sized samples. The simulation study shows, however, satisfactory performance of the tests, with some exceptions in cases with very small sample. In such cases an alternative procedure would be to simulate the null distribution of the test statistic by a permutation approach, permuting the order of the completely observed interevent times. Lawless2012tmt showed that this is a valid approach even for time censored processes, and we have confirmed this in simulations not reported here.

It is clear that the basic result of Corollary 1 in principle may give rise to a very large class of tests. We have in Section 3 considered four tests based on standard goodness-of-fit statistics, and as an example of the variety of other possible tests we added and studied in some detail a non-standard test, which led to a further extension of the Lewis-Robinson test.

An interesting fact of the constructed test statistics is that they may be viewed as test statistics for the case of Poisson processes, with null hypothesis corresponding to HPP, that are adjusted according to the coefficient of variation of the observed interevent times. This is exactly the way lewisrobinson obtained their test statistic for the event censored case, starting from the Laplace test.

R-code for the tests can be obtained from the authors.


Appendix 1

Consistent Estimator of

It is clear from the strong law of large numbers for renewal processes (see, e.g., ross) that

since . Note that by standard renewal process theory we have

Thus another consistent estimator of is given by . Note that we can write , so we have .

Consistent Estimator of

By the strong law of large numbers we have

Writing

it follows from Slutsky’s theorem that

is a consistent estimator of .

A disadvantage of the estimator , as with , is that they do not take into account the censored time . [chap. 5]gallager shows that

(17)

Here the left hand side is the long run average length of time since the last previous event, and the result says that this equals where is the coefficient of variation of the distribution of .

We use (17) in the following way. A straightforward calculation shows that

which after substitution in (17), noting that , gives the following consistent estimator for ,


An alternative variance estimator, , was presented in Section 3.6, see equation (11). To prove consistency of

under the null hypothesis of RP, we can consider separately the sum over odd

and even and use the strong law of large numbers on each of the two resulting sums, which are now sums of i.i.d. variables.

Appendix 2

The Extended Lewis-Robinson Test

The test statistic (9) is obtained as follows. Note first that we can write

where is the indicator function of the set . From (3) it follows that we can consider the integration

(18)

and similarly

(19)

Subtracting (19) from (18), we get

The statistic (9) is hence obtained from (3).

We finally prove that is normal with mean 0 and variance . For this we use the fact that, for a Gaussian process with mean function and covariance function , we have

The covariance function of the Brownian bridge is . Hence is normal with mean 0 and variance

(20)

Similarly, is normal with mean 0 and variance

(21)

Now we want , and thus we seemingly need also . We use, however, the following trick. Since is known to have variance , we can solve for in the equation to get

where we used (20-21) and the fact that .

References

  • [1] Aalen  Husebye1991aalenhusebye Aalen, O.  Husebye, E. 1991. Statistical analysis of repeated events forming renewal processes, Statistics in Medicine 10: 1227–1240.
  • [2] Anderson  Darling1952andersondarling Anderson, T. W.  Darling, D. A. 1952. Asymptotic theory of certain goodness of fit criteria based on stochastic processes, Annals of Mathematical Statistics 23: 193–212.
  • [3] Anderson  Darling1954andersondarling2 Anderson, T. W.  Darling, D. A. 1954. A test of goodness of fit, Journal of the American Statistical Association 49: 765–769.
  • [4] Ascher  Feingold1984ascherfeingold Ascher, H.  Feingold, H. 1984. Repairable Systems Reliability. Modeling, Inference, Misconceptions and Their Causes, Marcel Dekker, Inc., New York.
  • [5] Billingsley1999billingsley1999convergence Billingsley, P. 1999. Convergence of Probability Measures., Wiley Series in Probability and Statistics.
  • [6] Cohen  Sackrowitz1993cohensackrowitz Cohen, A.  Sackrowitz, H. B. 1993. Evaluating tests for increasing intensity of a Poisson process, Technometrics 35: 446–448.
  • [7] Cox  Lewis1966coxlewis Cox, D. R.  Lewis, P. A. W. 1966. The Statistical Analysis of Series of Events, Methuen, London.
  • [8] Donsker1952Donsker1952 Donsker, M. D. 1952.

    Justification and extension of Doob’s heuristic approach to the Kolmogorov–Smirnov theorems,

    Annals of Mathematical Statistics 23: 277–281.
  • [9] Gallager2013gallager Gallager, R. G. 2013. Stochastic Processes: Theory for Applications, Cambridge University Press.
  • [10] Kolmogorov1933kolmogorov Kolmogorov, A. 1933. Sulla determinazione empirica di una legge di distribuzione, Giornale dell’ Istituto Italiano degli Attuari 4: 83–91.
  • [11] [Kumar et al.]Kumar, Klefsjö  Granholm1989kkg Kumar, U., Klefsjö, B.  Granholm, S. 1989. Reliability investigation for a fleet of load haul dump machines in a Swedish mine, Reliability Engineering and System Safety 24: 341–361.
  • [12] Kvaløy  Lindqvist2003klRP Kvaløy, J. T.  Lindqvist, B. 2003. A class of tests for renewal process versus monontonic and nonmonotonic trend in repairable systems data, in B. Lindqvist  K. Doksum (eds), Mathematical and Statistical Methods in Reliability, Series on Quality, Reliability and Engineering Statistics, Vol. 7, World Scientific Publishing, Singapore, pp. 401–414.
  • [13] Kvaløy  Lindqvist1998kvaloeylindqvist Kvaløy, J. T.  Lindqvist, B. H. 1998. TTT-based tests for trend in repairable systems data, Reliability Engineering and System Safety 60: 13–28.
  • [14] Lawless  Nadeau1995lawlessnadeau Lawless, J. F.  Nadeau, C. 1995. Some simple robust methods for the analysis of recurrent events, Technometrics 37(2): 158–168.
  • [15] [Lawless et al.]Lawless, Çiğşar  Cook2012Lawless2012tmt Lawless, J., Çiğşar, C.  Cook, R. 2012. Testing for monotone trend in recurrent event processes, Technometrics 54: 147–158.
  • [16] Lewis  Robinson1974lewisrobinson Lewis, P. A. W.  Robinson, D. W. 1974. Testing for a monotone trend in a modulated renewal process, in F. Proschan  R. J. Serfling (eds), Reliability and Biometry, SIAM, Philadelphia, pp. 163–182.
  • [17] [Lindqvist et al.]Lindqvist, Elvebakk  Heggland2003leh Lindqvist, B. H., Elvebakk, G.  Heggland, K. 2003. The trend-renewal process for statistical analysis of repairable systems, Technometrics 45: 31–44.
  • [18] Mann1945mann Mann, H. B. 1945. Nonparametric tests against trend, Econometrica 13: 245–259.
  • [19] Nelson1988nelson88 Nelson, W. 1988. Graphical analysis of system repair data, Journal of Quality Technology 20(1).
  • [20] Ross1983ross Ross, S. M. 1983. Stochastic Processes, John Wiley, New York.
  • [21] Scholz  Stephens1987scholzstephens Scholz, F.  Stephens, M. 1987. -sample anderson-darling tests, Journal of the American Statistical Association 82: 918–924.
  • [22] Smirnov1948smirnov Smirnov, N. 1948. Table for estimating the goodness of fit of empirical distributions, Annals of Mathematical Statistics 19: 279–281.
  • [23] Viertävä  Vaurio2009vaurio2 Viertävä, J.  Vaurio, J. K. 2009. Testing statistical significance of trends in learning, ageing and safety indicator, Reliability Engineering and System Safety 94: 1128–1132.
  • [24]