Let be independent random variables observed sequentially. It is assumed have a common probability density , while are all distributed according to a probability density . This paper deals with the simplest change-point detection problem where it is supposed and are known, but the time change is unknown, and the goal is to construct a stopping time that detects as soon as possible. The existing approaches to this problem rely essentially on some a priori information about . For instance, in Bayes approaches, it is assumed that is a random variable with a known probability distribution, see e.g. . In methods related to hypothesis testing, this a priori information is hidden in the so-called average run length, see e.g. . Our main goal in this paper is to construct robust stopping times which do not make use of a priori information about , but have detection delays close to Bayes ones.
In order to be more precise, denote by the probability distribution of and by the expectation with respect to this measure. In this paper, we characterize with the help of two functions in :
false alarm probability
average detection delay
and our goal is to construct stopping times solving the following problem:
The main difficulty in this problem is related to the fact that for a given stopping time the average delay depends on . This means that in order to compare two stopping times and , one has to compare two functions in . Obviously, this is not feasible from a mathematical viewpoint and the principal objective in this paper is to propose stopping times providing good approximative solutions to (1). Notice also here that similar problems are common and well-known in statistics and there are reasonable approaches to obtain their solutions.
In change-point detection, there are two standard methods for constructing stopping times.
A Bayes approach. The first Bayes change detection problem was stated in  for on-line quality control problem for continuous technological processes. In detecting changes in distributions this approach assumes that is a random variable with a known distribution
and the goal is to construct a stopping time that solves the averaged version of (1), i.e.,
Emphasize that in contrast to (1), this problem is well defined from a mathematical viewpoint, but its solution depends on a priori law .
A hypothesis testing approach. The first non-Bayesian change detection algorithm based on sequential hypothesis testing was proposed in . Denote by
the observations till moment. The main idea in this approach is to test sequentially
So, stopping time is defined as follows:
if is accepted, the observations are continued, i.e., we test vs. ;
If is accepted, then we stop and .
In order to motivate our idea of robust stopping times, we discuss very briefly basic statistical properties of the above mentioned approaches.
1.1 A Bayes approach
Usually in this approach the geometric a priori distribution
is used. Positive parameter is assumed to be known. In this case, the optimal stopping time is given by the following famous theorem :
The optimal Bayes stopping time (see (2)) is given by
and is a constant.
Notice that the geometric a priori distribution results in the following recursive formula for a posteriori probability (see, e.g., ):
So, if we denote for brevity
then (5) may be rewritten in the following equivalent form:
From this equation we see, in particular, that the Bayes stopping time depends on that is hardly known in practice. In statistics, in order to avoid such dependence, the uniform a priori distribution is usually used. Let’s look how this idea works in change point detection. The uniform a priori distribution assumes that and in this case we obtain immediately from (6)
Hence, the optimal stopping time in the case of the uniform a priori distribution is given by
where is some constant. Fig. 1 shows a typical trajectory of
, in detecting change in the Gaussian distribution with.
Computing the false alarm probability for this stopping time is not difficult and based on the following simple fact. Let
It follows immediately from the definition of that if , then . So, by this Lemma we get
Let Then for defined by (7) we have
Fig. 1 illustrates this theorem showing , in the case of the change in the mean of the Gaussian distribution with .
We would like to emphasize that the fact that is linear in is not good from practical and theoretical viewpoints. In order to understand why it is so, let us now turn back to the Bayes setting assuming that . In this case the following theorem holds true.
Suppose . Then for defined by (4) we have
This theorem may be proved with the help of the standard techniques described, e.g., in .
So, we see that this mean detection delay is far away from the optimal Bayes one given by
Let us now summarize briefly main facts related to the classical Bayes approach.
if , then the average detection delay of the Bayes stopping time grows linearly in ;
when , the maximal false alarm probability is not controlled.
In view of these facts it is clear that the standard Bayes technique cannot provide reasonable solutions to (1).
1.2 A hypothesis testing approach
, in the change-point detection, this approach has a rather heuristic character since here we test a simple hypothesis versus a compound alternative whose complexity grows with the observations volume.
In sequential hypothesis testing there are two common methods
The maximum likelihood test accepts hypothesis (see (3)) when
The threshold is computed as follows
where is the first type error probability. Notice that by Lemma 1.1
Therefore the maximum likelihood test results in the following stopping time:
Notice also that admits a simple recursive computation . Indeed, notice
This method is usually called CUSUM algorithm. It is well known that it is optimal in Lorden  sense, i.e., for properly chosen , minimizes
in the class of stopping times , see .
However, with this method cannot control the false alarm probability as shows the following theorem.
The Bayesian test is based on the assumption that
is uniformly distributed on. So, this test accepts when
the test statistics in (11) admits the following recursive computation:
So, the corresponding stopping time is given by
over all stopping times with More detailed statistical properties of SR procedure can be found in .
As one can see on Fig. 3, in practice, there is no significant difference between CUSUM and SR algorithms.
Notice also that for SR method the fact similar to Theorem 1.4 holds true. So, the standard hypothesis testing methods results in stopping times with uncontrollable false alarm probabilities.
2 Robust stopping times
The main idea in this paper is to make use of multiple hypothesis testing methods for constructing stopping times. This can be done very easily by replacing the constant threshold in the ML test (9) by one depending on . So, we define the stopping time
In order to control the false alarm probability and to obtain a nearly minimal average detection delay, we are looking for a minimal function , such that
We begin our construction of with the following function:
and define -iterated by
Next, for given , define
Consider the following random variable:
The next theorem plays a cornerstone role in our construction of robust stopping times.
For any , , and
Therefore we can define the quantile of orderof by
Fig. 4 shows the distribution functions and quantiles of for computed with the help of Monte-Carlo method.
The next theorem describes principal properties of the stopping time
where is a solution to
The asymptotic behavior of the average delay is described by the following theorem
For any , as and
Remark. It is easy to check with a simple algebra that for any given
The robustness of
w.r.t. a priori geometric distribution offollows now almost immediately from (14). Indeed, suppose is a random variable with
Then, averaging (14) w.r.t. this distribution, we obtain
as , and with (8) we arrive at
where is the optimal Bayesian stopping time (see Theorem 1.1).
Appendix A Appendix section
Proof of Lemma 1.1.
is a martingale with , we have
In what follows we denote by be i.i.d. standard exponential random variables.
For any and
where is defined by (12).
where random process is defined by (10).
Define random integers by
From (10) it is clear that these random variables are renovation points for the random process and therefore random variables
are independent. Since is non-decreasing in and obviously , we get
Proof of Theorem 2.2.
It follows from (10) that for all
It is clear that is a convex function and therefore for any
Next, we obtain by Wald’s identity
Proof of Theorem 2.3.
It follows immediately from Theorem 2.1 that as
Next, by convexity of we obtain for any
we get by (13)
So, our next step is to upper bound . First, notice that
Therefore it follows immediately from this equation and (12) that as
It is also easy to check that
-  Basseville, M. and Nikiforov, I. V. (1996). Detection of Abrupt Changes: Theory and Application. Prentice-Hall.
-  Blackwell, D. (1946). On an equation of Wald, Annals of Math. Stat. 17 84–87.
-  Feinberg, E.A. and Shiryaev, A. N. (2006). Quickest detection of drift change for Brownian motion in generalized and Bayesian settings. Statist. Decisions. 24 445–470.
-  Girshick M. A. and Rubin, H. (1952). A Bayes approach to a quality control model. Annals Math. Statistics. 23 114–125.
-  Lorden, G. (1971). Procedures for reacting to a change in distribution. Ann. Math. Statist. 42 No. 6 1897–1908.
-  Moustakides, G. V. (1986). Optimal stopping times for detecting changes in distributions. Ann. of Statist. Vol. 14, No. 4, 1370–1387.
-  Page, E. S. (1954). Continuous inspection schemes, Biometrika, 41, 100–115.
-  Pollak, M. and Tartakovsky, A. G. (2009). Optimality properties of Shirayev-Roberts procedure. Statist. Sinica, 19, 1729–1739.
-  Polunchenko, A. S. and Tartakovsky, A. G. (2010). On optimality of the Shirayev-Roberts procedure for detecting a change in distribution. Annals of Statist., 38, No. 6, 3445–3457.
-  Roberts, S.W. (1966). A comparison of some control chart procedures. Technometrics 8, 411–430.
-  Shiryaev, A.N. (1961). The problem of the most rapid detection disturbance in a stationary process. Dokl. Math. 2 795–799.
-  Shiryaev, A. N. (1978). Optimal Stopping Rules, Springer-Verlag, Berlin, Heidelberg.
-  Shiryaev, A. N. (1963). On optimum methods in quickest detection problems. Theory Probab. Appl. 8 22–46.
-  Wald, A. (1944). On cumulative sums of random variables. The Annals of Math. Stat. 15 No. 3 283–296.
-  Wald, A. (1945). Sequential Tests of Statistical Hypotheses. Annals of Math. Stat. 16 No. 2 117–186.