On robust stopping times for detecting changes in distribution

04/24/2018 ∙ by Yuri Golubev, et al. ∙ KIT 0

Let X_1,X_2,... be independent random variables observed sequentially and such that X_1,...,X_θ-1 have a common probability density p_0, while X_θ,X_θ+1,... are all distributed according to p_1≠ p_0. It is assumed that p_0 and p_1 are known, but the time change θ∈Z^+ is unknown and the goal is to construct a stopping time τ that detects the change-point θ as soon as possible. The existing approaches to this problem rely essentially on some a priori information about θ. For instance, in Bayes approaches, it is assumed that θ is a random variable with a known probability distribution. In methods related to hypothesis testing, this a priori information is hidden in the so-called average run length. The main goal in this paper is to construct stopping times which do not make use of a priori information about θ, but have nearly Bayesian detection delays. More precisely, we propose stopping times solving approximately the following problem: & Δ(θ;τ^α)→_τ^α subject to α(θ;τ^α)<α for any θ>1, where α(θ;τ)=P_θ{τ<θ} is the false alarm probability and Δ(θ;τ)=E_θ(τ-θ)_+ is the average detection delay, that and (1+o(1))(θ/α), as θ/α and explain why such stopping times are robust w.r.t. a priori information about θ.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Let be independent random variables observed sequentially. It is assumed have a common probability density , while are all distributed according to a probability density . This paper deals with the simplest change-point detection problem where it is supposed and are known, but the time change is unknown, and the goal is to construct a stopping time that detects as soon as possible. The existing approaches to this problem rely essentially on some a priori information about . For instance, in Bayes approaches, it is assumed that is a random variable with a known probability distribution, see e.g. [12]. In methods related to hypothesis testing, this a priori information is hidden in the so-called average run length, see e.g. [7]. Our main goal in this paper is to construct robust stopping times which do not make use of a priori information about , but have detection delays close to Bayes ones.

In order to be more precise, denote by the probability distribution of and by the expectation with respect to this measure. In this paper, we characterize with the help of two functions in :

  • false alarm probability

  • average detection delay

and our goal is to construct stopping times solving the following problem:

(1)

The main difficulty in this problem is related to the fact that for a given stopping time the average delay depends on . This means that in order to compare two stopping times and , one has to compare two functions in . Obviously, this is not feasible from a mathematical viewpoint and the principal objective in this paper is to propose stopping times providing good approximative solutions to (1). Notice also here that similar problems are common and well-known in statistics and there are reasonable approaches to obtain their solutions.

In change-point detection, there are two standard methods for constructing stopping times.

  • A Bayes approach. The first Bayes change detection problem was stated in [4] for on-line quality control problem for continuous technological processes. In detecting changes in distributions this approach assumes that is a random variable with a known distribution

    and the goal is to construct a stopping time that solves the averaged version of (1), i.e.,

    (2)

    Emphasize that in contrast to (1), this problem is well defined from a mathematical viewpoint, but its solution depends on a priori law .

  • A hypothesis testing approach. The first non-Bayesian change detection algorithm based on sequential hypothesis testing was proposed in [7]. Denote by

    the observations till moment

    . The main idea in this approach is to test sequentially

    (3)

    So, stopping time is defined as follows:

    • if is accepted, the observations are continued, i.e., we test vs. ;

    • If is accepted, then we stop and .

In order to motivate our idea of robust stopping times, we discuss very briefly basic statistical properties of the above mentioned approaches.

1.1 A Bayes approach

Usually in this approach the geometric a priori distribution

is used. Positive parameter is assumed to be known. In this case, the optimal stopping time is given by the following famous theorem [12]:

Theorem 1.1.

The optimal Bayes stopping time (see (2)) is given by

(4)

where

and is a constant.

Notice that the geometric a priori distribution results in the following recursive formula for a posteriori probability (see, e.g., [12]):

(5)

So, if we denote for brevity

then (5) may be rewritten in the following equivalent form:

(6)

From this equation we see, in particular, that the Bayes stopping time depends on that is hardly known in practice. In statistics, in order to avoid such dependence, the uniform a priori distribution is usually used. Let’s look how this idea works in change point detection. The uniform a priori distribution assumes that and in this case we obtain immediately from (6)

Therefore, for

we get

Hence, the optimal stopping time in the case of the uniform a priori distribution is given by

(7)

where is some constant. Fig. 1 shows a typical trajectory of

, in detecting change in the Gaussian distribution with

.

Computing the false alarm probability for this stopping time is not difficult and based on the following simple fact. Let

Lemma 1.1.

For any

It follows immediately from the definition of that if , then . So, by this Lemma we get

As to the average detection delay, it can be easily computed with the help of the famous Wald identity [14, 2]. The next theorem summarizes principal properties of . Let us assume that

Theorem 1.2.

Let Then for defined by (7) we have

Fig. 1 illustrates this theorem showing , in the case of the change in the mean of the Gaussian distribution with .

Figure 1: Detecting change in the mean of Gaussian distribution with the help of .

We would like to emphasize that the fact that is linear in is not good from practical and theoretical viewpoints. In order to understand why it is so, let us now turn back to the Bayes setting assuming that . In this case the following theorem holds true.

Theorem 1.3.

Suppose . Then for defined by (4) we have

(8)

This theorem may be proved with the help of the standard techniques described, e.g., in [1].

Fig. 2 illustrates typical behavior of with . Notice that if is used in the considered case, then we obtain by (8)

So, we see that this mean detection delay is far away from the optimal Bayes one given by

Figure 2: Detecting change in the mean of Gaussian distribution with the help of ().

Let us now summarize briefly main facts related to the classical Bayes approach.

  • if , then the average detection delay of the Bayes stopping time grows linearly in ;

  • when , the maximal false alarm probability is not controlled.

In view of these facts it is clear that the standard Bayes technique cannot provide reasonable solutions to (1).

1.2 A hypothesis testing approach

The idea of this approach is based on the well-known sequential testing of two simple hypothesis [15]. However, we would like to emphasize that in contrast to the standard setting in [15]

, in the change-point detection, this approach has a rather heuristic character since here we test a simple hypothesis versus a compound alternative whose complexity grows with the observations volume.

In sequential hypothesis testing there are two common methods

  • maximum likelihood;

  • Bayesian.

The maximum likelihood test accepts hypothesis (see (3)) when

or, equivalently,

where

The threshold is computed as follows

where is the first type error probability. Notice that by Lemma 1.1

Therefore the maximum likelihood test results in the following stopping time:

(9)

Notice also that admits a simple recursive computation [7]. Indeed, notice

Therefore

(10)

This method is usually called CUSUM algorithm. It is well known that it is optimal in Lorden [5] sense, i.e., for properly chosen , minimizes

in the class of stopping times , see [6].

However, with this method cannot control the false alarm probability as shows the following theorem.

Theorem 1.4.

For any

As

The Bayesian test is based on the assumption that

is uniformly distributed on

. So, this test accepts when

(11)

Since

and

the test statistics in (

11) admits the following recursive computation:

So, the corresponding stopping time is given by

In the literature, this method is known as Shirayev-Roberts (SR) algorithm. It was firstly proposed in [11] and [10]. In [8] and [3] it was shown that it minimizes the integral average delay

over all stopping times with More detailed statistical properties of SR procedure can be found in [9].

As one can see on Fig. 3, in practice, there is no significant difference between CUSUM and SR algorithms.

Figure 3: Detecting change in the mean of Gaussian distribution with the help of CUSUM and SR procedures.

Notice also that for SR method the fact similar to Theorem 1.4 holds true. So, the standard hypothesis testing methods results in stopping times with uncontrollable false alarm probabilities.

2 Robust stopping times

The main idea in this paper is to make use of multiple hypothesis testing methods for constructing stopping times. This can be done very easily by replacing the constant threshold in the ML test (9) by one depending on . So, we define the stopping time

In order to control the false alarm probability and to obtain a nearly minimal average detection delay, we are looking for a minimal function , such that

We begin our construction of with the following function:

and define -iterated by

Next, for given , define

(12)

Consider the following random variable:

The next theorem plays a cornerstone role in our construction of robust stopping times.

Theorem 2.1.

For any , , and

Therefore we can define the quantile of order

of by

Fig. 4 shows the distribution functions and quantiles of for computed with the help of Monte-Carlo method.

Figure 4: Distribution functions and quantiles of .

The next theorem describes principal properties of the stopping time

Theorem 2.2.

For any

where is a solution to

(13)

The asymptotic behavior of the average delay is described by the following theorem

Theorem 2.3.

For any , as and

(14)

Remark. It is easy to check with a simple algebra that for any given

The robustness of

w.r.t. a priori geometric distribution of

follows now almost immediately from (14). Indeed, suppose is a random variable with

Then, averaging (14) w.r.t. this distribution, we obtain

as , and with (8) we arrive at

Theorem 2.4.

As

where is the optimal Bayesian stopping time (see Theorem 1.1).

Appendix A Appendix section

Proof of Lemma 1.1.

Since

is a martingale with , we have

In what follows we denote by be i.i.d. standard exponential random variables.

Lemma A.1.

For any and

where is defined by (12).

Proof.

It is easy to check with a simple algebra that for any

Therefore with this inequality we obtain

(15)

It follows immediately from the definition of , see (12), that

It is also easy to check numerically that for any and

Therefore, substituting the above equations in (15), we complete the proof. ∎

Lemma A.2.

For any

where random process is defined by (10).

Proof.

Define random integers by

From (10) it is clear that these random variables are renovation points for the random process and therefore random variables

are independent. Since is non-decreasing in and obviously , we get

Therefore, to finish the proof, it suffices to notice that by (10) and Lemma 1.1

Theorem 2.1 follows now immediately from Lemmas A.1, A.2.

Proof of Theorem 2.2.

It follows from (10) that for all

and therefore

where

Computing is based on the famous Wald’s identity [14] (see also [2]). For given , define function

It is clear that is a convex function and therefore for any

Hence,

Next, we obtain by Wald’s identity

and thus

(16)

To finish the proof, we choose (see (13)), and notice that . Hence, by (16)

Proof of Theorem 2.3.

It follows immediately from Theorem 2.1 that as

(17)

Next, by convexity of we obtain for any

Therefore, choosing

we get by (13)

(18)

So, our next step is to upper bound . First, notice that

and thus

Therefore it follows immediately from this equation and (12) that as

(19)

It is also easy to check that

(20)

Finally, substituting (17), (19), and (20) in (18), we complete the proof. ∎

References

  • [1] Basseville, M. and Nikiforov, I. V. (1996). Detection of Abrupt Changes: Theory and Application. Prentice-Hall.
  • [2] Blackwell, D. (1946). On an equation of Wald, Annals of Math. Stat. 17 84–87.
  • [3] Feinberg, E.A. and Shiryaev, A. N. (2006). Quickest detection of drift change for Brownian motion in generalized and Bayesian settings. Statist. Decisions. 24 445–470.
  • [4] Girshick M. A. and Rubin, H. (1952). A Bayes approach to a quality control model. Annals Math. Statistics. 23 114–125.
  • [5] Lorden, G. (1971). Procedures for reacting to a change in distribution. Ann. Math. Statist. 42 No. 6 1897–1908.
  • [6] Moustakides, G. V. (1986). Optimal stopping times for detecting changes in distributions. Ann. of Statist. Vol. 14, No. 4, 1370–1387.
  • [7] Page, E. S. (1954). Continuous inspection schemes, Biometrika, 41, 100–115.
  • [8] Pollak, M. and Tartakovsky, A. G. (2009). Optimality properties of Shirayev-Roberts procedure. Statist. Sinica, 19, 1729–1739.
  • [9] Polunchenko, A. S. and Tartakovsky, A. G. (2010). On optimality of the Shirayev-Roberts procedure for detecting a change in distribution. Annals of Statist., 38, No. 6, 3445–3457.
  • [10] Roberts, S.W. (1966). A comparison of some control chart procedures. Technometrics 8, 411–430.
  • [11] Shiryaev, A.N. (1961). The problem of the most rapid detection disturbance in a stationary process. Dokl. Math. 2 795–799.
  • [12] Shiryaev, A. N. (1978). Optimal Stopping Rules, Springer-Verlag, Berlin, Heidelberg.
  • [13] Shiryaev, A. N. (1963). On optimum methods in quickest detection problems. Theory Probab. Appl. 8 22–46.
  • [14] Wald, A. (1944). On cumulative sums of random variables. The Annals of Math. Stat. 15 No. 3 283–296.
  • [15] Wald, A. (1945). Sequential Tests of Statistical Hypotheses. Annals of Math. Stat. 16 No. 2 117–186.