# On robust stopping times for detecting changes in distribution

Let X_1,X_2,... be independent random variables observed sequentially and such that X_1,...,X_θ-1 have a common probability density p_0, while X_θ,X_θ+1,... are all distributed according to p_1≠ p_0. It is assumed that p_0 and p_1 are known, but the time change θ∈Z^+ is unknown and the goal is to construct a stopping time τ that detects the change-point θ as soon as possible. The existing approaches to this problem rely essentially on some a priori information about θ. For instance, in Bayes approaches, it is assumed that θ is a random variable with a known probability distribution. In methods related to hypothesis testing, this a priori information is hidden in the so-called average run length. The main goal in this paper is to construct stopping times which do not make use of a priori information about θ, but have nearly Bayesian detection delays. More precisely, we propose stopping times solving approximately the following problem: & Δ(θ;τ^α)→_τ^α subject to α(θ;τ^α)<α for any θ>1, where α(θ;τ)=P_θ{τ<θ} is the false alarm probability and Δ(θ;τ)=E_θ(τ-θ)_+ is the average detection delay, that and (1+o(1))(θ/α), as θ/α and explain why such stopping times are robust w.r.t. a priori information about θ.

There are no comments yet.

## Authors

• 3 publications
• 1 publication
• ### A refined and asymptotic analysis of optimal stopping problems of Bruss and Weber

The classical secretary problem has been generalized over the years into...
05/26/2017 ∙ by Guy Louchard, et al. ∙ 0

• ### Quickest Change Detection in the Presence of a Nuisance Change

In the quickest change detection problem in which both nuisance and crit...
02/09/2019 ∙ by Tze Siong Lau, et al. ∙ 0

• ### Multi-Sensor Slope Change Detection

We develop a mixture procedure for multi-sensor systems to monitor data ...
09/01/2015 ∙ by Yang Cao, et al. ∙ 0

• ### Online Change-Point Detection in High-Dimensional Covariance Structure with Application to Dynamic Networks

One important task in online data analysis is detecting network change, ...
11/18/2019 ∙ by Lingjun Li, et al. ∙ 0

• ### Quickest Detection over Sensor Networks with Unknown Post-Change Distribution

We propose a quickest change detection problem over sensor networks wher...
12/23/2020 ∙ by Deniz Sargun, et al. ∙ 0

• ### Binary Hypothesis Testing with Deterministic Finite-Memory Decision Rules

In this paper we consider the problem of binary hypothesis testing with ...
05/15/2020 ∙ by Tomer Berg, et al. ∙ 0

• ### Optional Stopping with Bayes Factors: a categorization and extension of folklore results, with an application to invariant situations

It is often claimed that Bayesian methods, in particular Bayes factor me...
07/24/2018 ∙ by Allard Hendriksen, et al. ∙ 0

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

Let be independent random variables observed sequentially. It is assumed have a common probability density , while are all distributed according to a probability density . This paper deals with the simplest change-point detection problem where it is supposed and are known, but the time change is unknown, and the goal is to construct a stopping time that detects as soon as possible. The existing approaches to this problem rely essentially on some a priori information about . For instance, in Bayes approaches, it is assumed that is a random variable with a known probability distribution, see e.g. [12]. In methods related to hypothesis testing, this a priori information is hidden in the so-called average run length, see e.g. [7]. Our main goal in this paper is to construct robust stopping times which do not make use of a priori information about , but have detection delays close to Bayes ones.

In order to be more precise, denote by the probability distribution of and by the expectation with respect to this measure. In this paper, we characterize with the help of two functions in :

• false alarm probability

 α(θ;τ)=Pθ{τ<θ};
• average detection delay

 Δ(θ;τ)=Eθ(τ−θ)+,where (x)+=max{0,x},

and our goal is to construct stopping times solving the following problem:

 Δ(θ;τα)→minταsubject toα(θ;τα)≤αfor anyθ≥1. (1)

The main difficulty in this problem is related to the fact that for a given stopping time the average delay depends on . This means that in order to compare two stopping times and , one has to compare two functions in . Obviously, this is not feasible from a mathematical viewpoint and the principal objective in this paper is to propose stopping times providing good approximative solutions to (1). Notice also here that similar problems are common and well-known in statistics and there are reasonable approaches to obtain their solutions.

In change-point detection, there are two standard methods for constructing stopping times.

• A Bayes approach. The first Bayes change detection problem was stated in [4] for on-line quality control problem for continuous technological processes. In detecting changes in distributions this approach assumes that is a random variable with a known distribution

 πm=P{θ=m},m=1,2,…,

and the goal is to construct a stopping time that solves the averaged version of (1), i.e.,

 ∞∑m=1πmΔ(m;ταπ)→minταπsubject to∞∑m=1πmα(m;ταπ)≤α. (2)

Emphasize that in contrast to (1), this problem is well defined from a mathematical viewpoint, but its solution depends on a priori law .

• A hypothesis testing approach. The first non-Bayesian change detection algorithm based on sequential hypothesis testing was proposed in [7]. Denote by

the observations till moment

. The main idea in this approach is to test sequentially

 \textslsimplehypothesisHn0:Xn∼n∏i=1p0(xi)\textslvs.compoundalternativeHn1:Xn∼m−1∏i=1p0(xi)n∏i=mp1(xi),m≤n. (3)

So, stopping time is defined as follows:

• if is accepted, the observations are continued, i.e., we test vs. ;

• If is accepted, then we stop and .

In order to motivate our idea of robust stopping times, we discuss very briefly basic statistical properties of the above mentioned approaches.

### 1.1 A Bayes approach

Usually in this approach the geometric a priori distribution

 πm=γ(1−γ)m−1,m=1,2,…,γ>0,

is used. Positive parameter is assumed to be known. In this case, the optimal stopping time is given by the following famous theorem [12]:

###### Theorem 1.1.

The optimal Bayes stopping time (see (2)) is given by

 ταγ=min{k:¯π(Xk)≥1−αγ}, (4)

where

 ¯πγ(Xk)=P{θ≤k|Xk},

and is a constant.

Notice that the geometric a priori distribution results in the following recursive formula for a posteriori probability (see, e.g., [12]):

 ¯πγ(Xk)==[γ+(1−γ)¯πγ(Xk−1)]p1(Xk)[γ+(1−γ)¯πγ(Xk−1)]p1(Xk)+[1−¯πγ(Xk−1)](1−γ)p0(Xk). (5)

So, if we denote for brevity

 ργ(Xk)=¯πγ(Xk)1−¯πγ(Xk),

then (5) may be rewritten in the following equivalent form:

 ργ(Xk)=γ+ργ(Xk−1)1−γ×p1(Xk)p0(Xk). (6)

From this equation we see, in particular, that the Bayes stopping time depends on that is hardly known in practice. In statistics, in order to avoid such dependence, the uniform a priori distribution is usually used. Let’s look how this idea works in change point detection. The uniform a priori distribution assumes that and in this case we obtain immediately from (6)

 ρ0(Xk)=ρ0(Xk−1)×p1(Xk)p0(Xk).

Therefore, for

 L0(Xk)=log[ρ0(Xk)],

we get

 L0(Xk)=k∑i=1logp1(Xi)p0(Xi).

Hence, the optimal stopping time in the case of the uniform a priori distribution is given by

 τα∘=min{k:L0(Xk)≥tα}, (7)

where is some constant. Fig. 1 shows a typical trajectory of

, in detecting change in the Gaussian distribution with

.

Computing the false alarm probability for this stopping time is not difficult and based on the following simple fact. Let

 ϕ(λ)=E∞exp[λlogp1(X1)p0(X1)].
###### Lemma 1.1.

For any

It follows immediately from the definition of that if , then . So, by this Lemma we get

 P∞{τα∘<∞}≤exp(−tα).

As to the average detection delay, it can be easily computed with the help of the famous Wald identity [14, 2]. The next theorem summarizes principal properties of . Let us assume that

 μ0def=∫logp0(x)p1(x)p0(x)dx>0andμ1def=∫logp1(x)p0(x)p1(x)dx>0.
###### Theorem 1.2.

Let Then for defined by (7) we have

 α(θ;τα∘)≤α, Δ(θ;τα∘)=log(1/α)+θμ0μ1.

Fig. 1 illustrates this theorem showing , in the case of the change in the mean of the Gaussian distribution with .

We would like to emphasize that the fact that is linear in is not good from practical and theoretical viewpoints. In order to understand why it is so, let us now turn back to the Bayes setting assuming that . In this case the following theorem holds true.

###### Theorem 1.3.

Suppose . Then for defined by (4) we have

 maxθ∈Z+α(θ;ταγ)=1, Δ(θ;ταγ)=log[1/(γα)]μ1+O(1),asγ,α→0. (8)

This theorem may be proved with the help of the standard techniques described, e.g., in [1].

Fig. 2 illustrates typical behavior of with . Notice that if is used in the considered case, then we obtain by (8)

 EΔ(θ;τα∘)=log(1/α)μ1+μ0μ1×1γ.

So, we see that this mean detection delay is far away from the optimal Bayes one given by

 EΔ(θ;ταγ)=log(1/α)μ1+1μ1×log1γ+O(1),asγ,α→0.

Let us now summarize briefly main facts related to the classical Bayes approach.

• if , then the average detection delay of the Bayes stopping time grows linearly in ;

• when , the maximal false alarm probability is not controlled.

In view of these facts it is clear that the standard Bayes technique cannot provide reasonable solutions to (1).

### 1.2 A hypothesis testing approach

The idea of this approach is based on the well-known sequential testing of two simple hypothesis [15]. However, we would like to emphasize that in contrast to the standard setting in [15]

, in the change-point detection, this approach has a rather heuristic character since here we test a simple hypothesis versus a compound alternative whose complexity grows with the observations volume.

In sequential hypothesis testing there are two common methods

• maximum likelihood;

• Bayesian.

The maximum likelihood test accepts hypothesis (see (3)) when

 maxk≤n∏k−1i=1p0(Xi)∏ni=kp1(Xi)∏ni=1p0(Xi)≥tα

or, equivalently,

 M(Xn)≥tα,

where

 M(Xn)=maxk≤nn∑i=klogp1(Xi)p0(Xi).

The threshold is computed as follows

 tα=min{t:P∞{M(Xn)≥t}≤α},

where is the first type error probability. Notice that by Lemma 1.1

 P∞{M(Xn)≥x}≤exp(−x).

Therefore the maximum likelihood test results in the following stopping time:

 ταml=min{n:M(Xn)≥log1α}. (9)

Notice also that admits a simple recursive computation [7]. Indeed, notice

 maxk≤nn∑i=klogp1(Xi)p0(Xi)=max{logp1(Xn)p0(Xn),logp1(Xn)p0(Xn)+maxk≤n−1n−1∑i=klogp1(Xi)p0(Xi)}=logp1(Xn)p0(Xn)+max{0,maxk≤n−1n−1∑i=klogp1(Xi)p0(Xi)}.

Therefore

 M(Xn)=logp1(Xn)p0(Xn)+[M(Xn−1)]+. (10)

This method is usually called CUSUM algorithm. It is well known that it is optimal in Lorden [5] sense, i.e., for properly chosen , minimizes

 supθ∈Z+esssupEθ[(τ−θ)+|X1,…,Xθ−1]

in the class of stopping times , see [6].

However, with this method cannot control the false alarm probability as shows the following theorem.

###### Theorem 1.4.

For any

 maxθ∈Z+α(θ;ταml)=1.

As

 Δ(θ;ταml)=1+o(1)μ1log1α.

The Bayesian test is based on the assumption that

. So, this test accepts when

 S(Xn)def=n∑k=1∏k−1i=1p0(Xi)∏ni=kp1(Xi)∏ni=1p0(Xi)≥tα. (11)

Since

 S(Xn)=n∑k=1n∏i=kp1(Xi)p0(Xi),

and

 n∑k=1n∏i=kp1(Xi)p0(Xi)=n−1∑k=1n∏i=kp1(Xi)p0(Xi)+p1(Xn)p0(Xn)=[1+n−1∑k=1n−1∏i=kp1(Xi)p0(Xi)]p1(Xn)p0(Xn),

the test statistics in (

11) admits the following recursive computation:

 S(Xn)=[1+S(Xn−1)]×p1(Xn)p0(Xn).

So, the corresponding stopping time is given by

 ταS=min{k:S(Xk)≥tα}.

In the literature, this method is known as Shirayev-Roberts (SR) algorithm. It was firstly proposed in [11] and [10]. In [8] and [3] it was shown that it minimizes the integral average delay

 1E∞τ∞∑θ=1Eθ(τ−θ)+

over all stopping times with More detailed statistical properties of SR procedure can be found in [9].

As one can see on Fig. 3, in practice, there is no significant difference between CUSUM and SR algorithms.

Notice also that for SR method the fact similar to Theorem 1.4 holds true. So, the standard hypothesis testing methods results in stopping times with uncontrollable false alarm probabilities.

## 2 Robust stopping times

The main idea in this paper is to make use of multiple hypothesis testing methods for constructing stopping times. This can be done very easily by replacing the constant threshold in the ML test (9) by one depending on . So, we define the stopping time

 ˜τα=min{k:M(Xk)≥tα(k)}.

In order to control the false alarm probability and to obtain a nearly minimal average detection delay, we are looking for a minimal function , such that

 P∞{maxk≥Z+[M(Xk)−tα(k)]≥0}≤α.

We begin our construction of with the following function:

 φ(x)=1+log(x),x∈R+,

and define -iterated by

 Φm(x)=φ[Φm−1(x)], with Φ1(x)=φ(x).

Next, for given , define

 bm,ϵ(x)=−log[1ϵΦϵm(x)−1ϵΦϵm(x+1)],x∈R+. (12)

Consider the following random variable:

 ζm,ϵ=maxk∈Z+{M(Xk)−bm,ϵ(k)}.

The next theorem plays a cornerstone role in our construction of robust stopping times.

###### Theorem 2.1.

For any , , and

 P{ζm,ϵ≥x}≤1−exp{−e−x[ϵ−1+e−x]}.

Therefore we can define the quantile of order

of by

 tαm,ϵ=min{x:P{ζm,ϵ≥x}≤α}.

Fig. 4 shows the distribution functions and quantiles of for computed with the help of Monte-Carlo method.

The next theorem describes principal properties of the stopping time

 ˜ταm,ϵ=min{k:M(Xk)≥bm,ϵ(k)+tαm,ϵ}.
###### Theorem 2.2.

For any

where is a solution to

 μ1dαm,ϵ(θ)=bm,ϵ[θ+dαm,ϵ(θ)]+tαm,ϵ. (13)

The asymptotic behavior of the average delay is described by the following theorem

###### Theorem 2.3.

For any , as and

 Δ(θ;˜ταm,ϵ)≤1μ1{logθα+m∑j=1log[Φj(θ)]+ϵlog[Φm(θ)]+log1ϵ}+o(1). (14)

Remark. It is easy to check with a simple algebra that for any given

 limj→∞jlog[Φj(θ)]=2.

The robustness of

w.r.t. a priori geometric distribution of

follows now almost immediately from (14). Indeed, suppose is a random variable with

 P{θ=k}=γ(1−γ)k−1,k∈Z+.

Then, averaging (14) w.r.t. this distribution, we obtain

 EΔ(θ;˜ταm,ϵ)≤1μ1{log1αγ+m∑j=1log[Φj(1γ)]+ϵlog[Φm(1γ)]+log1ϵ}+o(1)

as , and with (8) we arrive at

###### Theorem 2.4.

As

 EΔ(θ;˜ταm,ϵ)≤EΔ(θ;ταγ)+1μ1{m∑j=1log[Φj(1γ)]+ϵlog[Φm(1γ)]+log1ϵ}+O(1)=(1+o(1))EΔ(θ;ταγ),

where is the optimal Bayesian stopping time (see Theorem 1.1).

## Appendix A Appendix section

###### Proof of Lemma 1.1.

Since

 Yk=exp{−klog[ϕ(λ)]+λL0(Xk)}

is a martingale with , we have

 1=E∞Yτα∘=E∞Yτα∘1(τα∘<∞)+E∞Yτα∘1(τα∘=∞)≥E∞Yτα∘1(τα∘<∞)=E∞exp{−τα∘log[ϕ(λ)]+λA}1(τα∘<∞).

In what follows we denote by be i.i.d. standard exponential random variables.

###### Lemma A.1.

For any and

 P{maxk∈Z+[ek−bm,ϵ(k)]≥x}≤1−exp{−e−x[ϵ−1+e−x]},

where is defined by (12).

###### Proof.

It is easy to check with a simple algebra that for any

 log(1−u)≥−u−u22(1−u).

Therefore with this inequality we obtain

 P{maxk∈Z+[ek−bm,ϵ(k)]≥x}=1−∞∏k=1{1−P{ek≥x+bm,ϵ(k)}}=1−exp{∞∑k=1log[1−e−x−bm,ϵ(k)]}≤1−exp{−e−x∞∑k=1e−bm,ϵ(k)−e−2x2(1−e−x)∞∑k=1e−2bm,ϵ(k)}. (15)

It follows immediately from the definition of , see (12), that

 ∞∑k=1e−bm,ϵ(k)=1ϵΦm(1)=1ϵ.

It is also easy to check numerically that for any and

 ∞∑k=1e−2bm,ϵ(k)<0.2075.

Therefore, substituting the above equations in (15), we complete the proof. ∎

###### Lemma A.2.

For any

 P∞{maxk∈Z+[M(Xk)−bm,ϵ(k)]≥x}≤P{maxk∈Z+[ek−bm,ϵ(k)]≥x},

where random process is defined by (10).

###### Proof.

Define random integers by

 κk=min{s>κk−1:M(Xs)≤0},t0=0,

From (10) it is clear that these random variables are renovation points for the random process and therefore random variables

 μk=maxκk

are independent. Since is non-decreasing in and obviously , we get

 maxk∈Z+[M(Xk)−bm,ϵ(k)]≤maxk∈Z+maxκk

Therefore, to finish the proof, it suffices to notice that by (10) and Lemma 1.1

 P∞{μk≥x}≤P∞{maxk∈Z+k∑s=θlogp0(Xs)p1(Xs)≥x}≤exp(−x).

Theorem 2.1 follows now immediately from Lemmas A.1, A.2.

###### Proof of Theorem 2.2.

It follows from (10) that for all

 M(Xk)≥k∑s=θlogp0(Xs)p1(Xs)

and therefore

 Δ(θ;˜τm,ϵ)≤Eθτ+,

where

 τ+=min{k≥1:θ+k∑s=θlogp0(Xs)p1(Xs)≥bm,ϵ(θ+k)+tαm,ϵ}.

Computing is based on the famous Wald’s identity [14] (see also [2]). For given , define function

 B(k)=bm,ϵ(θ+k)+tαm,ϵ, k∈Z+.

It is clear that is a convex function and therefore for any

 B(k)≤B(k0)+B′(x0)(k−k0).

Hence,

 τ+≤τ++=min{k≥1:θ+k∑s=θlogp0(Xs)p1(Xs)≥B(k0)+B′(k0)(k−k0)}.

Next, we obtain by Wald’s identity

 μ1Eθτ++≤B(k0)+B′(k0)(Eθτ++−k0)

and thus

 Eθτ++≤B(k0)−B′(x0)k0μ1−B′(k0). (16)

To finish the proof, we choose (see (13)), and notice that . Hence, by (16)

 Eθτ++≤k0=dαm,ϵ(θ).

###### Proof of Theorem 2.3.

It follows immediately from Theorem 2.1 that as

 tαm,ϵ≤log1αϵ+o(1). (17)

Next, by convexity of we obtain for any

 bm,ϵ(θ+x)≤bm,ϵ(θ+x0)+b′m,ϵ(θ+x0)(x−x0).

Therefore, choosing

 x0=bm,ϵ(θ)+tαm,ϵμ1

we get by (13)

 dαm,ϵ(θ)≤bm,ϵ(θ+x0)+tαm,ϵμ1−b′m,ϵ(θ+x0). (18)

So, our next step is to upper bound . First, notice that

 −1ϵdΦ−ϵm(x)dx=Φ−1−ϵm(x)Φ′m(x)=Φ−ϵm(x)xm∏j=11Φj(x),

and thus

 −log[−1ϵdΦ−ϵm(x)dx]=log(x)+m∑j=1log[Φj(x)]+ϵlog[Φm(x)].

Therefore it follows immediately from this equation and (12) that as

 bm,ϵ(k)=log(k)+m∑j=1log[Φj(k)]+ϵlog[Φm(k)]+o(1). (19)

It is also easy to check that

 b′m,ϵ(k)=O(1k). (20)

Finally, substituting (17), (19), and (20) in (18), we complete the proof. ∎

## References

• [1] Basseville, M. and Nikiforov, I. V. (1996). Detection of Abrupt Changes: Theory and Application. Prentice-Hall.
• [2] Blackwell, D. (1946). On an equation of Wald, Annals of Math. Stat. 17 84–87.
• [3] Feinberg, E.A. and Shiryaev, A. N. (2006). Quickest detection of drift change for Brownian motion in generalized and Bayesian settings. Statist. Decisions. 24 445–470.
• [4] Girshick M. A. and Rubin, H. (1952). A Bayes approach to a quality control model. Annals Math. Statistics. 23 114–125.
• [5] Lorden, G. (1971). Procedures for reacting to a change in distribution. Ann. Math. Statist. 42 No. 6 1897–1908.
• [6] Moustakides, G. V. (1986). Optimal stopping times for detecting changes in distributions. Ann. of Statist. Vol. 14, No. 4, 1370–1387.
• [7] Page, E. S. (1954). Continuous inspection schemes, Biometrika, 41, 100–115.
• [8] Pollak, M. and Tartakovsky, A. G. (2009). Optimality properties of Shirayev-Roberts procedure. Statist. Sinica, 19, 1729–1739.
• [9] Polunchenko, A. S. and Tartakovsky, A. G. (2010). On optimality of the Shirayev-Roberts procedure for detecting a change in distribution. Annals of Statist., 38, No. 6, 3445–3457.
• [10] Roberts, S.W. (1966). A comparison of some control chart procedures. Technometrics 8, 411–430.
• [11] Shiryaev, A.N. (1961). The problem of the most rapid detection disturbance in a stationary process. Dokl. Math. 2 795–799.
• [12] Shiryaev, A. N. (1978). Optimal Stopping Rules, Springer-Verlag, Berlin, Heidelberg.
• [13] Shiryaev, A. N. (1963). On optimum methods in quickest detection problems. Theory Probab. Appl. 8 22–46.
• [14] Wald, A. (1944). On cumulative sums of random variables. The Annals of Math. Stat. 15 No. 3 283–296.
• [15] Wald, A. (1945). Sequential Tests of Statistical Hypotheses. Annals of Math. Stat. 16 No. 2 117–186.