# Non-Parametric Quickest Mean Change Detection

The problem of quickest detection of a change in the mean of a sequence of independent observations is studied. The pre-change distribution is assumed to be stationary, while the post-change distributions are allowed to be non-stationary. The case where the pre-change distribution is known is studied first, and then the extension where only the mean and variance of the pre-change distribution are known. No knowledge of the post-change distributions is assumed other than that their means are above some pre-specified threshold larger than the pre-change mean. For the case where the pre-change distribution is known, a test is derived that asymptotically minimizes the worst-case detection delay over all possible post-change distributions, as the false alarm rate goes to zero. Towards deriving this asymptotically optimal test, some new results are provided for the general problem of asymptotic minimax robust quickest change detection in non-stationary settings. Then, the limiting form of the optimal test is studied as the gap between the pre- and post-change means goes to zero, called the Mean-Change Test (MCT). It is shown that the MCT can be designed with only knowledge of the mean and variance of the pre-change distribution. The performance of the MCT is also characterized when the mean gap is moderate, under the additional assumption that the distributions of the observations have bounded support. The analysis is validated through numerical results for detecting a change in the mean of a beta distribution. The use of the MCT in monitoring pandemics is also demonstrated.

## Authors

• 6 publications
• 21 publications
• ### Quickest Change Detection with Non-stationary and Composite Post-change Distribution

The problem of quickest detection of a change in the distribution of a s...
10/04/2021 ∙ by Yuchen Liang, et al. ∙ 0

• ### A Semi-Parametric Binning Approach to Quickest Change Detection

The problem of quickest detection of a change in distribution is conside...
01/15/2018 ∙ by Tze Siong Lau, et al. ∙ 0

• ### Quickest change detection with unknown parameters: Constant complexity and near optimality

We consider the quickest change detection problem where both the paramet...
06/09/2021 ∙ by Firas Jarboui, et al. ∙ 0

• ### Quickest Detection of Growing Dynamic Anomalies in Networks

The problem of quickest growing dynamic anomaly detection in sensor netw...
10/21/2019 ∙ by Georgios Rovatsos, et al. ∙ 0

• ### Model change detection with application to machine learning

Model change detection is studied, in which there are two sets of sample...
11/19/2018 ∙ by Yuheng Bu, et al. ∙ 0

• ### On the T-test

The T-test is probably the most popular statistical test; it is routinel...
12/28/2020 ∙ by S. Y. Novak, et al. ∙ 0

• ### Conformal e-prediction for change detection

We adapt conformal e-prediction to change detection, defining analogues ...
06/03/2020 ∙ by Vladimir Vovk, et al. ∙ 0

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## I Introduction

Quickest change detection (QCD) is a fundamental problem in mathematical statistics (see, e.g., [27] for an overview). Given a stochastic sequence whose distribution changes at some unknown change-point, the goal is to detect the change after it occurs as quickly as possible, subject to false alarm constraints. The QCD framework has seen a wide range of applications, including line-outage in power systems [2], dim-target manoeuvre detection [13], stochastic process control [23], structural health monitoring [3], and piece-wise stationary multi-armed bandits [1]. The two main formulations of the classical QCD problem are the Bayesian formulation [20, 25], where the change-point is assumed to follow a known prior distribution, and the minimax formulation [10, 18], where the worst-case detection delay is minimized over all possible change-points, subject to false alarm constraints. In both the Bayesian and minimax settings, if the pre- and post-change distributions are known, low-complexity efficient solutions to the QCD problem can be found [27].

In many practical situations, we may not know the exact distribution in the pre- or post-change regimes. While it is reasonable to assume that we can obtain a large amount of data in the pre-change regime, this may not be the case for the post-change regime. Also, in applications such epidemic monitoring and piece-wise stationary multi-armed bandits, a change in a specific statistic (e.g., the mean) of the distribution is of interest. This is different from the original QCD problem where any distributional change needs to be detected. Furthermore, in many applications, the support of the distribution is bounded. For example, the observations representing the fraction of some specific group in the entire population are bounded between 0 and 1. This is the case, for example, in the pandemic monitoring problem that we discuss in detail in Section IV. In many applications, including the pandemic monitoring problem, the system has usually reached some nominal steady-state distribution before the change-point. In these situations, the pre-change distribution can be assumed to be stationary.

In this paper, we study the problem of quickest detection of a change in the mean of a sequence of independent observations. The pre-change distribution is assumed to be stationary, while the post-change distributions are allowed to be non-stationary. We first study the case where the pre-change distribution is known, and then study the extension where only the mean and variance of the pre-change distribution are known. No knowledge of the post-change distributions is assumed other than that their means are above some threshold larger than the pre-change mean.

There have been a number of lines of work on the QCD problem when the pre- and/or post-change distributions are not completely known. The most prevalent is the generalized likelihood ratio (GLR) approach, introduced in [10]

for the parametric case where the post-change distribution has an unknown parameter. This GLR approach is studied in detail for the problem of detecting the change in the mean of a Gaussian distribution with unknown post-change mean in

[21]. A GLR test for the case where the pre- and post-change distributions come from an one-parameter exponential family, and both the pre- and post-change parameters are unknown, is analyzed in [7].

The QCD problem has also been studied in a non-parametric setting. In particular, for detecting a change in the mean of an observation sequence, one approach has been to use maximum scan statistics. The scan statistic of an observation sequence is defined as the absolute difference of the averages before and after a potential change-point. In [4]

, the case where the pre- and post-change distributions have finite moment generating functions in some neighborhood around zero is considered. At each time greater than a window size

, the scan statistic at each potential change-point is calculated using the last observations. The maximum scan statistic is then calculated over the set of potential change-points, and an alarm is raised if this maximum exceeds some threshold. In [12], the case of sub-Gaussian pre- and post-change distributions is studied. The scan statistic is calculated over the entire observation sequence, and the maximum is compared to a threshold determined by the current time and the desired false alarm rate. This approach is further applied to the piece-wise stationary multi-armed bandit problem in [1]. We compare our approach to mean-change detection with a test using scan statistics in Section IV.

We note that for both the GLR the scan statistics approaches, the complexity of computing the test statistic at each time-step grows at least linearly with the number of samples. In practice, a windowed version of the test statistic is often used to reduce computational complexity, while suffering some loss in performance.

Still another line of work is the one based on a minimax robust approach [5], in which it is assumed that the distributions come from mutually exclusive uncertainty classes. Under certain conditions on the uncertainty classes, e.g., joint stochastic boundedness [15], low-complexity solutions to the minimax robust QCD problem can be found [26]. Under more general conditions, e.g., weak stochastic boundedness, a solution that is asymptotically close to the minimax solution can be found [13].

In this paper, we use an asymptotic version of the minimax robust QCD problem formulation [13] to develop algorithms for the non-parametric detection of a change in mean of an observation sequence. Our contributions are as follows:

1. We extend the asymptotic minimax robust QCD problem introduced in [13] to the more general non-stationary setting.

2. We study the problem of quickest detection of a change in the mean of an observation sequence under the assumption that no knowledge of the post-change distribution is available other than that its mean is above some threshold larger than the pre-change mean.

3. For the case where the pre-change distribution is known, we derive a test that asymptotically minimizes the worst-case detection delay over all possible post-change distributions, as the false alarm rate goes to zero.

4. We study the limiting form of the optimal test as the gap between the pre- and post-change means goes to zero, which we call the Mean-Change Test (MCT). We show that the MCT can be designed with only knowledge of the mean and variance of the pre-change distribution.

5. We also characterize the performance of the MCT when the mean gap is moderate, under the assumption that the distributions of the observations have bounded support.

6. We validate our analysis through numerical results for detecting a change in the mean of a beta distribution. We also demonstrate the use of the MCT for pandemic monitoring.

The rest of the paper is structured as follows. In Section II, we describe the quickest change detection problem under distributional uncertainty and provide some new results regarding asymptotically robust tests in the non-stationary setting. In Section III, we formulate the mean change detection problem, and propose and analyze the mean-change test (MCT), which solves the problem asymptotically. In Section IV, we validate our analysis through numerical results for detecting a change in the mean of a beta distribution, and also demonstrate the use of the MCT in monitoring pandemics. Finally, in Section V, we provide some concluding remarks.

## Ii Quickest Change Detection Under Distributional Uncertainty

Let

be a sequence of independent random variables, and let

be a change-point. Let and

be two sequences of probability measures, where

and for all . Further, assume that has probability density with respect to the Lebesgue measure on , for and . Let denote the probability measure on the entire sequence of observations when the pre-change distributions are and the post-change distributions are , with and , and let denote the corresponding expectation. When and are stationary, i.e., , and , , we use the notations and in place of and , respectively.

The change-time is assumed to be unknown but deterministic. The problem is to detect the change quickly while not causing too many false alarms. Let be a stopping time [15] defined on the observation sequence associated with the detection rule, i.e. is the time at which we stop taking observations and declare that the change has occurred.

For the case where both the pre- and post-change distributions are stationary and known, Lorden [10] proposed solving the following optimization problem to find the best stopping time :

where

is a worst-case delay metric, and

 (3)

with

 FARP0(τ):=1EP0,P1∞[τ]. (4)

Here is the expectation operator when the change never happens, and .

Lorden also showed that Page’s Cumulative Sum (CuSum) algorithm [17] whose test statistic is given by:

 ΛP0,P1(t) =max1≤k≤t+1t∑i=klnLP0,P1(Xi) =(ΛP0,P1(t−1)+lnLP0,P1(Xt))+ (5)

solves the problem in (1) asymptotically. Here is the likelihood ratio:

 LP0,P1(x)=p1(x)p0(x). (6)

The CuSum stopping rule is given by:

 τ(ΛP0,P1,bα):=inf{t:ΛP0,P1(t)≥bα} (7)

where . It was shown by Moustakides [16] that the CuSum algorithm is exactly optimal for the problem in (1).

When the pre-change and post-change distributions are unknown but belong to known uncertainty sets and are possibly non-stationary, a minimax robust formulation can be used in place of (1):

where

and the feasible set is defined as

 CP0α={τ:supP0:P0,t∈P0FARP0(τ)≤α} (10)

with

 FARP0(τ):=1EP0,P1∞[τ]. (11)

We now address the solution to the problem in (8). To this end, we give the following using definitions.

###### Definition II.1.

(see, e.g., [15]) A pair of uncertainty sets is said to be jointly stochastically (JS) bounded by if, for any and any ,

 P0{L¯P0,¯P1(X)>h} ≤¯P0{L¯P0,¯P1(X)>h} (12) P1{L¯P0,¯P1(X)>h} ≥¯P1{L¯P0,¯P1(X)>h} (13)

where is the likelihood ratio between and (see (6)). The distributions and are called least favorable distributions (LFDs) within the classes and , respectively.

If the pair of pre- and post-change uncertainty sets is JS bounded, the CuSum test statistic (see (II)), with stopping rule (see (7)), solves (8) exactly both when and are stationary [26] and when they are potentially non-stationary [14].

###### Definition II.2.

(see [13]) A pair of uncertainty sets is said to be weakly stochastically (WS) bounded by if

 D(~P1||~P0)≤D(P1||~P0)−D(P1||~P1) (14)

for all , and

 EP0[L~P0,~P1(X)]≤E~P0[L~P0,~P1(X)]=1 (15)

for all . Here, denotes the expectation operator with respect to distribution , and denotes KL-divergence:

 D(P||Q)=EP[lnLP,Q(X)]. (16)

It is shown in [13] that if the pair of uncertainty sets is JS bounded by , it is also WS bounded by . It is also shown in [13] that if the pair of pre- and post-change uncertainty sets is WS bounded, the CuSum test statistic with stopping rule solves (8) asymptotically as when and are both stationary.

### Ii-a Asymptotically Optimal Solution in the Non-stationary Setting

Let be such that is WS bounded by . In the following, we extend the result in [13] to the case where and are potentially non-stationary and derive an asymptotically optimal solution as . Specifically, through Lemma II.1 we upper bound the asymptotic delay, through Lemma II.2 we control the false alarm rate, and in Theorem II.3 we combine the lemmas to provide an asymptotically optimal solution to the problem in (8) when and are potentially non-stationary.

###### Lemma II.1.

Consider WS bounded by . Let and be such that and for all . Suppose that for all ,

 sup1≤t≤nVarP1,t(lnL~P0,~P1(Xt))=o(n) as n→∞

where denotes the variance of when . Then, satisfies

as , where as .

###### Lemma II.2.

Consider the same assumptions as in Lemma II.1. Then, for any ,

 EP0[τ(Λ~P0,~P1,b)]≥eb (18)

for any threshold .

###### Theorem II.3.

Consider the same assumptions as in Lemma II.1. Then, the CuSum test solves the problem in (8) asymptotically as , and

where as .

The proofs of Lemma II.1, Lemma II.2 and Theorem II.3 are given in the appendix.

## Iii Mean-Change Detection Problem

Until now, we have considered the general QCD problem formulated in (8). In this paper, we are mainly interested in a special case of the problem, described as follows. The pre-change distribution is stationary, i.e., , with pre-change mean and variance . Thus, is a singleton. The post-change distribution could be non-stationary, and at each time it belongs to the following uncertainty set:

 P1=M1:={P:EP[X]≥η>μ0}. (20)

In this expression, denotes a generic observation in the sequence, and is a pre-designed threshold. Define

 Δ:=η−μ02 (21)

which is half of the worst-case mean-change gap.

The minimax robust mean-change problem, which is a reformulation of (8) is given by:

Our goal is to find a stopping time that solves (22) asymptotically as the false alarm rate .

### Iii-a Known Pre-change Distribution

Define

 κ0(λ)=lnEP0[eλX] (23)

to be the cumulant-generating function (cgf) of the observations under . In the following theorem, we provide a solution to the problem stated in (22).

###### Theorem III.1.

Consider , and as given in (20). Define

 ~p1(x)=p0(x)eλ∗x−κ0(λ∗) (24)

where is the cgf under and satisfies

 κ′0(λ∗)=EP0[Xeλ∗X]EP0[eλ∗X]=η (25)

Then, the CuSum statistic

 ΛP0,~P1(t)=max1≤k≤t+1t∑i=k(λ∗Xi−κ0(λ∗)) (26)

and the stopping rule (see (7)) with threshold solves the minimax robust problem in (22) asymptotically as , and

###### Proof.

The proof follows from an application of Theorem II.3 if we can establish that is WS bounded by . By [13, Prop. 1 (iii)], since is convex and is a singleton, if minimizes the KL-divergence over , then is WS bounded by . Therefore, it remains to show that specified in (24) minimizes , subject to . To this end, we follow the procedure outlined in [8, Sec. 6.4.1]. Consider the Lagrangian

 J(p1,λ,μ) =EP1[lnLP0,P1(X)]+λ(η−EP1[X])+μ(1−∫p1(x)ddx) =∫(lnp1(x)p0(x)−λx−μ)p1(x)dx+λη+μ (28)

where the Lagrange multiplier corresponds to the constraint that the post-change mean is greater than , and corresponds to the constraint that is a probability measure. For an arbitrary direction , we take the Gateaux derivative with respect to :

 ∇p1,zJ(p1,λ,μ) :=limh→0J(p1+hz,λ,μ)−J(p1,λ,μ)h =∫(lnp1(x)p0(x)−λx−μ′)zdx (29)

where , and since is arbitrary, we arrive at

 lnp1(x)p0(x)−λx−μ′=0 (30)

By the Generalized Kuhn–Tucker Theorem [11], since is bounded, is a necessary condition for optimality. Furthermore, since is convex in , this is also a global optimum. To satisfy the constraints, we have

 μ′=−ln∫p0(x)eλxdx=−κ0(λ) (31)

and that satisfies

 η=EP1[X]=EP0[Xeλ∗X]EP0[eλ∗X]=κ′0(λ∗) (32)

Thus, in (24) minimizes , subject to .

Furthermore, the minimum KL-divergence is

 D(~P1||P0) =∫(λ∗x−κ0(λ∗))~p1(x)dx =λ∗η−κ0(λ∗) (33)

Hence, the worst-case delay satisfies

as . ∎

Note that is an exponentially-tilted version (or the Esscher transform) of .

### Iii-B Approximation for Small Δ

Even though we have an expression for the test statistic when is known, as given in (26), the exact solution of is not available in closed-form. Fortunately, if the mean-change gap is small, we obtain a low-complexity test in terms of only the pre-change mean and variance that closely approximates the performance of the asymptotically minimax optimal test in the previous section.

As , , and hence . From a second-order Taylor expansion on around 0, we obtain

 κ0(λ∗) =κ0(0)+κ′0(0)λ∗+κ′′0(0)2(λ∗)2+o((λ∗)2) =μ0λ∗+σ202(λ∗)2+o((λ∗)2) (35)

In this same regime, by continuity of ,

 λ∗ =κ′0(λ∗)−κ′0(0)κ′′0(0)+o(Δ) =η−μ0σ20+o(Δ) =2Δσ20+o(Δ) (36)

where we have used . Hence, the approximate log-likelihood ratio at time is

 λ∗Xt−κ0(λ∗) =λ∗Xt−(μ0λ∗+σ202(λ∗)2)+o((λ∗)2) =2Δσ20(Xt−μ0)−σ202(2Δσ20)2+o(Δ2) =2Δσ20(Xt−μ0+η2)+o(Δ2) (37)

and the corresponding minimum KL-divergence is approximated as:

 D(~P1||P0)=2Δ2σ20+o(Δ2). (38)

Now

 2Δσ20(Xt−μ0+η2)>bα⟺Xt−μ0+η2>~bα (39)

where

 ~bα:=|lnα|σ202Δ=|lnα|σ20η−μ0. (40)

Therefore, the stopping rule can be approximated by the stopping rule , where

 ~Λμ0,η(t) =max1≤k≤t+1t∑i=k(Xi−μ0+η2) =(~Λμ0,η(t−1)+(Xt−μ0+η2))+ (41)

with . We call the Mean-Change Test (MCT), and the MCT statistic.

From (38), it follows that as and , the worst-case delay satisfies

Therefore, if is small, it is sufficient to know only the mean and variance to construct a good approximation to the asymptotically minimax robust test. Furthermore, only the mean of the pre-change distribution is needed to construct the MCT statistic. From the simulation results in Section IV, we see that the performance of the MCT can be very close to that of the asymptotically minimax robust test even for moderate values of

. Since the mean and variance of a distribution are much easier and more accurate to estimate than the entire density, this test can be useful and accurate when only a moderate number of observations in the pre-change regime is available.

###### Remark.

It is interesting that the form of MCT statistic in (III-B) coincides with that of the CuSum statistic (see (II)) with known stationary pre- and post-change distributions, and , respectively. Here denotes a Gaussian distribution with mean and variance .

### Iii-C Performance Analysis of MCT for moderate Δ

We now study the asymptotic performance of the MCT for fixed , as . For this part of the analysis, we assume that the pre- and post-change distributions have supports that are uniformly bounded, and without loss of generality, we assume that the bounding interval is . This assumption holds in many practical applications, including the pandemic monitoring problem discussed in Section IV.

Define

 Zt:=Xt−μ0+η2, ∀t≥1. (43)

Then the MCT statistic of (III-B) can be written as:

 ~Λμ0,η(t)=(~Λμ0,η(t−1)+Zt)+ (44)

with . The MCT stopping time is given by:

 τ(~Λμ0,η,b)=inf{t:~Λμ0,η≥b} (45)

where has to be chosen to meet the FAR constraint:

 FARP0(τ(~Λμ0,η,b))=1EP0,P1∞[τ(~Λμ0,η,b)]≤α (46)

In what follows, we write as , with the understanding that the test statistic being used throughout is the MCT statistic .

#### Iii-C1 False Alarm Analysis

In Lemma III.2 below, we first control the boundary crossing probability of in the pre-change regime. Then, in Theorem III.3, we use Lemma III.2 to bound the false alarm rate of the MCT asymptotically using the procedure outlined in [22].

###### Lemma III.2.

Assume that the pre-change distribution has known pre-change mean and variance , and that the post-change distribution is non-stationary with , for all . For , define the supplementary stopping time

 τ′(b):=inf{t:St∉(0,b)} (47)

where , with defined in (43). Then,

 PP0,P1∞{Sτ′(b)≥b} ≤2R0√b2Δ2K1(R20bΔσ20)exp(−R20Δσ20b) =√2πσ20bΔ3exp(−2R20Δσ20b)(1+o(1)), as b→∞, (48)

where

 R0=σ20/(σ20+Δ⋅max{μ0,1−μ0}/3) (49)

and is the modified Bessel function of the second kind of order .

###### Proof.

Note that . Since , we have . Let ; then . Thus, we have

 PP0,P1∞{Sτ′(b)≥b} =P0{Sτ′(b)≥b} =P0⎧⎨⎩τ′(b)∑i=1Zi≥b⎫⎬⎭ =∞∑t=1P0{t∑i=1Zi≥b,t=τ′(b)} ≤∞∑t=1P0{t∑i=1Zi≥b} =∞∑t=1P0{t∑i=1(Zi+Δ)≥b+tΔ} (i)≤∞∑t=1exp(−(b+tΔ)22(tσ20+M(b+tΔ))) (ii)≤∫∞0exp(−(b+xΔ)22(xσ20+M(b+xΔ)))dx =a∫∞0exp(−(aΔy+C)22y)dy =ae−aΔC∫∞0e−((a2Δ2/2)y+(C2/2)y−1)dy (iii)=2CΔe−aΔCK1(aΔC)

where and . In the series of inequalities above, follows from Bernstein’s inequality [28, p. 9], follows from bounding the sum with an integral, and follows from Lemma .2 in the appendix, with and . Since as , the asymptotic result follows. ∎

###### Theorem III.3.

Under the same assumptions as in Lemma III.2, let be such that

 √2πσ20~b′αΔ3exp(−2R20Δσ20~b′α)=α. (50)

Then, the MCT with , i.e., , meets the FAR constraint (46) asymptotically as .

Furthermore, as ,

 ~b′α=~bαR20(1+o(1)) (51)

where is defined in (40) and is defined in (49).

###### Proof.

As , . Recall the definition of in (47). From Lemma III.2, for any , . Then, using [22, Sec. 2.6], it can be shown that

 EP0,P1∞[τ(~b′α)]=EP0[τ′(~b′α)]P0{Sτ′(~b′α)≥~b′α}(∗)≥1P0{Sτ′(~b′α)≥~b′α}≥α−1(1+o(1)) (52)

where follows because . Thus, (46) is satisfied asymptotically.

For the second result, it is sufficient to show that