Quickest Change Detection with Non-stationary and Composite Post-change Distribution

10/04/2021 ∙ by Yuchen Liang, et al. ∙ 0

The problem of quickest detection of a change in the distribution of a sequence of independent observations is considered. The pre-change distribution is assumed to be known and stationary, while the post-change distributions are assumed to evolve in a pre-determined non-stationary manner with some possible parametric uncertainty. In particular, it is assumed that the cumulative KL divergence between the post-change and the pre-change distributions grows super-linearly with time after the change-point. For the case where the post-change distributions are known, a universal asymptotic lower bound on the delay is derived, as the false alarm rate goes to zero. Furthermore, a window-limited CuSum test is developed, and shown to achieve the lower bound asymptotically. For the case where the post-change distributions have parametric uncertainty, a window-limited generalized likelihood-ratio test is developed and is shown to achieve the universal lower bound asymptotically. Extensions to the case with dependent observations are discussed. The analysis is validated through numerical results on synthetic data. The use of the window-limited generalized likelihood-ratio test in monitoring pandemics is also demonstrated.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

The problem of quickest change detection (QCD) is of fundamental importance in mathematical statistics (see, for example, [1, 2] for an overview). Given a sequence of observations whose distribution changes at some unknown change-point, the goal is to detect the change in distribution as quickly as possible after it occurs, while not making too many false alarms.

In the classical formulations of the QCD problem, it is assumed that the pre- and post-change distributions are known and stationary, and that the pre-change distribution is independent and identically distributed (i.i.d.). In many practical situations, while it is reasonable to assume that we can accurately estimate the pre-change distribution, the post-change distribution is rarely completely known. Furthermore, while it is reasonable to assume that the system is in steady-state before the change-point and producing stationary observations, the post-change distribution may typically be non-stationary. For example, in the pandemic monitoring problem, the distribution of the number of people infected daily might have achieved a steady-state before the start of a new wave. At the onset of the new wave, the post-change distribution is constantly evolving. Indeed, during the early phase of the new wave, the mean of the post-change distribution grows approximately in an exponential manner. We will address the pandemic monitoring problem in detail in Section 

V.

In this paper, our main focus is the QCD problem with independent observations111The extension to the case of dependent observations is discussed in Section IV., where the pre-change distribution is assumed to be known and stationary, while the post-change distribution is allowed to be non-stationary and have some parametric uncertainty.

There has been prior work on extensions of the classical QCD framework to the case where the pre- and/or the post-change distributions are not stationary. A prevalent approach is based on a minimax robust [3] formulation of the QCD problem, where it is assumed that the pre- and post-change distributions come from mutually exclusive uncertainty classes. Under certain conditions, e.g., joint stochastic boundedness [4] and weak stochastic boundedness [5], low-complexity tests that either coincide with [6] or asymptotically approach [7] the optimal test can be found. The essence of the minimax robust approach to dealing with non-stationarity is to identify stationary pre- and post-change distributions that are least favorable for detection, and design classical (i.i.d.) QCD tests for these least favorable distributions. A drawback of this approach is that the robust tests can have suboptimal performance for actual distributions encountered in practice.

There have also been extensions of the classical formulation to the case where the pre- and/or post-change distributions are not fully known. In the generalized likelihood ratio (GLR) approach, introduced in [8], it is assumed that the pre- and post-change distributions i.i.d. and come from one-parameter exponential families is studied, and the post-change distribution has an unknown parameter. An alternative to the GLR test, the mixture based test, was developed for the same settings as in [8], by Pollak [9]

. The GLR approach is studied in detail for the problem of detecting the change in the mean of a Gaussian distribution with unknown post-change mean in

[10]. Both the mixture and the GLR approaches are studied in detail for the case where pre- and post-change distributions are non-i.i.d. and the post-change distribution has parametric uncertainty in [11], where it is assumed that the cumulative Kullback-Leibler (KL) divergence between the post-change and the pre-change distributions grows linearly in the number of observations. A universal lower bound on the worst-case delay is first developed, and a window-limited CuSum test that asymptotically achieves the lower bound is proposed in this regime. For the case where the post-change distribution has parametric uncertainty, a window-limited GLR test is proposed and analyzed.

In some application (e.g., the pandemic monitoring problem), the post-change distributions are non-stationary in a way such that the cumulative KL divergence grows super-linearly after the change-point, in which case we say that the post-change distribution is detection-favorable. This is the setting we consider in this paper. Our contributions are as follows:

  1. We extend the universal lower bound on the worst-case delay given in [11] to the more general detection-favorable setting.

  2. We develop a window-limited CuSum test that asymptotically achieves the lower bound on the delay when the post-change distribution is detection-favorable and fully known.

  3. We develop and analyze a GLR test that asymptotically achieves the worst-case delay when the post-change distributions are detection-favorable and have parametric uncertainty.

  4. We validate our analysis through numerical results, and demonstrate the use of our approach in monitoring pandemics.

The rest of the paper is structured as follows. In Section II, we describe the information bounds and propose an asymptotically optimal CuSum test when the post-change distribution is detection-favorable and completely known. In Section III, we propose an asymptotically optimal GLR test when the post-change distribution is still detection-favorable but has unknown parameters. In Section V, we present some numerical results, including results on monitoring the COVID-19 pandemic. We conclude the paper in Section VI.

Ii Information Bounds and Optimal Detection

Let

be a sequence of independent random variables, and let

be a change-point. Assume that all have density with respect to some measure . Furthermore, assume that have densities , respectively, with respect to , i.e., we are implicitly assuming that the post-change distribution is time-invariant with respect to the change-point . Note that the distributions of the observations are allowed to be non-stationary after the change-point. Let

denote the probability measure on the entire sequence of observations, when the change-point is

, and let denote the corresponding expectation.

The change-time is assumed to be unknown but deterministic. The problem is to detect the change quickly while not causing too many false alarms. Let be a stopping time [4] defined on the observation sequence associated with the detection rule, i.e. is the time at which we stop taking observations and declare that the change has occurred.

Ii-a Classical Results under i.i.d. Model

A special case of the model described above is where both the pre- and post-change distributions are independent and identically distributed (i.i.d.), i.e., for all . In this case, Lorden [8] proposed solving the following optimization problem to find the best stopping time :

(1)

where

(2)

characterizes the worst-case delay, and denotes the sigma algebra generated by , i.e., . The constraint set is

(3)

with

which guarantees that the false alarm rate of the algorithm does not exceed . Here, is the expectation operator when the change never happens, and .

Lorden also showed that Page’s Cumulative Sum (CuSum) algorithm [12]

whose test statistic is given by:

(4)

solves the problem in (1) asymptotically as . Here, is the log-likelihood ratio defined as:

(5)

The CuSum stopping rule is given by:

(6)

where the threshold is set as . It was shown by Moustakides [13] that the CuSum algorithm is exactly optimal for the problem in (1).

Ii-B Information Bounds for Non-stationary Post-Change Distributions

In the case where both the pre- and post-change distributions are independent and the post-change distributions are non-stationary, let the log-likelihood ratio be:

(7)

where . Here is a hypothetisized change-point and is drawn from the true distribution .

In the classical i.i.d. model described in Section II-A, the cumulative KL-divergence after the change-point increases linearly in the number of observations. We generalize this condition is as follows. Let the growth function represent the cumulative Kullback-Leibler (KL) divergence under the true distribution. More specifically, let be increasing and continuous. Note that the inverse of , denoted by , exists and is also increasing and continuous. It is assumed that the expected sum of the log-likelihood ratios under matches the value of the growth function at all positive integers, i.e.,

(8)

Note that the KL divergence is always positive, i.e.,

In this paper, we are interested in the case where the post-change distribution is eventually persistently different from that of pre-change. Specifically, it is assumed that is asymptotically lower bounded by some linear function, i.e., , for some constant as .

Lemma II.1.

Consider the growth function defined in (8

). Suppose that the sum of variance of the log-likelihood ratios satisfies

(9)

where is equivalent to as . Further, suppose that

(10)

for all positive integers . Then,

(11)

and

(12)

for any .

The proof is given in the appendix.

Remark.

One can generalize condition (10) in a way that either or

holds for all positive integers .

Example II.1.

Consider the Gaussian exponential mean-change detection problem (with unit variance) as follows. Denote by the Gaussian distribution with mean and variance . Let be distributed as , and for all , let be distributed as . Here is some positive fixed constant. The log-likelihood ratio is given by:

(13)

Now, the growth function can be calculated as

Note that as for all . Also, the sum of variances of the log-likelihood ratios is

for all , which establishes condition (9). Further, for any and ,

which establishes condition (10). ∎

The following theorem gives a lower bound on the worst-case delay as .

Theorem II.2.

Suppose that (11) holds with an increasing and continuous , where as for some . Then, as ,

(14)

where as .

Proof.

The proof follows the structure of [11, Thm 1]. Let . Fix such that . First, since ,

Therefore, it remains to show that

(15)

From the proof of [11, Thm 1], since , for any , there exists some such that and

(16)

Fix . Now consider the events:

(17)

and

(18)

where

(19)

such that . Choose . Thus, for large ,

where is explained as follows. By assumption, there exists some large such that for all , . Set . For any , . Specifically, this is true for above for all ’s large enough.

Therefore, for large , the chosen satisfies that and that .

Next, we will upper bound the probability of the events (defined in (17)) and (defined in (18)). We first consider . By a change-of-measure argument,

Since , we get

For large , , and therefore,

where the last inequality follows from (16), and thus

(20)

because for any fixed . We next turn to :

(21)

where follows because , follows by independence, and the limit follows from the assumption in (11) since as . Therefore, combining (20) and (II-B), we get, for some and all ,

or equivalently,

Since is arbitrary and is continuous, we can take the limit as . Recalling the definition of in (19), we obtain

(22)

Since is arbitrary, (15) is proved, and the proof is complete. ∎

Ii-C Asymptotically Optimal Detection with Non-stationary Post-Change Distributions

Recall that under the classical setting, Page’s CuSum test (in (6)) is optimal and has the following structure:

(23)

where is the log-likelihood ratio when the post-change distributions are stationary (defined in (5)). When the post-change distributions are potentially non-stationary, let the modified CuSum stopping rule be:

(24)

where represents the log-likelihood ratio between densities and for observation (defined in (7)). Here is the time index and is the hypothesized change-point. Note that if the post-change distributions are indeed stationary, i.e., , we would get for all , and thus .

As shown in (4), Page’s classical CuSum algorithm admits a recursive way to compute its test statistic. Unfortunately, despite independent observations, when the log-likelihood ratios actually depend on the hypothesized change-point , the test statistic in (24) cannot be computed recursively. For computational tractability, we therefore consider a window-limited version of the test in (24):

(25)

where is the window size. We require that satisfy the following conditions:

(26)

Since the range for the maximum is smaller in than in , given any realization of , if the test statistic of crosses the threshold at some time , so does that of . Therefore, for any fixed threshold ,

(27)

almost surely.

In the following, we first control the asymptotic false alarm rate of with an appropriately chosen threshold in Lemma II.3. Then, we upper bound the asymptotic delay of in Lemma II.4. Finally, we combine these two lemmas and provide an asymptotically optimal solution to the problem in (1) in Theorem II.5.

Lemma II.3.

Suppose that and that satisfies (26). Then,

where as .

Remark.

If , where , then as .

Lemma II.4.

Suppose that and that satisfies (26). Further, suppose that (12) holds for when . Then,

where as .

Remark.

Clearly, the asymptotic inequality still holds in the case where no window is applied, i.e., .

The proofs for the above two lemmas are given in the appendix. Using these lemmas, we obtain the following asymptotic result.

Theorem II.5.

Suppose that and that satisfies (26). Further, suppose that (11) and (12) hold for when . Then, the CuSum stopping rule in (24) solves the problem in (1) asymptotically as , and

(28)

where as .

Example II.2.

Consider the same setting as in Example II.1. We have shown that conditions (9) and (10) hold in this setting, and thus (11) and (12) also hold by Lemma II.1. Considering the growth function as , we obtain

where as . Thus,

where as . Therefore,

(29)

where as , and is as defined in Theorem II.5.

Iii Window-limited GLR with Unknown Parameters

We now study the case where the evolution of the post-change distribution is parametrized by . Let be distributed as , where the corresponding densities are with respect to the common measure . Let be the parameter set and . Note that does not need to be compact. The true post-change parameter is assumed to be unknown but deterministic. Let the log-likelihood ratio be re-defined as

(30)

for any and . Here is drawn from the distribution with true change-point and true post-change parameter . The problem is to solve (1) asymptotically as under parameter uncertainty.

Consider the following window-limited GLR stopping:

(31)

where as . Therefore, it is guaranteed that for all small enough . Further, let be compact for each , and thus the maximizing given the pair at the false-alarm rate , denoted by , is contained in . Note that we omit the dependency of on here for simplicity.

If is discrete-valued, the in (31) becomes , and the stopping time is equivalent to running CuSum algorithms simultaneously, where stops whenever one of the CuSum algorithms stops. Therefore, we only consider the case where is continuous.

Finally, it is assumed that the largest absolute eigenvalue of the Hessian matrix of

exists and is finite in the neighborhood of when the false alarm rate is small. Specifically, there exists such that for any and ,

(32)

where represents the maximum eigenvalue of a matrix . Intuitively, this condition guarantees that the log-likelihood ratio is sufficiently smooth even when becomes large.

Example III.1.

Consider again the Gaussian exponential mean-change detection problem in Example II.1. This section considers the case where the exact value of the post-change exponent coefficient is unknown. Note that characterizes the entire post-change evolution rather than a single post-change distribution.

In the following, we first upper bound the asymptotic delay of in Lemma III.1. Next, we control the asymptotic false alarm rate of with some proper threshold in Lemma III.2. Finally, we combine these two lemmas and provide an asymptotically optimal solution when the post-change parameter is unknown in Theorem III.3.

Lemma III.1.

If , then

for any threshold .

Proof.

This follows directly from the definition of (see (24)) and (see (31)). ∎

Remark.

Suppose that . Since for all small ’s, the worst-case delay of satisfies

asymptotically as .

Lemma III.2.

Suppose that satisfies

(33)

where

is the volume coefficient corresponding to a -dimensional Euclidean ball. Here is the gamma function. Further, suppose that satisfies (26) and that satisfies (32) when . Then,

where as .

The proof of this lemma is given in the appendix.

Remark.

Taking log on both sides and re-arranging the terms, (33) becomes:

(34)

Since , it follows that as .

Theorem III.3.

Suppose that is as defined in Lemma III.2 and that satisfies (26). Further, suppose that (11), (12) and (32) hold for when . Then, solves the problem in (1) asymptotically as , and

(35)

where as .

Proof.

The asymptotic result follows directly from Lemma III.1 and Lemma III.2. ∎

Iv Extensions

Iv-a Dependent Observations

In [11], Lai considered the case where the pre- and post-change distributions are potentially dependent. Let the log-likelihood ratio be defined as:

(36)

It was assumed that, for any ,

(37)

and

(38)

Lai [11] also proposed and analyzed a window-limited version of the classical CuSum test, which was shown to be asymptotically optimal as .

Conditions (37) and (38) imply that the sum of log-likelihood ratio in the post-change regime concentrates around , which is linear in the number of observations. We can generalize these conditions to the detection-favorable setting with potentially dependent observations. Let the log-likelihood ratio be re-defined as:

(39)

Then, Theorem II.5 and Theorem III.3 still hold if one can establish that for any ,

(40)

and

(41)

in the place of (11) and (12). If for some positive constant , the conditions coincide with (37) and (38).

Iv-B Change-point Dependent Post-change Behavior

We could also relax the time-invariant assumptions in the distribution model. Assume that all have density with respect to some measure , and that have densities , respectively, with respect to . Note that it is possible that the densities depend on the change-point , i.e., . The log-likelihood ratio becomes:

(42)

where is the time index and is the hypothesized change-point. With this definition of the log-likelihood ratio, the results in Section II and Section III still hold as long as the corresponding conditions are satisfied.

V Numerical Results and Discussion

Fig. 1: Performances of tests with different sizes of windows. The Gaussian exponential mean-change problem is considered, with , , and . The change-point .

In Fig. 1, we study the performance of the proposed tests through simulations for the Gaussian exponential mean-change problem (see Example II.1). It is observed that the delay at is for all sizes of windows considered, as described in (II.2).

Fig. 2: Validation of distribution model using past COVID-19 data. The plot shows the four-day moving average of the daily new cases of COVID-19 as a fraction of the population in Wayne County, MI from October 1, 2020 to February 1, 2021 (in blue). The shape of the pre-change distribution is estimated using data from the previous 20 days (from September 11, 2020 to September 30, 2021), where and

. The mean of the Beta distributions with the best-fit

(defined in (45)) is also shown (in orange), which minimizes the mean-square distance between the daily incremental fraction and mean of the Beta distributions. The best-fit parameters are: , , and .

Next, we apply our GLR algorithm to monitoring the spread of COVID-19 using new case data from various counties in the US [14]. The goal is to detect the onset of a new wave of the pandemic based on the incremental daily cases. The problem is modeled as one of detecting a change in the mean of a Beta distribution as described below. Let denote the Beta distribution with shape parameters and . Let

(43)

Here, is a parametric function such that . Note that if and is not too large,

(44)

for all . Therefore, is designed to capture the shape of the average fraction of daily incremental cases. Let

(45)

where are all parameters. This specific choice of has two advantages: 1) It guarantees a rapid growth during the start of a new epidemic wave. When is small, grows like the left edge of a Gaussian density if is large. 2) It guarantees that daily incremental cases will eventually vanish at the end of the current epidemic wave, i.e., as .

In Fig. 2, we validate the choice of distribution model defined in (V) with data from COVID-19 wave of Fall 2020. In the simulation, and are estimated using observations within previous periods in which the increments remain low and roughly constant. It is observed that the mean of the daily fraction of incremental cases matches well with the mean of the fitted Beta distribution with in (45).

Fig. 3: COVID-19 monitoring example. The upper row shows the four-day moving average of the daily new cases of COVID-19 as a fraction of the population in Wayne County, MI (left), New York City, NY (middle), and Hamilton County, OH (right). A pre-change distribution is estimated using data from the previous 20 days (from May 26, 2021 to June 14, 2021). The mean of the Beta distributions with the hypothesized change-point and estimated parameters from the GLR algorithm is also shown (in orange). The lower row shows the evolution of the GLR test statistic (defined in (31)), respectively. The FAR threshold is set to , and the corresponding GLR test threshold is also shown (in red). The post-change distribution at time with hypothesized change-point is modeled as , where is defined in (45). The parameters , and are assumed to be unknown. The window size . The threshold is set using equation (33).

In Fig. 3, we illustrate the use our GLR algorithm with the distribution model in (V) in the detection of the onset of a new wave of COVID-19. We assumed a start date of June 15th, 2021 for the monitoring, at which time the pandemic appeared to be in a steady state with incremental cases staying relatively flat. We observe that the GLR statistic significantly and persistently crosses the test-threshold around late July in all counties, which is strong indication of a new wave of the pandemic. More importantly, unlike the raw observations which are highly varying, the GLR statistic shows a clear dichotomy between the pre- and post-change settings, with the statistic staying near zero before the purported onset of the new wave, and taking off nearly vertically after the onset.

Vi Conclusion

In this paper, we considered the problem of quickest detection of a change in the distribution of a sequence of independent observations. We assumed that the pre-change distribution is known and stationary and that the post-change distributions are assumed to evolve in a pre-determined non-stationary manner with some possible parametric uncertainty. In particular, the cumulative KL divergence between the post-change and the pre-change distributions grows super-linearly with time after the change-point. We extended the universal lower bound on the worst-case delay given in [11] to the more general setting, developed a window-limited CuSum test that asymptotically achieves the lower bound on the delay, and developed and analyzed a GLR test that asymptotically achieves the worst-case delay when the post-change distributions have parametric uncertainty. We validated our analysis through numerical results and demonstrated the use of our approach in monitoring pandemics.

References

  • [1] V. V. Veeravalli and T. Banerjee, “Quickest change detection,” in Academic press library in signal processing: Array and statistical signal processing.   Cambridge, MA: Academic Press, 2013.
  • [2] L. Xie, S. Zou, Y. Xie, and V. V. Veeravalli, “Sequential (quickest) change detection: Classical results and new directions,” arXiv preprint arXiv:2104.04186, 2021.
  • [3] P. J. Huber, “A robust version of the probability ratio test,” The Annals of Mathematical Statistics, vol. 36, no. 6, pp. 1753–1758, Dec. 1965.
  • [4] P. Moulin and V. V. Veeravalli, Statistical Inference for Engineers and Data Scientists.   Cambridge, UK: Cambridge University Press, 2018.
  • [5] T. L. Molloy and J. J. Ford, “Misspecified and asymptotically minimax robust quickest change detection,” IEEE Transactions on Signal Processing, vol. 65, no. 21, pp. 5730–5742, 2017.
  • [6] T. L. Molloy and J. J. Ford, “Minimax robust quickest change detection in systems and signals with unknown transients,” IEEE Transactions on Automatic Control, vol. 64, no. 7, pp. 2976–2982, July 2019.
  • [7] Y. Liang and V. V. Veeravalli, “Non-parametric quickest detection of a change in the mean of an observation sequence,” in 2021 55th Annual Conference on Information Sciences and Systems (CISS), 2021, pp. 1–6.
  • [8] G. Lorden, “Procedures for reacting to a change in distribution,” The Annals of Mathematical Statistics, vol. 42, no. 6, pp. 1897–1908, Dec. 1971.
  • [9] M. Pollak, “Optimality and almost optimality of mixture stopping rules,” Annals of Statistics, vol. 6, no. 4, pp. 910–916, Jul. 1978.
  • [10] D. Siegmund and E. S. Venkatraman, “Using the generalized likelihood ratio statistic for sequential detection of a change-point,” The Annals of Statistics, vol. 23, no. 1, pp. 255–271, Feb. 1995.
  • [11] T. L. Lai, “Information bounds and quick detection of parameter changes in stochastic systems,” IEEE Transactions on Information Theory, vol. 44, no. 7, pp. 2917–2929, November 1998.
  • [12] E. S. PAGE, “Continuous Inspection Schemes,” Biometrika, vol. 41, no. 1/2, pp. 100–115, Jun. 1954.
  • [13] G. V. Moustakides, “Optimal stopping times for detecting changes in distributions,” Annals of Statistics, vol. 14, no. 4, pp. 1379–1387, Dec. 1986.
  • [14] N. Y. Times. Coronavirus in the U.S.: Latest Map and Case Count. [Online]. Available: https://www.nytimes.com/interactive/2021/us/covid-cases.html