The problem of quickest change detection (QCD) is of fundamental importance in mathematical statistics (see, for example, [1, 2] for an overview). Given a sequence of observations whose distribution changes at some unknown change-point, the goal is to detect the change in distribution as quickly as possible after it occurs, while not making too many false alarms.
In the classical formulations of the QCD problem, it is assumed that the pre- and post-change distributions are known and stationary, and that the pre-change distribution is independent and identically distributed (i.i.d.). In many practical situations, while it is reasonable to assume that we can accurately estimate the pre-change distribution, the post-change distribution is rarely completely known. Furthermore, while it is reasonable to assume that the system is in steady-state before the change-point and producing stationary observations, the post-change distribution may typically be non-stationary. For example, in the pandemic monitoring problem, the distribution of the number of people infected daily might have achieved a steady-state before the start of a new wave. At the onset of the new wave, the post-change distribution is constantly evolving. Indeed, during the early phase of the new wave, the mean of the post-change distribution grows approximately in an exponential manner. We will address the pandemic monitoring problem in detail in SectionV.
In this paper, our main focus is the QCD problem with independent observations111The extension to the case of dependent observations is discussed in Section IV., where the pre-change distribution is assumed to be known and stationary, while the post-change distribution is allowed to be non-stationary and have some parametric uncertainty.
There has been prior work on extensions of the classical QCD framework to the case where the pre- and/or the post-change distributions are not stationary. A prevalent approach is based on a minimax robust  formulation of the QCD problem, where it is assumed that the pre- and post-change distributions come from mutually exclusive uncertainty classes. Under certain conditions, e.g., joint stochastic boundedness  and weak stochastic boundedness , low-complexity tests that either coincide with  or asymptotically approach  the optimal test can be found. The essence of the minimax robust approach to dealing with non-stationarity is to identify stationary pre- and post-change distributions that are least favorable for detection, and design classical (i.i.d.) QCD tests for these least favorable distributions. A drawback of this approach is that the robust tests can have suboptimal performance for actual distributions encountered in practice.
There have also been extensions of the classical formulation to the case where the pre- and/or post-change distributions are not fully known. In the generalized likelihood ratio (GLR) approach, introduced in , it is assumed that the pre- and post-change distributions i.i.d. and come from one-parameter exponential families is studied, and the post-change distribution has an unknown parameter. An alternative to the GLR test, the mixture based test, was developed for the same settings as in , by Pollak 
. The GLR approach is studied in detail for the problem of detecting the change in the mean of a Gaussian distribution with unknown post-change mean in. Both the mixture and the GLR approaches are studied in detail for the case where pre- and post-change distributions are non-i.i.d. and the post-change distribution has parametric uncertainty in , where it is assumed that the cumulative Kullback-Leibler (KL) divergence between the post-change and the pre-change distributions grows linearly in the number of observations. A universal lower bound on the worst-case delay is first developed, and a window-limited CuSum test that asymptotically achieves the lower bound is proposed in this regime. For the case where the post-change distribution has parametric uncertainty, a window-limited GLR test is proposed and analyzed.
In some application (e.g., the pandemic monitoring problem), the post-change distributions are non-stationary in a way such that the cumulative KL divergence grows super-linearly after the change-point, in which case we say that the post-change distribution is detection-favorable. This is the setting we consider in this paper. Our contributions are as follows:
We extend the universal lower bound on the worst-case delay given in  to the more general detection-favorable setting.
We develop a window-limited CuSum test that asymptotically achieves the lower bound on the delay when the post-change distribution is detection-favorable and fully known.
We develop and analyze a GLR test that asymptotically achieves the worst-case delay when the post-change distributions are detection-favorable and have parametric uncertainty.
We validate our analysis through numerical results, and demonstrate the use of our approach in monitoring pandemics.
The rest of the paper is structured as follows. In Section II, we describe the information bounds and propose an asymptotically optimal CuSum test when the post-change distribution is detection-favorable and completely known. In Section III, we propose an asymptotically optimal GLR test when the post-change distribution is still detection-favorable but has unknown parameters. In Section V, we present some numerical results, including results on monitoring the COVID-19 pandemic. We conclude the paper in Section VI.
Ii Information Bounds and Optimal Detection
be a sequence of independent random variables, and letbe a change-point. Assume that all have density with respect to some measure . Furthermore, assume that have densities , respectively, with respect to , i.e., we are implicitly assuming that the post-change distribution is time-invariant with respect to the change-point . Note that the distributions of the observations are allowed to be non-stationary after the change-point. Let
denote the probability measure on the entire sequence of observations, when the change-point is, and let denote the corresponding expectation.
The change-time is assumed to be unknown but deterministic. The problem is to detect the change quickly while not causing too many false alarms. Let be a stopping time  defined on the observation sequence associated with the detection rule, i.e. is the time at which we stop taking observations and declare that the change has occurred.
Ii-a Classical Results under i.i.d. Model
A special case of the model described above is where both the pre- and post-change distributions are independent and identically distributed (i.i.d.), i.e., for all . In this case, Lorden  proposed solving the following optimization problem to find the best stopping time :
characterizes the worst-case delay, and denotes the sigma algebra generated by , i.e., . The constraint set is
which guarantees that the false alarm rate of the algorithm does not exceed . Here, is the expectation operator when the change never happens, and .
Lorden also showed that Page’s Cumulative Sum (CuSum) algorithm 
whose test statistic is given by:
solves the problem in (1) asymptotically as . Here, is the log-likelihood ratio defined as:
The CuSum stopping rule is given by:
Ii-B Information Bounds for Non-stationary Post-Change Distributions
In the case where both the pre- and post-change distributions are independent and the post-change distributions are non-stationary, let the log-likelihood ratio be:
where . Here is a hypothetisized change-point and is drawn from the true distribution .
In the classical i.i.d. model described in Section II-A, the cumulative KL-divergence after the change-point increases linearly in the number of observations. We generalize this condition is as follows. Let the growth function represent the cumulative Kullback-Leibler (KL) divergence under the true distribution. More specifically, let be increasing and continuous. Note that the inverse of , denoted by , exists and is also increasing and continuous. It is assumed that the expected sum of the log-likelihood ratios under matches the value of the growth function at all positive integers, i.e.,
Note that the KL divergence is always positive, i.e.,
In this paper, we are interested in the case where the post-change distribution is eventually persistently different from that of pre-change. Specifically, it is assumed that is asymptotically lower bounded by some linear function, i.e., , for some constant as .
The proof is given in the appendix.
One can generalize condition (10) in a way that either or
holds for all positive integers .
Consider the Gaussian exponential mean-change detection problem (with unit variance) as follows. Denote by the Gaussian distribution with mean and variance . Let be distributed as , and for all , let be distributed as . Here is some positive fixed constant. The log-likelihood ratio is given by:
Now, the growth function can be calculated as
Note that as for all . Also, the sum of variances of the log-likelihood ratios is
for all , which establishes condition (9). Further, for any and ,
which establishes condition (10). ∎
The following theorem gives a lower bound on the worst-case delay as .
Suppose that (11) holds with an increasing and continuous , where as for some . Then, as ,
where as .
The proof follows the structure of [11, Thm 1]. Let . Fix such that . First, since ,
Therefore, it remains to show that
From the proof of [11, Thm 1], since , for any , there exists some such that and
Fix . Now consider the events:
such that . Choose . Thus, for large ,
where is explained as follows. By assumption, there exists some large such that for all , . Set . For any , . Specifically, this is true for above for all ’s large enough.
Therefore, for large , the chosen satisfies that and that .
Since , we get
For large , , and therefore,
where the last inequality follows from (16), and thus
because for any fixed . We next turn to :
Since is arbitrary and is continuous, we can take the limit as . Recalling the definition of in (19), we obtain
Since is arbitrary, (15) is proved, and the proof is complete. ∎
Ii-C Asymptotically Optimal Detection with Non-stationary Post-Change Distributions
Recall that under the classical setting, Page’s CuSum test (in (6)) is optimal and has the following structure:
where is the log-likelihood ratio when the post-change distributions are stationary (defined in (5)). When the post-change distributions are potentially non-stationary, let the modified CuSum stopping rule be:
where represents the log-likelihood ratio between densities and for observation (defined in (7)). Here is the time index and is the hypothesized change-point. Note that if the post-change distributions are indeed stationary, i.e., , we would get for all , and thus .
As shown in (4), Page’s classical CuSum algorithm admits a recursive way to compute its test statistic. Unfortunately, despite independent observations, when the log-likelihood ratios actually depend on the hypothesized change-point , the test statistic in (24) cannot be computed recursively. For computational tractability, we therefore consider a window-limited version of the test in (24):
where is the window size. We require that satisfy the following conditions:
Since the range for the maximum is smaller in than in , given any realization of , if the test statistic of crosses the threshold at some time , so does that of . Therefore, for any fixed threshold ,
In the following, we first control the asymptotic false alarm rate of with an appropriately chosen threshold in Lemma II.3. Then, we upper bound the asymptotic delay of in Lemma II.4. Finally, we combine these two lemmas and provide an asymptotically optimal solution to the problem in (1) in Theorem II.5.
Suppose that and that satisfies (26). Then,
where as .
If , where , then as .
Clearly, the asymptotic inequality still holds in the case where no window is applied, i.e., .
The proofs for the above two lemmas are given in the appendix. Using these lemmas, we obtain the following asymptotic result.
Iii Window-limited GLR with Unknown Parameters
We now study the case where the evolution of the post-change distribution is parametrized by . Let be distributed as , where the corresponding densities are with respect to the common measure . Let be the parameter set and . Note that does not need to be compact. The true post-change parameter is assumed to be unknown but deterministic. Let the log-likelihood ratio be re-defined as
for any and . Here is drawn from the distribution with true change-point and true post-change parameter . The problem is to solve (1) asymptotically as under parameter uncertainty.
Consider the following window-limited GLR stopping:
where as . Therefore, it is guaranteed that for all small enough . Further, let be compact for each , and thus the maximizing given the pair at the false-alarm rate , denoted by , is contained in . Note that we omit the dependency of on here for simplicity.
If is discrete-valued, the in (31) becomes , and the stopping time is equivalent to running CuSum algorithms simultaneously, where stops whenever one of the CuSum algorithms stops. Therefore, we only consider the case where is continuous.
Finally, it is assumed that the largest absolute eigenvalue of the Hessian matrix ofexists and is finite in the neighborhood of when the false alarm rate is small. Specifically, there exists such that for any and ,
where represents the maximum eigenvalue of a matrix . Intuitively, this condition guarantees that the log-likelihood ratio is sufficiently smooth even when becomes large.
Consider again the Gaussian exponential mean-change detection problem in Example II.1. This section considers the case where the exact value of the post-change exponent coefficient is unknown. Note that characterizes the entire post-change evolution rather than a single post-change distribution.
In the following, we first upper bound the asymptotic delay of in Lemma III.1. Next, we control the asymptotic false alarm rate of with some proper threshold in Lemma III.2. Finally, we combine these two lemmas and provide an asymptotically optimal solution when the post-change parameter is unknown in Theorem III.3.
If , then
for any threshold .
Suppose that . Since for all small ’s, the worst-case delay of satisfies
asymptotically as .
The proof of this lemma is given in the appendix.
Taking log on both sides and re-arranging the terms, (33) becomes:
Since , it follows that as .
Iv-a Dependent Observations
In , Lai considered the case where the pre- and post-change distributions are potentially dependent. Let the log-likelihood ratio be defined as:
It was assumed that, for any ,
Lai  also proposed and analyzed a window-limited version of the classical CuSum test, which was shown to be asymptotically optimal as .
Conditions (37) and (38) imply that the sum of log-likelihood ratio in the post-change regime concentrates around , which is linear in the number of observations. We can generalize these conditions to the detection-favorable setting with potentially dependent observations. Let the log-likelihood ratio be re-defined as:
Iv-B Change-point Dependent Post-change Behavior
We could also relax the time-invariant assumptions in the distribution model. Assume that all have density with respect to some measure , and that have densities , respectively, with respect to . Note that it is possible that the densities depend on the change-point , i.e., . The log-likelihood ratio becomes:
where is the time index and is the hypothesized change-point. With this definition of the log-likelihood ratio, the results in Section II and Section III still hold as long as the corresponding conditions are satisfied.
V Numerical Results and Discussion
In Fig. 1, we study the performance of the proposed tests through simulations for the Gaussian exponential mean-change problem (see Example II.1). It is observed that the delay at is for all sizes of windows considered, as described in (II.2).
Next, we apply our GLR algorithm to monitoring the spread of COVID-19 using new case data from various counties in the US . The goal is to detect the onset of a new wave of the pandemic based on the incremental daily cases. The problem is modeled as one of detecting a change in the mean of a Beta distribution as described below. Let denote the Beta distribution with shape parameters and . Let
Here, is a parametric function such that . Note that if and is not too large,
for all . Therefore, is designed to capture the shape of the average fraction of daily incremental cases. Let
where are all parameters. This specific choice of has two advantages: 1) It guarantees a rapid growth during the start of a new epidemic wave. When is small, grows like the left edge of a Gaussian density if is large. 2) It guarantees that daily incremental cases will eventually vanish at the end of the current epidemic wave, i.e., as .
In Fig. 2, we validate the choice of distribution model defined in (V) with data from COVID-19 wave of Fall 2020. In the simulation, and are estimated using observations within previous periods in which the increments remain low and roughly constant. It is observed that the mean of the daily fraction of incremental cases matches well with the mean of the fitted Beta distribution with in (45).
In Fig. 3, we illustrate the use our GLR algorithm with the distribution model in (V) in the detection of the onset of a new wave of COVID-19. We assumed a start date of June 15th, 2021 for the monitoring, at which time the pandemic appeared to be in a steady state with incremental cases staying relatively flat. We observe that the GLR statistic significantly and persistently crosses the test-threshold around late July in all counties, which is strong indication of a new wave of the pandemic. More importantly, unlike the raw observations which are highly varying, the GLR statistic shows a clear dichotomy between the pre- and post-change settings, with the statistic staying near zero before the purported onset of the new wave, and taking off nearly vertically after the onset.
In this paper, we considered the problem of quickest detection of a change in the distribution of a sequence of independent observations. We assumed that the pre-change distribution is known and stationary and that the post-change distributions are assumed to evolve in a pre-determined non-stationary manner with some possible parametric uncertainty. In particular, the cumulative KL divergence between the post-change and the pre-change distributions grows super-linearly with time after the change-point. We extended the universal lower bound on the worst-case delay given in  to the more general setting, developed a window-limited CuSum test that asymptotically achieves the lower bound on the delay, and developed and analyzed a GLR test that asymptotically achieves the worst-case delay when the post-change distributions have parametric uncertainty. We validated our analysis through numerical results and demonstrated the use of our approach in monitoring pandemics.
-  V. V. Veeravalli and T. Banerjee, “Quickest change detection,” in Academic press library in signal processing: Array and statistical signal processing. Cambridge, MA: Academic Press, 2013.
-  L. Xie, S. Zou, Y. Xie, and V. V. Veeravalli, “Sequential (quickest) change detection: Classical results and new directions,” arXiv preprint arXiv:2104.04186, 2021.
-  P. J. Huber, “A robust version of the probability ratio test,” The Annals of Mathematical Statistics, vol. 36, no. 6, pp. 1753–1758, Dec. 1965.
-  P. Moulin and V. V. Veeravalli, Statistical Inference for Engineers and Data Scientists. Cambridge, UK: Cambridge University Press, 2018.
-  T. L. Molloy and J. J. Ford, “Misspecified and asymptotically minimax robust quickest change detection,” IEEE Transactions on Signal Processing, vol. 65, no. 21, pp. 5730–5742, 2017.
-  T. L. Molloy and J. J. Ford, “Minimax robust quickest change detection in systems and signals with unknown transients,” IEEE Transactions on Automatic Control, vol. 64, no. 7, pp. 2976–2982, July 2019.
-  Y. Liang and V. V. Veeravalli, “Non-parametric quickest detection of a change in the mean of an observation sequence,” in 2021 55th Annual Conference on Information Sciences and Systems (CISS), 2021, pp. 1–6.
-  G. Lorden, “Procedures for reacting to a change in distribution,” The Annals of Mathematical Statistics, vol. 42, no. 6, pp. 1897–1908, Dec. 1971.
-  M. Pollak, “Optimality and almost optimality of mixture stopping rules,” Annals of Statistics, vol. 6, no. 4, pp. 910–916, Jul. 1978.
-  D. Siegmund and E. S. Venkatraman, “Using the generalized likelihood ratio statistic for sequential detection of a change-point,” The Annals of Statistics, vol. 23, no. 1, pp. 255–271, Feb. 1995.
-  T. L. Lai, “Information bounds and quick detection of parameter changes in stochastic systems,” IEEE Transactions on Information Theory, vol. 44, no. 7, pp. 2917–2929, November 1998.
-  E. S. PAGE, “Continuous Inspection Schemes,” Biometrika, vol. 41, no. 1/2, pp. 100–115, Jun. 1954.
-  G. V. Moustakides, “Optimal stopping times for detecting changes in distributions,” Annals of Statistics, vol. 14, no. 4, pp. 1379–1387, Dec. 1986.
-  N. Y. Times. Coronavirus in the U.S.: Latest Map and Case Count. [Online]. Available: https://www.nytimes.com/interactive/2021/us/covid-cases.html