1 Introduction
In the classical problem of quickest change detection [1], [2], [3], a decision maker observes a stochastic process with a given distribution. At some point in time, the distribution of the process changes. The problem objective is to detect this change in distribution as quickly as possible, with minimum possible delay, subject to a constraint on the rate of false alarms. This problem has applications in statistical process control [4], sensor networks [5], cyberphysical system monitoring [6], regime changes in neural data [7], traffic monitoring [8], and in general, anomaly detection [8], [9].
In many applications of anomaly detection, the observed process has a periodic or regular statistical behavior. Such a periodic statistical behavior was observed by us in the multimodal data collected around the Tunnel To Towers 5K run in NYC [8], [9]. In these papers, our objective was to detect the 5K run using multimodal data from CCTV cameras, Twitter, and Instagram posts. The details on the data collected can be found in these papers. In Fig. 1
, we have plotted the average counts of the number of persons, the number of Instagram posts, and the number of vehicles captured through and around two CCTV cameras in NYC. One camera was off the path of the 5K run and another camera was on the path of the run. The data corresponding to the offpath camera represents normal behavior. The object counts from CCTV images were extracted using a convolution neural networkbased object detector. The data for persons and Instagram are from the offpath camera (the Instagram data is collected in a square grid around a camera), and the vehicle data is from the onpath camera. As can be seen from the figure, the average counts across different data collection days (here four Sundays) show a similar pattern (growth and decay). Such periodic or cyclostationary behavior of the data can also be observed in neural spike data
[10], [7]. In a controlled experiment, where an animal is trained to do a certain task repeatedly, one can expect a similarity in neural firing patterns [7].The anomaly detection problem in such applications can be posed as the problem of detecting deviations from this regular or periodic statistical behavior. In this paper, we develop theory and algorithms to solve this change detection problem. Precise problem formulations are given below. The quickest change detection literature is divided broadly into two parts: results for i.i.d. processes with algorithms that can be computed recursively and enjoy strongly optimality properties [11], and results for noni.i.d. data with algorithms that are hard to compute but are asymptotically optimal [12], [13], [14], [15]. We show in this paper that the algorithms for our noni.i.d. setup can be computed recursively and are asymptotically optimal. A class of models of this type was first studied by us in [9]. In this paper, we study a much broader class of processes and also develop optimality theory.
2 Model to Capture Periodic Statistical Behavior
An independent and identically distributed (i.i.d.) process is a sequence of random variables that are independent and have the same distribution. We define a new category of stochastic processes called independent and periodically identically distributed (i.p.i.d.) processes:
Definition 1.
Let be a sequence of random variables such that the variable has density . The stochastic process is called independent and periodically identically distributed (i.p.i.d) if are independent and there is a positive integer such that the sequence of densities is periodic with period :
We say that the process is i.p.i.d. with the law .
Note that the law of an i.p.i.d. process is completely characterized by the finitedimensional product distribution involving . We assume that in a normal regime, the data can be modeled as an i.p.i.d. process. At some point in time, due to an anomaly, the distribution of the i.p.i.d. process deviates from . Our objective in this paper is to develop algorithms that can observe the process in real time and detect changes in the distribution as quickly as possible, subject to a constraint on the rate of false alarms. In Section 3, we define the change point model and develop algorithms and optimality theory for detecting changes in an i.p.i.d. processes. In Section 4, we extend the results to the case when the postchange distribution is unknown. In Section 5, we comment on parametric i.p.i.d. models that are easier to learn as compared to learning .
3 A Change Detection Theory for General i.p.i.d. Processes
Consider another periodic sequence of densities such that
Thus, we essentially have distinct set of densities . We assume that at some point in time , called the change point in the following, the law of the i.p.i.d. process is governed not by the densities , but by the new set of densities (a precise definition is given below). These densities need not be all different from the set of densities , but we assume that there exists at least an such that they are different:
(1) 
The change point model is as follows. At a time point , the distribution of the random variable changes from to :
(2) 
We emphasize that the densities and are periodic. This model is equivalent to saying that we have two i.p.i.d. processes, one governed by the densities and another governed by the densities , and at the change point , the process switches from one i.p.i.d. process to another. A more general change point model where the exact postchange density is unknown will be discussed in Section 4.
We want to detect the change described in (2) as quickly as possible, subject to a constraint on the rate of false alarms. We are looking for a stopping time for the process to minimize a metric on the delay and to avoid the event of false alarm . Specifically, we are interested in the popular false alarm and delay metrics of Pollak [16] and Lorden [17]. Let
denote the probability law of the process
when the change occurs at time and let denote the corresponding expectation. When there is no change, we use the notation . The quickest change detection problem formulation of Pollak [16] is defined as(3) 
where is a given constraint on the mean time to false alarm. Thus, the objective is to find a stopping time that minimizes the worst case conditional average detection delay subject to a constraint on the mean time to false alarm. A popular alternative is the worstworst case delay metric of Lorden [17]:
(4) 
where is used to denote the supremum of the random variable outside a set of measure zero. Further motivation and comparison of these and other problem formulations for change point detection can be found in the literature [3], [2], [1], [12].
We now propose a CUSUMtype scheme to detect the above change (also see [12]). We compute the sequence of statistics
(5) 
and raise an alarm as soon as the statistic is above a threshold :
(6) 
We show below that this scheme is asymptotically optimal in a welldefined sense. But, before that we prove an important property that the statistic can be computed recursively and using finite memory. The proof of this and all the other results are provided in Section 7.
Lemma 1.
The statistic sequence can be recursively computed as
(7) 
where . Further, since the set of pre and postchange densities and are finite, the recursion (7) can be computed using finite memory needed to store these densities.
In the rest of the paper, we refer to (7) to as the PeriodicCUSUM algorithm.
Towards proving the optimality of the PeriodicCUSUM scheme, we obtain a universal lower bound on the performance of any stopping time for detecting changes in i.p.i.d. processes. Define
(8) 
where
is the KullbackLeibler divergence between the densities
and . We assume thatand
Theorem 3.1.
Let the information number as defined in (8) satisfy . Then, for any stopping time satisfying the false alarm constraint , we have as
(9) 
where an term is one that goes to zero in the limit as .
We now show that the PeriodicCUSUM scheme (5)–(7) is asymptotically optimal for both the formulations (3) and (4).
Theorem 3.2.
We note that the algorithm is also optimal for various other formulations studied in the literature [12]. We do not report these here due to a paucity of space.
4 Change Detection With Unknown PostChange i.p.i.d. Process
In the previous section, we assumed that the postchange law is known to the decision maker. This information was used to design the PeriodicCUSUM algorithm (7). In practice, this information may not be available. We now show that if the postchange law belongs to a finite set of possible distributions, , then an asymptotically optimal test can be designed.
For , define the statistic
(11) 
and the stopping rule
(12) 
which is the PeriodicCUSUM stopping rule for the th postchange law . Now, define
(13) 
Then, note that
(14) 
The stopping rule is the stopping rule under which we stop the first time any of the PeriodicCUSUMs raise an alarm.
We now show that this stopping rule is optimal for both Lorden’s and Pollak’s criteria. Towards this end, we define a ShiryaevRobertstype statistic
(15) 
and a ShiryaevRobertstype stopping rule
(16) 
Note that
(17) 
We have the following theorem.
Theorem 4.1.
The process is a martingale. If then
Further, if is the true postchange i.p.i.d. law and
(18) 
then
(19) 
Given the lower bound in Theorem 3.1, the stopping rule is thus asymptotically optimal with respect to the criteria of Lorden and Pollak, uniformly over each possible postchange hypothesis , .
The condition
is equivalent to saying that the mean overshoot is finite. This assumption is satisfied for example if the likelihood ratios are bounded or may be satisfied if the LLRs have finite variance. The latter will be verified in a future version of this paper. Note that our statistics are not random walks (but periodic versions of them). As a result, we cannot directly borrow such finiteness results from
[18], for example.5 Detection in Parametric i.p.i.d. Models
In practice, learning pre and postchange laws and
can be hard. Thus, it is of interest to study lowdimensional parametric i.p.i.d. models. Such parametric models were the object of our study in
[9] where we assumed that we have a periodic function of parameters with period , and . Another option is to assume that we have a smooth function and(20) 
The batch parameter model studied in [9] is then equivalent to a step approximation to in (20). The change detection problem in this process will be equivalent to detecting a change in the parametric function from to some . The PeriodicCUSUM algorithm can be easily applied to such models, and all the optimality results proved here are valid for the parametric models as well. We refer the readers to [9] for numerical results and application of our algorithms to NYC data. We do not reproduce them here due to a paucity of space.
6 Conclusions and Future Work
We developed a general asymptotic theory for quickest detection of changes in i.p.i.d. models. We also studied the case where the postchange i.p.i.d. law is unknown. In future, we will apply the developed algorithm to real multimodal data, e.g., as collected in [8] and [9]. We will also study optimality theory for more general change point models in the i.p.i.d. setting.
7 Proofs
Proof of Lemma 1.
For any sequence of random variables, we can write
(21) 
Substituting into the above equation we get the desired recursion for in (5):
Note that the increment term is only a function of the current observation . Also, since the processes are i.p.i.d. with laws and , the likelihood ratio functions are not all distinct, and there are only such functions to . Thus, we need only a finite amount of memory to store the past statistic, current observation, and densities to compute this statistic recursively. ∎
Proof of Theorem 3.1.
Let be the log likelihood ratio at time . We show that the sequence satisfies the following statement:
(22) 
where is as defined in (8). The lower bound then follows from Theorem 1 in [12]. Towards proving (22), note that as
(23) 
The above display is true because of the i.p.i.d. nature of the observation processes. This implies that as
(24) 
To show this, note that
(25) 
For a fixed , because of (23), the LHS in (24) is greater than for large enough. Also, let the maximum on the LHS be achieved at a point , then
Now cannot be bounded because of the presence of in the denominator. This implies , for any fixed , and . Thus, . Since , we have that the LHS in (24) is less than , for large enough. This proves (24). To prove (22), note that due to the i.p.i.d. nature of the process
(26) 
The right hand side goes to zero because of (24) and because the maximum on the right hand side in (26) is over only finitely many terms. ∎
Proof of Theorem 3.2.
Again with , we show that the sequence satisfies the following statement:
(27) 
The upper bound then follows from Theorem 4 in [12]. To prove (27), note that due to the i.p.i.d nature of the process we have
(28) 
The right hand side of the above equation goes to zero for any because of (23) and also because of the finite number of maximizations. The false alarm result follows directly from [12] with because the likelihood ratios here also form a martingale. ∎
Proof of Theorem 4.1.
References
 [1] H. V. Poor and O. Hadjiliadis, Quickest detection. Cambridge University Press, 2009.
 [2] A. G. Tartakovsky, I. V. Nikiforov, and M. Basseville, Sequential Analysis: Hypothesis Testing and ChangePoint Detection. Statistics, CRC Press, 2014.
 [3] V. V. Veeravalli and T. Banerjee, Quickest Change Detection. Academic Press Library in Signal Processing: Volume 3 – Array and Statistical Signal Processing, 2014. http://arxiv.org/abs/1210.5552.
 [4] G. Tagaras, “A survey of recent developments in the design of adaptive control charts,” Journal of Quality Technology, vol. 30, pp. 212–231, July 1998.
 [5] T. Banerjee and V. V. Veeravalli, “Dataefficient quickest change detection in sensor networks,” IEEE Transactions on Signal Processing, vol. 63, no. 14, pp. 3727–3735, 2015.
 [6] Y. C. Chen, T. Banerjee, A. D. DomínguezGarcía, and V. V. Veeravalli, “Quickest line outage detection and identification,” IEEE Transactions on Power Systems, vol. 31, pp. 749–758, Jan 2016.
 [7] T. Banerjee, S. Allsop, K. M. Tye, D. Ba, and V. Tarokh, “Sequential detection of regime changes in neural data,” arXiv preprint arXiv:1809.00358, 2018.
 [8] T. Banerjee, G. Whipps, P. Gurram, and V. Tarokh, “Sequential event detection using multimodal data in nonstationary environments,” in Proc. of the 21st International Conference on Information Fusion, July 2018.
 [9] T. Banerjee, G. Whipps, P. Gurram, and V. Tarokh, “Cyclostationary statistical models and algorithms for anomaly detection using multimodal data,” in Proc. of the 6th IEEE Global Conference on Signal and Information Processing, Nov. 2018.

[10]
Y. Zhang, N. MalemShinitski, S. A. Allsop, K. Tye, and D. Ba, “Estimating a separablymarkov random field (smurf) from binary observations,”
Neural Computation, vol. 30, no. 4, pp. 1046–1079, 2018.  [11] G. V. Moustakides, “Optimal stopping times for detecting changes in distributions,” Ann. Statist., vol. 14, pp. 1379–1387, Dec. 1986.
 [12] T. L. Lai, “Information bounds and quick detection of parameter changes in stochastic systems,” IEEE Trans. Inf. Theory, vol. 44, pp. 2917 –2929, Nov. 1998.
 [13] A. G. Tartakovsky and V. V. Veeravalli, “General asymptotic Bayesian theory of quickest change detection,” SIAM Theory of Prob. and App., vol. 49, pp. 458–497, Sept. 2005.
 [14] A. G. Tartakovsky, “On asymptotic optimality in sequential changepoint detection: Noniid case,” IEEE Transactions on Information Theory, vol. 63, no. 6, pp. 3433–3450, 2017.
 [15] S. Pergamenchtchikov and A. G. Tartakovsky, “Asymptotically optimal pointwise and minimax quickest changepoint detection for dependent data,” Statistical Inference for Stochastic Processes, vol. 21, pp. 217–259, Apr 2018.
 [16] M. Pollak, “Optimal detection of a change in distribution,” Ann. Statist., vol. 13, pp. 206–227, Mar. 1985.
 [17] G. Lorden, “Procedures for reacting to a change in distribution,” Ann. Math. Statist., vol. 42, pp. 1897–1908, Dec. 1971.
 [18] M. Woodroofe, Nonlinear Renewal Theory in Sequential Analysis. CBMSNSF regional conference series in applied mathematics, SIAM, 1982.
Comments
There are no comments yet.