Quickest Detection Of Deviations From Periodic Statistical Behavior

10/30/2018
by   Taposh Banerjee, et al.
0

A new class of stochastic processes called independent and periodically identically distributed (i.p.i.d.) processes is defined to capture periodically varying statistical behavior. Algorithms are proposed to detect changes in such i.p.i.d. processes. It is shown that the algorithms can be computed recursively and are asymptotically optimal. This problem has applications in anomaly detection in traffic data, social network data, and neural data, where periodic statistical behavior has been observed.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

04/06/2019

Quickest Event Detection Using Multimodal Data In Nonstationary Environments

Theory and algorithms are developed for event detection using multimodal...
04/06/2019

A Bayesian Theory of Change Detection in Statistically Periodic Random Processes

A new class of stochastic processes called independent and periodically ...
07/02/2018

Cyclostationary Statistical Models and Algorithms for Anomaly Detection Using Multi-Modal Data

A framework is proposed to detect anomalies in multi-modal data. A deep ...
02/22/2019

Bayesian Anomaly Detection and Classification

Statistical uncertainties are rarely incorporated in machine learning al...
11/29/2018

A Machine-Learning Phase Classification Scheme for Anomaly Detection in Signals with Periodic Characteristics

In this paper we propose a novel machine-learning method for anomaly det...
10/13/2011

Discovering Emerging Topics in Social Streams via Link Anomaly Detection

Detection of emerging topics are now receiving renewed interest motivate...
08/05/2019

Some Developments in Clustering Analysis on Stochastic Processes

We review some developments on clustering stochastic processes and come ...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

In the classical problem of quickest change detection [1], [2], [3], a decision maker observes a stochastic process with a given distribution. At some point in time, the distribution of the process changes. The problem objective is to detect this change in distribution as quickly as possible, with minimum possible delay, subject to a constraint on the rate of false alarms. This problem has applications in statistical process control [4], sensor networks [5], cyber-physical system monitoring [6], regime changes in neural data [7], traffic monitoring [8], and in general, anomaly detection [8], [9].

In many applications of anomaly detection, the observed process has a periodic or regular statistical behavior. Such a periodic statistical behavior was observed by us in the multimodal data collected around the Tunnel To Towers 5K run in NYC [8], [9]. In these papers, our objective was to detect the 5K run using multimodal data from CCTV cameras, Twitter, and Instagram posts. The details on the data collected can be found in these papers. In Fig. 1

, we have plotted the average counts of the number of persons, the number of Instagram posts, and the number of vehicles captured through and around two CCTV cameras in NYC. One camera was off the path of the 5K run and another camera was on the path of the run. The data corresponding to the off-path camera represents normal behavior. The object counts from CCTV images were extracted using a convolution neural network-based object detector. The data for persons and Instagram are from the off-path camera (the Instagram data is collected in a square grid around a camera), and the vehicle data is from the on-path camera. As can be seen from the figure, the average counts across different data collection days (here four Sundays) show a similar pattern (growth and decay). Such periodic or cyclostationary behavior of the data can also be observed in neural spike data

[10], [7]. In a controlled experiment, where an animal is trained to do a certain task repeatedly, one can expect a similarity in neural firing patterns [7].

Figure 1: The average person, vehicle, and Instagram post counts for data collected in NYC in [8]. The figure shows that the average counts have similar statistical properties across different days. The vehicle data is from a CCTV camera on the path of the event and captures a decrease in the average counts on the event day, Sept. 24.

The anomaly detection problem in such applications can be posed as the problem of detecting deviations from this regular or periodic statistical behavior. In this paper, we develop theory and algorithms to solve this change detection problem. Precise problem formulations are given below. The quickest change detection literature is divided broadly into two parts: results for i.i.d. processes with algorithms that can be computed recursively and enjoy strongly optimality properties [11], and results for non-i.i.d. data with algorithms that are hard to compute but are asymptotically optimal [12], [13], [14], [15]. We show in this paper that the algorithms for our non-i.i.d. setup can be computed recursively and are asymptotically optimal. A class of models of this type was first studied by us in [9]. In this paper, we study a much broader class of processes and also develop optimality theory.

2 Model to Capture Periodic Statistical Behavior

An independent and identically distributed (i.i.d.) process is a sequence of random variables that are independent and have the same distribution. We define a new category of stochastic processes called independent and periodically identically distributed (i.p.i.d.) processes:

Definition 1.

Let be a sequence of random variables such that the variable has density . The stochastic process is called independent and periodically identically distributed (i.p.i.d) if are independent and there is a positive integer such that the sequence of densities is periodic with period :

We say that the process is i.p.i.d. with the law .

Note that the law of an i.p.i.d. process is completely characterized by the finite-dimensional product distribution involving . We assume that in a normal regime, the data can be modeled as an i.p.i.d. process. At some point in time, due to an anomaly, the distribution of the i.p.i.d. process deviates from . Our objective in this paper is to develop algorithms that can observe the process in real time and detect changes in the distribution as quickly as possible, subject to a constraint on the rate of false alarms. In Section 3, we define the change point model and develop algorithms and optimality theory for detecting changes in an i.p.i.d. processes. In Section 4, we extend the results to the case when the post-change distribution is unknown. In Section 5, we comment on parametric i.p.i.d. models that are easier to learn as compared to learning .

3 A Change Detection Theory for General i.p.i.d. Processes

Consider another periodic sequence of densities such that

Thus, we essentially have distinct set of densities . We assume that at some point in time , called the change point in the following, the law of the i.p.i.d. process is governed not by the densities , but by the new set of densities (a precise definition is given below). These densities need not be all different from the set of densities , but we assume that there exists at least an such that they are different:

(1)

The change point model is as follows. At a time point , the distribution of the random variable changes from to :

(2)

We emphasize that the densities and are periodic. This model is equivalent to saying that we have two i.p.i.d. processes, one governed by the densities and another governed by the densities , and at the change point , the process switches from one i.p.i.d. process to another. A more general change point model where the exact post-change density is unknown will be discussed in Section 4.

We want to detect the change described in (2) as quickly as possible, subject to a constraint on the rate of false alarms. We are looking for a stopping time for the process to minimize a metric on the delay and to avoid the event of false alarm . Specifically, we are interested in the popular false alarm and delay metrics of Pollak [16] and Lorden [17]. Let

denote the probability law of the process

when the change occurs at time and let denote the corresponding expectation. When there is no change, we use the notation . The quickest change detection problem formulation of Pollak [16] is defined as

(3)

where is a given constraint on the mean time to false alarm. Thus, the objective is to find a stopping time that minimizes the worst case conditional average detection delay subject to a constraint on the mean time to false alarm. A popular alternative is the worst-worst case delay metric of Lorden [17]:

(4)

where is used to denote the supremum of the random variable outside a set of measure zero. Further motivation and comparison of these and other problem formulations for change point detection can be found in the literature [3], [2], [1], [12].

We now propose a CUSUM-type scheme to detect the above change (also see [12]). We compute the sequence of statistics

(5)

and raise an alarm as soon as the statistic is above a threshold :

(6)

We show below that this scheme is asymptotically optimal in a well-defined sense. But, before that we prove an important property that the statistic can be computed recursively and using finite memory. The proof of this and all the other results are provided in Section  7.

Lemma 1.

The statistic sequence can be recursively computed as

(7)

where . Further, since the set of pre- and post-change densities and are finite, the recursion (7) can be computed using finite memory needed to store these densities.

In the rest of the paper, we refer to (7) to as the Periodic-CUSUM algorithm.

Towards proving the optimality of the Periodic-CUSUM scheme, we obtain a universal lower bound on the performance of any stopping time for detecting changes in i.p.i.d. processes. Define

(8)

where

is the Kullback-Leibler divergence between the densities

and . We assume that

and

Theorem 3.1.

Let the information number as defined in (8) satisfy . Then, for any stopping time satisfying the false alarm constraint , we have as

(9)

where an term is one that goes to zero in the limit as .

We now show that the Periodic-CUSUM scheme (5)–(7) is asymptotically optimal for both the formulations (3) and (4).

Theorem 3.2.

Let the information number as defined in (8) satisfy . Then, the Periodic-CUSUM stopping time (5)–(7) with satisfies the false alarm constraint

and as ,

(10)

We note that the algorithm is also optimal for various other formulations studied in the literature [12]. We do not report these here due to a paucity of space.

4 Change Detection With Unknown Post-Change i.p.i.d. Process

In the previous section, we assumed that the post-change law is known to the decision maker. This information was used to design the Periodic-CUSUM algorithm (7). In practice, this information may not be available. We now show that if the post-change law belongs to a finite set of possible distributions, , then an asymptotically optimal test can be designed.

For , define the statistic

(11)

and the stopping rule

(12)

which is the Periodic-CUSUM stopping rule for the th post-change law . Now, define

(13)

Then, note that

(14)

The stopping rule is the stopping rule under which we stop the first time any of the Periodic-CUSUMs raise an alarm.

We now show that this stopping rule is optimal for both Lorden’s and Pollak’s criteria. Towards this end, we define a Shiryaev-Roberts-type statistic

(15)

and a Shiryaev-Roberts-type stopping rule

(16)

Note that

(17)

We have the following theorem.

Theorem 4.1.

The process is a martingale. If then

Further, if is the true post-change i.p.i.d. law and

(18)

then

(19)

Given the lower bound in Theorem 3.1, the stopping rule is thus asymptotically optimal with respect to the criteria of Lorden and Pollak, uniformly over each possible post-change hypothesis , .

The condition

is equivalent to saying that the mean overshoot is finite. This assumption is satisfied for example if the likelihood ratios are bounded or may be satisfied if the LLRs have finite variance. The latter will be verified in a future version of this paper. Note that our statistics are not random walks (but periodic versions of them). As a result, we cannot directly borrow such finiteness results from

[18], for example.

5 Detection in Parametric i.p.i.d. Models

In practice, learning pre- and post-change laws and

can be hard. Thus, it is of interest to study low-dimensional parametric i.p.i.d. models. Such parametric models were the object of our study in

[9] where we assumed that we have a periodic function of parameters with period , and . Another option is to assume that we have a smooth function and

(20)

The batch parameter model studied in [9] is then equivalent to a step approximation to in (20). The change detection problem in this process will be equivalent to detecting a change in the parametric function from to some . The Periodic-CUSUM algorithm can be easily applied to such models, and all the optimality results proved here are valid for the parametric models as well. We refer the readers to [9] for numerical results and application of our algorithms to NYC data. We do not reproduce them here due to a paucity of space.

6 Conclusions and Future Work

We developed a general asymptotic theory for quickest detection of changes in i.p.i.d. models. We also studied the case where the post-change i.p.i.d. law is unknown. In future, we will apply the developed algorithm to real multi-modal data, e.g., as collected in [8] and [9]. We will also study optimality theory for more general change point models in the i.p.i.d. setting.

7 Proofs

Proof of Lemma 1.

For any sequence of random variables, we can write

(21)

Substituting into the above equation we get the desired recursion for in (5):

Note that the increment term is only a function of the current observation . Also, since the processes are i.p.i.d. with laws and , the likelihood ratio functions are not all distinct, and there are only such functions to . Thus, we need only a finite amount of memory to store the past statistic, current observation, and densities to compute this statistic recursively. ∎

Proof of Theorem 3.1.

Let be the log likelihood ratio at time . We show that the sequence satisfies the following statement:

(22)

where is as defined in (8). The lower bound then follows from Theorem 1 in [12]. Towards proving (22), note that as

(23)

The above display is true because of the i.p.i.d. nature of the observation processes. This implies that as

(24)

To show this, note that

(25)

For a fixed , because of (23), the LHS in (24) is greater than for large enough. Also, let the maximum on the LHS be achieved at a point , then

Now cannot be bounded because of the presence of in the denominator. This implies , for any fixed , and . Thus, . Since , we have that the LHS in (24) is less than , for large enough. This proves (24). To prove (22), note that due to the i.p.i.d. nature of the process

(26)

The right hand side goes to zero because of (24) and because the maximum on the right hand side in (26) is over only finitely many terms. ∎

Proof of Theorem 3.2.

Again with , we show that the sequence satisfies the following statement:

(27)

The upper bound then follows from Theorem 4 in [12]. To prove (27), note that due to the i.p.i.d nature of the process we have

(28)

The right hand side of the above equation goes to zero for any because of (23) and also because of the finite number of maximizations. The false alarm result follows directly from [12] with because the likelihood ratios here also form a martingale. ∎

Proof of Theorem 4.1.

That is a martingale can be proved by direct verification. For the false alarm proof, we assume that , otherwise the proof is trivial. Since , we have that is integrable. Further, as ,

(29)

Thus, by the optional sampling theorem [18] and (17) we have

The delay result is true because of (14). ∎

References

  • [1] H. V. Poor and O. Hadjiliadis, Quickest detection. Cambridge University Press, 2009.
  • [2] A. G. Tartakovsky, I. V. Nikiforov, and M. Basseville, Sequential Analysis: Hypothesis Testing and Change-Point Detection. Statistics, CRC Press, 2014.
  • [3] V. V. Veeravalli and T. Banerjee, Quickest Change Detection. Academic Press Library in Signal Processing: Volume 3 – Array and Statistical Signal Processing, 2014. http://arxiv.org/abs/1210.5552.
  • [4] G. Tagaras, “A survey of recent developments in the design of adaptive control charts,” Journal of Quality Technology, vol. 30, pp. 212–231, July 1998.
  • [5] T. Banerjee and V. V. Veeravalli, “Data-efficient quickest change detection in sensor networks,” IEEE Transactions on Signal Processing, vol. 63, no. 14, pp. 3727–3735, 2015.
  • [6] Y. C. Chen, T. Banerjee, A. D. Domínguez-García, and V. V. Veeravalli, “Quickest line outage detection and identification,” IEEE Transactions on Power Systems, vol. 31, pp. 749–758, Jan 2016.
  • [7] T. Banerjee, S. Allsop, K. M. Tye, D. Ba, and V. Tarokh, “Sequential detection of regime changes in neural data,” arXiv preprint arXiv:1809.00358, 2018.
  • [8] T. Banerjee, G. Whipps, P. Gurram, and V. Tarokh, “Sequential event detection using multimodal data in nonstationary environments,” in Proc. of the 21st International Conference on Information Fusion, July 2018.
  • [9] T. Banerjee, G. Whipps, P. Gurram, and V. Tarokh, “Cyclostationary statistical models and algorithms for anomaly detection using multi-modal data,” in Proc. of the 6th IEEE Global Conference on Signal and Information Processing, Nov. 2018.
  • [10]

    Y. Zhang, N. Malem-Shinitski, S. A. Allsop, K. Tye, and D. Ba, “Estimating a separably-markov random field (smurf) from binary observations,”

    Neural Computation, vol. 30, no. 4, pp. 1046–1079, 2018.
  • [11] G. V. Moustakides, “Optimal stopping times for detecting changes in distributions,” Ann. Statist., vol. 14, pp. 1379–1387, Dec. 1986.
  • [12] T. L. Lai, “Information bounds and quick detection of parameter changes in stochastic systems,” IEEE Trans. Inf. Theory, vol. 44, pp. 2917 –2929, Nov. 1998.
  • [13] A. G. Tartakovsky and V. V. Veeravalli, “General asymptotic Bayesian theory of quickest change detection,” SIAM Theory of Prob. and App., vol. 49, pp. 458–497, Sept. 2005.
  • [14] A. G. Tartakovsky, “On asymptotic optimality in sequential changepoint detection: Non-iid case,” IEEE Transactions on Information Theory, vol. 63, no. 6, pp. 3433–3450, 2017.
  • [15] S. Pergamenchtchikov and A. G. Tartakovsky, “Asymptotically optimal pointwise and minimax quickest change-point detection for dependent data,” Statistical Inference for Stochastic Processes, vol. 21, pp. 217–259, Apr 2018.
  • [16] M. Pollak, “Optimal detection of a change in distribution,” Ann. Statist., vol. 13, pp. 206–227, Mar. 1985.
  • [17] G. Lorden, “Procedures for reacting to a change in distribution,” Ann. Math. Statist., vol. 42, pp. 1897–1908, Dec. 1971.
  • [18] M. Woodroofe, Nonlinear Renewal Theory in Sequential Analysis. CBMS-NSF regional conference series in applied mathematics, SIAM, 1982.