Detecting abrupt changes in time-series data has attracted researchers in the statistics and data mining communities for decades Basseville and Nikiforov (1993)
. Based on the instantaneousness of detection, changepoint detection algorithms can be classified into two categories: online changepoint detection and offline changepoint detection. While the online change detection targets on data that requires instantaneous responses, the offline detection algorithm often triggers delay, which leads to more accurate results. This literature review mainly focuses on the online changepoint detection algorithms.
There are plenty of changepoint detection algorithms that have been proposed and proved pragmatic. The pioneering works Basseville and Nikiforov (1993)
compared the probability distributions of time-series samples over the past and present intervals. The algorithm demonstrates an abrupt change when two distributions are significantly different. There are various now-famous algorithms following this approach to detect changepoints, such as the generalized likelihood-ratio methodGustafsson (1996) and the change finder Takeuchi and Yamanishi (2006). Most recently, the subspace methods are proposed, which include subspace identification and Krylov subspace learning Kawahara and Sugiyama (2012).
The aforementioned methods are all considered traditional and rely on pre-designed parametric models, such as the underlying probability distributions, auto-regressive models and state-space models to track specific parametersLiu et al. (2013). As alternatives, several general and ad-hoc model-free methods have been proposed with no specific parametric assumptions Desobry et al. (2005)
. These alternative methods include time-frequency approaches and kernel density estimations. However, a common weakness lies in these algorithms is that they all tend to be less accurate in high-dimensional problems because of the curse of dimensionalityVapnik (1998). To overcome this problem, we introduce a new strategy called the direct density-ratio estimation.
In summary, this survey focuses on the aforementioned changepoint detection methods and discusses how the algorithms work to detect abrupt changes in details. In Section 2, we explore the traditional model-based changepoint detection algorithms. Section 3 compares the traditional algorithms with the alternative model-free changepoint detections. In Section 4, we make conclusions and present some of the future research directions.
2 Model-based Change Detection Algorithms
2.1 Generalized Likelihood Ratio
The generalized likelihood ratio (GLR) test is widely used in detecting abrupt changes in linear systems Gustafsson (1996), which is proposed by Basseville and Nikiforov (1993). As summarized by Kerr (1987)
, the GLR test has an appealing analytical framework that is suitable to those systems with Kalman filters. The test also locates the physical cause of changes when they abruptly occurred.
In a linear state space model, we present the occurrence of abruptly changes by
where the observation is denoted as , the input as , and the state as . Here, , and
are assumed to be Gaussian distributed that are mutually independent. The state jumpoccurs at an unknown instant . is a pulse function that takes the value of one if and takes the value of zero, otherwise. The set of measurements is denoted as .
The likelihood function based on the observations up to time given the jump at time is denoted . The same notation is used for the conditional density function of , where and
are given. The likelihood ratio (LR) test is a multiple hypotheses test, where different jump hypotheses are compared to the no jump null hypothesis in a pairwise manner. In the LR test, the jump magnitude is given. The hypotheses under consideration are
By introducing the log-likelihood ratio for the hypotheses test
the GLR test is a double optimization over and
The jump candidate in the GLR test is rejected (a change point is detected), if
where a certain threshold characterizes the hypothesis test.
2.2 Bayesian Online Changepoint Detection
Using the Bayesian approach to detect the abrupt changes in time series has been well studied. In this section, we summarize the works of Barry and Hartigan (1993), Paquet (2007), Adams and MacKay (2007), and Garnett et al. (2009) to generate a whole picture of the Bayesian approach.
Let be a sequence of observations that is divided into non-overlapping product partitions, where the changepoints are the delineations between these partitions. For each partition , the data within it are assumed to be generated from a probability distribution , while the parameters , are assumed to be as well. Define as the set of observations associated with the run . The Bayesian approach is conducted by estimating the posterior distribution over the current run length (i.e., the length of time since the last changepoint), given the data observed where
The model then computes the predictive distribution conditional on
and integrates over the posterior distribution on the current run length to obtain its marginal predictive distribution. A recursive message-passing algorithm is developed for the joint distribution over the current run length and the data, based on two calculations: 1) the prior overgiven , and 2) the predictive distribution over the newly-observed datum, given the data since the last change point. Furthermore, a recursive algorithm must define not only the recurrence relation but also the initialization conditions. Thus, the prior over the initial run length is the following normalized survival function:
Furthermore, by addressing the whole problem using the conjugate-exponential models, we have
The whole algorithm can be summarized as follows
2.3 The Subspace Methods for Online Changepoint Detection
Detecting changepoints in the time-series data based on the subspace identification needs to employ geometric approaches to estimate the linear state-space model Kawahara et al. (2007). Takeuchi and Yamanishi (2006) proposed a framework in which an autoregressive (AR) model is fitted recursively, thereby solving the problems in non-stationary time series. Accordingly, some new changepoint detection algorithms based on the singular-spectrum analysis (SSA) were proposed by Moskvinz and Zhigljavsky (2003).
Consider a discrete-time wide-sense stationary vector process, which models the signal of the unknown stochastic system as a discrete-time linear state-space system:
is a state vector, and are the system and observation noises respectively, while and are the system matrices. The key problem solved by the subspace identification is the consistent estimation of the column space of the extended observability matrix.
Once the extended observability matrix is obtained, we can derive the system matrices and the Kalman gain by substituting the above equations with
where is an innovation process (the error process of the model) and is the stationary Kalman gain. Thus, we obtain the extended observability matrix as
where the suffix denotes the past and denotes the future and the covariance matrices are computed using the matrices obtained by the factorization, respectively.
A subsequence can be expressed as
where is defined as
Moreover, by aligning the above equation according to the structure of a Hankel metrics
Hence, the subspace spanned by the column vectors of is equivalent to the spans of plus . Then the following distance, which quantifies the gap between subspaces, can be used as a measure of the changepoint in the time-series
where is computed by the SVD of the extended observability matrix , which is estimated by the subspace identification using the data in the reference interval The procedure for change-point detection can be outlined as follows:
3 Alternative Model Free Change Detection Algorithms
3.1 Online Kernel Change Detection Algorithm
In this section, we refer to the famous works written by Desobry et al. (2005) and Harchaoui et al. (2009) to present a general, model-free framework for the online abrupt change detection method called Kernel change detection algorithm. Similar to other model-free techniques, the detection of abrupt changes is based on the descriptors extracted from the signal of interests.
be a time series of independent random variables. The change point detection based on the observed sampleconsists two steps
Decide between : and : there exists such that
Estimate from the sample if is true.
To conduct the kernel changepoint analysis, the running-maximum-partition strategy is employed based on a reproduced kernel Hilbert space. Let be a separable measurable metric space, and be a -valued random variable with probability measure . The expectation with respect to is denoted by while the covariance matrix is denoted by . Consider a reproducing kernel Hilbert space (RKHS) of function , the model makes the following two assumptions on the kernel: 1) the kernel is bounded, i.e. , 2) for all probability distributions , the RKHS associated with is dense in .
An efficient strategy for conducting the changepoint analysis is to select the partition of sample. The partition yields a maximum heterogeneity between a sample and a candidate change point with interval . Assume that we can compute a measure of heterogeneity between the segments as well as the , then the €œrunning-maximum-partition strategy€ consists in using max as a building block for changepoint analysis.
Consider a sequence of independent observations . For any , the corresponding empirical mean elements and covariance operators as follows
For all the maximum kernel Fisher discriminant ratio (KFDR), is defined as
This model applies the running-maximum-partition strategy to obtain the building block of the test statistic for change-point analysis. Define the kernel test statistic
where The quantities and , where
are the normalizing constants for
to have zero-mean and unit-variance astends to infinity. The maximum is searched within the interval with and . The algorithm then yields the result of whether an abrupt change has occurred and where the change has occurred.
3.2 Changepoint Detection by Direct Density Ratio Estimation
The aforementioned model-free changepoint detection algorithms tend to be less accurate in high-dimensional problems because of the curse of dimensionality Vapnik (1998). To solve the problem, we introduce a new strategy called the direct density-ratio estimation, which estimates the ratio of probability densities directly without going through density estimation Liu et al. (2013). Following this idea, models such as the Kullback-Leibler importance estimation procedure (KLIEP) were established Kawahara and Sugiyama (2012).
Let be a dimensional time series sample at time . The goal of this model is to detect whether there exists a changepoint between two consecutive time intervals, which is called the reference and test intervals. Let be the forward subsequence of length at time
Thus, the likelihood ratio of the sequence sample is
are the probability density functions of the reference and test sequence samples, respectively. Letand be the starting points of the reference and test intervals, respectively. Suppose we have and sequence samples in the reference and test intervals. Hence we obtain = + and accordingly, the hypothesis test for this model is given as:
The likelihood ratio between the hypotheses and is
Therefore, the model could decide whether there exits a change point between the reference and test intervals by monitoring the logarithm of the likelihood ratio
Based on the logarithm of the likelihood ratio , the model could detect change occurs if . Thus, we can obtain the density ratio as
The model solves this problem by using the KLIEP. The KLIEP first models the density ratio by using a non-parametric Gaussian kernel model
where are the parameters to be fitted from samples, and is the Gaussian kernel function with mean
in this model are determined such that the empirical Kullback-Leibler divergence fromto (= ) is minimized.
The solution to this problem can be obtained by solving the following convex optimization problem
The equality constraint in the above optimization problem comes from the requirement that should be properly normalized as (= ) ,which is a probability density function. The non-negativity constraint reflects the non-negativity of the density ratio function. After solving this optimization problem by arcane procedures, one can detect the change points in a data series by the following algorithm:
Changepoint detection has always been a subject worth of studying and exploring. There is a flourish of old literature and traditional models devoted to this subject. Throughout these years, more and more new methodologies have been introduced to tackle the abrupt changes in data series. In this literature review, we have summarized a portion of the most famous and effective methods to detect change point. As for future research directions, the academia is now heading to find more methods based on non-parametric model-free algorithms to detect change points, such as the single spectrum method, direct density estimation method, etc.
- Adams and MacKay (2007) Adams, R. P. & MacKay, D. J. C. (2007). Bayesian Online Changepoint Detection. arXiv preprint arXiv:0710.3742.
- Barry and Hartigan (1993) Barry, D., & Hartigan, J. A. (1993). A Bayesian Analysis for Change Point Problems. Journal of the American Statistical Association 88, 309–319.
- Basseville and Nikiforov (1993) Basseville, M., and Nikiforov, I. V. (1993). Detection of Abrupt Changes: Theory and Application. Englewood Cliffs: Prentice Hall.
Chernoff and Zacks (1964)
Chernoff, H., and Zacks, S. (1964). Estimating the Current Mean of a Normal Distribution which is Subjected to Changes in Time.The Annals of Mathematical Statistics 35, 999–1018.
- Chopin (2007) Chopin, N. (2007). Dynamic Detection of Change Points in Long Time Series. Annals of the Institute of Statistical Mathematics 59, 349–366.
- Desobry et al. (2005) Desobry, F., Davy, M., and Doncarli, C. (2005). An Online Kernel Change Detection Algorithm. IEEE Trans. Signal Processing 53, 2961–2974.
- Fearnhead et al. (2007) Fearnhead, P. and Liu, Z. (2007). Online Inference for Multiple Changepoint Problems. Journal of the Royal Statistical Society: Series B 69, 589–605.
Garnett et al. (2009)
Garnett, R., Osborne, M. A., and Roberts, S. J. (2009).
Sequential Bayesian Prediction in The Presence of Changepoints.
In Proceedings of The 26th Annual International Conference on Machine Learning, 345–352.
- Lee et al. (2009)
- Guralnik and Srivastava (1999) Guralnik, V., and Srivastava, J. (1999). Event Detection from Time Series Data. In Proceedings of The fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 33–42.
- Gustafsson (1996) Gustafsson, F. (1996). The Marginalized Likelihood Ratio Test for Detecting Abrupt Changes. IEEE Transactions on Automatic Control 41, 66–78.
- Harchaoui et al. (2009) Harchaoui, Z., Moulines, E., and Bach, F. R. (2009). Kernel Changepoint Analysis. In Advances in Neural Information Processing Systems, 609–616.
- IdÃ© and Tsuda (2007) IdÃ©, T., and Tsuda, K. (2007). Change-point Detection Using Krylov Subspace Learning. In Proceedings of the 2007 SIAM International Conference on Data Mining, 515–520.
- Kadirkamanathan et al. (2002) Kadirkamanathan, V., Li, P., Jaward, M. H., and Fabri, S. G. (2002). Particle Filtering-based Fault Detection in Non-linear Stochastic Systems. International Journal of Systems Science 33, 259–265.
- Kawahara et al. (2007) Kawahara, Y., Yairi, T., and Machida, K. (2007). Changepoint Detection in Time-series Data based on Subspace Identification. In Seventh IEEE International Conference on Data Mining (ICDM 2007). 559–564.
Kawahara and Sugiyama (2012)
Kawahara, Y., and Sugiyama, M. (2012). Sequential Changepoint Detection based on Direct Density Ratio Estimation.
Statistical Analysis and Data Mining: The ASA Data Science Journal5, 114–127.
- Kerr (1987) Kerr, T. (1987). Decentralized Filtering and Redundancy Management for Multisensor Navigation. IEEE Transactions on Aerospace and Electronic Systems 1, 83–119.
- Laurent and Doncarli (1998) Laurent, H., and Doncarli, C. (1998). Stationarity Index for Abrupt Changes Detection in The Time Frequency Plane. IEEE Signal Processing Letters 5, 43–45.
- Liu et al. (2013) Liu, S., Yamada, M., Collier, N., and Sugiyama, M. (2013). Changepoint Detection in Time Series Data by Relative Density Ratio Estimation. Neural Networks 43, 72–83.
- Moskvinz and Zhigljavsky (2003) Moskvina, V., and Zhigljavsky, A. (2003). An Algorithm based on Singular Spectrum Analysis for Changepoint Detection. Communications in Statistics-Simulation and Computation 32, 319–352.
- Paquet (2007) Paquet, U. (2007). Empirical Bayesian Changepoint Detection. Graphical Models, 1–20.
- Reeves et al. (2007) Reeves, J., Chen, J., Wang, X. L., Lund, R., and Lu, Q. Q. (2007). A Review and Comparison of Changepoint Detection Techniques for Climate Data. Journal of Applied Meteorology and Climatology 46, 900–915.
Takeuchi and Yamanishi (2006)
Takeuchi, J. I., and Yamanishi, K. (2006). A Unifying Framework for Detecting Outliers and Change Points from Time Series.IEEE Transactions on Knowledge and Data Engineering 18, 482–492.
- Turner et al. (2009) Turner, R., Saatci, Y., and Rasmussen, C. E. (2009). Adaptive Sequential Bayesian Changepoint Detection. In Temporal Segmentation Workshop at NIPS.
- Vapnik (1998) Vapnik, V. (1998). The Support Vector Method of Function Estimation. In Nonlinear Modeling, 55–85. Springer, Boston, MA.
- Yamanishi and Takeuchi (2002) Yamanishi, K., and Takeuchi, J. I. (2002). A Unifying Framework for Detecting Outliers and Change Points from Non-stationary Time Series Data. In Proceedings of The Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 676–681.
Yamanishi et al. (2004)
Yamanishi, K., Takeuchi, J. I., Williams, G., and Milne, P. (2004). Online Unsupervised Outlier Detection Using Finite Mixtures with Discounting Learning Algorithms.Data Mining and Knowledge Discovery 8, 275–300.