1 Introduction
Detecting abrupt changes in timeseries data has attracted researchers in the statistics and data mining communities for decades Basseville and Nikiforov (1993)
. Based on the instantaneousness of detection, changepoint detection algorithms can be classified into two categories: online changepoint detection and offline changepoint detection. While the online change detection targets on data that requires instantaneous responses, the offline detection algorithm often triggers delay, which leads to more accurate results. This literature review mainly focuses on the online changepoint detection algorithms.
There are plenty of changepoint detection algorithms that have been proposed and proved pragmatic. The pioneering works Basseville and Nikiforov (1993)
compared the probability distributions of timeseries samples over the past and present intervals. The algorithm demonstrates an abrupt change when two distributions are significantly different. There are various nowfamous algorithms following this approach to detect changepoints, such as the generalized likelihoodratio method
Gustafsson (1996) and the change finder Takeuchi and Yamanishi (2006). Most recently, the subspace methods are proposed, which include subspace identification and Krylov subspace learning Kawahara and Sugiyama (2012).The aforementioned methods are all considered traditional and rely on predesigned parametric models, such as the underlying probability distributions, autoregressive models and statespace models to track specific parameters
Liu et al. (2013). As alternatives, several general and adhoc modelfree methods have been proposed with no specific parametric assumptions Desobry et al. (2005). These alternative methods include timefrequency approaches and kernel density estimations. However, a common weakness lies in these algorithms is that they all tend to be less accurate in highdimensional problems because of the curse of dimensionality
Vapnik (1998). To overcome this problem, we introduce a new strategy called the direct densityratio estimation.In summary, this survey focuses on the aforementioned changepoint detection methods and discusses how the algorithms work to detect abrupt changes in details. In Section 2, we explore the traditional modelbased changepoint detection algorithms. Section 3 compares the traditional algorithms with the alternative modelfree changepoint detections. In Section 4, we make conclusions and present some of the future research directions.
2 Modelbased Change Detection Algorithms
2.1 Generalized Likelihood Ratio
The generalized likelihood ratio (GLR) test is widely used in detecting abrupt changes in linear systems Gustafsson (1996), which is proposed by Basseville and Nikiforov (1993). As summarized by Kerr (1987)
, the GLR test has an appealing analytical framework that is suitable to those systems with Kalman filters. The test also locates the physical cause of changes when they abruptly occurred.
In a linear state space model, we present the occurrence of abruptly changes by
where the observation is denoted as , the input as , and the state as . Here, , and
are assumed to be Gaussian distributed that are mutually independent. The state jump
occurs at an unknown instant . is a pulse function that takes the value of one if and takes the value of zero, otherwise. The set of measurements is denoted as .The likelihood function based on the observations up to time given the jump at time is denoted . The same notation is used for the conditional density function of , where and
are given. The likelihood ratio (LR) test is a multiple hypotheses test, where different jump hypotheses are compared to the no jump null hypothesis in a pairwise manner. In the LR test, the jump magnitude is given. The hypotheses under consideration are
By introducing the loglikelihood ratio for the hypotheses test
the GLR test is a double optimization over and
The jump candidate in the GLR test is rejected (a change point is detected), if
where a certain threshold characterizes the hypothesis test.
2.2 Bayesian Online Changepoint Detection
Using the Bayesian approach to detect the abrupt changes in time series has been well studied. In this section, we summarize the works of Barry and Hartigan (1993), Paquet (2007), Adams and MacKay (2007), and Garnett et al. (2009) to generate a whole picture of the Bayesian approach.
Let be a sequence of observations that is divided into nonoverlapping product partitions, where the changepoints are the delineations between these partitions. For each partition , the data within it are assumed to be generated from a probability distribution , while the parameters , are assumed to be as well. Define as the set of observations associated with the run . The Bayesian approach is conducted by estimating the posterior distribution over the current run length (i.e., the length of time since the last changepoint), given the data observed where
The model then computes the predictive distribution conditional on
and integrates over the posterior distribution on the current run length to obtain its marginal predictive distribution. A recursive messagepassing algorithm is developed for the joint distribution over the current run length and the data, based on two calculations: 1) the prior over
given , and 2) the predictive distribution over the newlyobserved datum, given the data since the last change point. Furthermore, a recursive algorithm must define not only the recurrence relation but also the initialization conditions. Thus, the prior over the initial run length is the following normalized survival function:Furthermore, by addressing the whole problem using the conjugateexponential models, we have
The whole algorithm can be summarized as follows
2.3 The Subspace Methods for Online Changepoint Detection
Detecting changepoints in the timeseries data based on the subspace identification needs to employ geometric approaches to estimate the linear statespace model Kawahara et al. (2007). Takeuchi and Yamanishi (2006) proposed a framework in which an autoregressive (AR) model is fitted recursively, thereby solving the problems in nonstationary time series. Accordingly, some new changepoint detection algorithms based on the singularspectrum analysis (SSA) were proposed by Moskvinz and Zhigljavsky (2003).
Consider a discretetime widesense stationary vector process
, which models the signal of the unknown stochastic system as a discretetime linear statespace system:is a state vector, and are the system and observation noises respectively, while and are the system matrices. The key problem solved by the subspace identification is the consistent estimation of the column space of the extended observability matrix.
Once the extended observability matrix is obtained, we can derive the system matrices and the Kalman gain by substituting the above equations with
where is an innovation process (the error process of the model) and is the stationary Kalman gain. Thus, we obtain the extended observability matrix as
where the suffix denotes the past and denotes the future and the covariance matrices are computed using the matrices obtained by the factorization, respectively.
A subsequence can be expressed as
where is defined as
Moreover, by aligning the above equation according to the structure of a Hankel metrics
Hence, the subspace spanned by the column vectors of is equivalent to the spans of plus . Then the following distance, which quantifies the gap between subspaces, can be used as a measure of the changepoint in the timeseries
where is computed by the SVD of the extended observability matrix , which is estimated by the subspace identification using the data in the reference interval The procedure for changepoint detection can be outlined as follows:
3 Alternative Model Free Change Detection Algorithms
3.1 Online Kernel Change Detection Algorithm
In this section, we refer to the famous works written by Desobry et al. (2005) and Harchaoui et al. (2009) to present a general, modelfree framework for the online abrupt change detection method called Kernel change detection algorithm. Similar to other modelfree techniques, the detection of abrupt changes is based on the descriptors extracted from the signal of interests.
Let
be a time series of independent random variables. The change point detection based on the observed sample
consists two steps
Decide between : and : there exists such that

Estimate from the sample if is true.
To conduct the kernel changepoint analysis, the runningmaximumpartition strategy is employed based on a reproduced kernel Hilbert space. Let be a separable measurable metric space, and be a valued random variable with probability measure . The expectation with respect to is denoted by while the covariance matrix is denoted by . Consider a reproducing kernel Hilbert space (RKHS) of function , the model makes the following two assumptions on the kernel: 1) the kernel is bounded, i.e. , 2) for all probability distributions , the RKHS associated with is dense in .
An efficient strategy for conducting the changepoint analysis is to select the partition of sample. The partition yields a maximum heterogeneity between a sample and a candidate change point with interval . Assume that we can compute a measure of heterogeneity between the segments as well as the , then the €œrunningmaximumpartition strategy€ consists in using max as a building block for changepoint analysis.
Consider a sequence of independent observations . For any , the corresponding empirical mean elements and covariance operators as follows
For all the maximum kernel Fisher discriminant ratio (KFDR), is defined as
This model applies the runningmaximumpartition strategy to obtain the building block of the test statistic for changepoint analysis. Define the kernel test statistic
where The quantities and , where
are the normalizing constants for
to have zeromean and unitvariance as
tends to infinity. The maximum is searched within the interval with and . The algorithm then yields the result of whether an abrupt change has occurred and where the change has occurred.3.2 Changepoint Detection by Direct Density Ratio Estimation
The aforementioned modelfree changepoint detection algorithms tend to be less accurate in highdimensional problems because of the curse of dimensionality Vapnik (1998). To solve the problem, we introduce a new strategy called the direct densityratio estimation, which estimates the ratio of probability densities directly without going through density estimation Liu et al. (2013). Following this idea, models such as the KullbackLeibler importance estimation procedure (KLIEP) were established Kawahara and Sugiyama (2012).
Let be a dimensional time series sample at time . The goal of this model is to detect whether there exists a changepoint between two consecutive time intervals, which is called the reference and test intervals. Let be the forward subsequence of length at time
Thus, the likelihood ratio of the sequence sample is
where and
are the probability density functions of the reference and test sequence samples, respectively. Let
and be the starting points of the reference and test intervals, respectively. Suppose we have and sequence samples in the reference and test intervals. Hence we obtain = + and accordingly, the hypothesis test for this model is given as:The likelihood ratio between the hypotheses and is
Therefore, the model could decide whether there exits a change point between the reference and test intervals by monitoring the logarithm of the likelihood ratio
Based on the logarithm of the likelihood ratio , the model could detect change occurs if . Thus, we can obtain the density ratio as
The model solves this problem by using the KLIEP. The KLIEP first models the density ratio by using a nonparametric Gaussian kernel model
where are the parameters to be fitted from samples, and is the Gaussian kernel function with mean
The parameters
in this model are determined such that the empirical KullbackLeibler divergence from
to (= ) is minimized.The solution to this problem can be obtained by solving the following convex optimization problem
The equality constraint in the above optimization problem comes from the requirement that should be properly normalized as (= ) ,which is a probability density function. The nonnegativity constraint reflects the nonnegativity of the density ratio function. After solving this optimization problem by arcane procedures, one can detect the change points in a data series by the following algorithm:
4 Conclusion
Changepoint detection has always been a subject worth of studying and exploring. There is a flourish of old literature and traditional models devoted to this subject. Throughout these years, more and more new methodologies have been introduced to tackle the abrupt changes in data series. In this literature review, we have summarized a portion of the most famous and effective methods to detect change point. As for future research directions, the academia is now heading to find more methods based on nonparametric modelfree algorithms to detect change points, such as the single spectrum method, direct density estimation method, etc.
References
 Adams and MacKay (2007) Adams, R. P. & MacKay, D. J. C. (2007). Bayesian Online Changepoint Detection. arXiv preprint arXiv:0710.3742.
 Barry and Hartigan (1993) Barry, D., & Hartigan, J. A. (1993). A Bayesian Analysis for Change Point Problems. Journal of the American Statistical Association 88, 309–319.
 Basseville and Nikiforov (1993) Basseville, M., and Nikiforov, I. V. (1993). Detection of Abrupt Changes: Theory and Application. Englewood Cliffs: Prentice Hall.

Chernoff and Zacks (1964)
Chernoff, H., and Zacks, S. (1964). Estimating the Current Mean of a Normal Distribution which is Subjected to Changes in Time.
The Annals of Mathematical Statistics 35, 999–1018.  Chopin (2007) Chopin, N. (2007). Dynamic Detection of Change Points in Long Time Series. Annals of the Institute of Statistical Mathematics 59, 349–366.
 Desobry et al. (2005) Desobry, F., Davy, M., and Doncarli, C. (2005). An Online Kernel Change Detection Algorithm. IEEE Trans. Signal Processing 53, 2961–2974.
 Fearnhead et al. (2007) Fearnhead, P. and Liu, Z. (2007). Online Inference for Multiple Changepoint Problems. Journal of the Royal Statistical Society: Series B 69, 589–605.

Garnett et al. (2009)
Garnett, R., Osborne, M. A., and Roberts, S. J. (2009).
Sequential Bayesian Prediction in The Presence of Changepoints.
In Proceedings of The 26th Annual International Conference on Machine Learning
, 345–352. 
Lee et al. (2009)
Lee, H., Grosse, R., Ranganath, R., and Ng, A. Y. (2009). Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations.
In Proceedings of the 26th annual international conference on machine learning, 609–616.  Guralnik and Srivastava (1999) Guralnik, V., and Srivastava, J. (1999). Event Detection from Time Series Data. In Proceedings of The fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 33–42.
 Gustafsson (1996) Gustafsson, F. (1996). The Marginalized Likelihood Ratio Test for Detecting Abrupt Changes. IEEE Transactions on Automatic Control 41, 66–78.
 Harchaoui et al. (2009) Harchaoui, Z., Moulines, E., and Bach, F. R. (2009). Kernel Changepoint Analysis. In Advances in Neural Information Processing Systems, 609–616.
 IdÃ© and Tsuda (2007) IdÃ©, T., and Tsuda, K. (2007). Changepoint Detection Using Krylov Subspace Learning. In Proceedings of the 2007 SIAM International Conference on Data Mining, 515–520.
 Kadirkamanathan et al. (2002) Kadirkamanathan, V., Li, P., Jaward, M. H., and Fabri, S. G. (2002). Particle Filteringbased Fault Detection in Nonlinear Stochastic Systems. International Journal of Systems Science 33, 259–265.
 Kawahara et al. (2007) Kawahara, Y., Yairi, T., and Machida, K. (2007). Changepoint Detection in Timeseries Data based on Subspace Identification. In Seventh IEEE International Conference on Data Mining (ICDM 2007). 559–564.

Kawahara and Sugiyama (2012)
Kawahara, Y., and Sugiyama, M. (2012). Sequential Changepoint Detection based on Direct Density Ratio Estimation.
Statistical Analysis and Data Mining: The ASA Data Science Journal
5, 114–127.  Kerr (1987) Kerr, T. (1987). Decentralized Filtering and Redundancy Management for Multisensor Navigation. IEEE Transactions on Aerospace and Electronic Systems 1, 83–119.
 Laurent and Doncarli (1998) Laurent, H., and Doncarli, C. (1998). Stationarity Index for Abrupt Changes Detection in The Time Frequency Plane. IEEE Signal Processing Letters 5, 43–45.
 Liu et al. (2013) Liu, S., Yamada, M., Collier, N., and Sugiyama, M. (2013). Changepoint Detection in Time Series Data by Relative Density Ratio Estimation. Neural Networks 43, 72–83.
 Moskvinz and Zhigljavsky (2003) Moskvina, V., and Zhigljavsky, A. (2003). An Algorithm based on Singular Spectrum Analysis for Changepoint Detection. Communications in StatisticsSimulation and Computation 32, 319–352.
 Paquet (2007) Paquet, U. (2007). Empirical Bayesian Changepoint Detection. Graphical Models, 1–20.
 Reeves et al. (2007) Reeves, J., Chen, J., Wang, X. L., Lund, R., and Lu, Q. Q. (2007). A Review and Comparison of Changepoint Detection Techniques for Climate Data. Journal of Applied Meteorology and Climatology 46, 900–915.

Takeuchi and Yamanishi (2006)
Takeuchi, J. I., and Yamanishi, K. (2006). A Unifying Framework for Detecting Outliers and Change Points from Time Series.
IEEE Transactions on Knowledge and Data Engineering 18, 482–492.  Turner et al. (2009) Turner, R., Saatci, Y., and Rasmussen, C. E. (2009). Adaptive Sequential Bayesian Changepoint Detection. In Temporal Segmentation Workshop at NIPS.
 Vapnik (1998) Vapnik, V. (1998). The Support Vector Method of Function Estimation. In Nonlinear Modeling, 55–85. Springer, Boston, MA.
 Yamanishi and Takeuchi (2002) Yamanishi, K., and Takeuchi, J. I. (2002). A Unifying Framework for Detecting Outliers and Change Points from Nonstationary Time Series Data. In Proceedings of The Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 676–681.

Yamanishi et al. (2004)
Yamanishi, K., Takeuchi, J. I., Williams, G., and Milne, P. (2004). Online Unsupervised Outlier Detection Using Finite Mixtures with Discounting Learning Algorithms.
Data Mining and Knowledge Discovery 8, 275–300.
Comments
There are no comments yet.