1 Introduction
Conformal Martingales (martingales based on the conformal prediction framework, see Vovk et al. 2005, Section 7.1) are known as a valid tool for testing the exchangeability and i.i.d. assumptions. They were proposed in Vovk et al. (2003) and later generalized in Fedorova et al. (2012).
One of rather widespread examples of noni.i.d. data is data with ChangePoints (CPs) (see Basseville and Nikiforov 1993; Tartakovsky et al. 2014
): we assume an online scheme of observations, such that before some moment of time (changepoint) observations are i.i.d., and after it observations are also i.i.d., but with some other distribution. Thus, overall observations are not i.i.d. This is the reason why application of Conformal Martingales (CMs) to CP detection is possible.
CP detection problems span many applied areas and include automatic video surveillance based on motion features (Pham et al., 2014), intrusion detection in computer networks (Tartakovsky et al., 2006), anomaly detection in data transmission networks (Casas et al., 2010), anomaly detection for malicious activity (Burnaev et al., 2015a, b; Burnaev and Smolyakov, 2016), changepoint detection in softwareintensive systems (Artemov et al., 2016; Artemov and Burnaev, 2016a, b), fault detection in vehicle control systems (Malladi and Speyer, 1999; Alestra et al., 2014), detection of onset of an epidemic (MacNeill and Mao, 1995), drinking water monitoring (Guépié et al., 2012) and many others.
Standard statistics for changepoint detections, such as Cumulative Sum (CUSUM, Page 1954) and ShiryaevRoberts (SR, Shiryaev 1963, Roberts 1966), have very strong assumptions about data distributions both before and after the changepoint. Usually in practice we do not know the changepoint model.
The first attempt to use CMs for changepoint detection was made in Ho (2005). However, only two different martingale tests were considered for CP detection.
CM is defined by two main components: a conformity measure (CM) and a betting function (Vovk et al., 2003; Fedorova et al., 2012). Nowadays there exist different approaches to define conformity measures and betting functions. Thus, a whole zoo of CMs for CP detection can be constructed.
Therefore, the goal of our work is to

propose different versions of CMs for CP detection, based on available as well as newly designed conformity measures and betting functions, specially tailored for CP detection;

perform extensive comparison of these CMs with classical CP detection procedures.
As classical CP detection procedures we consider CUSUM, ShiryaevRoberts and Posterior Probability statistics. Also we perform comparison with CP detection oracles, which model a typical practical situation when we have only imprecise information about pre and postchange data distributions. CP detection statistics, considered in the comparison, enjoy different information about statistical characteristics of data and CP models.
Comparison is performed on simulated data, corresponding to a classical CP model (Basseville and Nikiforov, 1993):

i.i.d. Gaussian white noise signal,

as a CP we consider change in the mean from zero initial level.
The results of our statistical analyses clearly show that in terms of mean time delay until CP detection for the same level of false alarms CMs are comparable with CP detection oracles and are not significantly worse than optimal CP detection statistics (requiring full information about CP model). At the same time, opposed to classical CP detection statistics, CP detection based on CMs is nonparametric and can be applied in the wild without significant parameter tuning both in case of onedimensional and multidimensional data streams.
The paper is organized as follows. In Section 2 we describe CMs. In Section 3 we consider quickest CP detection problem statement and describe optimal CP detection statistics, as well as CP detection approaches based on CMs, defined by different conformity measures and betting functions. In Section 4 we consider CP detection oracles. In Section 5 we describe a protocol of experiments and provide results of simulations. We list conclusions in Section 6.
2 Conformal Martingales
First we describe Conformal Prediction framework (Vovk et al., 2005), which can be regarded as a tool, satisfying some natural properties of validity, for measuring the strangeness of observations.
2.1 NonConformity measures and pvalues
Let us denote by
a sequence of observations, where each observation is represented as a vector in some vector space. Our first goal is to test whether the new observation
fits the previously observed observations . In other words, we want to measure how strange is compared to other observations. For this purpose, we use the Conformal Prediction framework (Vovk et al., 2005). The first step is the definition of a nonconformity measure, which is a functionmapping pairs consisting of an observation and a finite multiset of observations to a real number with the following meaning: the greater this value is, the stranger is relative to . As a simple example, one can consider the Nearest Neighbors conformity measure, where is the average distance from to its nearest neighbors in .
The second step in the Conformal Prediction framework is the definition of the pvalue for the observation :
(1) 
where is a random number in independent of , and the nonconformity scores (including ) are defined by
(2) 
i.e., the pvalue for the observation is defined, roughly, as the fraction of observations that have nonconformity scores greater than or equal to the nonconformity score . Intuitively the smaller pvalue is, the stranger the observation is.
If observations satisfy the i.i.d. assumption, the pvalues
are independent and uniformly distributed in
.The statement of Theorem 2.1 (proved in Vovk et al. 2003) provides grounds for CP detection:

observations are i.i.d.;

are also i.i.d.;

in the case , all the observations are i.i.d., and therefore CMs couldn’t be used for detecting a CP;

since at the distribution changes, the corresponding pvalues are not i.i.d. uniform in for .
We use this fact for constructing CMs for CP detection.
2.2 Definition of Exchangeability Martingales
Given a sequence of random vectors taking values in some observation space
, the joint probability distribution of
for a finite is exchangeableif it is invariant under any permutation of these random vectors. The joint distribution of the infinite sequence of random vectors
is exchangeable if the marginal distribution of is exchangeable for every . By de Finetti’s theorem, every exchangeable distribution is a mixture of power distributions (i.e., distributions under which the sequence is i.i.d.).A test exchanegeability martingale
is a sequence of nonnegative random variables
such thatwhere is the expectation w.r.t. any exchangeable distribution (equaivalently, any power) on observations. According to Ville’s inequality (Ville, 1939), in this case
under any exchangeable distribution. If the final value of the martingale is large, we can reject the i.i.d. (equivalently, exchangeability) assumption with the corresponding probability. In the next section we define a way to transform pvalues (1) into test exchangeability martingales. An exchangeability martingale is defined similarly but dropping the requirements that should be nonnegative and that .
2.3 Constructing Exchangeability Martingales from pvalues
Given a sequence of pvalues, we consider a martingale of the form
(3) 
where each is a betting function required to satisfy the condition . We can easily verify the martingale property under any exchangeable distribution:
Test exchangeability martingale of the form (3) are conformal martingales. (It is interesting whether there are any other test exchangeability martingales apart from the conformal martingales.)
3 Quickest ChangePoint detection
3.1 Problem statement
We observe sequentially a series of independent observations whose distribution changes from to at some unknown point in time. Formally, are independent random variables such that are each distributed according to a distribution and are each distributed according to a distribution , where is unknown. The objective is to detect that a change has taken place “as soon as possible” after its occurrence, subject to a restriction on the rate of false detections.
Historically, the subject of changepoint detection first began to emerge in the 19201930’s motivated by considerations of quality control. When a process is “in control,” observations are distributed according to . At an unknown point , the process jumps “out of control” and subsequent observations are distributed according to . We want to raise an alarm “as soon as possible” after the process jumps “out of control”.
Current approaches to changepoint detection were initiated by the pioneering work of Page (1954). In order to detect a change in a normal mean from to he proposed the following stopping rule : stop and declare the process to be “out of control” as soon as gets large, where and is suitably chosen. This and related procedures are known as CUSUM (cumulative sum) procedures (see Shiryaev 2010 for a survey).
There are different approaches how to formalize a restriction on false detections as well as to formalize the objective of detecting a change “as soon as possible” after its occurrence. The restriction on false detections is usually formalized either as a rate restriction on stopping rule , according to which we stop our observations and declare the process to be “out of control”, or a probability restriction. The rate restriction is usually formalized by a requirement that , the probability restriction is usually formalized by a requirement that for all . The objective of detecting a change “as soon as possible” after its occurrence is usually formalized in terms of functionals of (Shiryaev, 2010).
3.2 Optimal approaches to ChangePoint detection
Let us describe main optimal statistics for CP detection. The main assumption here is that a known probability density of observations changes to another known probability density of observations at some unknown point . We denote by
(4) 
the likelihood of observations when , and by
(5) 
the likelihood of observations without CP.
Shiryaev (1963) solved the CP detection problem in a Bayesian framework. As prior on the distribution is used, i.e., ,
. A loss function has the form
, where and . Shiryaev showed that it is optimal to stop observations as soon as the posterior probability of a change exceeds a fixed level , i.e., , where(6) 
In the nonBayesian (minimax) setting of the problem, the objective is to minimize the expected detection delay for some worstcase changetime distribution, subject to a cost or constraint on false alarms. Here the classical optimality result is due to Lorden, Ritov and Moustakides (Lorden, 1971; Moustakides, 1986; Ritov, 1990). They evaluate the speed of detection by under the restriction that the stopping rules must satisfy . In fact from results of Lorden, Ritov and Moustakides it follows that Page’s aforementioned stopping rule, which takes the form with
(7) 
is optimal.
Pollak (1985; 1987) considered another nonBayesian setting: the speed of detection is evaluated by under the same restriction on the stopping rules, i.e. must satisfy . Pollak proved that the socalled ShiryaevRoberts statistics (Shiryaev, 1963; Roberts, 1966) is asymptotically () minimax. The corresponding stopping rule has the form with
(8) 
As usual we select parameter for stopping moments , and in such a way that these stopping moments fulfill the corresponding restrictions on false detections.
3.3 Adaptation of Conformal Martingales for ChangePoint detection problem
Let us describe a modification of Conformal Martingales tailored for the CP detection problem:

Instead of CMs we use their computationally efficient modification that we call Inductive Conformal Martingales (ICMs). The main difference of ICMs from CMs is that to compute a nonconformity measure we use some fixed initial training set , i.e., each time we receive a new observation we compute the nonconformity score according to the formula
(cf. original CMs where are defined by (2)). Intuitively, we fix some training set and evaluate to which extent new observations are strange w.r.t. this training set. This approach allows us to speed up computations without destroying the validity (see also section 3.4): one should not recompute all nonconformity scores at each iteration. Another advantage is the possibility of parallelization in the batch mode, i.e., when we receive observations in batches.

One drawback of the original CMs, from the point of view of the performance measures adopted in this paper, is that in the case of i.i.d. observations CMs decrease to almost zero values with time. As a result, since CMs are represented as a product of betting functions (see (3)), it takes CMs a lot of time to recover from zero to some significant value when “strange” observations appear. In order to deal with this problem we introduce
(9) where , is a pvalue, and is a betting function. On each iteration we cut the logarithm of the martingale. This modification performs better in terms of the mean delay until CP detection.
The complete procedure is summarized in Algorithm 1. The stopping rule for CP detection has the form , where is the modification of the corresponding CM, calculated according to (9). Notice that
An example of the martingale is given in Fig. 1
. Here we consider observations from a normal distribution with a unit variance, such that at
its mean changes from to . We use two nonconformity measures: Nearest Neighbor NonConformity Measure (1NN NCM) and Likelihood Ratio NonConformity Measure (LR NCM), which are described in section 3.5.3.4 Validity
Let us check empirically that our method is valid for small values of train set size (the theoretical validity is lost because of the transition from to ). For this purpose we generate observations from without CP and with CP (mean changes from to at ). Here is a value at point of a normal density with mean and variance . We use Nearest Neighbor nonconformity measure (see section 3.5 below). We plot ICM for train set sizes in Fig. 2. Results of simulations, provided in Fig. 2, confirm the validity of our approach.
3.5 NonConformity Measures
Let us describe nonconformity measures that we use:

Nearest Neighbors NonConformity Measure (kNN NCM). kNN NCM is computed as the average distance to
nearest neighbors. The advantage of this NCM is that it doesn’t depend on any assumptions and can be used in a multidimensional case; 
Likelihood Ratio NonConformity Measure (LR NCM). One way or another the classical CP detection algorithms (see section 3.2) are based on a likelihood ratio. Thus it is worth to consider LR NCM. If we denote by and probability density functions before and after the CP, then a reasonable LR NCM would be
However, we rarely know , , exactly. Thus, we should somehow model this lack of information. We assume that , , belong to some parametric class of densities, i.e., , where ,
are vectors of parameters. We estimate the value of
by some using the training set . We also impose some prior on the parameter , i.e., we model the data distribution after the CP by . As a result LR NCM has the form(10) E.g., in the onedimensional case for and we get that
(11) where .
3.6 Betting Functions
Let us describe Betting Functions that we use:

Mixture Betting Function was proposed in the very first work on testing exchangeability (Vovk et al., 2003). It doesn’t depend on the previous pvalues and has the form

Constant Betting Function. We split the interval into two parts at the point . We expect pvalues to be small if observations are strange:

Kernel Density Betting Function has the form
Here
is a ParzenRosenblatt kernel density estimate
(Rosenblatt et al., 1956) based on the previous pvalues , being a window size. We use a Gaussian kernel. Since pvalues are in , then to reduce boundary effects we reflect the pvalues to the left of zero and to the right of one, construct the density estimate, crop its support back to and normalize. Fedorova et al. (2012) prove that such an approach provides an asymptotically better growth rate of the exchangeability martingale than any martingale with a fixed betting function. The corresponding martingale is also called the plugin martingale. Let us note that for quicker CP detection we use not all pvalues, but only last of them: . Increasing usually results in an increase of the mean delay, because after the CP we need to collect more observations to estimate the new distribution of pvalues correctly. 
Precomputed Kernel Density Betting Function. To deal with the problem of high mean delay until CP detection, we propose to estimate the kernel density of pvalues before constructing any martingale. For this purpose, we have to learn the betting function using some finite length realization of , containing an example of a typical CP, and some training set . In other words, the realization should contain some CP with position and intensity resembling those of real CPs (say within the accuracy of order of magnitude) we are going to detect while applying the corresponding CM. Particular values of these parameters are specified in experimental Section 5. We compute pvalues using (1) as in Algorithm 1. Using them we construct a kernel estimate of pvalues density. Further we assume that for data of the same nature pvalues will be distributed in a similar way, so we can use this precomputed kernel density betting function for new data realizations. Thus, thanks to the precomputed estimate we can

Detect CP faster;

Speedup computations (we don’t need to reconstruct density of pvalues for each position of the sliding window).

4 Oracles for ChangePoint detection
In the current section we describe Oracles for CP detection that we compare with CP detection based on Conformal Martingales.
4.1 Motivation to use Oracles
First we explain why we need to compare CP detection based on CMs with CP detection Oracles:

Classical CP detection statistics are optimal in terms of the mean delay (subject to a restriction on the rate of false detections) if data distributions before and after the CP are known. There is no need for them to learn the distributions and before and after the CP.

CMs are designed to solve another problem. As far as their validity is concerned, they assume nothing about the distributions , . They have to learn the distribution before the CP in order to detect a change.

The profound difference between the classical setting and the adaptive setting dealt with in conformal prediction can be seen clearly if instead of the problem of quickest CP detection we consider the related problem of gambling (formalized by constructing a test martingale) against the null hypothesis (
in the case of classical statistics and i.i.d. in the case of conformal prediction) in the presence of a CP. In the classical case, the growth rate of the optimal test martingale (likelihood ratio) will be exponential since the null hypothesis is simple, whereas in the i.i.d. case after an initial period of nearly exponential growth the growth rate will slow down as we start learning that is much closer to being the datagenerating distribution than is. 
Thus for a fair comparison we should compare CP detection based on CMs not with CP detectors from Section 3.2, which are optimal under known and , but with their modifications (oracles, defined in Section 4.2
below) that have plenty of information about pre and postchange data distributions, but there is still some uncertainty; the oracles only know the parametric models that
and are coming from, and the task of competing with them making only a nonparametric assumption (i.i.d.) is challenging but not hopeless.
4.2 Description of Oracles
We assume that , , belong to some parametric class of densities, i.e., , where , are vectors of parameters. We impose the same prior on the parameters , . Thus, instead of likelihood (5) of observations without CP we use
(12) 
and instead of likelihood (4) of observations with CP at we use
(13) 
Oracles are obtained from optimal statistics (6), (7) and (8) by using from (12) instead of from (5), and by using from (13) instead of from (4) (cf. with section 2.4.2.1 and example 2.4.2 in Basseville and Nikiforov (1993)).
Let us consider a onedimensional example. We set , and , and we get that
(14)  
(15) 
where , . In such a way we model a situation when the Oracle does not know exact values of , .
5 Experiments
In the current section we describe our experimental setup and provide results of experiments.
5.1 Experimental setup
We consider the following experimental setup:

We use observations as a training set for computation of nonconformity scores. We set in all experiments.

Observations are generated from .

Observations are generated from . We consider .
As performance characteristics we use:

Mean delay until CP detection ,

Probability of False Alarm (FA) .
In all experiments using MonteCarlo simulations we estimate dependency of the mean delay on the probability of the false alarm .
In the case of oracle detectors (see section 4.2) we use likelihoods from (14) and from (15) to obtain Posterior Oracle from the optimal statistics (6), CUSUM Oracle from the optimal statistics (7) and SR Oracle from the optimal statistics (8). When calculating Posterior Probability statistics (6) and Posterior Oracle we set parameter
of the geometric distribution to
.5.2 Refinement of the experimental setup
When applying Conformal Martingales both original and inductive versions can be used. First let us check that the inductive version is not worse than the original one. In our comparison we use a simple NCM: . In Fig. 3 we plot estimated dependency of the mean delay on the probability of the false alarm for both ICM and CM with the constant betting function and different oracles. As we can see, there is almost no difference in the original and inductive versions. Later we consider only Inductive Conformal Martingales.
When calculating the Oracles we can either additionally use the train set or not. Let us check how the addition of the train set influence results. The comparison is presented in Fig. 4. We can see that the results are practically the same. Later in the paper when calculating the Oracles we do not use the train set.
5.3 Constant Betting Function
Results for Constant Betting Function are in Fig. 5. Here SR stands for SR Oracle, PP — for Posterior Oracle, CUSUM — for CUSUM Oracle, ICM 7 NN — for ICM CP detector with nearest neighbor NCM, ICM LR — for ICM CP detector with LR NCM. Mean delays for some values of false alarm probability are in Tab. 1.
Param.\Probab. of FA  5%  10%  

ICM LR  ICM kNN 



ICM LR  ICM kNN 




14.02  33.52  61.59  62.01  64.37  8.90  17.71  43.53  43.89  46.40  
7.08  12.51  19.51  19.51  20.98  4.79  7.79  14.50  14.51  15.67  
5.19  6.90  10.11  10.09  10.78  3.62  4.70  7.64  7.64  8.27  
13.22  31.33  37.78  37.80  38.73  8.33  17.17  27.24  27.24  28.25  
7.00  12.50  14.62  14.52  15.16  4.74  8.08  10.85  10.81  11.36  
5.13  7.12  8.02  7.98  8.30  3.59  4.85  6.00  5.97  6.28 
5.4 Mixture Betting Function
Results for Mixture Betting Function are in Fig. 6. Mean delays for some values of false alarm probability are in Tab. 2.
Param.\Probab. of FA  5%  10%  

ICM LR  ICM kNN 



ICM LR  ICM kNN 




132.58  193.27  61.59  62.01  64.37  66.34  124.34  43.53  43.89  46.40  
32.73  71.01  19.51  19.51  20.98  12.63  30.77  14.50  14.51  15.67  
11.37  16.60  10.11  10.09  10.78  5.45  7.57  7.64  7.64  8.27  
151.61  244.65  37.78  37.80  38.73  77.10  175.08  27.24  27.24  28.25  
29.50  65.29  14.62  14.52  15.16  16.56  32.13  10.85  10.81  11.36  
14.49  19.12  8.02  7.98  8.30  8.20  11.16  6.00  5.97  6.28 
5.5 Kernel Betting Function
Results for Kernel Betting Function are in Fig. 7. Mean delays for some values of false alarm probability are in Tab. 3. We use a sliding window of size to estimate density of pvalues.
We can see, that results for the Kernel Betting Function is worse than for the Mixture Betting Function. The main reason is that it takes a long time for the martingale to grow sufficiently. In fact, before the changepoint the distribution of pvalues is uniform on . If for the current moment of time it holds that , the distribution of pvalues , is also uniform. Thus, the martingale grows only when the changepoint is inside the interval , pvalues from which are used for density estimation. This is the reason why in section 3.6 we propose new Precomputed Kernel Density Betting Function.
Param.\Probab. of FA  5%  10%  

ICM LR  ICM kNN 



ICM LR  ICM kNN 




33.10  65.26  61.59  62.01  64.37  22.92  38.70  43.53  43.89  46.40  
15.08  22.03  19.51  19.51  20.98  11.15  15.65  14.50  14.51  15.67  
9.04  11.62  10.11  10.09  10.78  6.66  8.55  7.64  7.64  8.27  
30.06  54.14  37.78  37.80  38.73  22.90  36.57  27.24  27.24  28.25  
15.44  22.02  14.62  14.52  15.16  12.08  17.13  10.85  10.81  11.36  
10.00  12.81  8.02  7.98  8.30  7.83  10.15  6.00  5.97  6.28 
5.6 Precomputed Kernel Betting Function
When learning the Precomputed Kernel Betting Function we use one realization of length with a CP at , such that for and for regardless of where the real CP is located and which amplitude it has.
Results for Precomputed Kernel Betting Function are in Fig. 8. Mean delays for some values of false alarm probability are in Tab. 4.
Param.\Probab. of FA  5%  10%  

ICM LR  ICM kNN 



ICM LR  ICM kNN 




15.20  34.41  61.59  62.01  64.37  10.08  20.27  43.53  43.89  46.40  
7.47  11.12  19.51  19.51  20.98  5.02  7.32  14.50  14.51  15.67  
4.95  6.22  10.11  10.09  10.78  3.28  4.11  7.64  7.64  8.27  
14.14  28.70  37.78  37.80  38.73  9.65  18.91  27.24  27.24  28.25  
7.24  10.80  14.62  14.52  15.16  4.92  7.39  10.85  10.81  11.36  
4.90  6.15  8.02  7.98  8.30  3.29  4.18  6.00  5.97  6.28 
5.7 Comparison with Optimal detectors
We also compare CP detection based on CMs with optimal detectors: Cumulative Sum (CUSUM), ShiryaevRoberts (SR) and Posterior Probability statistics (PP). One can see from Tab. 5 and Fig. 9 that our results are comparable to results of the optimal methods. CMs perform a little bit worse, but we should notice that it requires fewer assumptions (does not know the true and ) and is more general (distributionfree).
Param.\Probab. of FA  5%  10%  

ICM LR  ICM kNN 



ICM LR  ICM kNN 




15.20  34.41  6.08  6.11  12.06  10.08  20.27  3.97  4.22  7.99  
7.47  11.12  3.42  3.60  7.11  5.02  7.32  2.19  2.43  4.67  
4.95  6.22  2.29  2.46  4.93  3.28  4.11  1.39  1.63  3.23  
14.14  28.70  6.19  6.22  12.55  9.65  18.91  4.07  4.19  8.38  
7.24  10.80  3.50  3.66  7.44  4.92  7.39  2.26  2.46  4.99  
4.90  6.15  2.33  2.48  5.22  3.29  4.18  1.46  1.64  3.44 
6 Conclusion
In this paper we describe an adaptation of Conformal Martingales for changepoint detection problem. We demonstrate the efficiency of this approach by comparing it with natural oracles, which are likelihoodbased changepoint detectors. Our results indicate that the efficiency of changepoint detection based on conformal martingales in most of cases is comparable with that of oracle detectors.
We propose and compare several approaches to calculating a betting function (a function that transforms pvalues into a martingale) and a nonconformity measure (a function that defines strangeness and, therefore, pvalues). We get that the Precomputed Kernel Betting Function provides the most efficient results and the Mixture Betting Function provides the worst results.
We also compare Inductive Conformal Martingales with methods that are optimal for known pre and postCP distributions, such as CUSUM, ShiryaevRoberts and Posterior Probability statistics. Our results are worse but still they are comparable. Some deterioration is inevitable, of course, since CMs are distributionfree methods and, therefore, require much weaker assumptions.
We are grateful for the support from the European Union’s Horizon 2020 Research and Innovation programme under Grant Agreement no. 671555 (ExCAPE project). The research presented in Section 5 of this paper was supported by the RFBR grants 160100576 A and 162909649 ofi_m. This work was also supported by the Russian Science Foundation grant (project 145000150), the UK EPSRC grant (EP/K033344/1), and the Technology Integrated Health Management (TIHM) project awarded to the School of Mathematics and Information Security at Royal Holloway. We are indebted to Prof. Ilya Muchnik, School of Data Analysis, Yandex, and Royal Holloway, University of London, for the studentship support of one of the authors.
References
 Alestra et al. (2014) Stephane Alestra, Christophe Bordry, Christophe Brand, Evgeny Burnaev, Pavel Erofeev, Artem Papanov, and Cassiano SilveiraFreixo. Application of rare event anticipation techniques to aircraft health management. Advanced Materials Research, 1016:413–417, 2014.
 Artemov and Burnaev (2016a) Alexey Artemov and Evgeny Burnaev. Ensembles of detectors for online detection of transient changes. In Proceedings of the Eighth International Conference on Machine Vision (ICMV), pages 1–5, 2016a.
 Artemov and Burnaev (2016b) Alexey Artemov and Evgeny Burnaev. Detecting performance degradation of softwareintensive systems in the presence of trends and longrange dependence. In Proceedings of the Sixteenth International Conference on Data Mining Workshops (ICDMW), pages 29–36. IEEE Conference Publications, 2016b.
 Artemov et al. (2016) Alexey Artemov, Evgeny Burnaev, and Andrey Lokot. Nonparametric decomposition of quasiperiodic time series for changepoint detection. In Proceedings of the Eighth International Conference on Machine Vision (ICMV), pages 1–5, 2016.
 Basseville and Nikiforov (1993) Michèle Basseville and Igor V. Nikiforov. Detection of Abrupt Changes: Theory and Application. Prentice Hall, Englewood Cliffs, 1993.
 Burnaev and Smolyakov (2016) Evgeny Burnaev and Dmitry Smolyakov. Oneclass SVM with privileged information and its application to malware detection. In Proceeding of the Sixteenth International Conference on Data Mining Workshops (ICDMW), pages 273–280. IEEE Conference Publications, 2016.
 Burnaev et al. (2015a) Evgeny Burnaev, Pavel Erofeev, and Artem Papanov. Influence of resampling on accuracy of imbalanced classification. In Proceedings of the Eighth International Conference on Machine Vision (ICMV), pages 1–5, 2015a.
 Burnaev et al. (2015b) Evgeny Burnaev, Pavel Erofeev, and Dmitry Smolyakov. Model selection for anomaly detection. In Proceedings of the Eighth International Conference on Machine Vision (ICMV), pages 1–6, 2015b.
 Casas et al. (2010) Pedro Casas, Sandrine Vaton, Lionel Fillatre, and Igor Nikiforov. Optimal volume anomaly detection and isolation in largescale IP networks using coarsegrained measurements. Computer Networks, 54:1750–1766, 2010.
 Fedorova et al. (2012) Valentina Fedorova, Alex Gammerman, Ilia Nouretdinov, and Vladimir Vovk. Plugin martingales for testing exchangeability online. In Proceedings of the TwentyNinth International Conference on Machine Learning (ICML), 2012.
 Guépié et al. (2012) Blaise Kévin Guépié, Lionel Fillatre, and Igor Nikiforov. Sequential detection of transient changes. Sequential Analysis, 31:528–547, 2012.
 Ho (2005) ShenShyang Ho. A martingale framework for concept change detection in timevarying data streams. In Proceedings of the TwentySecond International Conference on Machine learning (ICML), pages 321–327. ACM, 2005.
 Lorden (1971) Gary Lorden. Procedures for reacting to a change in distribution. Annals of Mathematical Statistics, 42:1897–1908, 1971.
 MacNeill and Mao (1995) I. B. MacNeill and Y. Mao. Changepoint analysis for mortality and morbidity rate. Applied Change Point Problems in Statistics, pages 37–55, 1995.
 Malladi and Speyer (1999) Durga P. Malladi and Jason L. Speyer. A generalized Shiryayev sequential probability ratio test for change detection and isolation. IEEE Transactions on Automatic Control, 44:1522–1534, 1999.
 Moustakides (1986) George V. Moustakides. Optimal stopping times for detecting changes in distributions. Annals of Statistics, 14:1379–1387, 1986.
 Page (1954) E. S. Page. Continuous inspection scheme. Biometrika, 41:100––115, 1954.
 Pham et al. (2014) DucSon Pham, Svetha Venkatesh, Mihai Lazarescu, and Saha Budhaditya. Anomaly detection in largescale data stream networks. Data Mining and Knowledge Discovery, 28:145–189, 2014.
 Pollak (1985) Moshe Pollak. Optimal detection of a change in distribution. Annals of Statistics, 13:206–227, 1985.
 Pollak (1987) Moshe Pollak. Average run lengths of an optimal method of detecting a change in distribution. Annals of Statistics, 15:749–779, 1987.
 Ritov (1990) Ya’acov Ritov. Decision theoretic optimality of the cusum procedure. Annals of Statistics, 18:1464–1469, 1990.
 Roberts (1966) S. W. Roberts. A comparison of some control chart procedures. Technometrics, 8:411–430, 1966.
 Rosenblatt et al. (1956) Murray Rosenblatt et al. Remarks on some nonparametric estimates of a density function. Annals of Mathematical Statistics, 27:832–837, 1956.
 Shiryaev (1963) Albert N. Shiryaev. On optimum methods in quickest detection problems. Theory of Probability & Its Applications, 8:22–46, 1963.
 Shiryaev (2010) Albert N. Shiryaev. Quickest detection problems: fifty years later. Sequential Analysis, 29:345––385, 2010.
 Tartakovsky et al. (2014) Alexander Tartakovsky, Igor Nikiforov, and Michele Basseville. Sequential Analysis: Hypothesis Testing and Changepoint Detection. CRC Press, Boca Raton, FL, 2014.
 Tartakovsky et al. (2006) Alexander G. Tartakovsky, Boris L. Rozovskii, Rudolf B. Blažek, and Hongjoong Kim. A novel approach to detection of intrusions in computer networks via adaptive sequential and batchsequential changepoint detection methods. IEEE Transactions on Signal Processing, 54:3372–3381, 2006.
 Ville (1939) Jean Ville. Etude critique de la notion de collectif. GauthierVillars Paris, 1939.
 Vovk et al. (2003) Vladimir Vovk, Ilia Nouretdinov, and Alexander Gammerman. Testing exchangeability online. In Proceedings of the Twentieth International Conference on Machine Learning (ICML), volume 12, pages 768–775, 2003.
 Vovk et al. (2005) Vladimir Vovk, Alex Gammerman, and Glenn Shafer. Algorithmic Learning in a Random World. Springer, New York, 2005.
Comments
There are no comments yet.