Detecting changes in covariance via random matrix theory

08/16/2021
by   Sean Ryan, et al.
0

A novel method is proposed for detecting changes in the covariance structure of moderate dimensional time series. This non-linear test statistic has a number of useful properties. Most importantly, it is independent of the underlying structure of the covariance matrix. We discuss how results from Random Matrix Theory, can be used to study the behaviour of our test statistic in a moderate dimensional setting (i.e. the number of variables is comparable to the length of the data). In particular, we demonstrate that the test statistic converges point wise to a normal distribution under the null hypothesis. We evaluate the performance of the proposed approach on a range of simulated datasets and find that it outperforms a range of alternative recently proposed methods. Finally, we use our approach to study changes in the amount of water on the surface of a plot of soil which feeds into model development for degradation of surface piping.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 36

08/31/2020

Discrete convolution statistic for hypothesis testing

The question of testing for equality in distribution between two linear ...
07/11/2019

Directing Power Towards Sub-Regions of the Alternative Hypothesis

In this paper, I propose a novel test statistic for testing hypotheses a...
05/03/2020

A Powerful Portmanteau Test for Detecting Nonlinearity in Time Series

A new portmanteau test statistic is proposed for detecting nonlinearity ...
02/23/2021

Goodness-of-fit Test on the Number of Biclusters in Relational Data Matrix

Biclustering is a method for detecting homogeneous submatrices in a give...
09/11/2018

T-statistic for Autoregressive process

In this paper, we discuss the distribution of the t-statistic under the ...
08/05/2020

Scalable Multiple Changepoint Detection for Functional Data Sequences

We propose the Multiple Changepoint Isolation (MCI) method for detecting...
11/05/2019

Consistency of a range of penalised cost approaches for detecting multiple changepoints

A common approach to detect multiple changepoints is to minimise a measu...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

There is considerable infrastructure within the oil and gas industry that is on the surface of the ground. These are exposed to changing weather conditions and models are used to estimate the degradation of the assets and direct monitoring. These degradation models rely on estimates of the conditions that the assets are under. As the surrounding soils evolve, so do the absorption properties of the soil. This may result in assets sitting in waterlogged soil when the models are assuming that water drains away. Thus estimating how these soil properties vary over time, specifically regarding water absorption, is an important task. Soil scientists typically measure water absorption through ground sensors at an individual location. This is not desirable as there may be large areas where assets are located. Recent research has explored video capture to estimate changes in surface water as a surrogate for absorption. Ambient lighting due to time of day and cloud cover hamper traditional mean-based methods for identifying surface water. In this paper, we develop a method for identifying changes within an image sequence (viewed as multivariate time series) that are linked to the changing presence of surface water.

One approach to modelling changing behaviour in data is to assume that the changes occur at a small number of discrete time points known as changepoints. Between changes, the data can be modelled as a set of stationary segments that satisfy standard modelling assumptions. Changepoint methods are relevant in a wide range of applications including genetics hocking2013learning, network traffic analysis rubin2016anomaly and oceanography tc-11-2149-2017. We consider the specific case where the covariance structure of the data changes at each changepoint. While our focus here is on a specific application, the problem has wide applicability. For example, stoehr2019detecting examine changes in the covariance structure of functional Magnetic Resonance Imaging (fMRI) data, where a failure to satisfy stationarity assumptions can significantly contaminate any analysis, while  Wied2013 and BERENS2015135 examine how changes in the covariance of financial data can be used to improve stock portfolio optimisation.

The changepoint problem has a long history in the statistical literature, and contains two distinct but related problems, depending on whether the data is observed sequentially (online setting) or as a single batch (offline setting). We focus on the latter and direct readers interested in the former to tartakovsky2014sequential for a thorough review. In the univariate setting there is a vast literature on different methods for estimating changepoints, and there are a number of state of the art methods killick2012optimal,frick2014multiscale,fryzlewicz2014wild,maidstone2017optimal.

The literature on detecting changes in multivariate time series has grown substantially in the last few years. In particular, many authors consider changes in the moderate dimensional setting, that is, where the number of the parameters of the model, is of the order of the number of data points. Much of this work considers changes in expectation where the series are uncorrelated grundy2020high,horvath2012change. Furthermore a number of authors have examined detecting changes in expectation where only a subset of variables under observation change enikeeva2019,jirak2015uniform,wang2016high. Separately a number of authors have considered changes in second order structure of moderate dimensional time series models including auto-covariance and cross-covariance cho2015multiple, changes in graphical models doi:10.1080/10618600.2017.1302340,6854087 and changes in network structure wang2018optimal.

The problem of detecting changes in the covariance structure has been examined in both the low dimensional and high dimensional setting. In the low dimensional setting () [chen2004statistical, lavielle2006detection] utilise a likelihood based test statistic and the Schwarz Information Criterion (SIC) to detect changes in covariance of normally distributed data. [aue2009break] consider a nonparameteric test statistic for changes in the covariance of linear and non-linear multivariate time series. [matteson2014nonparametric] study changes in the distribution of (possibly) multivariate time series using a clustering inspired nonparametric test statistic that claims to handle covariances. In the high dimensional setting, [avanesov2018, wang2017optimal] study test statistics based on the distance between sample covariances, utilising the operator norm and norm respectively. Crucially all of these approaches are focused on exploring the theoretical aspects of the proposed test statistics rather than the practical implications.

In this work we propose a novel method for detecting changes in the covariance structure of moderate dimensional time series motivated by the practical challenges of implementing the approach for estimating changes in soil. In Section 2, we introduce a test statistic inspired by a distance metric intuitively defined on the space of positive definite matrices. The primary advantage of this metric is that under the null hypothesis of no change, it is independent of the underlying covariance structure. This is not the case for other methods in the literature which require users to estimate this. In Section 3, we study the asymptotic properties of this test statistic when, the dimension of the data is of comparable size to (but still smaller than) the sample size. In Section 4, we use these results to propose a new method for detecting multiple changes in the covariance structure of multivariate time series. In Section 5, we study the finite sample performance of the proposed approach on simulated datasets. Finally in Section 6, we use our method to examine how changes in the covariance structure of pixel intensities can be used to detect changes in surface water.

2 Two Sample Tests for the Covariance

Let be independent

dimensional vectors with

(2.1)

where each is full rank. Furthermore, let denote an matrix defined by . Our primary interest in this paper is to develop a testing procedure that can identify a change in the covariance structure of the data over time. For now, let us consider the case of a single changepoint. We compare a null hypothesis of the data sharing the same covariance versus an alternative setting that allows a single change at time . Formally we have

(2.2)
(2.3)

where is unknown. We are interested in distinguishing between the null and alternative hypothesis, and under the alternative locating the changepoint , when the dimension of the data , is of comparable size to the length of the data, . In particular we require that for all pairs , the set

(2.4)

is non empty, where is a problem dependent positive constant. Note defines the set of possible candidate changepoints, while is the minimum distance between changepoints or minimum segment length. Then for each candidate changepoint , a two sample test statistic can be used to determine if the data to the left and right of the changepoint have different distributions. If the two sample test statistic for a candidate exceeds some threshold, then we say a change has occured and an estimator for is given by the value that maximises .

Let (or ) be a plug in estimator for the covariance (or precision) of data . Then to test for a changepoint at time , we can measure the magnitude of the matrix where and are sequences of normalizing constants. If this matrix is large, then there is evidence for a change and vice versa. This approach is well represented in the literature, for instance [wang2017optimal, aue2009break, galeano2007covariance] measure the difference between sample covariance estimates, while in the high dimensional setting, [avanesov2018] measures the difference between debiased graphical LASSO estimates. Although this approach is intuitive, it can be challenging to use in practice. A good estimator will depend on the true covariance of which implies that the difference matrix above is dependent on the true covariance of . As a result, any test statistic based on a difference matrix must be a function of the underlying covariance, , and should be corrected to account for this. For example, [aue2009break, galeano2007covariance] normalize their test statistic using the sample covariance for the whole data, [avanesov2018] use a bootstrap procedure which assumes knowledge of the measure of and [wang2017optimal] use a threshold which is a function of . All these approaches require estimating in practice. This is impractical under the alternative setting, since estimating the segment covariances requires knowledge of the changepoint.

Therefore, it is natural to ask whether there are alternative ways of measuring the distance between covariance metrics. In the univariate setting, a common approach is to evaluate the logarithm of the ratio of the segment variances (

[chen1997testing, inclan1994use, killick2010detection]). This is in contrast with the change in expectation problem where it is more common to measure the difference between sample means. In the variance setting a ratio is more appropriate for two reasons. Firstly, since variances are strictly positive, if the underlying variance is quite small then the absolute difference between the values will also be small whereas the ratio is not affected by the scale. Secondly, under the null hypothesis of no change, the variances will cancel and the test statistic will be independent of the variance. Thus, there is no need to estimate the variance when calculating the threshold.

We propose to extend this ratio idea from the univariate setting to the multivariate setting by studying the multivariate ratio matrix, , where

. Ratio matrices are widely used in multivariate analysis to compare covariance matrices (

[finn1974general]). In particular, we are often interested in functions of the eigenvalues of these matrices ([10.2307/2331979, 10.2307/2334137, 10.2307/2332232]). Here we are interested in the following test statistic,

(2.5)

where is the th largest eigenvalue of the matrix . The function has valuable properties that may not be immediately obvious.

Proposition 2.1.

Let , and , and define as in equation (2.5), then we have that:

  1. is symmetric i.e. ;

  2. is symmetric with respect to inversion of matrices i.e. ;

  3. If and , then .

The symmetry property is important for a changepoint analysis as the segmentation should be the same regardless of whether the data is read forwards or backwards. The second property states that is the same whether we examine the covariance matrix or the precision matrix. This ensures that differences between both small and large eigenvalues can be detected. The third property is particularly important as it implies that provides a test statistic which is independent of the underlying covariance of the data. In particular, let be the sample covariance estimate for data i.e.

Under the null hypothesis for any , where the covariance of

is the identity matrix. Then property 3 implies that for any

, which is independent of . In other words, under the null hypothesis the underlying covariance cancels out (as occurs with the ratio approach in the univariate variance setting). Furthermore, due to the square term, is a positive definite function. This is necessary to prevent changes cancelling out. These properties are clearly not unique to our chosen test statistic and in fact, there are other possible choices (such as ). However we argue that for an alternative function to be appropriate in the changepoint setting, it would also require these properties. Furthermore, this choice of

allows us to analytically derive relevant quantities such as the limiting moments.

It is both possible and interesting to study the properties of this test statistic in the finite dimensional setting (i.e., where is fixed). However in this work, our focus is on problems where the dimension of the data is of comparable size to the length of the data. Under this asymptotic setting, the eigenvalues of random matrices (and by extension the properties of ) have different limiting behaviour and a proper test should take this into account. For example, the two sample Likelihood Ratio Test (as used in [galeano2007covariance]) is a function of the log of the determinant of the covariance, or equivalently the sum of the log eigenvalues. In the moderate dimensional setting, this test has been shown to breakdown due to the differing limiting behaviour bai2009corrections. Therefore in the next section, we consider the properties of as a two sample test, and derive the asymptotic distribution under moderate dimensional asymptotics.

3 Random Matrix Theory

We now describe some foundational concepts in Random Matrix Theory (RMT), before discussing how these ideas are utilised to idenfity the asymptotic distribution of our test statistic under the null hypothesis. RMT concerns the study of matrices where each entry is a random variable. In particular, RMT is often concerned with the behaviour of the eigenvalues and eigenvectors of such matrices. Interested readers should see

[tao2012topics] for an introduction and [anderson2010introduction] for a more thorough review.

A key object of study in the field is the Empirical Spectral Distribution (ESD), defined for a matrix as

(3.6)

where is an indicator function. In other words the ESD of

is a discrete uniform distribution placed on the eigenvalues of

. Several authors have established results on the limiting behaviour of the ESD as the dimension tends to infinity, the so-called Limiting Spectral Distribution (LSD). For example, [doi:10.1137/1009001] demonstrate that if the upper triangular entries of a Hermitian matrix have mean zero and unit variance, then converges to the Wigner semicircular distribution.

The LSD of the ratio matrix , was shown to exist in [yin1983limiting] and computed analytically in [silverstein1985limiting]. The following two assumptions are sufficient for the LSD of an F matrix to exist.

Assumption 3.1.

Let and be random matrices with independent not necessarily identically distributed entries and with mean 0, variance 1 and fourth moment . Furthermore, for any fixed ,

(3.7)
(3.8)

as tend to infinity subject to Assumption 3.2.

Assumption 3.2.

The sample sizes , and the dimension grow to infinity such that

We refer to the limiting scheme described in Assumption 3.2 as .

Let be matrices satisfying Assumptions 3.1 and 3.2. These assumptions place restrictions on the mean of the data, and the tails of the data. The mean assumption is standard in the literature. If the data has non zero mean, the data should be standardized, typically by removing the sample mean (assuming the mean is constant). The impact of this on our proposed method is examined in Appendix B.3. Note that although the matrices have identity covariance, these results also hold for data with general covariance , since by property 3 of Proposition 2.1 the covariance term cancels out under the null hypothesis and we do not require knowledge of . Furthermore, let denote the ESD of . Then [silverstein1985limiting] demonstrate that converges almost surely to the non random distribution function

(3.9)
(3.10)
Figure 3.1: Test statistic defined in (2.5) applied to a 100 different data sets before (left) and after standardisation (right) using (3.11) under the null setting (top) and alternative setting (bottom) with , and .

The LSD, provides an asymptotic centering term for functions of the eigenvalues of random ratio matrices. In particular, for any function , we have that,

as , by the definition of weak convergence. This allows us to account for bias in the statistic as seen in Figure 3.1.

The rate of convergence of to zero was studied in [zheng2012] and found to be

. In particular, the authors establish a central limit theorem for the quantity,

We can apply this result to our problem in order to demonstrate that our two sample test statistic converges to a normal distribution with known mean and variance terms.

Theorem 3.1.

Let and be random matrices satisfying Assumptions 3.1 and 3.2 and be defined as in (2.5). Then we have that as ,

where , , , , , , , , , , .

Using Theorem 3.1, we can properly normalise such that it can be utilized within a changepoint analysis. In particular, we have that under the null hypothesis

weakly as tend to infinity, where and is as defined in Theorem 3.1. Thus we utilise the normalised test statistic, ,

(3.11)

which under the null hypothesis converges pointwise to a standard normal random variable.

The asymptotic moments of the test statistic, , depend on the parameter and, as approaches (or equivalently ) the mean and variance of the test statistic dramatically increase. In the context of changepoint analysis, this implies that the mean and variance increase at the edges of the data. We note that this is a common feature of changepoint test statistics. We can significantly reduce the impact of this by the above standardisation. This can be seen empirically in Figure 3.1. After standardisation, the test statistics for the series with no change, do not appear to have any structure. Similarly, the test statistics for the series with a change show a clear peak at the changepoint. Importantly we can now easily distinguish the test statistic under the null and alternative hypotheses, and this normalization does not require knowledge of the underlying covariance structure.

4 Practical Considerations

Before we can apply our method to real and simulated data, we need to address three practical concerns, namely we must select a threshold for rejecting the null hypothesis, determine an appropriate minimum segment length and address the issue of multiple changepoints.

4.1 Threshold for Detecting a Change

Firstly, we need to select an appropriate threshold for rejecting the null hypothesis. We choose to utilise the asymptotic distribution of the test statistic on a pointwise basis, that is for each we say that , where is a standard normal variable. This do not take into account whether or not we are in the limiting regime and as a result, the method may be unreliable if is small (indeed we observe this pattern in Section 5

). We then use a Bonferroni correction Haynes2013 to control the probability that any

exceeds a threshold . In particular, for a given significance level , we reject the null hypothesis for a single change in data of length if for some , where is the

th quantile of the standard normal distribution In the case of multiple changepoints, we use

to account for the extra hypothesis tests. We note that a Bonferroni correction is known to be conservative and as a result, using this approach may have poor size (again results from the simulation study validate this concern). Ideally, one would take account of the strong dependence between consecutive test statistics to get a better threshold, but this is challenging given the non-linear nature of the test statistic. Further work may wish to investigate whether finite sample results which exploit this dependence can be derived. Alternatively practitioners could use several different thresholds and ascertain the appropriate threshold for a particular application at hand as demonstrated in [lavielle2005].

4.2 Minimum Segment Length

The test statistic proposed relies on an appropriate choice for the minimum segment length parameter, . Too small and the covariance estimates in the small segments will elicit false detection, too large and the changepoints will not be localized enough to be useful.

In many applications, domain specific knowledge may be used to increase this parameter. However, it is also important to consider smallest value that will give reliable results in the general case. The minimum segment length must grow sufficiently fast to ensure that converges to a normal distribution. Outside the asymptotic regime, it is possible for the ratio matrix to have very large eigenvalues. Thus for candidate changepoints close to (or by symmetry ), the probability of observing spuriously large values of becomes much larger. This can be seen in Figure 4.2. When (the smallest possible value), we observe extremely large values of the test statistic, that would make identifying a true change almost impossible. However when , the test statistic behaves reliably. Thus we need to converge to or equivalently for the asymptotic results to hold. In Appendix B.1, we analyse the effect of different sequences in the finite sample setting via a simulation study. Based on these results, we recommend using a default value of , however note that for moderate values of , smaller values can be taken without any corresponding decrease in performance.

Figure 4.2: Histogram of values of applied to 100 datasets of length with no change for with (top) and (bottom).

4.3 Multiple Changepoints

Finally, we also consider the extension to multiple changes. In this setting, we have a set of unknown ordered changepoints, such that, and is the covariance matrix of the th time-vector. We are interested in estimating the number of changes , and the set of changepoints . The classic approach to this problem, is to extend a method defined for the single changepoint setting to the multiple changepoint setting, via an appropriate search method such as dynamic programming or binary segmentation. For this work, we cannot apply the dynamic programming approach killick2012optimal, which minimises the within segment variability through a cost function for each segment. This is because our distance metric is formulated as a two-sample test and cannot be readily expressed as cost function for a single segment. Therefore we use the classic binary segmentation procedure scott1974cluster. The binary segmentation method extends a single changepoint test as follows. Firstly, the test is run on the whole data. If no change is found then the algorithm terminates. If a changepoint is found, it is added to the list of estimated changepoints and, the binary segmentation procedure is then run on the data to the left and right of the candidate change. This process continues until no more changes are found. Note the threshold, , and the minimum segment length, , remain the same. We note that a number of extensions of the traditional binary segmentation procedure have been proposed in recent years fryzlewicz2014wild, olshen2004circular. Although we do not use these search methods in our simulations, due to additional optional parameters that affect performance, it is not difficult to incorporate our proposed test statistic into these adaptations of the original binary segmentation approach. The full proposed procedure is described in Algorithm 1.

Input : Data matrix , , , minseglen , significance level
;
for  do
       ;
       ;
      
end for
;
if  then
       RatioBinSeg;
       RatioBinSeg;
       ;
      
end if
Output : Set of changepoints .
Algorithm 1 Ratio Binary Segmentation(RatioBinSeg)

5 Simulations

In this section, we compare our method with existing methods in the literature, namely the methods of [wang2017optimal, aue2009break, galeano2007covariance], which we refer to as the Aue, Galeano and Wang methods respectively. We do not consider [avanesov2018] as this method is intended for the high dimensional setting and so would be an unfair comparison. Software implementing these methods is not currently available and as a result, we have implemented each of these methods according to the descriptions in their respective papers. All methods, simulations, visuals and analysis have been implemented in the R programming language Rref. The code to repeat our experiments is available at https://github.com/s-ryan1/Covariance_RMT_simulations.

Simulation studies in the current literature for changes in covariance structure are very limited. [wang2017optimal] do not include any simulations. [aue2009break, avanesov2018] only consider the single changepoint setting, and do not consider random parameters for the changes. Furthermore to our knowledge, no papers compare the performance of different methods. While theoretical results are clearly important, it is also necessary to consider the finite sample performance of any estimator, and we now study the finite sample properties of our approach on simulated datasets. Further details on the general setup of our simulations are given in Appendix A. Note that the significance thresholds for each method are set to be favourable to competing methods and we anticipate that performance would decrease in real data.

We begin by analysing the performance of our approach (which we refer to as the Ratio method) in the single changepoint setting. This allows us to directly examine the finite sample properties of the method, such as the power and size, as well as investigate how violations to our assumptions, such as autocorrelation and heavy tailed errors impact the method. We then compare our approach with current state of the art methods for detecting multiple changepoints. Results for assessing the chosen default values for the minimum segment length parameter in Section 4, as well as a comparison of different methods in the single changepoint setting, are given in Appendix B. These demonstrate that the Ratio and Aue methods are well peaked whereas the Wang method is not leading to accurate changepoint localization. In both settings, the localization of the changepoints is more accurate for our approach than the [aue2009break] method whilst also being applicable to larger values of . However, we note that for smaller dimensions, , the Aue method would likely be more accurate due to the Bonferroni correction over correcting. Finally, comparisons for the Ratio method based on whether or not the mean is known (under the null and alternate hypotheses) are provided in Appendix B. These results show that centering the data by subtracting the sample mean has a small impact on the performance of the method.

Performance Metrics

In the single changepoint setting, we are interested in whether the Ratio approach provides a valid hypothesis test. Therefore, for a given set of simulations, we measure how often the method incorrectly rejects the null (Type 1 error) and how often the method fails to correctly reject the null (Type 2 error). Furthermore, under the alternative hypothesis, we measure the absolute difference between the estimated changepoint and the true change, and refer to this throughout as the Changepoint Error. For the multiple changepoint setting, we use and to denote the set of true changepoints and the set of estimated changepoints respectively. We say that the changepoint has been detected correctly if for some and denote the set of correctly estimated changes by . Then we define the true discovery rate (TDR) and false discovery rate (FDR) as follows,

A perfect method will have a TDR of 1 and FDR of 0. We set although it should be noted that in reality the desired accuracy would be application specific and dependent on the minimum segment length . Although, whilst the specific values vary with the conclusions of the study do not. We also consider whether or not the resulting segmentation allows us to estimate the true underlying covariance matrices, and define the mean absolute error (MAE) as

5.1 Finite sample properties

We begin by examining the performance of our proposed approach on normally distributed datasets with length and dimension . Based on results from Section 4.2, the minimum segment length was set to . For each pair, we generated 1000 datasets with a single change at time as follows

(5.12)

where is the identity matrix of dimension and . Note since our approach is invariant to the covariance of the data under the null hypothesis, this is equivalent to generating the data with some unknown covariance. We use the approach described in Section 4.1 to select the threshold with . A histogram of the test statistic values under the null hypothesis for and is shown in Figure 5.3. For , we observe large test values, however this effect is not present for , indicating that we have not entered the limiting regime when . We computed the FPR for each pair and the results are shown in Table E.7. Across all dimensions, we observe low numbers of false positives. In particular, for and the method is conservative. For , we find that the test appears to have good size, however this difference may be explained by the fact that we have not entered the limiting regime. We measure the power and accuracy of our method via the True Positive Rate (TPR) and the absolute difference between the estimated change and the true change (Changepoint Error). The TPR is given in Table E.7 for . As increases the probability of detection increases. However for smaller values of , the method can have less power, such as and . In these cases, using less data gives a better detection rate, implying that the method inefficiently utilises the data and a high dimensional approach may be preferable. Changepoint errors are given in Figure 5.3. As increase, the method more accurately locates the changepoint, as we would expect.

Serial Dependence

Our method does not allow for dependence between succesive data points. To measure the impact of serial dependence, we generated data,

(5.13)

where , , and and . Results from this analysis are shown in Figure 5.4. Focusing on the top left plot, we can see that even under the null the test values increase as the autocorrelation increases, and we find that for the proposed threshold is invalid, with FPRs of approximately 1. Thus the test is invalid and will produce spurious false positives. This is well known in the univariate changepoint literature Shi2021 and can be mitigated by scaling the threshold by the autocorrelation observed.

Similarly the power and accuracy of our method decreases when changepoints are present. Figure 5.4 shows the separation between test statistic value under the null and alternative hypothesis. If these values are well separated then the method will have good power given a valid threshold. We find that the separation between the null and alternative distributions decreases and thus changepoint error increases as increases. These results show that as autocorrelation increases, our method becomes less accurate as expected.

Figure 5.3: Clockwise from Top Left: Test values for

, dashed line indicates threshold. For higher dimensions we observe less outliers. Changepoint error under increasing data length (

), dimension () and size of change (). The method becomes more accurate as these increase.

Model Misspecification

Our method places assumptions on the data which may not hold in practice. To measure the impact of this we generated data

(5.14)

Uniform, Exponential(1), Student t, , and . Note the the exponential and t-distributions do not satisfy assumption 3.1

and as a result, the threshold is invalid producing FPRs of .512 and .248 respectively. Interestingly, although the t-distribution has heavier tailed errors than the exponential distribution, it gives a lower FPR. This is likely due to the skewness of the exponential distribution. We find that the method has less power for the heavier tailed distributions (Figure

5.4), as again, the distributions under the null and alternative overlap more. This pattern is repeated for the changepoint error. Thus, the method will be less accurate in the presence of heavy tailed errors.

Figure 5.4: Clockwise from Top Left: Test statistic values for AR(1) data with , dashed line indicates threshold. Changepoint Error for AR(1) data (. Changepoint Error for different error distributions. Test values for different error distributions, dashed line indicates threshold. As autocorrelation and probability of outliers increases the method becomes less accurate.
FDR MAE TDR
p n Aue Ratio Wang Galeano Aue Ratio Wang Galeano Aue Ratio Wang Galeano
3 500 0.38 0.27 0.63 0.46 26.05 26.53 42.82 37.99 0.55 0.38 0.51 0.20
1000 0.49 0.29 0.77 0.51 18.44 15.36 36.81 27.94 0.58 0.48 0.48 0.23
2000 0.54 0.31 0.86 0.54 12.78 7.65 32.14 21.46 0.63 0.60 0.44 0.27
5000 0.59 0.31 0.90 0.57 7.95 2.93 23.85 13.97 0.65 0.68 0.45 0.31
10 500 0.25 0.16 0.28 0.46 304 303.68 316.80 541.90 0.56 0.55 0.51 0.22
1000 0.32 0.13 0.43 0.48 167.59 127.72 243.70 413.23 0.69 0.75 0.54 0.26
2000 0.34 0.10 0.53 0.51 99.52 46.44 186.25 292.50 0.76 0.87 0.58 0.29
5000 0.37 0.09 0.61 0.57 58.09 16.62 123.34 189.12 0.80 0.91 0.61 0.32
30 2000 0.02 0.31 0.47 143.71 1096.3 1550.72 0.98 0.45 0.36
5000 0.02 0.32 0.52 51.15 523.25 913.45 0.98 0.55 0.39
100 5000 0.00 0.44 0.51 209.36 7895.1 5969.70 1.00 0.32 0.48
Table 5.1: Results from multiple changepoint simulations based on Ratio constraints described in Appendix A.2. For smaller values of , the Ratio provides lower FPR and lower TPR than the Aue method, indicating a tradeoff between the methods. However for larger values of

the Ratio method is the top performer. Confidence intervals for mean values are provided in Table

E.5.

5.2 Multiple Change Points

We now compare the Ratio approach with other methods on simulated data sets with multiple changepoints. We consider datasets with 4 changepoints, uniformly sampled with minimum segment length , where and . Covariance matrices are generated so that the distance between consecutive covariance matrices is sufficiently large to detect a change. We consider two separate metrics,

where matches the assumptions from [wang2017optimal], while matches the distance metric used by . As such, the first set of simulations should favour the Wang method. Full details for how the covariances were generated are provided in Appendix A.2. For each pair and distance metric, we generated 1000 datasets and applied our method, the Aue ([aue2009break]) method and the Wang ([wang2017optimal]) method to each dataset. Due to its computational complexity, we do not run the Aue method for . Using the resulting segmentations, we then calculated the error metrics for each method. The worst performers across all metrics are the Galeano and Wang methods. Notably the true positive rate for the Wang method decreases as grows. This is in striking contrast with the other methods which become more accurate for larger values of as one may expect. This may be due to the fact that the Wang method only considers the first principal component of the difference matrix, ignoring the remainder of the spectrum or the bias issue identified in Appendix B.2. The Galeano performance is equally surprising as given results in the single changepoint case (Appendix B.2) we would anticipate it to be the best performer. Furthermore, these methods also have the highest false positive rate, indicating that adapting the threshold would not lead to more accurate changepoint estimates. The Ratio and Aue methods are more closely matched. We can see that the Ratio method is the more conservative of the two, producing a lower FDR and corresponding lower TDR when and are smaller. This is unsurprising since our results in the single changepoint case found that the Ratio method can be less reliable when . For scenarios with , the Ratio approach is extremely accurate. This indicates that for problems with smaller datasets, the Aue method may be preferable, while our approach is suitable for larger datasets.

FDR MAE TDR
p n Aue Ratio Wang Galeano Aue Ratio Wang Galeano Aue Ratio Wang Galeano
3 500 0.30 0.66 0.94 21.21 41.86 40.47 0.64 0.41 0.02
3 1000 0.40 0.81 0.81 16.40 37.33 32.33 0.72 0.35 0.05
3 2000 0.47 0.88 0.68 12.14 30.59 25.83 0.77 0.3 0.09
3 5000 0.52 0.92 0.51 7.54 22.21 19.51 0.77 0.28 0.16
10 500 0.36 0.57 1.00 265.47 329.02 0.28 0.19 0.00
10 1000 0.45 0.73 1.00 141.09 201.80 253.15 0.41 0.14 0.00
10 2000 0.48 0.80 0.99 91.04 147.13 206.24 0.13 0.00
10 5000 0.48 0.81 0.95 50.98 90.40 165.90 0.63 0.14 0.01
30 2000 0.84 0.96 1127.84 1330.00 0.05 0.01
30 5000 0.84 0.87 671.94 925.27 0.06 0.04
100 5000 0.97 0.64 7796.28 7036.56 0.01 0.11
Table 5.2: Results from multiple changepoint simulations based on assumptions in [wang2018optimal] described in Appendix A.2. For smaller values of , the Ratio provides lower FPR and lower TPR than the Aue method, indicating a tradeoff between the methods. However for larger values of the Ratio method is the top performer. Confidence intervals for mean values are provided in Table E.6.

6 Detecting changes in moisture levels in soil

In this section, we investigate whether changes in the covariance structure of soil data correspond with shifts in the amount of moisture on the soil. There is significant interest in developing new techniques to better understand how water is absorbed and modelling surface water. This is an important question and is relevant to a variety of industrial applications such as farming, construction and the oil and gas industry hillel2003introduction. A widely used approach is to place probes at different depths and locations in the soil which measure the level of moisture. To measure across a site more easily, scientists are investigating the use of cameras to capture the soil surface as a surrogate for moisture.

Figure 6.5: (a) Raw grayscale intensities for three pixels. (b) Standardised intensities for the same three pixels.

We analyse images from an experiment studying moisture on the surface of the soil. A camera was placed over a large plot of soil and took a set of 589 pictures over the experiment. At different times, different amounts of rainfall are simulated and the amount of water on the soil surface changes. We wish to segment the data into periods of dry, wet and surface water. The intensity of a set of pixels over time is shown in Figure 6.5 . We can see that the mean level is clearly nonstationary. This nonstationary behaviour may be attributed to two causes, changes in the background light intensity (due to a cloud passing by) and changes in the wetness of the soil which changes how much light is reflected. Since changes in the mean intensity are not necessarily associated with changes in the wetness, we instead focus on changes in the covariance structure. If pixels become wet we would expect that the correlation between the pixels should increase as they become more alike as the surface becomes uniformly water instead of the variable soil surface. Thus changes in the covariance structure of the pixels may correspond with changes in the wetness.

The data consists of 589 images and we analyze two subsets with The original images are in colour but were transferred to grayscale for computational purposes. We run a multiple changepoint analysis on the smaller subset using our approach as well as the Aue and Wang methods. We also ran a multiple changepoint analysis on the larger subset.

In order to analyse the covariance structure of the data, we first need to transform the data to have stationary mean. Estimating the mean of this series is challenging as there is stochastic volatility and the smoothness of the function appears to change over time. As a result, standard smoothing methods such as LOESS and windowed mean estimators may be inappropriate. We use a Bayesian Trend Filter with Dynamic Shrinkage kowal2019dynamic, implemented in the DSP R package dsp, which is robust to these issues. We then take the residuals. The transformed data for a subset of the pixels can be seen in Figure 6.5. The minimum segment length is set to 25. The thresholds for significance for each method were again set to the defaults as discussed in the previous section. The results of this analysis are shown in Table 6.3.

To validate our results we worked with scientists studying this data and identified three clear time points where there is a substantial change in the amount of water on the surface at the relevant pixels. The first change is somewhat gradual going from very dry at time to very wet from time . The second and third changes are more abrupt, with a substantial increase in the amount of water at time and a corresponding sharp decrease at time . The Aue method reports 7 changepoints, the Wang method reports 5 changepoints and our method locates 8 changepoints. All methods detect the first and last changes. However the Wang method does not detect any change near the second anticipated changepoint. All of the methods appear to overfit changepoints, in the sense that they report changes that do not correspond with clear changes in the amount of water on the surface. For our method and the Aue method, the majority of these overfitted changes occur when the soil is dry (before t=64 and after t=450). During these periods the scientists were moving around the site increasing the variability in the amount of light exposure from image to image which may explain these nuisance changes.

For the larger dataset, the minimum segment length was set to 60 (twice the number of variables) and the thresholds were set to their defaults. The results were broadly similar for our method and quite different for the Wang method. Our approach reports 6 changes again detecting the three obvious changes in the video. We note that the reduced number of changepoints is primarily due to the increased minimum segment length. The Wang method only reports a single changepoint. This drop in reported changes is caused by the largest eigenvalue of the sample covariance being much larger. As a result, the threshold for detecting a change is times larger to account for this and consequently, it appears that the method loses power.

Method Small subset Larger subset
Aue 66, 101, 243, 354, 451, 514, 589 NA
Wang 52,79, 184, 237, 445 445
Ratio 49, 77, 244, 347, 452, 493, 532,562 64, 125, 184, 255, 340, 450, 527
Table 6.3: Detected changepoints for each of the three methods when applied to the soil image data. Note the dimension of the larger subset means the Aue method can detect at most one changepoint.

7 Conclusion

In this work, we have presented a novel test statistic for detecting changes in the covariance structure of moderate dimensional data. This geometrically inspired test statistic has a number of desirable properties that are not features of competitor methods. Most notably our approach does not require knowledge of the underlying covariance structure. We utilise results from Random Matrix Theory to derive a limiting distribution for our test statistic. The proposed method outperforms other methods on simulated datasets, in terms of both accuracy in detecting changes and estimation of the underlying covariance model. We then use our method to analyse changes in the amount of surface water on a plot of soil. We find that our approach is able to detect changes in this dataset that are visible to the eye and locates a number of other changes. It is not clear whether these changes correspond to true changes in the surface water and we are investigating this further.

While our method has a number of advantages, it is important to recognise some limitations. Firstly, our method requires calculating the inverse of a matrix at each time point, which is a computationally and memory intensive operation. As a result, our approach is challenging for larger datasets that can be considered by other methods, which only require the first principle component. This could be mitigated by novel solutions to solving inverse matrices alongside GPU computation. However, as we demonstrate through simulations, there are a wide range of settings where our method produces considerable better results for a marginal increase in computational time. Finally we note that a limitation of our method is that the minimum segment length is bounded below by the dimension of the data. This means that the method cannot be applied to tall datasets or datasets with short segments.

[]

Appendix A Further Details on the Simulation Study

a.1 Random Seed Generation

To ensure reproducibility, throughout our simulation study we make use of seeds for generating random numbers. These seeds are chosen to ensure that, where appropriate, simulations under different settings are directly comparable. For example, the data generating mechanisms in Section 5.1 feature a parameter which measures the size of the change. If the random seed depended on the parameter, it would be harder to compare the results across different parameters, such as and , since by random chance the noise generated for the former case may be better behaved than for the latter. Therefore for all our experiments, the random seeds depend on the dimensions of the data, and , however they do not depend on other parameters, such as change size or error distribution. Finally in Section 5.2, the seeds for generating random covariances depends on but not (since determines how hard the problem is).

a.2 Data Generation Mechanism

We now provide full details for the data generating mechanisms used to simulate data in Section 5.2. In the study, random covariance matrices are generated such that distance between consecutive matrices is sufficiently large. We incorporate constraints on the eigenvalues of the difference and ratio matrix, i.e.

The procedure for generating a set of matrices sufficiently far apart with respect to is as follows

  1. where for

  2. For the th segment, generate Uniform

  3. Then where is a random integer uniformly sampled from .

  4. Then where and is a random orthonormal matrix.

The procedure for generating a set of matrices sufficiently far apart with respect to is as follows

  1. where for

  2. For the th segment, generate where is an ordered list of Uniform random variables. Then .

  3. To ensure that the covariances get both bigger and smaller in scale, we set where each Bernoulli(1/2)

  4. Then and

  5. Finally, where is a random orthonormal matrix.

a.3 Specifications for Competitor Methods

In Remark 2.1, [aue2009break] state that the asymptotic distribution of their test statistic after standardisation can be approximated by a standard normal distribution. Therefore we set the threshold for detecting a change to be the quantile or 1.96. Note that this could be increased, reducing the probability of overfitting changes but also reducing the power of the method. This approach also requires a plug in estimator for the long run covariance of the vectorized second moment of the data. For datasets with no temporal structure, this long run covariance is exactly the covariance of the vectorized second moment and we use the empirical estimate as our plug in estimator. Note this should improve the performance of the method compared with a generic plug in estimator for the long run covariance. For examples with temporal dependence, following the recommendation provided in [aue2009break], we use the Bartlett estimator as implemented in [sandwich]. In both cases, the plug in estimator has dimension where is the dimension of the data, and must be invertable implying that . As a result, we do not include this method in simulations with large datasets.

[wang2017optimal] do not provide a practical default threshold for their method, instead providing an interval of consistent thresholds which is defined by theoretical quantities such as the minimum size of a change, the minimum distance between changes and a bound on the tails of the data, . A lower bound on the minimum threshold is given by . The value bounds the square root of the largest eigenvalue of the covariance of the underlying data, which implies the largest eigenvalue is a lower bound for . Note this value is not available in practice so we approximate this quantity with the largest eigenvalue of the data. Thus a lower bound for the threshold is given by . Again if this value was increased, the method would lose power but be less likely to overfit changes.

[galeano2007covariance] propose a number of methodsm, recommending a Cusum based test statistic in most settings. Therefore we compare our approach with the CUSUM method. The authors demonstrate that their method converges weakly to a standard Brownian Bridge. We base our threshold on the quantile of the asymptotic distribution.

Appendix B Further Simulations

b.1 Assessment of minimum segment length

In order to control the false positive rate of the method, we need appropriate choices of the minimum segment length, . We examined the impact of different functions for on data of different dimensions. In particular, we consider datasets with , and . For each combination, we generated 1000 datasets and applied the Ratio method with . The results of this analysis can be seen in Figure B.6. As with other simulations the FPR is not well controlled for . For , the FPR is well controlled for . For , we can use much less conservative functions for , with no significant increase in FPR for . This is likely due to the fact that when , we are closer to the limiting regime and the test is better behaved. As a result of these simulations, we recommend setting , however note that less conservative functions are reasonable for larger values of .

Figure B.6: False Positive Rates for different minimum segment length functions and different values of . We can see that for larger values of , proportionally smaller minimum segment lengths can be used without an increase in FPR.

b.2 Single Changepoint

We now compare our approach with the methods mentioned in Section 5. For all our simulated examples, we let the minimum segment length or distance between changes be as this is required by the wang method. We compare the four approaches on a set of 100 datasets with a change at . We consider two settings with the first case having a moderate value for (), and the second case having a larger value for (). Importantly in the second setting we should be closer to the asymptotic regime for our method as and are larger. For each dataset we computed the test statistic as well as the difference between the truth and the changepoint estimates for each method. Note that the Aue method is not computable for the case, and as a result is not included for this case. The results of this simulation can be seen in Figure B.7.

In both the small and large cases, the Galeano method clearly outperforms the other methods, providing the lowest changepoint error. Looking at Figure B.7 (top left), the Wang and Aue methods are poorly peaked indicating a change has not beed detected, while the Ratio method produces a partial peak indicating it is struggling to detect the change. These performances are reflected in the changepoint errors (B.7 top right) In the large setting (bottom left and right), both the Galeano and Ratio methods are well peaked, with the Galeano method producing the lower error. The wang method appears to partially locate the change but appears to be biased to the right and thus fails to localize the change. It is unclear what causes this bias. Finally, we note that the results in this setting differ from what we would expect based on Section 5.2. In particular, in that experiment we found that the Ratio method completely outperformed the Galeano approach. Again the reason for this disparity is unclear, however given that a greater range of changes are considered in the multiple change setting they may indicate that the Galeano approach works better on certain types of changes.

Figure B.7: (a) Test statistic at each time point from a 100 different data sets under the alternative setting with , and a changepoint at . (b) Histogram of the difference between the estimated changepoint location and the true changepoint. (c) Same as (a) for and . (d) Same as (b) for and .

b.3 Impact of Mean Centering

The Ratio method assumes that data has mean zero ( or equivalently that the mean is known). This is unlikely to be true in practice and some mean centering will typically be required. When faced with data with an unknown stationary mean, we recommend centering the data by subtracting the sample mean. To evaluate the impact of this approach, we applied the Ratio method to datasets with known and unknown means, under both the null and alternate hypothesis. In particular, we consider two sets of dimensions, namely and . For each dimension set and hypothesis, we generated 1000 datasets and applied the Ratio method (note under then alternative ). Results from this analysis are shown in Table B.4. We can see that the mean centering, has a very marginal impact on the performance of the Ratio method.

n p hypothesis mean TPR Change Estimate
1 500.00 15.00 Alt Mean Known 0.92 222.82
2 500.00 15.00 Alt Mean Unknown 0.94 225.70
3 500.00 15.00 Null Mean Known 0.04 238.88
4 500.00 15.00 Null Mean Unknown 0.02 240.38
5 2500.00 100.00 Alt Mean Known 1.00 1247.10
6 2500.00 100.00 Alt Mean Unknown 1.00 1247.14
7 2500.00 100.00 Null Mean Known 0.00 1159.56
8 2500.00 100.00 Null Mean Unknown 0.00 1187.74
Table B.4: Average TPR and estimated change location depending on whether the data was centered by subtracting the sample mean. We can see that the centering has little impact on the results.

Appendix C Auxillary Results

The results in this section are required for the proof of Theorem 3.1.

Lemma C.1.

Let and be the real valued function

Then

where

Proof.

Firstly we have that

Then