1 Introduction
There is considerable infrastructure within the oil and gas industry that is on the surface of the ground. These are exposed to changing weather conditions and models are used to estimate the degradation of the assets and direct monitoring. These degradation models rely on estimates of the conditions that the assets are under. As the surrounding soils evolve, so do the absorption properties of the soil. This may result in assets sitting in waterlogged soil when the models are assuming that water drains away. Thus estimating how these soil properties vary over time, specifically regarding water absorption, is an important task. Soil scientists typically measure water absorption through ground sensors at an individual location. This is not desirable as there may be large areas where assets are located. Recent research has explored video capture to estimate changes in surface water as a surrogate for absorption. Ambient lighting due to time of day and cloud cover hamper traditional meanbased methods for identifying surface water. In this paper, we develop a method for identifying changes within an image sequence (viewed as multivariate time series) that are linked to the changing presence of surface water.
One approach to modelling changing behaviour in data is to assume that the changes occur at a small number of discrete time points known as changepoints. Between changes, the data can be modelled as a set of stationary segments that satisfy standard modelling assumptions. Changepoint methods are relevant in a wide range of applications including genetics hocking2013learning, network traffic analysis rubin2016anomaly and oceanography tc1121492017. We consider the specific case where the covariance structure of the data changes at each changepoint. While our focus here is on a specific application, the problem has wide applicability. For example, stoehr2019detecting examine changes in the covariance structure of functional Magnetic Resonance Imaging (fMRI) data, where a failure to satisfy stationarity assumptions can significantly contaminate any analysis, while Wied2013 and BERENS2015135 examine how changes in the covariance of financial data can be used to improve stock portfolio optimisation.
The changepoint problem has a long history in the statistical literature, and contains two distinct but related problems, depending on whether the data is observed sequentially (online setting) or as a single batch (offline setting). We focus on the latter and direct readers interested in the former to tartakovsky2014sequential for a thorough review. In the univariate setting there is a vast literature on different methods for estimating changepoints, and there are a number of state of the art methods killick2012optimal,frick2014multiscale,fryzlewicz2014wild,maidstone2017optimal.
The literature on detecting changes in multivariate time series has grown substantially in the last few years. In particular, many authors consider changes in the moderate dimensional setting, that is, where the number of the parameters of the model, is of the order of the number of data points. Much of this work considers changes in expectation where the series are uncorrelated grundy2020high,horvath2012change. Furthermore a number of authors have examined detecting changes in expectation where only a subset of variables under observation change enikeeva2019,jirak2015uniform,wang2016high. Separately a number of authors have considered changes in second order structure of moderate dimensional time series models including autocovariance and crosscovariance cho2015multiple, changes in graphical models doi:10.1080/10618600.2017.1302340,6854087 and changes in network structure wang2018optimal.
The problem of detecting changes in the covariance structure has been examined in both the low dimensional and high dimensional setting. In the low dimensional setting () [chen2004statistical, lavielle2006detection] utilise a likelihood based test statistic and the Schwarz Information Criterion (SIC) to detect changes in covariance of normally distributed data. [aue2009break] consider a nonparameteric test statistic for changes in the covariance of linear and nonlinear multivariate time series. [matteson2014nonparametric] study changes in the distribution of (possibly) multivariate time series using a clustering inspired nonparametric test statistic that claims to handle covariances. In the high dimensional setting, [avanesov2018, wang2017optimal] study test statistics based on the distance between sample covariances, utilising the operator norm and norm respectively. Crucially all of these approaches are focused on exploring the theoretical aspects of the proposed test statistics rather than the practical implications.
In this work we propose a novel method for detecting changes in the covariance structure of moderate dimensional time series motivated by the practical challenges of implementing the approach for estimating changes in soil. In Section 2, we introduce a test statistic inspired by a distance metric intuitively defined on the space of positive definite matrices. The primary advantage of this metric is that under the null hypothesis of no change, it is independent of the underlying covariance structure. This is not the case for other methods in the literature which require users to estimate this. In Section 3, we study the asymptotic properties of this test statistic when, the dimension of the data is of comparable size to (but still smaller than) the sample size. In Section 4, we use these results to propose a new method for detecting multiple changes in the covariance structure of multivariate time series. In Section 5, we study the finite sample performance of the proposed approach on simulated datasets. Finally in Section 6, we use our method to examine how changes in the covariance structure of pixel intensities can be used to detect changes in surface water.
2 Two Sample Tests for the Covariance
Let be independent
dimensional vectors with
(2.1) 
where each is full rank. Furthermore, let denote an matrix defined by . Our primary interest in this paper is to develop a testing procedure that can identify a change in the covariance structure of the data over time. For now, let us consider the case of a single changepoint. We compare a null hypothesis of the data sharing the same covariance versus an alternative setting that allows a single change at time . Formally we have
(2.2)  
(2.3) 
where is unknown. We are interested in distinguishing between the null and alternative hypothesis, and under the alternative locating the changepoint , when the dimension of the data , is of comparable size to the length of the data, . In particular we require that for all pairs , the set
(2.4) 
is non empty, where is a problem dependent positive constant. Note defines the set of possible candidate changepoints, while is the minimum distance between changepoints or minimum segment length. Then for each candidate changepoint , a two sample test statistic can be used to determine if the data to the left and right of the changepoint have different distributions. If the two sample test statistic for a candidate exceeds some threshold, then we say a change has occured and an estimator for is given by the value that maximises .
Let (or ) be a plug in estimator for the covariance (or precision) of data . Then to test for a changepoint at time , we can measure the magnitude of the matrix where and are sequences of normalizing constants. If this matrix is large, then there is evidence for a change and vice versa. This approach is well represented in the literature, for instance [wang2017optimal, aue2009break, galeano2007covariance] measure the difference between sample covariance estimates, while in the high dimensional setting, [avanesov2018] measures the difference between debiased graphical LASSO estimates. Although this approach is intuitive, it can be challenging to use in practice. A good estimator will depend on the true covariance of which implies that the difference matrix above is dependent on the true covariance of . As a result, any test statistic based on a difference matrix must be a function of the underlying covariance, , and should be corrected to account for this. For example, [aue2009break, galeano2007covariance] normalize their test statistic using the sample covariance for the whole data, [avanesov2018] use a bootstrap procedure which assumes knowledge of the measure of and [wang2017optimal] use a threshold which is a function of . All these approaches require estimating in practice. This is impractical under the alternative setting, since estimating the segment covariances requires knowledge of the changepoint.
Therefore, it is natural to ask whether there are alternative ways of measuring the distance between covariance metrics. In the univariate setting, a common approach is to evaluate the logarithm of the ratio of the segment variances (
[chen1997testing, inclan1994use, killick2010detection]). This is in contrast with the change in expectation problem where it is more common to measure the difference between sample means. In the variance setting a ratio is more appropriate for two reasons. Firstly, since variances are strictly positive, if the underlying variance is quite small then the absolute difference between the values will also be small whereas the ratio is not affected by the scale. Secondly, under the null hypothesis of no change, the variances will cancel and the test statistic will be independent of the variance. Thus, there is no need to estimate the variance when calculating the threshold.We propose to extend this ratio idea from the univariate setting to the multivariate setting by studying the multivariate ratio matrix, , where
. Ratio matrices are widely used in multivariate analysis to compare covariance matrices (
[finn1974general]). In particular, we are often interested in functions of the eigenvalues of these matrices ([10.2307/2331979, 10.2307/2334137, 10.2307/2332232]). Here we are interested in the following test statistic,(2.5) 
where is the th largest eigenvalue of the matrix . The function has valuable properties that may not be immediately obvious.
Proposition 2.1.
Let , and , and define as in equation (2.5), then we have that:

is symmetric i.e. ;

is symmetric with respect to inversion of matrices i.e. ;

If and , then .
The symmetry property is important for a changepoint analysis as the segmentation should be the same regardless of whether the data is read forwards or backwards. The second property states that is the same whether we examine the covariance matrix or the precision matrix. This ensures that differences between both small and large eigenvalues can be detected. The third property is particularly important as it implies that provides a test statistic which is independent of the underlying covariance of the data. In particular, let be the sample covariance estimate for data i.e.
Under the null hypothesis for any , where the covariance of
is the identity matrix. Then property 3 implies that for any
, which is independent of . In other words, under the null hypothesis the underlying covariance cancels out (as occurs with the ratio approach in the univariate variance setting). Furthermore, due to the square term, is a positive definite function. This is necessary to prevent changes cancelling out. These properties are clearly not unique to our chosen test statistic and in fact, there are other possible choices (such as ). However we argue that for an alternative function to be appropriate in the changepoint setting, it would also require these properties. Furthermore, this choice ofallows us to analytically derive relevant quantities such as the limiting moments.
It is both possible and interesting to study the properties of this test statistic in the finite dimensional setting (i.e., where is fixed). However in this work, our focus is on problems where the dimension of the data is of comparable size to the length of the data. Under this asymptotic setting, the eigenvalues of random matrices (and by extension the properties of ) have different limiting behaviour and a proper test should take this into account. For example, the two sample Likelihood Ratio Test (as used in [galeano2007covariance]) is a function of the log of the determinant of the covariance, or equivalently the sum of the log eigenvalues. In the moderate dimensional setting, this test has been shown to breakdown due to the differing limiting behaviour bai2009corrections. Therefore in the next section, we consider the properties of as a two sample test, and derive the asymptotic distribution under moderate dimensional asymptotics.
3 Random Matrix Theory
We now describe some foundational concepts in Random Matrix Theory (RMT), before discussing how these ideas are utilised to idenfity the asymptotic distribution of our test statistic under the null hypothesis. RMT concerns the study of matrices where each entry is a random variable. In particular, RMT is often concerned with the behaviour of the eigenvalues and eigenvectors of such matrices. Interested readers should see
[tao2012topics] for an introduction and [anderson2010introduction] for a more thorough review.A key object of study in the field is the Empirical Spectral Distribution (ESD), defined for a matrix as
(3.6) 
where is an indicator function. In other words the ESD of
is a discrete uniform distribution placed on the eigenvalues of
. Several authors have established results on the limiting behaviour of the ESD as the dimension tends to infinity, the socalled Limiting Spectral Distribution (LSD). For example, [doi:10.1137/1009001] demonstrate that if the upper triangular entries of a Hermitian matrix have mean zero and unit variance, then converges to the Wigner semicircular distribution.The LSD of the ratio matrix , was shown to exist in [yin1983limiting] and computed analytically in [silverstein1985limiting]. The following two assumptions are sufficient for the LSD of an F matrix to exist.
Assumption 3.1.
Let and be random matrices with independent not necessarily identically distributed entries and with mean 0, variance 1 and fourth moment . Furthermore, for any fixed ,
(3.7)  
(3.8) 
as tend to infinity subject to Assumption 3.2.
Assumption 3.2.
The sample sizes , and the dimension grow to infinity such that
We refer to the limiting scheme described in Assumption 3.2 as .
Let be matrices satisfying Assumptions 3.1 and 3.2. These assumptions place restrictions on the mean of the data, and the tails of the data. The mean assumption is standard in the literature. If the data has non zero mean, the data should be standardized, typically by removing the sample mean (assuming the mean is constant). The impact of this on our proposed method is examined in Appendix B.3. Note that although the matrices have identity covariance, these results also hold for data with general covariance , since by property 3 of Proposition 2.1 the covariance term cancels out under the null hypothesis and we do not require knowledge of . Furthermore, let denote the ESD of . Then [silverstein1985limiting] demonstrate that converges almost surely to the non random distribution function
(3.9)  
(3.10) 
The LSD, provides an asymptotic centering term for functions of the eigenvalues of random ratio matrices. In particular, for any function , we have that,
as , by the definition of weak convergence. This allows us to account for bias in the statistic as seen in Figure 3.1.
The rate of convergence of to zero was studied in [zheng2012] and found to be
. In particular, the authors establish a central limit theorem for the quantity,
We can apply this result to our problem in order to demonstrate that our two sample test statistic converges to a normal distribution with known mean and variance terms.
Theorem 3.1.
Using Theorem 3.1, we can properly normalise such that it can be utilized within a changepoint analysis. In particular, we have that under the null hypothesis
weakly as tend to infinity, where and is as defined in Theorem 3.1. Thus we utilise the normalised test statistic, ,
(3.11) 
which under the null hypothesis converges pointwise to a standard normal random variable.
The asymptotic moments of the test statistic, , depend on the parameter and, as approaches (or equivalently ) the mean and variance of the test statistic dramatically increase. In the context of changepoint analysis, this implies that the mean and variance increase at the edges of the data. We note that this is a common feature of changepoint test statistics. We can significantly reduce the impact of this by the above standardisation. This can be seen empirically in Figure 3.1. After standardisation, the test statistics for the series with no change, do not appear to have any structure. Similarly, the test statistics for the series with a change show a clear peak at the changepoint. Importantly we can now easily distinguish the test statistic under the null and alternative hypotheses, and this normalization does not require knowledge of the underlying covariance structure.
4 Practical Considerations
Before we can apply our method to real and simulated data, we need to address three practical concerns, namely we must select a threshold for rejecting the null hypothesis, determine an appropriate minimum segment length and address the issue of multiple changepoints.
4.1 Threshold for Detecting a Change
Firstly, we need to select an appropriate threshold for rejecting the null hypothesis. We choose to utilise the asymptotic distribution of the test statistic on a pointwise basis, that is for each we say that , where is a standard normal variable. This do not take into account whether or not we are in the limiting regime and as a result, the method may be unreliable if is small (indeed we observe this pattern in Section 5
). We then use a Bonferroni correction Haynes2013 to control the probability that any
exceeds a threshold . In particular, for a given significance level , we reject the null hypothesis for a single change in data of length if for some , where is theth quantile of the standard normal distribution In the case of multiple changepoints, we use
to account for the extra hypothesis tests. We note that a Bonferroni correction is known to be conservative and as a result, using this approach may have poor size (again results from the simulation study validate this concern). Ideally, one would take account of the strong dependence between consecutive test statistics to get a better threshold, but this is challenging given the nonlinear nature of the test statistic. Further work may wish to investigate whether finite sample results which exploit this dependence can be derived. Alternatively practitioners could use several different thresholds and ascertain the appropriate threshold for a particular application at hand as demonstrated in [lavielle2005].4.2 Minimum Segment Length
The test statistic proposed relies on an appropriate choice for the minimum segment length parameter, . Too small and the covariance estimates in the small segments will elicit false detection, too large and the changepoints will not be localized enough to be useful.
In many applications, domain specific knowledge may be used to increase this parameter. However, it is also important to consider smallest value that will give reliable results in the general case. The minimum segment length must grow sufficiently fast to ensure that converges to a normal distribution. Outside the asymptotic regime, it is possible for the ratio matrix to have very large eigenvalues. Thus for candidate changepoints close to (or by symmetry ), the probability of observing spuriously large values of becomes much larger. This can be seen in Figure 4.2. When (the smallest possible value), we observe extremely large values of the test statistic, that would make identifying a true change almost impossible. However when , the test statistic behaves reliably. Thus we need to converge to or equivalently for the asymptotic results to hold. In Appendix B.1, we analyse the effect of different sequences in the finite sample setting via a simulation study. Based on these results, we recommend using a default value of , however note that for moderate values of , smaller values can be taken without any corresponding decrease in performance.
4.3 Multiple Changepoints
Finally, we also consider the extension to multiple changes. In this setting, we have a set of unknown ordered changepoints, such that, and is the covariance matrix of the th timevector. We are interested in estimating the number of changes , and the set of changepoints . The classic approach to this problem, is to extend a method defined for the single changepoint setting to the multiple changepoint setting, via an appropriate search method such as dynamic programming or binary segmentation. For this work, we cannot apply the dynamic programming approach killick2012optimal, which minimises the within segment variability through a cost function for each segment. This is because our distance metric is formulated as a twosample test and cannot be readily expressed as cost function for a single segment. Therefore we use the classic binary segmentation procedure scott1974cluster. The binary segmentation method extends a single changepoint test as follows. Firstly, the test is run on the whole data. If no change is found then the algorithm terminates. If a changepoint is found, it is added to the list of estimated changepoints and, the binary segmentation procedure is then run on the data to the left and right of the candidate change. This process continues until no more changes are found. Note the threshold, , and the minimum segment length, , remain the same. We note that a number of extensions of the traditional binary segmentation procedure have been proposed in recent years fryzlewicz2014wild, olshen2004circular. Although we do not use these search methods in our simulations, due to additional optional parameters that affect performance, it is not difficult to incorporate our proposed test statistic into these adaptations of the original binary segmentation approach. The full proposed procedure is described in Algorithm 1.
5 Simulations
In this section, we compare our method with existing methods in the literature, namely the methods of [wang2017optimal, aue2009break, galeano2007covariance], which we refer to as the Aue, Galeano and Wang methods respectively. We do not consider [avanesov2018] as this method is intended for the high dimensional setting and so would be an unfair comparison. Software implementing these methods is not currently available and as a result, we have implemented each of these methods according to the descriptions in their respective papers. All methods, simulations, visuals and analysis have been implemented in the R programming language Rref. The code to repeat our experiments is available at https://github.com/sryan1/Covariance_RMT_simulations.
Simulation studies in the current literature for changes in covariance structure are very limited. [wang2017optimal] do not include any simulations. [aue2009break, avanesov2018] only consider the single changepoint setting, and do not consider random parameters for the changes. Furthermore to our knowledge, no papers compare the performance of different methods. While theoretical results are clearly important, it is also necessary to consider the finite sample performance of any estimator, and we now study the finite sample properties of our approach on simulated datasets. Further details on the general setup of our simulations are given in Appendix A. Note that the significance thresholds for each method are set to be favourable to competing methods and we anticipate that performance would decrease in real data.
We begin by analysing the performance of our approach (which we refer to as the Ratio method) in the single changepoint setting. This allows us to directly examine the finite sample properties of the method, such as the power and size, as well as investigate how violations to our assumptions, such as autocorrelation and heavy tailed errors impact the method. We then compare our approach with current state of the art methods for detecting multiple changepoints. Results for assessing the chosen default values for the minimum segment length parameter in Section 4, as well as a comparison of different methods in the single changepoint setting, are given in Appendix B. These demonstrate that the Ratio and Aue methods are well peaked whereas the Wang method is not leading to accurate changepoint localization. In both settings, the localization of the changepoints is more accurate for our approach than the [aue2009break] method whilst also being applicable to larger values of . However, we note that for smaller dimensions, , the Aue method would likely be more accurate due to the Bonferroni correction over correcting. Finally, comparisons for the Ratio method based on whether or not the mean is known (under the null and alternate hypotheses) are provided in Appendix B. These results show that centering the data by subtracting the sample mean has a small impact on the performance of the method.
Performance Metrics
In the single changepoint setting, we are interested in whether the Ratio approach provides a valid hypothesis test. Therefore, for a given set of simulations, we measure how often the method incorrectly rejects the null (Type 1 error) and how often the method fails to correctly reject the null (Type 2 error). Furthermore, under the alternative hypothesis, we measure the absolute difference between the estimated changepoint and the true change, and refer to this throughout as the Changepoint Error. For the multiple changepoint setting, we use and to denote the set of true changepoints and the set of estimated changepoints respectively. We say that the changepoint has been detected correctly if for some and denote the set of correctly estimated changes by . Then we define the true discovery rate (TDR) and false discovery rate (FDR) as follows,
A perfect method will have a TDR of 1 and FDR of 0. We set although it should be noted that in reality the desired accuracy would be application specific and dependent on the minimum segment length . Although, whilst the specific values vary with the conclusions of the study do not. We also consider whether or not the resulting segmentation allows us to estimate the true underlying covariance matrices, and define the mean absolute error (MAE) as
5.1 Finite sample properties
We begin by examining the performance of our proposed approach on normally distributed datasets with length and dimension . Based on results from Section 4.2, the minimum segment length was set to . For each pair, we generated 1000 datasets with a single change at time as follows
(5.12) 
where is the identity matrix of dimension and . Note since our approach is invariant to the covariance of the data under the null hypothesis, this is equivalent to generating the data with some unknown covariance. We use the approach described in Section 4.1 to select the threshold with . A histogram of the test statistic values under the null hypothesis for and is shown in Figure 5.3. For , we observe large test values, however this effect is not present for , indicating that we have not entered the limiting regime when . We computed the FPR for each pair and the results are shown in Table E.7. Across all dimensions, we observe low numbers of false positives. In particular, for and the method is conservative. For , we find that the test appears to have good size, however this difference may be explained by the fact that we have not entered the limiting regime. We measure the power and accuracy of our method via the True Positive Rate (TPR) and the absolute difference between the estimated change and the true change (Changepoint Error). The TPR is given in Table E.7 for . As increases the probability of detection increases. However for smaller values of , the method can have less power, such as and . In these cases, using less data gives a better detection rate, implying that the method inefficiently utilises the data and a high dimensional approach may be preferable. Changepoint errors are given in Figure 5.3. As increase, the method more accurately locates the changepoint, as we would expect.
Serial Dependence
Our method does not allow for dependence between succesive data points. To measure the impact of serial dependence, we generated data,
(5.13) 
where , , and and . Results from this analysis are shown in Figure 5.4. Focusing on the top left plot, we can see that even under the null the test values increase as the autocorrelation increases, and we find that for the proposed threshold is invalid, with FPRs of approximately 1. Thus the test is invalid and will produce spurious false positives. This is well known in the univariate changepoint literature Shi2021 and can be mitigated by scaling the threshold by the autocorrelation observed.
Similarly the power and accuracy of our method decreases when changepoints are present. Figure 5.4 shows the separation between test statistic value under the null and alternative hypothesis. If these values are well separated then the method will have good power given a valid threshold. We find that the separation between the null and alternative distributions decreases and thus changepoint error increases as increases. These results show that as autocorrelation increases, our method becomes less accurate as expected.
, dashed line indicates threshold. For higher dimensions we observe less outliers. Changepoint error under increasing data length (
), dimension () and size of change (). The method becomes more accurate as these increase.Model Misspecification
Our method places assumptions on the data which may not hold in practice. To measure the impact of this we generated data
(5.14) 
Uniform, Exponential(1), Student t, , and . Note the the exponential and tdistributions do not satisfy assumption 3.1
and as a result, the threshold is invalid producing FPRs of .512 and .248 respectively. Interestingly, although the tdistribution has heavier tailed errors than the exponential distribution, it gives a lower FPR. This is likely due to the skewness of the exponential distribution. We find that the method has less power for the heavier tailed distributions (Figure
5.4), as again, the distributions under the null and alternative overlap more. This pattern is repeated for the changepoint error. Thus, the method will be less accurate in the presence of heavy tailed errors.FDR  MAE  TDR  

p  n  Aue  Ratio  Wang  Galeano  Aue  Ratio  Wang  Galeano  Aue  Ratio  Wang  Galeano 
3  500  0.38  0.27  0.63  0.46  26.05  26.53  42.82  37.99  0.55  0.38  0.51  0.20 
1000  0.49  0.29  0.77  0.51  18.44  15.36  36.81  27.94  0.58  0.48  0.48  0.23  
2000  0.54  0.31  0.86  0.54  12.78  7.65  32.14  21.46  0.63  0.60  0.44  0.27  
5000  0.59  0.31  0.90  0.57  7.95  2.93  23.85  13.97  0.65  0.68  0.45  0.31  
10  500  0.25  0.16  0.28  0.46  304  303.68  316.80  541.90  0.56  0.55  0.51  0.22 
1000  0.32  0.13  0.43  0.48  167.59  127.72  243.70  413.23  0.69  0.75  0.54  0.26  
2000  0.34  0.10  0.53  0.51  99.52  46.44  186.25  292.50  0.76  0.87  0.58  0.29  
5000  0.37  0.09  0.61  0.57  58.09  16.62  123.34  189.12  0.80  0.91  0.61  0.32  
30  2000  0.02  0.31  0.47  143.71  1096.3  1550.72  0.98  0.45  0.36  
5000  0.02  0.32  0.52  51.15  523.25  913.45  0.98  0.55  0.39  
100  5000  0.00  0.44  0.51  209.36  7895.1  5969.70  1.00  0.32  0.48 
the Ratio method is the top performer. Confidence intervals for mean values are provided in Table
E.5.5.2 Multiple Change Points
We now compare the Ratio approach with other methods on simulated data sets with multiple changepoints. We consider datasets with 4 changepoints, uniformly sampled with minimum segment length , where and . Covariance matrices are generated so that the distance between consecutive covariance matrices is sufficiently large to detect a change. We consider two separate metrics,
where matches the assumptions from [wang2017optimal], while matches the distance metric used by . As such, the first set of simulations should favour the Wang method. Full details for how the covariances were generated are provided in Appendix A.2. For each pair and distance metric, we generated 1000 datasets and applied our method, the Aue ([aue2009break]) method and the Wang ([wang2017optimal]) method to each dataset. Due to its computational complexity, we do not run the Aue method for . Using the resulting segmentations, we then calculated the error metrics for each method. The worst performers across all metrics are the Galeano and Wang methods. Notably the true positive rate for the Wang method decreases as grows. This is in striking contrast with the other methods which become more accurate for larger values of as one may expect. This may be due to the fact that the Wang method only considers the first principal component of the difference matrix, ignoring the remainder of the spectrum or the bias issue identified in Appendix B.2. The Galeano performance is equally surprising as given results in the single changepoint case (Appendix B.2) we would anticipate it to be the best performer. Furthermore, these methods also have the highest false positive rate, indicating that adapting the threshold would not lead to more accurate changepoint estimates. The Ratio and Aue methods are more closely matched. We can see that the Ratio method is the more conservative of the two, producing a lower FDR and corresponding lower TDR when and are smaller. This is unsurprising since our results in the single changepoint case found that the Ratio method can be less reliable when . For scenarios with , the Ratio approach is extremely accurate. This indicates that for problems with smaller datasets, the Aue method may be preferable, while our approach is suitable for larger datasets.
FDR  MAE  TDR  

p  n  Aue  Ratio  Wang  Galeano  Aue  Ratio  Wang  Galeano  Aue  Ratio  Wang  Galeano 
3  500  0.30  0.66  0.94  21.21  41.86  40.47  0.64  0.41  0.02  
3  1000  0.40  0.81  0.81  16.40  37.33  32.33  0.72  0.35  0.05  
3  2000  0.47  0.88  0.68  12.14  30.59  25.83  0.77  0.3  0.09  
3  5000  0.52  0.92  0.51  7.54  22.21  19.51  0.77  0.28  0.16  
10  500  0.36  0.57  1.00  265.47  329.02  0.28  0.19  0.00  
10  1000  0.45  0.73  1.00  141.09  201.80  253.15  0.41  0.14  0.00  
10  2000  0.48  0.80  0.99  91.04  147.13  206.24  0.13  0.00  
10  5000  0.48  0.81  0.95  50.98  90.40  165.90  0.63  0.14  0.01  
30  2000  0.84  0.96  1127.84  1330.00  0.05  0.01  
30  5000  0.84  0.87  671.94  925.27  0.06  0.04  
100  5000  0.97  0.64  7796.28  7036.56  0.01  0.11 
6 Detecting changes in moisture levels in soil
In this section, we investigate whether changes in the covariance structure of soil data correspond with shifts in the amount of moisture on the soil. There is significant interest in developing new techniques to better understand how water is absorbed and modelling surface water. This is an important question and is relevant to a variety of industrial applications such as farming, construction and the oil and gas industry hillel2003introduction. A widely used approach is to place probes at different depths and locations in the soil which measure the level of moisture. To measure across a site more easily, scientists are investigating the use of cameras to capture the soil surface as a surrogate for moisture.
We analyse images from an experiment studying moisture on the surface of the soil. A camera was placed over a large plot of soil and took a set of 589 pictures over the experiment. At different times, different amounts of rainfall are simulated and the amount of water on the soil surface changes. We wish to segment the data into periods of dry, wet and surface water. The intensity of a set of pixels over time is shown in Figure 6.5 . We can see that the mean level is clearly nonstationary. This nonstationary behaviour may be attributed to two causes, changes in the background light intensity (due to a cloud passing by) and changes in the wetness of the soil which changes how much light is reflected. Since changes in the mean intensity are not necessarily associated with changes in the wetness, we instead focus on changes in the covariance structure. If pixels become wet we would expect that the correlation between the pixels should increase as they become more alike as the surface becomes uniformly water instead of the variable soil surface. Thus changes in the covariance structure of the pixels may correspond with changes in the wetness.
The data consists of 589 images and we analyze two subsets with The original images are in colour but were transferred to grayscale for computational purposes. We run a multiple changepoint analysis on the smaller subset using our approach as well as the Aue and Wang methods. We also ran a multiple changepoint analysis on the larger subset.
In order to analyse the covariance structure of the data, we first need to transform the data to have stationary mean. Estimating the mean of this series is challenging as there is stochastic volatility and the smoothness of the function appears to change over time. As a result, standard smoothing methods such as LOESS and windowed mean estimators may be inappropriate. We use a Bayesian Trend Filter with Dynamic Shrinkage kowal2019dynamic, implemented in the DSP R package dsp, which is robust to these issues. We then take the residuals. The transformed data for a subset of the pixels can be seen in Figure 6.5. The minimum segment length is set to 25. The thresholds for significance for each method were again set to the defaults as discussed in the previous section. The results of this analysis are shown in Table 6.3.
To validate our results we worked with scientists studying this data and identified three clear time points where there is a substantial change in the amount of water on the surface at the relevant pixels. The first change is somewhat gradual going from very dry at time to very wet from time . The second and third changes are more abrupt, with a substantial increase in the amount of water at time and a corresponding sharp decrease at time . The Aue method reports 7 changepoints, the Wang method reports 5 changepoints and our method locates 8 changepoints. All methods detect the first and last changes. However the Wang method does not detect any change near the second anticipated changepoint. All of the methods appear to overfit changepoints, in the sense that they report changes that do not correspond with clear changes in the amount of water on the surface. For our method and the Aue method, the majority of these overfitted changes occur when the soil is dry (before t=64 and after t=450). During these periods the scientists were moving around the site increasing the variability in the amount of light exposure from image to image which may explain these nuisance changes.
For the larger dataset, the minimum segment length was set to 60 (twice the number of variables) and the thresholds were set to their defaults. The results were broadly similar for our method and quite different for the Wang method. Our approach reports 6 changes again detecting the three obvious changes in the video. We note that the reduced number of changepoints is primarily due to the increased minimum segment length. The Wang method only reports a single changepoint. This drop in reported changes is caused by the largest eigenvalue of the sample covariance being much larger. As a result, the threshold for detecting a change is times larger to account for this and consequently, it appears that the method loses power.
Method  Small subset  Larger subset 

Aue  66, 101, 243, 354, 451, 514, 589  NA 
Wang  52,79, 184, 237, 445  445 
Ratio  49, 77, 244, 347, 452, 493, 532,562  64, 125, 184, 255, 340, 450, 527 
7 Conclusion
In this work, we have presented a novel test statistic for detecting changes in the covariance structure of moderate dimensional data. This geometrically inspired test statistic has a number of desirable properties that are not features of competitor methods. Most notably our approach does not require knowledge of the underlying covariance structure. We utilise results from Random Matrix Theory to derive a limiting distribution for our test statistic. The proposed method outperforms other methods on simulated datasets, in terms of both accuracy in detecting changes and estimation of the underlying covariance model. We then use our method to analyse changes in the amount of surface water on a plot of soil. We find that our approach is able to detect changes in this dataset that are visible to the eye and locates a number of other changes. It is not clear whether these changes correspond to true changes in the surface water and we are investigating this further.
While our method has a number of advantages, it is important to recognise some limitations. Firstly, our method requires calculating the inverse of a matrix at each time point, which is a computationally and memory intensive operation. As a result, our approach is challenging for larger datasets that can be considered by other methods, which only require the first principle component. This could be mitigated by novel solutions to solving inverse matrices alongside GPU computation. However, as we demonstrate through simulations, there are a wide range of settings where our method produces considerable better results for a marginal increase in computational time. Finally we note that a limitation of our method is that the minimum segment length is bounded below by the dimension of the data. This means that the method cannot be applied to tall datasets or datasets with short segments.
[]
Appendix A Further Details on the Simulation Study
a.1 Random Seed Generation
To ensure reproducibility, throughout our simulation study we make use of seeds for generating random numbers. These seeds are chosen to ensure that, where appropriate, simulations under different settings are directly comparable. For example, the data generating mechanisms in Section 5.1 feature a parameter which measures the size of the change. If the random seed depended on the parameter, it would be harder to compare the results across different parameters, such as and , since by random chance the noise generated for the former case may be better behaved than for the latter. Therefore for all our experiments, the random seeds depend on the dimensions of the data, and , however they do not depend on other parameters, such as change size or error distribution. Finally in Section 5.2, the seeds for generating random covariances depends on but not (since determines how hard the problem is).
a.2 Data Generation Mechanism
We now provide full details for the data generating mechanisms used to simulate data in Section 5.2. In the study, random covariance matrices are generated such that distance between consecutive matrices is sufficiently large. We incorporate constraints on the eigenvalues of the difference and ratio matrix, i.e.
The procedure for generating a set of matrices sufficiently far apart with respect to is as follows

where for

For the th segment, generate Uniform

Then where is a random integer uniformly sampled from .

Then where and is a random orthonormal matrix.
The procedure for generating a set of matrices sufficiently far apart with respect to is as follows

where for

For the th segment, generate where is an ordered list of Uniform random variables. Then .

To ensure that the covariances get both bigger and smaller in scale, we set where each Bernoulli(1/2)

Then and

Finally, where is a random orthonormal matrix.
a.3 Specifications for Competitor Methods
In Remark 2.1, [aue2009break] state that the asymptotic distribution of their test statistic after standardisation can be approximated by a standard normal distribution. Therefore we set the threshold for detecting a change to be the quantile or 1.96. Note that this could be increased, reducing the probability of overfitting changes but also reducing the power of the method. This approach also requires a plug in estimator for the long run covariance of the vectorized second moment of the data. For datasets with no temporal structure, this long run covariance is exactly the covariance of the vectorized second moment and we use the empirical estimate as our plug in estimator. Note this should improve the performance of the method compared with a generic plug in estimator for the long run covariance. For examples with temporal dependence, following the recommendation provided in [aue2009break], we use the Bartlett estimator as implemented in [sandwich]. In both cases, the plug in estimator has dimension where is the dimension of the data, and must be invertable implying that . As a result, we do not include this method in simulations with large datasets.
[wang2017optimal] do not provide a practical default threshold for their method, instead providing an interval of consistent thresholds which is defined by theoretical quantities such as the minimum size of a change, the minimum distance between changes and a bound on the tails of the data, . A lower bound on the minimum threshold is given by . The value bounds the square root of the largest eigenvalue of the covariance of the underlying data, which implies the largest eigenvalue is a lower bound for . Note this value is not available in practice so we approximate this quantity with the largest eigenvalue of the data. Thus a lower bound for the threshold is given by . Again if this value was increased, the method would lose power but be less likely to overfit changes.
[galeano2007covariance] propose a number of methodsm, recommending a Cusum based test statistic in most settings. Therefore we compare our approach with the CUSUM method. The authors demonstrate that their method converges weakly to a standard Brownian Bridge. We base our threshold on the quantile of the asymptotic distribution.
Appendix B Further Simulations
b.1 Assessment of minimum segment length
In order to control the false positive rate of the method, we need appropriate choices of the minimum segment length, . We examined the impact of different functions for on data of different dimensions. In particular, we consider datasets with , and . For each combination, we generated 1000 datasets and applied the Ratio method with . The results of this analysis can be seen in Figure B.6. As with other simulations the FPR is not well controlled for . For , the FPR is well controlled for . For , we can use much less conservative functions for , with no significant increase in FPR for . This is likely due to the fact that when , we are closer to the limiting regime and the test is better behaved. As a result of these simulations, we recommend setting , however note that less conservative functions are reasonable for larger values of .
b.2 Single Changepoint
We now compare our approach with the methods mentioned in Section 5. For all our simulated examples, we let the minimum segment length or distance between changes be as this is required by the wang method. We compare the four approaches on a set of 100 datasets with a change at . We consider two settings with the first case having a moderate value for (), and the second case having a larger value for (). Importantly in the second setting we should be closer to the asymptotic regime for our method as and are larger. For each dataset we computed the test statistic as well as the difference between the truth and the changepoint estimates for each method. Note that the Aue method is not computable for the case, and as a result is not included for this case. The results of this simulation can be seen in Figure B.7.
In both the small and large cases, the Galeano method clearly outperforms the other methods, providing the lowest changepoint error. Looking at Figure B.7 (top left), the Wang and Aue methods are poorly peaked indicating a change has not beed detected, while the Ratio method produces a partial peak indicating it is struggling to detect the change. These performances are reflected in the changepoint errors (B.7 top right) In the large setting (bottom left and right), both the Galeano and Ratio methods are well peaked, with the Galeano method producing the lower error. The wang method appears to partially locate the change but appears to be biased to the right and thus fails to localize the change. It is unclear what causes this bias. Finally, we note that the results in this setting differ from what we would expect based on Section 5.2. In particular, in that experiment we found that the Ratio method completely outperformed the Galeano approach. Again the reason for this disparity is unclear, however given that a greater range of changes are considered in the multiple change setting they may indicate that the Galeano approach works better on certain types of changes.
b.3 Impact of Mean Centering
The Ratio method assumes that data has mean zero ( or equivalently that the mean is known). This is unlikely to be true in practice and some mean centering will typically be required. When faced with data with an unknown stationary mean, we recommend centering the data by subtracting the sample mean. To evaluate the impact of this approach, we applied the Ratio method to datasets with known and unknown means, under both the null and alternate hypothesis. In particular, we consider two sets of dimensions, namely and . For each dimension set and hypothesis, we generated 1000 datasets and applied the Ratio method (note under then alternative ). Results from this analysis are shown in Table B.4. We can see that the mean centering, has a very marginal impact on the performance of the Ratio method.
n  p  hypothesis  mean  TPR  Change Estimate  

1  500.00  15.00  Alt  Mean Known  0.92  222.82 
2  500.00  15.00  Alt  Mean Unknown  0.94  225.70 
3  500.00  15.00  Null  Mean Known  0.04  238.88 
4  500.00  15.00  Null  Mean Unknown  0.02  240.38 
5  2500.00  100.00  Alt  Mean Known  1.00  1247.10 
6  2500.00  100.00  Alt  Mean Unknown  1.00  1247.14 
7  2500.00  100.00  Null  Mean Known  0.00  1159.56 
8  2500.00  100.00  Null  Mean Unknown  0.00  1187.74 
Appendix C Auxillary Results
The results in this section are required for the proof of Theorem 3.1.
Lemma C.1.
Let and be the real valued function
Then
where
Proof.
Firstly we have that
Then
Comments
There are no comments yet.