I Introduction
Detecting the changepoint from highdimensional streaming data is a fundamental problem in various applications such as video surveillance, sensor networks, and seismic events detection. In various scenarios, the change happens to the covariance structure and can be represented as a linear subspace. For example, the covariance matrix may shift from an identity matrix to a spiked covariance model [1, 2].
Given a sequence of observed vectors
where and is the signal dimension, there may be a changepoint time when the distribution of the data stream changes. Our goal is to detect this change as quickly as possible from streaming (sequentially obtained) data using online techniques. We are particularly interested in the structured change occurring in the signal covariance. We study two related settings: the emerging subspace, meaning that the change is a subspace emerging from a noisy background; and the switching subspace, meaning that the change is a switch in the direction of the subspace. The emerging subspace problem can arise, for instance, from coherent weak signal detection from seismic sensor arrays, and the switching subspace detection can be used for principal component analysis for streaming data. In these settings, the changes can be shown to be equivalent to a lowrank component added to the original covariance matrix. On the other hand, the switching subspace problem, as we will see, can be reduced to the emerging subspace problem, if we are willing to tolerate a performance loss. Therefore, we will focus on the analysis of the emerging subspace problem.
In this paper, we consider three detection procedures. We start with the exact CUSUM which is known to be optimum when we have complete knowledge of the pre and postchange statistics and parameters. Since the postchange parameters are usually unknown we propose two alternatives to deal with the case of unknown parameters. Specifically we consider the largest eigenvalue procedure where we use as test statistic the largest eigenvalue of the sample covariance matrix of the data contained within a sliding timewindow. This can be regarded as a straightforward extension of its offline counterpart
[1]. The second method which we call the SubspaceCUSUM uses the structure of the exact CUSUM but in place of the parameters that are known it uses their estimates which are computed using data within, again, a sliding window. We perform a theoretical analysis of the largest eigenvalue procedure. A similar analysis, which is far more complicated and extended, is postponed for a future article. The three algorithms are compared using simulations.
The rest of the paper is organized as follows. Section II details on the two problems of emerging and switching subspace and how they can be related. Section III presents the three sequential change detection procedures. In Section IV we develop theoretical bounds for the average run length and the expected detection delay of the largest eigenvalue procedure. In Section V we presents numerical results and comparisons for the competing algorithms. Finally Section VI contains our concluding remarks.
Ia Related Work
Classical approaches to covariance change detection usually consider generic settings without assuming any structure. The CUSUM statistics can be derived if the prechange and postchange distributions are known. For the multivariate case, the Hotelling control chart is the traditional way to detect the covariance change. The determinant of the sample covariance matrix was also used in [3] to detect change of the determinant of the covariance matrix. A multivariate CUSUM based on likelihood functions of multivariate Gaussian is studied in [4] but it only considers the covariance change from to for a constant . Offline change detection of covariance change from to is studied in [5] using the Schwarz information criterion [6], where the changepoint location must satisfy certain regularity condition to ensure the existence of the maximum likelihood estimator. In [7] we find a hypothesis testing approach to detect a shift in an offdiagonal submatrix of the covariance matrix using likelihood ratios. Recently, [8] studies a CUSUMlike procedure for detection of switching subspaces, when the distributions (as well as the subspaces) before and after the changepoint are exactly known; this is different from our work since we assume the subspace after the change is unknown.
The most related work to our present effort is the hypothesis testing methods developed in [1], which uses the largest eigenvalue of the sample covariance matrix to detect a sparse spiked covariance model given a fixed number of samples. The largest eigenvalue statistic is shown to be asymptotically minimax optimal for determining whether there exists a sparse and lowrank component in the offline setting. A natural sequential version of this idea is to use a sliding window and estimate the largest eigenvalue of the corresponding sample covariance matrix. However, this approach, under a sequential setting does not enjoy any form of (asymptotic) optimality.
A different test statistic, the socalled KacRice statistic [9]
, has been considered for testing spiked covariance model. The KacRice statistic is the conditional survival function of the largest observed singular value conditioned on all other observed singular values, and is characterized by a simple asymptotic distribution (uniform in
). However, the statistic involves the computation of an integral over the whole real line, and it is not clear how this can be carried over to the sequential formulation.Ii Problem Setup
We first introduce the spiked covariance model [2]
, which assumes that a small number of directions explain most of the variance. In particular, we consider the rankone spiked covariance matrix, which is given by
where denotes an identity matrix of size ; is the signal strength; represents a basis for the subspace with unit norm ; is the noise variance, which will be considered known since it can be estimated from training data^{1}^{1}1In fact it is possible to consider unknown as well and provide estimates of this parameter along with the necessary estimates of . However, in order to simplify our presentation, we decided to consider known.. The SignaltoNoise Ratio (SNR) is defined as .
Formally, the emerging subspace problem can be cast as follows:
(1) 
where is the unknown changepoint that we would like to detect from data that are acquired sequentially.
Similarly, the switching subspace problem can be formulated as follows:
(2) 
where represent bases for the subspaces before and after the change, with and is considered known. In both settings, our goal is to detect the change as quickly as possible.
The switching subspace problem (2) can be reduced into the emerging subspace problem (1) by a simple data projection. Specifically, select any orthonormal matrix such that
which means that all rows of are orthogonal to , they are orthogonal to each other and have unit norm. Then, using the matrix , we project each observation onto a dimensional space and obtain a new sequence
Then is a zeromean random vector with covariance matrix before the change and after the change. Let , and
Thus, problem (2) can be reduced to the following
(3) 
Note that this way the switching subspace problem is reduced into the emerging subspace problem, where the new signal power depends on the angle between and , which is consistent with our intuition.
We would like to emphasize that by projecting the observations onto a lower dimensional space we lose information, suggesting that the two versions of the problem are not equivalent. Indeed, the optimum detector for the transformed data in (3) and the one for the original data in (2) do not coincide. This can be easily verified by computing the corresponding CUSUM tests and their optimum performance. Despite this difference, it is clear that with the proposed approach we put both problems under the same framework, offering, as we will see, computationally simple methods to solve the original problem in (2). Consequently, in the following analysis, we focus solely on problem (1).
Iii Detection Procedures
As we mentioned before, we consider three methods: The exact CUSUM procedure where all parameters are considered known, the largest eigenvalue procedure, and the SubspaceCUSUM procedure. It is clear that since CUSUM is optimum it will be regarded as a point of reference for the other two approaches. We first introduce some necessary notation. Denote with and
the probability and expectation induced when there is a changepoint at the
deterministic time . Under this definition and is the probability and the expectation under the nominal regime (change never happens) while and the probability and expectation under the alternative regime (change happens before we take any data).Iiia Optimal CUSUM Procedure
The CUSUM procedure [10, 11] is the most popular sequential test for change detection. When the observations are i.i.d. before and after the change, CUSUM is known to be exactly optimum [12] in the sense that it solves a very well defined constrained optimization problem introduced in [13]. However, the CUSUM procedure can be applied only when we have exact knowledge of the pre and postchange distributions. Thus, for our problem, it requires complete specification of all parameters namely the subspace , noise power and SNR .
To derive the CUSUM procedure, let
denote the pre and postchange probability density function (pdf) of the observations. Then the CUSUM statistics is defined by maximizing the loglikelihood ratio statistic over all possible changepoint locations
(4) 
which has the recursive implementation
(5) 
that enables its efficient calculation [12]. The CUSUM stopping time in turn is defined as
(6) 
where is a threshold selected to meet a suitable false alarm constraint. For our problem of interest we can derive that
(7) 
The second equality is due to the matrix inversion lemma [14] that allows us to write
which, after substitution into the equation, yields the desired result. Note that the multiplicative factor is positive, so we can omit it from the loglikelihood ratio when forming the CUSUM statistic. This leads to
(8) 
Remark 1.
We can show that the increment in (8), i.e.,
has the following property: its expected value is negative under the prechange and positive under the postchange probability measure. The proof relies on a simple argument based on Jensen’s inequality. Due to this property, before the change, the CUSUM statistics will oscillate near while it will exhibit, on average, a positive linear drift after the occurrence of the change forcing it, eventually, to hit or exceed the threshold.
IiiB Largest Eigenvalue Procedure
Motivated by the offline test in [1], a natural strategy to detect the change is to use the largest eigenvalue of the sample covariance matrix. Under the sequential setting, we adopt a sliding window approach and form the sample covariance matrix using observations that lie within a time window of length . For each time , the unnormalized sample covariance matrix using the available samples is given by
(9) 
We note that for the matrix contains a single outer product and as time progresses the number of outer products increases linearly until it reaches . After this point, namely for , the number of outer products remains equal to .
Let denote the largest eigenvalue of a symmetric matrix . We define the largest eigenvalue procedure, as the one that stops according to the following rule:
(10) 
where is a constant threshold selected to meet a suitable false alarm constraint. We need to emphasize that we do not divide by when forming the unnormalized sample covariance matrix. As we explain in Section IIID, it is better for to always divide by instead of . Consequently, we can omit the normalization with from our detection statistics by absorbing it into the threshold.
IiiC SubspaceCUSUM Procedure
Usually the subspace and SNR are unknown. In this case it is impossible to form the exact CUSUM statistic depicted in (8). One option is to estimate the unknown parameters and substitute them back into the likelihood function. Here we propose to estimate only and call which leads to the following SubspaceCUSUM update
(11) 
We denote the estimator of as . This is because at time the estimate will rely on the data that are in the “future” of . Practically, this is always possible by properly delaying our data by samples. Stopping occurs similarly to CUSUM, that is
Of course, in order to be fair, at the time of stopping we must make the appropriate correction, namely if exceeds the threshold at for the first time, then the actual stopping takes place at . The reason we use estimates based on “future” data is to make and independent which in turn will help us decide what is the appropriate choice for the drift constant .
For the drift parameter we need the following double inequality to be true
(12) 
With (12) we can guarantee that mimics the behavior of the exact CUSUM statistic mentioned in Remark 1, namely, it exhibits a negative drift before and a positive after the change. To apply (11), we need to specify and of course provide the estimate . Regarding the latter we simply use the unitnormeigenvector corresponding to the largest eigenvalue of depicted in (9). As we mentioned, the main advantage of using is that it provides estimates that are independent from . This independence property allows for the straightforward computation of the two expectations in (12) and contributes towards the proper selection of . Note that under the prechange distribution we can write
(13) 
where the first equation is due to the independence of and , the next one due to having covariance and the last equality due to being of unit norm.
Under the postchange regime, we need to specify the statistical behavior of for the computation of . We will assume that the window size
is sufficiently large so that Central Limit Theorem (CLT) approximations
[15, 16] are possible for . The required result appears in the next lemma.Lemma 1.
Suppose vectors are of dimension and follow the distribution . Let be the eigenvector corresponding to the largest eigenvalue of the sample covariance matrix , then, as , we have the following CLT version for
Proof.
The proof is detailed in the Appendix. ∎
Lemma 1 provides an asymptotic statistical description of the unnormalized estimate of . More precisely it characterizes the estimation error . In our case we estimate the eigenvector from the matrix but, as mentioned before, we adopt a normalized (unit norm) version . Therefore if we fix at a sufficiently large value and denotes the estimation error of the unnormalized estimate at time then, from Lemma 1, we can deduce
(14) 
Note that is also independent from and orthogonal to , the latter being true because the covariance matrix of is . This implies . Combining the above results, we have
Because , the above expression simplifies to
(15) 
For the two expectations in (15), using the Gaussian approximation from Lemma 1, we have
and
Consequently
(16) 
with the term being negligible compared to the other two when .
Consider now the case where is unknown but exceeds some preset minimal SNR . From the above derivation, given the worstcase SNR and an estimation for the noise variance , we can give a lower bound for . Consequently, the drift can be anything between and where, we observe, that the latter quantity exceeds when . Below, for simplicity, for we use the average of the two bounds.
Alternatively, and in particular when does not satisfy , we can estimate by Monto Carlo simulation. This method requires: (i) estimating the noise level , which can be obtained from training data without a changepoint; (ii) the preset worstcase SNR ; (iii) a unit norm vector that is generated randomly. Under the nominal regime we have . Under the alternative depends only on the SNR as shown in (16). We can therefore simulate the worstcase scenario using the randomly generated vector by generating samples from the distribution .
Even though the average of the update in (11) does not depend on , the computation of the test statistic requires the estimate
of the eigenvector. This can be accomplished by applying the singular value decomposition (SVD) (or the power method
[17]) on the unnormalized sample covariance matrix .Remark 2.
An alternative possibility is to use the generalized likelihood ratio (GLR) statistic, where both and are estimated for each possible change location . The GLR statistic is
where , are estimated from samples . However, this computation is more intensive since there is no recursive implementation for the GLR statistic, furthermore it requires growing memory^{2}^{2}2There are finite memory versions [18] which, unfortunately, are equally complicated in their implementation.. Therefore, we do not consider the GLR statistic in this paper.
IiiD Calibration
To fairly compare the detection procedures discussed in the previous section we need to properly calibrate them. Clearly the calibration process must be consistent with the performance measure we are interested in. It is exactly this point we are discussing next.
For a given stopping time we measure false alarms through the Average Run Length (ARL) expressed with . For the detection capability of we use the (worstcase) Expected Detection Delay (EDD) proposed by Lorden [13]
(17) 
which considers the worst possible data before the change (expressed through the ) and the worst possible changetime .
We now consider scenarios that lead to the worstcase detection delay. For the largest eigenvalue procedure, assume . Since for the detection we use and compare it to a threshold, it is clear that the worstcase data before are the ones that will make as small as possible. We observe that
(18) 
which corresponds to the data , before the change, being all equal to zero. In fact, the worstcase scenario at any time instant is equivalent to forgetting all data before and including and restarting the procedure from using, initially, one, then two, etc. outer products in the unnormalized sample covariance matrix, exactly as we do when we start at time 0. Due to stationarity, this suggests that we can limit ourselves to the case and compute and this will constitute the worstcase EDD. Furthermore, the fact that in the beginning we do not normalize with the number of outer products, is beneficial for since it improves its ARL.
We should emphasize that if we do not force the data before the change to become zero and use simulations to evaluate the detector with a change occurring at some time different from 0, then it is possible to arrive at misleading conclusions. Indeed, it is not uncommon this test to appear outperforming the exact CUSUM test for low ARL values. Of course this is impossible since the exact CUSUM is optimum for any ARL in the sense that it minimizes the worstcase EDD depicted in (17).
Let us now consider the worstcase scenario for SubspaceCUSUM. We observe that
suggesting that when restarts this is the worst it can happen for the detection delay. We therefore understand that the wellknown property of the worstcase scenario in the exact CUSUM carries over to SubspaceCUSUM. Again, because of stationarity, this allows us to fix the changetime at . Of course, as mentioned before, because uses data coming from the future of , if our detector stops at some time (namely when for the first time we experience ) then the actual time of stopping must be corrected to . A similar correction is not necessary for CUSUM because this test has the exact information for all parameters.
Threshold is chosen so that the ARL meets a prespecified value. In practice, is determined by simulation. A very convenient tool in accelerating the estimation of ARL (which is usually large) is the usage of the following formula that connects the ARL of CUSUM to the average of the SPRT stopping time [11]
(19) 
where the SPRT stopping time is defined as
The validity of this formula relies on the CUSUM property that after each restart, is independent from the data before the time of the restart. Unfortunately this key characteristic is no longer true in the proposed SubspaceCUSUM scheme due to the fact that uses data from the future of . We could, however, argue that this dependence is weak. Indeed, as we have seen in Lemma 1, each is basically equal to plus some small random perturbation (estimation error with power of the order of ), with these perturbations being practically independent in time. As we observed with numerous simulations, estimating the ARL directly and through (19) (with replaced by ), results in almost indistinguishable values even for moderate window sizes . This suggests that we can use (19) to estimate the ARL of the SubspaceCUSUM as well. As we mentioned, in the final result we need to add to account for the future data used by the estimate .
Iv Analysis of the Largest Eigenvalue Procedure
It is clear that, in this work we are interested in promoting the SubspaceCUSUM detection procedure for the change detection problem of interest. Therefore, it would have been very supportive to this method to offer a theoretical analysis and derive formulas for the corresponding ARL and EDD. Even though such an analysis is possible it is unfortunately overly lengthy, for this reason, we postpone its presentation for a future publication. In this section we intend to characterize the ARL and EDD of the largest eigenvalue procedure which turns out to be simpler. In doing so we will also introduce some of the mathematical tools we are going to use in the (future) analysis of the SubspaceCUSUM.
Iva Link with Random Matrix Theory
Since the study of ARL requires the understanding of the property of the largest eigenvalue under the null, i.e., the samples are i.i.d. Gaussian random vectors with zeromean and identity covariance matrix, we first review some related results from random matrix theory.
There has been an extensive literature on the distribution of the largest eigenvalue of the sample covariance matrix, see, e.g., [2, 19, 20]. There are two kinds of results typically available for eigenvalue distributions: the socalled bulk [21], which treats a continuum of eigenvalues, and the extremes, which are the (first few) largest and smallest eigenvalues. Assume there are samples which are dimensional Gaussian random vectors with zeromean and identity covariance matrix. Let denote the unnormalized sample covariance matrix. If , the largest eigenvalue of the sample covariance matrix converges to almost surely [22]. To characterize the distribution of the largest eigenvalue, [2] uses the TracyWidom law [23]. Define the center and scaling constants
(20)  
If
, then the centered and scaled largest eigenvalue converges in distribution to a random variable
with the socalled TracyWidom law of order one [2]:(21) 
The TracyWidom law can be described in terms of a partial differential equation and the Airy function, and its tail can be computed numerically (using for example the Rpackage
RMTstat).IvB Approximation of ARL Ignoring Temporal Correlation
If we ignore the temporal correlation of the largest eigenvalues produced by the sliding window, we can obtain a simple approximation for the ARL. If we call for then the probability to stop at is geometric and it is easy to see that the ARL can be expressed as . Clearly, to obtain this result we must assume that for as well, which is clearly not true. Since for the unnormalized sample covariance has less than terms, the corresponding probability is smaller than . This suggests that is actually a lower bound to the ARL while an upper bound. If then approximating the ARL with is quite acceptable. We can use the TracyWidom law to obtain an asymptotic expression relating the ARL with the threshold . The desired formula is depicted in the following proposition.
Proposition 1 (Approximation of ARL by ignoring temporal correlation).
For any we have , if we select
(22) 
where denotes the upperpercentage point of namely .
Proof.
The proof is straightforward and therefore ommitted. ∎
IvC Approximation of ARL Including Temporal Correlation
Now we aim to capture the temporal correlation between detection statistics due to overlapping time windows. We leverage a proof technique developed in [24], which can obtain satisfactory approximation for the tail probability of the maximum of a random field. For each , define
(23) 
Fig. 1 illustrates the overlap of two sample covariance matrices and provides necessary notation.
We note that for any given ,
which is the max over a set of correlated variables . Capturing the temporal dependence of is challenging. For our analysis we recall Pearson’s correlation between two random variables as
We then have the following lemma that addresses the problem of interest.
Lemma 2 (Approximation of local correlation).
Let
(24) 
where and . Then when and ,
(25) 
Proof.
The proof is given in the Appendix. ∎
By leveraging the properties of the local approximation in (25), we can obtain an asymptotic approximation using the localization theorem [25, 24]. Define a special function which is closely related to the Laplace transform of the overshoot over the boundary of a random walk [26]:
where and
are the pdf and cdf of the standard normal distribution
.Proposition 2 (ARL with temporal correlation).
For large values of we can write
(26) 
where
Proof.
The proof is detailed in the Appendix. ∎
We perform simulations to verify the accuracy of the threshold value obtained without and with considering the temporal correlation (Proposition 1 and Proposition 2, respectively). The results are shown in Table I. We find that, indeed, the threshold, when temporal correlation (26) is taken into account, is more accurate than its counterpart obtained by using the TracyWidom law (22).
Target ARL  5k  10k  20k  30k  40k  50k 

Simulation  1.633  1.661  1.688  1.702  1.713  1.722 
Approx (22)  1.738  1.763  1.787  1.800  1.809  1.816 
Approx (26)  1.699  1.713  1.727  1.735  1.740  1.744 
IvD Lower Bound on EDD using Marginal Power
We now focus on the detection performance and present a tight lower bound for the EDD of the largest eigenvalue procedure.
Proposition 3.
For large values of we have
(27) 
where
Proof.
The proof is based on a known result for CUSUM [11]
and requires the derivation of the KullbackLeibler divergence for our problem. Details are given in the Appendix. ∎
V Numerical examples
In this section, numerical results are presented to compare the three detection procedures. The tests are first applied to synthetic data and the performance of the SubspaceCUSUM and largest eigenvalue test are compared against the CUSUM optimum performance. Then the performance of SubspaceCUSUM is optimized by selecting the most appropriate window size.
Va Performance Comparison
We perform simulations to compare the largest eigenvalue procedure, the SubspaceCUSUM procedure, and the exact CUSUM procedure. The threshold for each procedure is determined by MonteCarlo simulation, as discussed in Section IIID. Fig. 3 depicts EDD versus logARL for parameter values , , and window length . The black line corresponds to the exact CUSUM procedure, which is clearly the best and it lies below the other curves. SubspaceCUSUM has always smaller EDD than the largest eigenvalue procedure and the difference increases with increasing ARL.
VB Optimal window size.
We also consider the EDD/ARL curve where is optimized to minimize the detection delay at every ARL. We first compute the EDD for window sizes given each ARL value. Then we plot in Fig. 4 the lower envelope of EDDs corresponding to the optimal EDD achieved by varying . We also plot the optimal value of as a function of ARL in Fig. 5. Even though the best EDD of the SubspaceCUSUM is diverging from the performance enjoyed by CUSUM this divergence we believe is slower than the increase of the optimum CUSUM EDD. One of the goals in the future publication regarding the analysis of SubspaceCUSUM is to show that this is indeed the case, which in turn will demonstrate that this detection structure is firstorder asymptotically optimum.
Vi Conclusion
In this paper, we considered three detection procedures for the rankone change in the covariance matrix: the largest eigenvalue procedure, the exact CUSUM procedure, and the SubspaceCUSUM procedure. For SubspaceCUSUM we perform a simultaneous estimate of the required subspace in parallel with its sequential detection. We avoid estimating all unknown parameters by following a worstcase analysis with respect to the subspace power. We were able to derive theoretical expressions for the ARL and an interesting lower bound for the EDD of the largest eigenvalue procedure. In particular we were able to handle the correlations resulting from the usage of a sliding window which is an issue that is not present in the offline version of the same procedure. For the comparisons of the three competing detectors we discuss how it is necessary to calibrate each detector so that comparisons are fair. Comparisons were performed using simulated data and SubspaceCUSUM was found to exhibit a significantly better performance than the largest eigenvalue procedure. Ongoing work involves establishing firstorder asymptotic optimality of the SubspaceCUSUM procedure by determining the optimal drift parameter and by relating the sliding window length to the desired ARL.
Acknowledgment
The work of Liyan Xie and Yao Xie was supported by the US National Science Foundation under Grants CCF 1442635, CMMI 1538746, DMS 1830210, and the Career Award CCF 1650913. The work of George Moustakides was supported by the US National Science Foundation under Grant CIF 1513373, through Rutgers University.
References
 [1] Q. Berthet, P. Rigollet et al., “Optimal detection of sparse principal components in high dimension,” The Annals of Statistics, vol. 41, no. 4, pp. 1780–1815, 2013.
 [2] I. M. Johnstone, “On the distribution of the largest eigenvalue in principal components analysis,” Annals of statistics, pp. 295–327, 2001.
 [3] F. B. Alt, “Multivariate quality control,” Encyclopedia of Statistical Sciences, 2004.
 [4] J. D. Healy, “A note on multivariate cusum procedures,” Technometrics, vol. 29, no. 4, pp. 409–412, 1987.
 [5] J. Chen and A. Gupta, “Statistical inference of covariance change points in gaussian model,” Statistics, vol. 38, no. 1, pp. 17–28, 2004.
 [6] G. Schwarz et al., “Estimating the dimension of a model,” The annals of statistics, vol. 6, no. 2, pp. 461–464, 1978.
 [7] E. AriasCastro, S. Bubeck, G. Lugosi et al., “Detection of correlations,” The Annals of Statistics, vol. 40, no. 1, pp. 412–435, 2012.
 [8] Y. Jiao, Y. Chen, and Y. Gu, “Subspace changepoint detection: A new model and solution,” IEEE Journal of Selected Topics in Signal Processing, 2018.
 [9] J. Taylor, J. Loftus, and R. Tibshrani, “Regression shrinkage and selection via the lasso,” J. Roy. Statist. Soc. Ser. B, vol. 58, pp. 267–288, 1996.
 [10] E. S. Page, “Continuous inspection schemes,” Biometrika, vol. 41, no. 1/2, pp. 100–115, 1954.

[11]
D. Siegmund,
Sequential analysis: tests and confidence intervals
. Springer Science & Business Media, 1985.  [12] G. V. Moustakides, “Optimal stopping times for detecting changes in distributions,” The Annals of Statistics, pp. 1379–1387, 1986.
 [13] G. Lorden, “Procedures for reacting to a change in distribution,” Annals of Mathematical Statistics, pp. 1897–1908, 1971.
 [14] M. A. Woodbury, “Inverting modified matrices,” Memorandum report, vol. 42, no. 106, p. 336, 1950.
 [15] T. W. Anderson, “Asymptotic theory for principal component analysis,” The Annals of Mathematical Statistics, vol. 34, no. 1, pp. 122–148, 1963.
 [16] D. Paul, “Asymptotics of sample eigenstructure for a large dimensional spiked covariance model,” Statistica Sinica, pp. 1617–1642, 2007.
 [17] R. Mises and H. PollaczekGeiringer, “Praktische verfahren der gleichungsauflösung.” ZAMMJournal of Applied Mathematics and Mechanics/Zeitschrift für Angewandte Mathematik und Mechanik, vol. 9, no. 1, pp. 58–77, 1929.
 [18] T. L. Lai and J. Z. Shan, “Efficient recursive algorithms for detection of abrupt changes in signals and control systems,” IEEE Transactions on Automatic Control, vol. 44, no. 5, pp. 952–966, 1999.
 [19] Y.Q. Yin, Z.D. Bai, and P. R. Krishnaiah, “On the limit of the largest eigenvalue of the large dimensional sample covariance matrix,” Probability theory and related fields, vol. 78, no. 4, pp. 509–521, 1988.

[20]
J. Baik and J. W. Silverstein, “Eigenvalues of large sample covariance
matrices of spiked population models,”
Journal of Multivariate Analysis
, vol. 97, no. 6, pp. 1382–1408, 2006.  [21] A. Edelman and Y. Wang, “Random matrix theory and its innovative applications,” in Advances in Applied Mathematics, Modeling, and Computational Science. Springer, 2013, pp. 91–116.
 [22] S. Geman, “A limit theorem for the norm of random matrices,” The Annals of Probability, pp. 252–261, 1980.
 [23] C. Tracy and H. Widom, “On orthogonal and symplectic matrix ensembles,” Comm. Math. Phys., vol. 177, pp. 727–754, 1996.
 [24] D. Siegmund, B. Yakir, and N. Zhang, “Tail approximations for maxima of random fields by likelihood ratio transformations,” Sequential Analysis, vol. 29, no. 3, pp. 245–262, 2010.
 [25] ——, “Tail approximations for maxima of random fields by likelihood ratio transformations,” Sequential Analysis, vol. 29, no. 3, pp. 245–262, 2010.
 [26] D. Siegmund and B. Yakir, The statistics of gene mapping. Springer Science & Business Media, 2007.
 [27] S. Li, Y. Xie, H. Dai, and L. Song, “Mstatistic for kernel changepoint detection,” in Advances in Neural Information Processing Systems, 2015, pp. 3366–3374.
 [28] O. Guédon and R. Vershynin, “Community detection in sparse networks via grothendieck’s inequality,” Probability Theory and Related Fields, vol. 165, no. 34, pp. 1025–1049, 2016.
Comments
There are no comments yet.