 # Offline detection of change-points in the mean for stationary graph signals

This paper addresses the problem of segmenting a stream of graph signals: we aim to detect changes in the mean of the multivariate signal defined over the nodes of a known graph. We propose an offline algorithm that relies on the concept of graph signal stationarity and allows the convenient translation of the problem from the original vertex domain to the spectral domain (Graph Fourier Transform), where it is much easier to solve. Although the obtained spectral representation is sparse in real applications, to the best of our knowledge this property has not been much exploited in the existing related literature. Our main contribution is a change-point detection algorithm that adopts a model selection perspective, which takes into account the sparsity of the spectral representation and determines automatically the number of change-points. Our detector comes with a proof of a non-asymptotic oracle inequality, numerical experiments demonstrate the validity of our method.

## Authors

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

One of the most common tasks in Signal Processing is segmentation. Identifying time intervals where a signal is homogeneous is a strategy to uncover latent features of its source. The signal segmentation problem can be restated as change-point detection task: delimiting a segment consists on fixing the timestamps where it starts and ends. This subject has being extensively investigated leading to a vast literature and applications in many domains including computer science, finance, medicine, geology, meteorology, etc. The majority of the work done so far in signal segmentation focuses on temporal signals. Baseville1993 ; Balzano2010 ; Chen2012 ; Tartakovsky2014 ; Aminikhanghahi2016 ; truong2020 .

In this work we will study a different kind of object: graph signals appearing as a stream. In general terms, a graph signal is a function defined over the nodes of a given graph. Intuitively, the graph partially encodes the variability of the function: nodes that are connected will take similar values. This applies to real situations, for instance, contacts in social networks would share similar tastes; two neighboring sensors in a sensor network would provide similar measurements. Moreover, this behavior is not evident only in the case when the graph is explicitly given. In some applications, the graph itself has to be inferred and most algorithms are built over this local similarity property that corresponds to signal smoothness. This can be seen in graphical models or networks used to approximate manifolds Perraudin2017 ; Friedman2007 ; LeBars2019arxiv ; Tenenbaum2000 .

It is true that there is a plethora of change-point detectors, nevertheless, despite their many applications in different contexts, the development of detectors specifically designed for graph signals is still limited in the literature Balzano2010 ; Angelosante2011 ; Chen2018 ; LeBars2019arxiv . To the best of our knowledge, the existing methods do not yet take into account the interplay between the signal and the graph structure. The main contribution of this article is an offline change-point detector aiming to spot jumps in the mean of a SGS. Our algorithm leverages many of the techniques developed in Graph Signal Processing (GSP), a relatively new field aiming to generalize the tools commonly used in classical Signal Processing Shuman2013 ; Ortega2018 . More specifically, our algorithm depends on the concept of Graph Fourier Transform (GFT) that, similarly to the usual Fourier Transform, induces a spectral domain and a sparse representation of the signal. The main idea behind our approach is to translate the problem from the vertex domain to the spectral domain, and design a change-point detector operating in this space that accounts for the sparsity of the data and and automatically infers the number of change-points. This is done by adding two penalization terms: a penalization term aiming to recover the sparsity and another one penalizing models with a high number of change-points. The performance of the algorithm and the design of this penalization terms are based on the framework introduced in Birge2001 and the innovative perspective of the norm analyzed in Massart2011 .

The organization of the paper is as follows. In Sec. 2 we present basic definitions and tools that will be used in the rest of the paper. In Sec. 3 we formulate the change-point detection problem in the context of graph signals and we propose a Lasso-based change-point detection algorithm. In Sec. 4 we provide theoretical guarantees for the algorithms we introduce in the previous section and, finally, in Sec. 5 we test our method in experiments on properly generated synthetic data.

## 2 Basic concepts and notations

In this section we introduce notations and key concepts. Let and denote the -th row and -th column of matrix , respectively. and stand for transpose and the conjugate transpose (i.e. transpose with negative imaginary part) of matrix . denotes the

-th entry of vector

, represents the observed vector at time and stands for the GFT of , which is introduced in Definition 3. A graph is defined by an ordered tuple , where and stand for the vertices and edges sets respectively and is the number of graph nodes.

###### Definition 1.

A graph signal is a tuple , where and is a function .

###### Definition 2.

A graph shift operator (GSO) associated with a graph , is a matrix whose entry iff or

, and it admits an eigenvector decomposition

.

###### Definition 3.

For a given GSO associated with a graph , the Graph Fourier Transform (GFT) of a graph signal is defined as .

The frequencies of the GFT correspond to the elements of the diagonal matrix , that is . Moreover, the eigenvectors provide also an orthogonal basis for the graph signals defined over the graph . Finally, the GFT is the basic tool that allows us to translate operations from the vertex domain to the spectral domain Sandryhaila2013 .

The graph signal stationarity over the vertex domain is a property aiming to formalize the notion that the graph structure explains to a large degree the inter-dependencies observed in a graph signal. For the rest of the paper, we refer to stationary with respect to a GSO . The definitions and the properties that are listed bellow can be found in Marques2017 ; Perraudin2017 .

###### Definition 4.

Stationarity with respect to the vertex domain: Given a normal GSO , a zero-mean graph signal with covariance matrix is stationary with respect to the vertex domain encoded by , iff and are simultaneously diagonalizable, i.e. . The vector is known as the graph power spectral density (PSD).

The following two properties are used in the derivation of our change-point detection algorithm, in the generation of the synthetic scenarios and the estimation of

.

###### Property 1.

Let be a stationary graph signal respect to , then , which means that the GFT of will have a covariance matrix .

###### Property 2.

Let be a stationary graph signal with covariance matrix and PSD . The output of a graph filter , with a frequency response , applied to the graph signal is and has the following properties:

1. It is stationary on with covariance .

2. .

## 3 Change-point detection for a stream of graph signals

Problem formulation. Suppose we observe a multivariate time series , where and let its mean value be . We suppose that there is an ordered set of change-points, with and . The elements of define the following set of matrices:

 Fτ={μ∈R T×p | μτl−1+1=...=μτl}. (1)

We additionally suppose that the elements of the time series are graph signals defined over the same graph , that is a stream of graph signals (SGS). Our goal is to infer the set of change-points , and the set of parameters .

We make the following hypotheses over the SGS:

1. [leftmargin=2em]

2. The graph signals are i.i.d. with respect to the temporal domain.

3. The graph signals follow a multivariate normal distribution.

4. If , then is stationary with respect to the GSO . This derives from the stationarity of itself.

5. The graphs signal admit a sparse representation with respect to the basis defined by the eigenvectors of . That is, it exists such that for all , where is the complement of set , .

6. is a normal matrix with all its eigenvalues different, and

remains constant throughout the observation time-horizon.

The problem is illustrated in Fig. 1 through an example where we can identify four different segments, i.e. change-points. Figure 1: An example stream of graph signals (SGS) with four change-points in the mean (according our problem formulation we take into account the end of the sequence as change-point). Successive segments have different color. The color of the graph nodes represents the mean of the signal during the first segment of the observed signal. The signal observed at each node evolves through time as shown in the line plots next to the nodes. At some timestamps the mean of the graph signal exhibits a change in a subset of the nodes. The change-points signify changes in the spectral representation of the signals.

As we suppose that does not change over time, the stationarity of the graph signals with respect to the graph implies that the covariance matrix remains unchanged too. Then the average log-likelihood of the SGS can be written as:

 L(μ,τ)=−Dτ∑l=1τl∑t=τl−1+1p∑i=1⎡⎢⎣(^y(i)t−^μ(i)τl)22TP(i)y+logP(i)y2T⎤⎥⎦. (2)

This formulation can be seen as a way to translate the signal from the vertex domain to the spectral domain, where the sample becomes independent to the graph structure according to Property 1.

Penalized cost function for an SGS with sparse GFT representation. The log-likelihood of the SGS can be used to define the cost function to minimize in order to detect the change-points. Since many graph signals observed in real applications can be accurately approximated by a subset of Graph Fourier frequencies, it is necessary to further account for this feature in the means Perraudin2017 ; Marques2017 ; Huang2016 of each segment. This justifies adding an penalization term in the formulation of the problem. Furthermore, to address the issue that the number of change-points can also be unknown, we also add a penalization term .

The overall optimization problem for the change-point detection is written as:

 (^d,^τ(^d),^~μ^τ(^d)) =(^d,{^τ0,^τ1,...,^τd},{^~μ0,^~μ1,...,^~μd}) (3) :=argmind∈{1,...,T}argminτ∈TdTargmin~μτ1,...,~μlCT(τ,~μ,~Y)+pen(d) =argmind∈{1,...,T}argminτ∈TdTd∑l=1⎧⎨⎩argmin~μτ1,...,~μτd⎡⎣τl∑t=τl−1+1p∑i=1(~y(i)t−~μ(i)τl)2T(P(i)y) +λl∑pi=1Il|~μ(i)τl|T⎤⎦⎫⎬⎭+dT(c1+c2log(Td)),

where represents the penalized least squares cost function, is the penalization constant leading to the desired sparsity of the GFT that a priori is segment-specific, denotes the length of the -segment and is the set of all possible segments of the set of size .

Problem 3 requires estimating the GFT of the mean of the graph signals that remains segment-wise constant. The separability of the cost function implies that this parameter depends just on the observations belonging to each of the segments delimited by the change-points. Moreover, this formulation leads to a closed-form solution for :

 ¯~μ(i)τk=sign⎛⎜⎝∑τlt=τl−1+1~y(i)tIl⎞⎟⎠max⎛⎜⎝∣∣ ∣∣∑τlt=τl−1+1~y(i)tIl∣∣ ∣∣−λlP(i)y2,0⎞⎟⎠. (4)

Thanks to this formulation, it is easy to see how we can find the precise change-points using dynamic programming. The final method can be found in Alg. 1.

Choosing the right constants , , . Even if Problem 3 is easy to solve, it requires to set the parameter, related with the sparsity of the graph signals, and a penalization term that would allow us to infer the number of change-points. This problem is not trivial since the number of possible solutions depends on the time-horizon and the number of nodes

; this feature hinder an asymptotic analysis. We require penalization terms that have good performance in practice and depend on

and . Following the model selection approach, we can obtain an oracle type inequality for the estimators and . Nevertheless such analysis allows us to only infer the shape of depending of unknown constants , and a lower bound for . These elements are not enough to use the method in practice, therefore we propose the alternative Alg. 2, which is further detailed in Sec. 4.

Both algorithms require the knowledge of the PSD of the SGS. We can estimate that via a maximum likelihood approach on observations belonging to a segment, where we know the graph signals share the same mean. However, it has been shown that the variance of the maximum likelihood estimator requires too many observations before achieving a good approximation. The estimator proposed by

Perraudin2017 requires a smaller number of samples and its computation scales with the number of connections in the graph (sparse in most applications). The idea of the estimator is based on Property 2: once the vertex domain of a stationary graph signal is known, it is possible to use different filters to focus on different regions of the graph, and then use this information to reconstruct the PSD.

## 4 Model selection approach

The problem of detecting a change in the mean of an SGS can be written as a generalized linear Gaussian model after preprocessing the data and the hypothesis of normality. With regards to the preprocessing, we will detect the change-points over the GFT of the SGS that is instead of , we suppose that we have standardized such that the variance of all the GFT coefficients is . Under the aforementioned conditions, we define as follows an isonormal process , where

is a matrix whose rows follow a centered multivariate Gaussian distribution with covariance matrix

. The generalized Gaussian process related to the SGS can be written as:

 ^Yϵ(~μ)=tr(~μ∗T~μ)T+ϵW(~μ). (6)

This formulation enables us to use techniques from the model selection literature Massart2003 in order to design the penalized term related with the number of change-points and derivate oracle-type inequalities for the performance of the proposed estimators described in Alg. 1 and Alg. 2.

Theorem 1 is an oracle inequality that provides insights on how Alg. 2 will behave with respect to the time-horizon and the size of the network . Furthermore, it gives us a guideline towards choosing and the number of change-points in order to minimize the penalized mean-squared criteria, which is one of the differences of our work to the change-point detection algorithms analyzed in Lebarbier2003 ; Arlot2019 that are based in model selection too, but they focus on mean-squared criteria.

###### Theorem 1.

Assume that:

 λl=λ≥(3√2)ϵ√logp+LT   and   pen(Dτ)=DτT(c1+c2log(TDτ)), (7)

where , and is such that . Then, there exists an absolute constant such that

 E⎡⎢⎣∥∥^~μ^τ−~μ∗∥∥2FT+λ∥∥^~μ^τ∥∥[^τ]+pen(^d)⎤⎥⎦≤C(K)⎡⎢ ⎢⎣(infτ∈T(inf~μ∈Fτ∥~μ∥[τ]<+∞∥~μ−~μ∗∥2FT+λ∥~μ∥[τ]) (8) +pen(Dτ))+2λϵ+(1+1(eγ−1)(e−1))ϵ2],

where , is the set of all possible segmentations of the SGS of length , and is a given constant.

The proof, which can be found in the supplementary material, follows similar arguments to Massart2011 . Specifically, it requires first to define the set of models of our interest. In this case, the list of candidate models is a list indexed by the possible segmentations and balls of length , where .

The following lemma is a direct consequence of Corollary 4.3 in Giraud2015 .

###### Lemma 1.

For any , the solution to Problem 3 estimator with tuning parameter

 λ=3ϵ√2(logp+L), (9)

fulfills with probability at least

the risk bound:

 ∥∥^~μ^τ−~μ∗∥∥2FT≤Dτ∑l=1τl∑t=τl−1+1inf~μ≠0,~μ∈Rp∥~μ−~μ∗t∥22T+18ϵ2(L+logp)TΦ(~μ)2∥~μ∥0, (10)

where is known in the literature as compatibility constant.

Both results, Theorem 1 and Lemma 1, provide details of the performance of the algorithm, when applied in practice, with respect to . Theorem 1 concludes that the value of should be the same for all the segments, while Lemma 1 relates the value of with the sparsity of the signal. We can see that there is a trade-off between the performance of the estimator and its ability to recover the sparsity of the signal: From one side, we need a low value of in order to reduce the bias of the estimator (see Ineq. 8), while from the other side we need a higher value of that will allow us to recover the sparsity of the signal with a higher probability Ineq. 10.

Theorem 1 provides lower bounds for the values of and . Nevertheless, in practice, when fixing and at these values, the obtained results were not satisfying. Finding the right constants in model selection is a common difficult problem Arlot2019_b

. In some cases, it is possible to use a technique called slope heuristic which recovers the constants using a linear regression of the empirical risk against the elements of a penalization term. However, the curve defined by the cost function including the

term does not tend to remain constant as the number of change-points increases, a feature that is used in the slope heuristics.

For that reason, we propose the alternative Alg. 2. The idea is to replace the penalization term by a Variable Selection penalization term. For each of the elements of given set of penalization parameters , we solve a Lasso problem over the whole stream of graph signals. This will allow us to keep all the relevant frequencies. Then, we solve multiple change-point detection problems for different levels of sparsity. We can deduce the right level of sparsity as well as two constants and related with the number of change-points via the slope heuristic. This last statement is validated by Theorem 2 and the experiments in Sec. 5.

###### Theorem 2.

Let us denote by the space generated by specific elements of the standard basis, and let us define the set as:

 S(Dm,τ):={~μ∈Fτ | ~μτl∈SDm,∀l∈{0,...,Dτ}}. (11)

Let and are solutions to the optimization problem 5. Then, there exist constants , , such that if the penalty is defined for all , where :

 pen(m,τ)=K1DmT+DτT(K2+K3log(TDτ)), (12)

such that there exists a positive constant and such that:

 E⎡⎢⎣∥∥^~μ^τ−~μ∗∥∥2FT⎤⎥⎦≤ C(K)[inf(m,τ)∈M(inf~μ∈S(Dm,τ)∥~μ−~μ∗∥2FT+pen(m,τ)) (13) +(1+(1(eγ−1)(e−1)))ϵ2].

,

Theorem 2 is proved in the supplementary material and it is a consequence of Theorem 4.18 of Massart2003 .

It is important to mention that the results stated in both theorems do not just apply for detecting the change-points in the SGS, but in any case the problem can be restated as in Eq. 6.

## 5 Numerical experiments

As mentioned earlier, a key hypothesis in our approach is the stationarity of the graph signals. An alternative definition of graph stationarity says that if we apply a graph filter with a frequency response

to white noise following a standard normal variable we get a stationary signal. This definition and the one given in Definition

4 are equivalent when the GSO is normal and all its eigenvalues are different Perraudin2017

. In this case we will use the Laplacian of the graph as GSO. The distance between two adjacent change-points is generated as observations of a exponential distribution with expectation

, we add to this result to guarantee a minimum distance of time stamps. We generate the SGS over a graph of nodes. We generate different random instances of each scenario. The particularities of each scenario are described bellow:

Scenario I: We generate Erdős–Rényi (ER) graphs with a fixed link creation probability and the frequency response of the filter defined as:

. We generate change-points via a Poisson distribution with expectation 5. Before the first change-point, the mean of the graph signals is a linear combination of the first

eigenvectors of the Laplacian matrix; random coefficients of the Graph Fourier transform are changed after each of the change-points, the mean is then this new linear combination of eigenvectors. In all cases the coefficients of the linear combinations were generated uniformly at random in the interval .

Scenario II: The graph structure is generated by a Barabasi-Albert (BA) model in which each incoming node is connected to nodes,

. The spectral profile of the filter is proportional to the density function of a Gamma distribution,

. Then, change-points are generated. Before the first change-point, the mean of the graph signals is a linear combination of the first eigenvectors of the Laplacian matrix; after the first change, the node with the highest degree and all its neighbors change their mean; after the second change-point the first nodes with the highest degrees modify their mean; after the third change-point, nodes at random of the graph get their mean changed. In all cases, the mean is generated uniformly at random in the interval .

In this section, we analyze the performance of our algorithm. We also analyze the differences in performance with the kernel-based detector introduced in Harchaoui2008 , which also uses model selection and the slope heuristic to identify the number of change-points Arlot2019 . As we are interested in detecting changes in the mean. We will show the results obtained by using the linear kernel , the Laplacian-based kernel and the Gaussian kernel , where is chosen according to the median heuristics. The detectors built using these kernels will be referenced as Linear, Laplacian and Gaussian. Alg. 2 is called to Variable Selection when we use the real values of and Approx. Variable Selection when we approximate it.

In order to estimate the of the signal, a parameter required for our proposed algorithms, we follow the technique described in Perraudin2017 . We use graph Gaussian filters over the observed signal and over white noise. In the synthetic scenarios, we use the first observations of the SGS, that is of the number of nodes. In the synthetic scenarios, we use the first observations of the SGS, that is of the number of nodes.

We implement the slope-heuristic described in Arlot2019 to recover the parameters , , and : that is we make a robust linear regression of the cost-functions of the list of models with high complexity against the penalization terms, then we multiply the linear coefficients by . By higher complexity we refer to the models such that inferred number of change-points is bigger than .

Results.

In both considered scenarios our method performs very well and is not affected by the estimation of the PSD: the distance with respect to the real change-points (Hausdorff distance) is small given the minimum gap between change-points. Almost all the points are correctly classified as whether they are change-points or not (Rand Index close to

). All the change-points are recovered (Recall equals ). However, it tends to slightly overestimate the number of change-points (Precision around ). These spurious change-points could be easily filtered out as they define segments of very small length, as clearly indicated by the Hausdorff distance.

For the Kernel-based detectors, we estimate the equivalent of the parameters and via the slope heuristic and we obtain a slightly better performance. Nevertheless, this method does not allow us to extract any information about the mean of the signal in each of the segments, let alone its sparse GFT representation.

## 6 Conclusion

In this work we presented an offline change-point detection approach for shifts in the mean of a stream of graph signals that automatically infer the number of change-points and the level of sparsity of the signal in its Graph Fourier representation. The formulation has the advantage of being easy to resolve via dynamic programming and interesting theoretical guarantees such as an oracle type inequality. The performance of our algorithm is comparable to that of the state-of-the-art kernel-based methods for changes in the mean of a multivariate signal, with the advantage that we can also get back the coefficients of the Graph Fourier Transform that could be used to interpret the change. The techniques and results of this paper can be generalized to similar situations when we aim to spot change-points in a stream of multivariate signals that supports a sparse representation in a given basis. Proving the consistency of the detected change-points is among our plans for future work.

## References

• (1) A., S.: Rejoinder on: Minimal penalties and the slope heuristics: a survey. J. de la Societe Française de Statistique 160(3), 158–168 (2019)
• (2) Aminikhanghahi, S., Cook, D.: A survey of methods for time series change point detection. Knowledge and Information Systems 51, 339–367 (2016)
• (3) Angelosante, D., Giannakis, G.: Sparse graphical modeling of piecewise-stationary time series. In: IEEE Int. Conf. on Acoustics, Speech and Signal Processing. pp. 1960–1963 (2011)
• (4)

Arlot, S., Celisse, A., Harchaoui, Z.: A kernel multiple change-point algorithm via model selection. J. of Machine Learning Research

20(162), 1–56 (2019)
• (5) Balzano, L., Recht, B., Nowak, R.: High-dimensional matched subspace detection when data are missing. In: IEEE Int. Symp. on Information Theory. pp. 1638–1642 (2010)
• (6) Basseville, M., Nikiforov, I.: Detection of abrupt changes: Theory and application. Prentice-Hall, Inc. (1993)
• (7) Birgé, L., Massart, P.: Gaussian model selection. J. European Mathematical Society 3, 203–268 (08 2001)
• (8) Chen, J., Gupta, A.: Parametric Statistical Change Point Analysis: With Applications to Genetics, Medicine, and Finance. Birkhäuser Basel, 2nd edn. (2012)
• (9) Chen, Y., Mao, X., Ling, D., Gu, Y.: Change-point detection of gaussian graph signals with partial information. In: IEEE Int. Conf. on Acoustics, Speech and Signal Processing. pp. 3934–3938 (2018)
• (10) Friedman, J., Hastie, T., Tibshirani, R.: Sparse inverse covariance estimation with the graphical Lasso. Biostatistics 9(3), 432–441 (2007)
• (11)

Giraud, C.: Introduction to high-dimensional statistics. Monographs on statistics and applied probability (Series) ; 139, CRC Press (2015)

• (12) Harchaoui, Z., Moulines, E., Bach, F.R.: Kernel change-point analysis. In: Koller, D., Schuurmans, D., Bengio, Y., Bottou, L. (eds.) Advances in Neural Information Processing Systems, pp. 609–616. Curran Associates, Inc. (2009)
• (13) Huang, W., Goldsberry, L., Wymbs, N.F., Grafton, S.T., Bassett, D.S., Ribeiro, A.: Graph frequency analysis of brain signals. IEEE Journal of Selected Topics in Signal Processing 10(7), 1189–1203 (Oct 2016). https://doi.org/10.1109/jstsp.2016.2600859, https://doi.org/10.1109/jstsp.2016.2600859
• (14) Le Bars, B., Humbert, P., Kalogeratos, A., Vayatis, N.: Detecting multiple change-points in the time-varying Ising model. arXiv e-prints arXiv:1910.08512 (2019)
• (15) Lebarbier, E.: Detecting multiple change-points in the mean of gaussian process by model selection. Tech. Rep. RR-4740, INRIA (2003)
• (16) Marques, A.G., Segarra, S., Leus, G., Ribeiro, A.: Stationary graph processes and spectral estimation. IEEE Trans. on Signal Processing 65(22), 5911–5926 (2017)
• (17) Massart, P., Picard, J.: Concentration Inequalities and Model Selection: Ecole d’Eté de Probabilités de Saint-Flour XXXIII - 2003. Springer Berlin Heidelberg (2003)
• (18) Massart, P., Meynet, C.: The Lasso as an l1-ball model selection procedure. Electr. J. of Statistics 5 (2011)
• (19) Ortega, A., Frossard, P., Kovačević, J., Moura, J., Vandergheynst, P.: Graph signal processing: Overview, challenges, and applications. Proceedings of the IEEE 106(5), 808–828 (2018)
• (20) Perraudin, N., Vandergheynst, P.: Stationary signal processing on graphs. IEEE Trans. on Signal Processing 65(13), 3462–3477 (2017)
• (21) Sandryhaila, A., Moura, J.: Discrete signal processing on graphs. IEEE Trans. on Signal Processing 61(7), 1644–1656 (2013)
• (22)

Shuman, D., Narang, S., Frossard, P., Ortega, A., Vandergheynst, P.: The emerging field of signal processing on graphs: Extending high-dimensional data analysis to networks and other irregular domains. IEEE Signal Processing Magazine

30(3), 83–98 (2013)
• (23) Tartakovsky, A., Nikiforov, I., Basseville, M.: Sequential Analysis: Hypothesis Testing and Changepoint Detection. Chapman & Hall/CRC Monographs on Statistics & Applied Probability, Taylor & Francis, CRC Press (2014)
• (24) Tenenbaum, J.B.: A global geometric framework for nonlinear dimensionality reduction. Science 290(5500), 2319–2323 (2000)
• (25) Truong, C., Oudre, L., Vayatis, N.: Selective review of offline change point detection methods. Signal Processing 167, 107299 (2020)

## Appendix A Proof of Theorems 1 and 2

In this appendix, we present the proofs of Theorem 1 and Theorem 2. For the sake of completeness, we introduce basic concepts of the model selection literature and we restate some results which are a key component to proof the oracle inequalities presented in this work.

The model selection framework offers an answer to the question: how to chose the function and the parameter so we recover the right number of change-points and the sparsity of the signal in its Graph Fourier representation at the same time.

###### Definition 5.

Given a separable Hilbert space , a generalized linear Gaussian model is defined as:

## References

• (1) A., S.: Rejoinder on: Minimal penalties and the slope heuristics: a survey. J. de la Societe Française de Statistique 160(3), 158–168 (2019)
• (2) Aminikhanghahi, S., Cook, D.: A survey of methods for time series change point detection. Knowledge and Information Systems 51, 339–367 (2016)
• (3) Angelosante, D., Giannakis, G.: Sparse graphical modeling of piecewise-stationary time series. In: IEEE Int. Conf. on Acoustics, Speech and Signal Processing. pp. 1960–1963 (2011)
• (4)

Arlot, S., Celisse, A., Harchaoui, Z.: A kernel multiple change-point algorithm via model selection. J. of Machine Learning Research

20(162), 1–56 (2019)
• (5) Balzano, L., Recht, B., Nowak, R.: High-dimensional matched subspace detection when data are missing. In: IEEE Int. Symp. on Information Theory. pp. 1638–1642 (2010)
• (6) Basseville, M., Nikiforov, I.: Detection of abrupt changes: Theory and application. Prentice-Hall, Inc. (1993)
• (7) Birgé, L., Massart, P.: Gaussian model selection. J. European Mathematical Society 3, 203–268 (08 2001)
• (8) Chen, J., Gupta, A.: Parametric Statistical Change Point Analysis: With Applications to Genetics, Medicine, and Finance. Birkhäuser Basel, 2nd edn. (2012)
• (9) Chen, Y., Mao, X., Ling, D., Gu, Y.: Change-point detection of gaussian graph signals with partial information. In: IEEE Int. Conf. on Acoustics, Speech and Signal Processing. pp. 3934–3938 (2018)
• (10) Friedman, J., Hastie, T., Tibshirani, R.: Sparse inverse covariance estimation with the graphical Lasso. Biostatistics 9(3), 432–441 (2007)
• (11)

Giraud, C.: Introduction to high-dimensional statistics. Monographs on statistics and applied probability (Series) ; 139, CRC Press (2015)

• (12) Harchaoui, Z., Moulines, E., Bach, F.R.: Kernel change-point analysis. In: Koller, D., Schuurmans, D., Bengio, Y., Bottou, L. (eds.) Advances in Neural Information Processing Systems, pp. 609–616. Curran Associates, Inc. (2009)
• (13) Huang, W., Goldsberry, L., Wymbs, N.F., Grafton, S.T., Bassett, D.S., Ribeiro, A.: Graph frequency analysis of brain signals. IEEE Journal of Selected Topics in Signal Processing 10(7), 1189–1203 (Oct 2016). https://doi.org/10.1109/jstsp.2016.2600859, https://doi.org/10.1109/jstsp.2016.2600859
• (14) Le Bars, B., Humbert, P., Kalogeratos, A., Vayatis, N.: Detecting multiple change-points in the time-varying Ising model. arXiv e-prints arXiv:1910.08512 (2019)
• (15) Lebarbier, E.: Detecting multiple change-points in the mean of gaussian process by model selection. Tech. Rep. RR-4740, INRIA (2003)
• (16) Marques, A.G., Segarra, S., Leus, G., Ribeiro, A.: Stationary graph processes and spectral estimation. IEEE Trans. on Signal Processing 65(22), 5911–5926 (2017)
• (17) Massart, P., Picard, J.: Concentration Inequalities and Model Selection: Ecole d’Eté de Probabilités de Saint-Flour XXXIII - 2003. Springer Berlin Heidelberg (2003)
• (18) Massart, P., Meynet, C.: The Lasso as an l1-ball model selection procedure. Electr. J. of Statistics 5 (2011)
• (19) Ortega, A., Frossard, P., Kovačević, J., Moura, J., Vandergheynst, P.: Graph signal processing: Overview, challenges, and applications. Proceedings of the IEEE 106(5), 808–828 (2018)
• (20) Perraudin, N., Vandergheynst, P.: Stationary signal processing on graphs. IEEE Trans. on Signal Processing 65(13), 3462–3477 (2017)
• (21) Sandryhaila, A., Moura, J.: Discrete signal processing on graphs. IEEE Trans. on Signal Processing 61(7), 1644–1656 (2013)
• (22)

Shuman, D., Narang, S., Frossard, P., Ortega, A., Vandergheynst, P.: The emerging field of signal processing on graphs: Extending high-dimensional data analysis to networks and other irregular domains. IEEE Signal Processing Magazine

30(3), 83–98 (2013)
• (23) Tartakovsky, A., Nikiforov, I., Basseville, M.: Sequential Analysis: Hypothesis Testing and Changepoint Detection. Chapman & Hall/CRC Monographs on Statistics & Applied Probability, Taylor & Francis, CRC Press (2014)
• (24) Tenenbaum, J.B.: A global geometric framework for nonlinear dimensionality reduction. Science 290(5500), 2319–2323 (2000)
• (25) Truong, C., Oudre, L., Vayatis, N.: Selective review of offline change point detection methods. Signal Processing 167, 107299 (2020)

## Appendix A Proof of Theorems 1 and 2

In this appendix, we present the proofs of Theorem 1 and Theorem 2. For the sake of completeness, we introduce basic concepts of the model selection literature and we restate some results which are a key component to proof the oracle inequalities presented in this work.

The model selection framework offers an answer to the question: how to chose the function and the parameter so we recover the right number of change-points and the sparsity of the signal in its Graph Fourier representation at the same time.

###### Definition 5.

Given a separable Hilbert space , a generalized linear Gaussian model is defined as: