# Change Point Methods on a Sequence of Graphs

The present paper considers a finite sequence of graphs, e.g., coming from technological, biological, and social networks, each of which is modelled as a realization of a graph-valued random variable, and proposes a methodology to identify possible changes in stationarity in its generating stochastic process. In order to cover a large class of applications, we consider a general family of attributed graphs, chatacterized by a possible variable topology (edges and vertices) also in the stationary case. A Change Point Method (CPM) approach is proposed, that (i) maps graphs into a vector domain; (ii) applies a suitable statistical test; (iii) detects the change --if any-- according to a confidence level and provides an estimate for its time of occurrence. Two specific CPMs are proposed: one detecting shifts in the distribution mean, the other addressing generic changes affecting the distribution. We ground our proposal with theoretical results showing how to relate the inference attained in the numerical vector space to the graph domain, and vice versa. Finally, simulations on epileptic-seizure detection problems are conducted on real-world data providing evidence for the CPMs effectiveness.

• 9 publications
• 31 publications
• 36 publications
06/21/2017

### Concept Drift and Anomaly Detection in Graph Streams

Graph representations offer powerful and intuitive ways to describe data...
02/11/2021

### Optimality of Graph Scanning Statistic for Online Community Detection

Sequential change-point detection for graphs is a fundamental problem fo...
04/25/2018

### Quickest Search for a Change Point

This paper considers a sequence of random variables generated according ...
03/05/2014

### Detecting change points in the large-scale structure of evolving networks

Interactions among people or objects are often dynamic in nature and can...
06/04/2021

### Principled change point detection via representation learning

Change points are abrupt alterations in the distribution of sequential d...
03/25/2016

### Exact Bayesian inference for off-line change-point detection in tree-structured graphical models

We consider the problem of change-point detection in multivariate time-s...
02/10/2020

### gfpop: an R Package for Univariate Graph-Constrained Change-point Detection

In a world with data that change rapidly and abruptly, it is important t...

## I Introduction

A graph representation for data is appropriate in several fields, including physics, chemistry, neuroscience, and sociology [newman2010networks], where the phenomena under investigations can be observed as a sequence of measurements whose pairwise relationships are relevant too and thus included in the data representation [li2017fundamental]. In these application scenarios, the identification of a possible change in the system behavior, a situation associated with anomalies or events to be detected in the sequence, is of particular interest; examples of applications that can be cast in this framework are functional brain networks [KHAMBHATI20161170] and power grids [powerlosses1]. Further relevant applications cover data acquired from cyber-physical systems and the Internet of Things [alippi2017not].

In all above applications the topology and the number of vertexes at different time steps may vary, and attributes can be associated with both vertexes and edges; moreover, attributes are not limited to numeric ones and may include categorical data, strings and even a mix of multiple types. In order to cover all these scenarios, we formalize graphs as objects of a graph domain belonging to the family of graph alignment spaces (GASs) [jain2016geometry]. GASs provide a metric structure which is also capable to deal with isomorphic graphs, i.e., they account for the case where a one-to-one correspondence among vertexes of different graphs is unavailable or missing.

An illustrative example of graph sequence is provided by Figure 1, where the graphs resemble the ‘A’ character in the first part of the sequence, until a certain time step after which the graph changes to a new one, resembling the ‘E’ character.

### I-a Problem formulation

Consider an unknown stochastic process that generates a finite sequence of attributed graphs, , , where is a GAS.

Each graph of the sequence is interpreted as the realization of a graph-valued random variable. Under the stationarity hypothesis for process , the graphs , are independent and identically distributed111 Two graphs are identically distributed if for any set ; they are independent if , for any pair . (i.i.d.) according to an unknown distribution [zambon2017concept]. Even when the distribution is stationary, the graphs of the sequence are allowed to vary (in terms of both structure and vertex/edge attributes) from a time step to another; this is, in fact, the random component associated with , which can be seen as the generator of the “normal operating conditions” of the monitored system. Conversely, we say that process undergoes a change in stationarity if it exists a time such that

 {gt∼Q0t

with being a graph distribution different from ; time is said to be the change point. The type of change is abrupt when commutes from to in a single time-step.

As finite sequence is given, to address the detection of a single change in stationarity, we propose to adopt a Change Point Method (CPM) [cabrieto2018testing, hawkins2003changepoint, fan2015identifying, harchaoui2013kernel], which relies on a series of two-sample statistical tests

 [s,pval]=Test(g(1,t−1),g(t,T))

applied to the pairs of subsets and , , and returns a statistic together with an associated -value . If at least a test yields a -value lower than a significance level , then a change in stationarity is detected in the sequence and the estimated change point is the one with the lowest -value. It is important to notice that the CPM framework can be implemented, in principle, by using any two-sample statistical test designed to assess differences between distribution functions.

Not rarely, the driving process undergoes multiple abrupt changes in the same finite sequence. In these cases, the proposed methodology can be extended by following the E-divisive technique [matteson2014nonparametric]. Whenever the stationarity hypothesis is met for each sub-sequence in between the change points, the methodology addresses the estimation of both the number of change points and their location in the sequence.

### I-B Contribution and paper organization

The novelty of our contribution can be summarized as follows:

• A methodology to perform change-point analysis on a sequence of attributed graphs by relying on graph embedding. We propose to map each observed graph belonging to a GAS onto a -dimensional point, , in some vector space where multivariate nonparametric222Throughout the present paper, the term nonparametric test indicate a test which does not assume a predefined model distribution.statistical hypothesis testing can be applied.

• Theoretical developments that allow to control the confidence level of the inference in the graph domain and the vector embedding space. In fact, in Proposition 1 we prove that the statistical confidence of the test attained in the embedding space is related to the confidence level that a change has taken place in the graph domain. Our theoretical results show how this relation, in principle, holds true for any graph embedding method.

• We propose two different CPM tests for graphs. The first one addresses the common type of change that involves a shift in the mean of the graph distribution; this nontrivially extends the CDT in [zambon2017concept] to the off-line case, and significantly improves the theoretical framework delineated there by removing the strong bi-Lipschitz assumption for the embedding map. The second test aims at identifying any kind of change in distribution (Proposition 3

) by relying on the energy distance between probability distributions introduced by Szekely

et al. [szekely2013energy], which also ensures the consistency of the test.

• We propose an extension of the E-divisive approach [matteson2014nonparametric] in order to identify multiple change points in a graph sequence; the approach is able to estimate both the number of change points and their position in the sequence.

The remainder of the paper is organized as follows. Section II describes related works. Preliminary definitions and assumptions are discussed in Section III. Section IV introduces the proposed methodology for performing CPMs on graph sequences. Section V presents the theoretical results related to the proposed CPMs. Section VI shows how to extend the proposed methodology for the identification of multiple changes; related theoretical results are presented in the same section. To demonstrate the practical usefulness of what proposed, in Section VII we perform simulations on both synthetic data and several real-world data sets of graphs. In particular, we take into account also a relevant real application scenario involving the detection of the onset of epileptic seizures in functional brain connectivity networks. Section VIII concludes the paper and provides pointers to future research. Finally, proofs of all theoretical results are provided in Appendix A.

## Ii Related works

In the literature, CPMs have been initially applied to scalar, normally distributed sequences to monitor shifts in the mean

[hawkins2003changepoint]

or variance

[hawkins2005change]. Extensions have been introduced for nonparametric inference [hawkins2010nonparametric, ross2012two], multivariate data [zamba2006multivariate, doi:10.1080/02664763.2013.800471, shi2017consistent], kernel-based inference [li2015m, NIPS2008_3556, NIPS2012_4727]. The design of CPMs for sequences of graphs, instead, is still a significantly underdeveloped research area. The problem of detecting multiple changes, instead, is more challenging [jandhyala2013inference]. One reason lies in the fact that the number of change points is often unknown; furthermore, the identification of one change point may rely on the the identification of the others. To address this problem, it is possible to optimize an objective function for identifying the location of the change and consider a penalty term that takes into account the number of change points [7938741]. A second direction in the literature aims at tackling the problem in a incremental way, by recursively splitting the original sequence in two parts [matteson2014nonparametric].

Considering graph sequences, we report the recent contribution in [barnett2016change], where the authors provide a method to monitor functional magnetic resonance recordings to identify changes. The recordings are modeled by correlation networks and a CPM is applied to detect changes in stationarity. The technique there proposed, however, is designed for graphs of fixed size with numerical weights associated with edges. Few other works address changes in sequences of graphs/networks [peel2015detecting, wilson2016modeling], but none of them operates on the very generic family of attributed graphs considered here.

Some of the already proposed CPMs can be applied to more general input spaces and hence are constrained to operate in vector spaces. For example, [matteson2014nonparametric, li2015m] rely on a distance or a kernel function, which makes them in principle applicable to graphs. However, the theoretical results related to testing for any distribution change in the graph sequence applies when the graph distance is metric of strong negative type [lyons2013distance] and the graph kernel is universal [smola2007hilbert].

Finally, we note that another relevant type of statistical test is known as Change Detection Test (CDT) [basseville1993detection]. A CDT acts differently from a CPM as it is designed for sequential monitoring.

## Iii Background and assumptions

### Iii-a Attributed graphs and the graph alignment space

An attributed graph is defined as a triple , where is a set of vertexes, is a set of edges and is a labeling function that associates to each vertex and edge an attribute taken from a predefined set . We denote with , or simply , the set of all graphs with attributes in and with at most vertexes. Notice that can be taken arbitrarily large, but needs to be finite. Simple examples of attribute sets are as in the case of correlation graphs and in the case of transport networks. Set can be more complex and include vectors, categorical data and strings. Moreover, even combination of multiple types are possible, like in the case of chemical compounds where, e.g., can contain all chemical elements and possible numbers of valence electrons involved in a bond.

In general, sets and are not vector spaces. Hence, operations between graphs, such as computing distances between pairs of graphs , are not trivial [gm_survey, emmert2016fifty]. In this work, we consider a graph alignment metric (GAM), [jain2016geometry] to evaluate the distance between two graphs. A GAM is defined by means of an attribute kernel , which assesses the similarity between attributes as an inner product in an implicit Hilbert space333Onto which the kernel trick is applied. , and a partial function444While a function associates an element to every element , a partial function is not necessarily defined for every element , hence is a (proper) function only from a subset to set . called alignment, which associates the vertexes of with vertexes of . A GAM is hence defined as:

 δ(gi,gj):=[κπ∗(gi,gi)+κπ∗(gj,gj)−2κπ∗(gi,gj)]1/2,

where

 κπ(gi,gj)=∑v,v′∈Dom(π)ka(a((v,v′)),a((π(v),π(v′)))),

indicates the domain of , and

 π∗=argmaxπ∈{Vi→Vj}kπ(gi,gj),

is the optimal alignment. In order to simplify the notation, we assume no self loops, , and that for any vertex . The set equipped with is called a graph alignment space (GAS). Under the mild assumptions that the attribute kernel is positive semidefinite, for any and for some , can be shown to be a metric space [jain2016geometry, Theo. 4.7,5.2]. This is a sufficient condition and corresponds to Assumption (A1) made in Section III-C.

We note that several common graph spaces are GASs, e.g., the spaces of weighted graphs equipped with Frobenius norm as distance, and the numeric vector-attributed graph spaces whose distance is based on Euclidean attribute kernels.

### Iii-B Graph mean and variation

In metric –not vector– spaces the notions of mean and variance have to be adapted, due to the possibly ill-defined operation of summation. This issue can be addressed by considering the formulation given by Fréchet [frechet1948elements], and also adopted in [jiang2001median, jain2016statistical]. Given a metric GAS , consider a graph-valued random variable taking values in and distributed according to probability function . Assume then a sample , with , being independent realizations of .

The concepts of graph mean and graph variation (extension of the variance concept) are formalized by the Fréchet function defined for any as

 FQ(g0)=∫Gδ(g0,g)2dQ(g)=Eg∼Q[δ(g,g0)2].

Function is positive and, if is finite at some graph , then it is finite for any other graph. Function is sometimes termed Fréchet population function to be distinguished by its empirical counterpart: the Fréchet sample function , which is computed over sample ,

 Fg(1,n)(g0)=1nn∑i=1δ(g0,gi)2.

The Fréchet sample variation is defined as the infimum

 Vf[g(1,n)]=infg0∈GFg(1,n)(g0).

Such infimum is attained in a finite set of graphs [jain2016statistical, Prop. 3.2], called sample Fréchet mean graphs. Accordingly, the Fréchet (population) variation is defined as

 Vf[Q]=infg0∈GFQ(g0).

A graph attaining the infimum is called Fréchet mean graph, and exists whenever is a complete metric space [bhattacharya2012nonparametric, Theorem 3.3]; accordingly, denotes the set of all Fréchet mean graphs. Notice that, whenever a Euclidean space is taken into account, the infimum of the Fréchet population and sample functions corresponds to the classical expected value and the arithmetic mean , respectively, where is a probability function on and are drawn i.i.d. from .

Verifying the uniqueness of the Fréchet mean graph in both population and sample cases is more involving; here we limit to a brief discussion and refer to [jain2016statistical] for details. A sufficient condition requires that the support of is bounded in a ball [Assumption (A2), Sec. III-C]. Such a ball has to be centered on a graph , and has a radius proportional to the degree of asymmetry, , defined as

 χ(g):=√2[kid(g,g)−minπ∈{V→V}kπ(g,g)]1/2,

with being the identity alignment so that , . Graphs with a non-null degree of asymmetry –namely, asymmetric or ordinary graphs– are spread over the entire graph space [jain2016statistical, Cor. 4.19], and their degree depends on the particular location. Here, we consider a ball, however, this can be extended to a cone surrounding that ball [jain2016statistical]. As final remark, we comment that each graph in can be represented by means of a matrix and , relying on the kernel embedding. It then follows that the Fréchet mean exists unique in .

In the rest of the paper, when the mean is assumed to exist and be unique, with little abuse of notation we denote as a graph instead of a singleton set.

### Iii-C Assumptions

The current section presents and comments on the assumptions we make throughout the paper. As mentioned in the Introduction, here we consider graphs belonging to a graph alignment space . In order to ensure that has a metric structure, we assume that:

• GAM is built on an positive semi-definite attribute kernel for any and for some .

Assumption (A1) is mild, yet grants space to be metric, as mentioned in Section III-B. The second assumption we make concerns the probability distribution over the graph domain. In particular, we bound the support of the distributions as follows:

• The Fréchet variation is finite, , and there exists a sufficiently asymmetric graph so that is bounded by a ball centered in and with radius proportional to , as requested in [jain2016statistical].

Assumption (A2) grants that the Fréchet mean exists unique (see Section III-B), hence making the mathematics more amenable. At the same time, this hypothesis enables Theorem 4.23 in [jain2016geometry] and, accordingly, the ball of graphs is proven isometric to an Euclidean space. We comment that, as mentioned in Section III-B, a given GAS is entirely covered by such balls and, moreover, this assumption can be relaxed.

## Iv CPMs on a graph sequence

Under the stationarity hypothesis for process , sequence is composed of i.i.d. graphs distributed according to the stationary probability function . Following (1), the statistical hypothesis test for detecting a single change in stationarity can be formulated as

 (2)

We consider a map between the graph domain and the Euclidean space, which associates graph with point . The methodology proposed here, and the related results shown in Section V, are valid for any embedding function that the user chooses, thus making the proposed methodology very general. However, in practical applications of our methodology, the distortion introduced by plays a relevant role that should be taken into account regardless of the validity of the theoretical results.

By applying the mapping to each graph of the sequence , we generate a new transformed sequence of vectors , . A multivariate CPM test can then be applied on . In CPMs, one performs multiple two-sample tests. In particular, for each time index , a statistic

 [se(t),pval(t)]=Test(x(1,t−1),x(t,T))

is computed on sequences . Notice that, in order to improve readability, and whenever it is clear from the context, we may write instead of , and instead of .

Statistical test depends on the detection problem at hand; for instance, one could design specific tests to identify a change in the mean, or in the variance. Two relevant examples, one addressing changes in the distribution mean, and the other generic changes in the distribution, are discussed later in Sections V-A and V-B, respectively, and Figure 3 provides a visual description of their application. The following pseudo-code outlines the CPM for a generic two-sample test.

1:A sequence of observed vectors ; a significance level .
2:Whether a change has been detected or not and the estimated change point .
3:for all  do
4:     Compute -value of ;
5:end for
6:;
7:if  then
8: is rejected;
9:     return Change detected at time .
10:else
11:     Null hypothesis is not rejected;
12:     return No change detected.
13:end if

The for-loop in Line 3 explores all possible subdivision of the sequence. Line 6 estimates the candidate change point , and considers graph to be the first one drawn by . Line 7 checks the actual presence of a change; in most cases, the rejection criterion can also be implemented by monitoring the statistic , instead of , e.g.,

 if se(^t)>γe(^t) ⇒ reject H0 at significance level αe, (3)

provided that

is the quantile of order

associated with . The significance level

 αe=P(reject H0|H0)

coincides with the tolerated rate of detected changes (false positive rate) when the null hypothesis holds true: the proposed methodology allows the designer to define the significance level according to the application at hand. Conversely, the rate of unrecognized changes (false negative rate),

 βe=P(do not reject H0|H1)

defines the power (i.e., ) of the test. Parameter is characteristic of the adopted test and depends on the family of possible distributions . While can be obtained in non-parametric tests, the value of is often unavailable. Given a significance level , the designer can improve the power by selecting a suitable test; in fact, the identification of a particular type of change, e.g., a change in the distribution variance, is better addressed when a test specifically designed for that change is adopted.

If a change is detected in sequence with significance level , then we say that a change has taken place also in the graph sequence at time . However, the significance level of the inference in the graph domain differs, a priori, from in . Proposition 1 shows how the significance levels are related and, accordingly, how a change in the embedding space implies a change in the graph domain and vice-versa.

We point out that some two-sample tests require a minimum sample size, e.g., the Welch’s t-test. In those cases, one can add a margin

and apply the procedure for . Considering a margin is useful also to reduce the false negative rate ; in fact, often approaches when the test (3) is applied with very close to or , i.e., when the sample size of one of the two subsets is small.

## V Theoretical results

Inferring whether a change in stationarity occurred or not in a sequence of attributed graphs, , is a difficult problem. As we do not make assumptions about embedding map , the resulting sequence does not necessarily encode the same statistical properties of . Nonetheless, here we prove some general results connecting changes in stationarity occurring in the graph sequence with those detected in the embedded sequence , and vice-versa. In order to improve readability, technical details of the various proofs are delivered in Section A.

The core of our argument is that, if statistic is related to the chosen statistic defined in the graph domain, then also their distributions must be related. By proving this, we can claim that a change occurring in one space can be detected in the other space as well, possibly with different confidence levels. We mention that, throughout the paper, subscripted ‘e’ and ‘g’ denote quantities associated with the embedding space and the graph domain, respectively. The following Proposition 1 shows how to relate statistics and in probabilistic terms. In particular, when decision rule (3) is applied to with at significance level , a decision rule of the type

 if sg(t)>γg(t) ⇒ reject H0 at % significance level αg (4)

holds in the graph domain by considering statistic . The significance level for the test on graphs can be bound by means of two significance levels and related to the multivariate test in the embedding space. Significance levels and are associated with two threshold , respectively, that allow the user to perform statistical inference in the embedding space, while controlling the significance level of the corresponding inference in graph domain.

###### Proposition 1.

Consider a sequence of graph-valued random variables that are i.i.d. according to probability function , and assume that (A1), (A2) hold true. Let us define a sequence of random vectors obtained from through map . Let and be the cumulative density functions of statistics and , respectively. Chosen constants555Constants and depend on distribution , but not from . and satisfying

 Pg∼QT0(|sg(t;g)−se(t;ϕ(g))|≤λ)≥q, (5)

then, for any real value , we have that

 qΨe(γ−λ)≤Ψg(γ)≤q−1Ψe(γ+λ). (6)

With Proposition 1, the significance levels, and respective thresholds, can be identified. In fact, by evaluating the bounds in (6) at in (4)

 qΨe(γg(t)−λ)≤1−αg≤q−1Ψe(γg(t)+λ),

and defining , , we obtain that ; the associated thresholds are those for which and . We conclude that, with a confidence at least , if then no change has taken place in the graph domain and, conversely, if then a change has occurred in the graph domain. Finally, it is worth observing that, when , it is not possible to make a reliable decision as a consequence of the severe distortion introduced by the embedding procedure. If the embedding is isometric, instead, for any , (5) holds with probability and (6) reduces to equality for any .

A similar reasoning can be done in terms of -values. The subsequent Proposition 2 shows how to bound the -value associated with graph statistic , evaluated on a specific observed sequence , with two -values concerning the vector statistic .

###### Proposition 2.

Let us consider the assumptions made in Proposition 1, and let and be realizations of random sequences and , respectively. Further, let

 pg =Pg∼QT0(sg(t;g)>sg(t;g∗)) p′e =Pg∼QT0(se(t;ϕ(g))>se(t;x∗)+2λ) p′′e =Pg∼QT0(se(t;ϕ(g))>se(t;x∗)−2λ)

be the -values associated with , , respectively. Then, with probability

 q−1p′e+1−q−1≤pg≤qp′′e+1−q. (7)

The following subsections propose two CPM tests based on choices of distances for graphs and vectors that are relevant in common applications, for which (i) terms in (5) can be made explicit, and (ii) the distribution of can be determined. It follows that Propositions 1 and 2 hold true and the distribution of can be derived from that of , together with the confidence level .

### V-a A CPM test for a shift in the Fréchet mean

The first CPM test we propose addresses the detection of a shift in the mean of the graph distribution. The derived statistical hypotheses are

 H0 :μQ0=μQ1 H1 :μQ0≠μQ1.

The adopted statistic for the embedding space is defined as:

 se(t;x)=Td2M(μx(1,t−1),μx(t,T)),

which is based on the squared Mahalanobis distance:

 d2M(μx(1,t−1),μx(t,T))=(μx(1,t−1)−μx(t,T))⊤M−1(μx(1,t−1)−μx(t,T)). (8)

and are the sample means and is the pooled sampling covariance matrix. Under the stationarity hypothesis, the distribution of statistic

can be determined; in fact, by applying the central limit theorem,

is asymptotically distributed as a , where denotes the embedding space dimension. As the distribution of is now available in closed-form, a threshold can be set to control the false positive rate.

Accordingly, we define the graph statistic as the squared GAM between the mean graphs and ,

 sg(t;g)=Tδ2(μg(1,t−1),μg(t,T)).

Let us recall that, as we are considering attributed graphs, possibly with a variable number of vertexes, the mean graph elements are intended according to Fréchet, as described in Section III-B. Further, we highlight that and are consistent estimators of and , respectively, where is the distribution of random vector , for and . In fact, the ordinary and Fréchet sample means are consistent estimators of their population counterparts [jain2016statistical].

With the above selection for statistics and , the claim of Proposition 1 can be refined and made more explicit. This is done in following Lemma 1, which explicitly provides a for any (see Eq. 5).

###### Lemma 1.

Under the assumptions of Proposition 1, there exists a positive constant depending on distribution and time , such that, for any

 P(|se(t;x)−sg(t;g)|≤λ)≥1−λ−1V1(t),

with

is the smallest eigenvalue of matrix

, and is the Fréchet variation, see Section III-B; see Section A-C for a proof and detailed explanation. From above lemma, it follows that Proposition 1 holds true for any positive value of , with . We point out that the constant is proportional to the sum of Fréchet variation of and and therefore it can be considered as a measure for the data spread in both graph and embedding spaces.

### V-B A CPM test to assess generic distribution changes

The second proposed CPM test allows to identify any type of changes in stationarity affecting the distribution. As such, the hypothesis test can be formalized as against . The multivariate two-sample test adopted in this CPM test is based on the energy statistic [szekely2004testing] and, accordingly, the statistic in the embedding space is

 se(t;x)=(t−1)(T−t+1)TE(x(1,t−1),x(t,T)), (9)

with

 E(x(1,t−1),x(t,T)):=2∑t−1i=1∑Tj=t∣∣xi−xj∣∣2(t−1)(T−t+1)−∑t−1i,j=1∣∣xi−xj∣∣2(t−1)2−∑Ti,j=t∣∣xi−xj∣∣2(T−t+1)2. (10)

Asymptotically follows a weighted sum of distributions, provided the variance of is finite and associated -values can be computed via permutation [szekely2004testing]. Székely and Rizzo [szekely2004testing] showed also that tests based on are consistent when testing equality of distributions and against the hypothesis, implying that the test is able to detect any discrepancy between distributions. This follows from the fact that statistic is the empirical version of the energy distance , which is proven to be a metric distance between distributions and with support on Euclidean spaces,

 E2(F0,F1):=2E[|x0−x1|2]−E[∣∣x0−x′0∣∣2]−E[∣∣x1−x′1∣∣2], (11)

with and independent random vectors. Such a property can be extended to more general metric spaces [szekely2013energy] by substituting in Equation (11) the associated metric distance. In particular, as stated in Proposition 3, this holds for whenever the graph domain is a proper GAS.

###### Proposition 3.

Let us define as the set of all probability functions on a measurable space over , so that fulfills the support condition of (A2). Then, use in of (11) the metric distance between samples . It follows that is a metric space.

Supported by this fact, we consider as graph statistic

 sg(t;g)= (t−1)(T−t+1)T⎧⎨⎩2∑t−1i=1∑Tj=tδ(gi,gj)(t−1)(T−t+1) −∑t−1i,j=1δ(gi,gj)(t−1)2−∑Ti,j=tδ(gi,gj)(T−t+1)2⎫⎬⎭.

Similarly to what we proved for Lemma 1, Lemma 2 connects statistics and in probability. This result will then be used in Proposition 1 to obtain an explicit relation between confidence levels for the test based on the energy distance.

###### Lemma 2.

Under the assumptions of Proposition 1, there exists a positive constant depending on distribution , such that, for any

 P(|se(t;ϕ(g))−sg(t;g)|≤λ|H0)≥1−λ−1V2,

with

Lemma 2 provides a way to define constant in terms of , i.e., . In this sense, Lemma 2 is analogous to Lemma 1; moreover, quantities and are measures of distribution spread and, accordingly, also Lemma 2 can be interpreted in terms of the data uncertainty.

## Vi Identifying multiple change points

What discussed so far assumes that input sequences contain at most one change point. Here, we elaborate over the E-divisive approach [matteson2014nonparametric] and design a CPM test able to detect multiple abrupt change points (if present) in a sequence of attributed graphs. The E-divisive approach relies on a two-sample test and produces, for a generic sequence , a statistic based on the energy distance defined in Equation 10:

 se(t;x(a,b))=maxt≤r≤bE(x(a,t−1),x(t,r)). (12)

As commented below, the maximum over variable is introduced to take into account the possible presence of multiple change points in the same sequence.

Multiple abrupt change points are detected incrementally. The algorithm initially takes into account the entire (embedded) sequence , and selects time step

so as to maximize the test statistic

,

 ^t=argmax2≤t≤Tse(t;x(1,T)). (13)

Time index is the first discovered change point, provided that the associated -value is lower than a predefined significance level . This step looks fairly similar to a typical CPM test for a single change, as the idea is to sweep over all bi-partitions induced by . However, we stress a fundamental difference introduced by the auxiliary variable . In fact, by varying we can mitigate possible side effects deriving, e.g., from the presence of multiple distributions in , making it statistically indistinguishable from .

In order to describe a generic iteration of the E-divisive technique, let us assume that a set of different change points, , has been already identified (to simplify the notation, endpoints and are considered change points as well). The next iteration consists in applying the procedure described above for , to a sub-sequence , obtaining via (13) a new candidate change point . Among these candidate change points, the new change point is the time index maximizing the associated statistic, i.e., , with

 j=argmaxi∈{0,…,k−1}se(^t(i);x(^ti,^ti+1−1)).

Again, if the -value associated with is lower than a predefined significance level , then is retained as an additional change point. The procedure is repeated until the outcome of a test is not statistically significant, meaning that all change points present in the sequence have been identified.

The statistic used for the detection in the embedding space can be associated with the graph statistic:

Similarly to what discussed in Section V, here we prove the following Lemma 3 which demonstrates how to relate the significance levels in the graph and embedding domains.

###### Lemma 3.

Let us consider again the assumptions made in Proposition 1. Let , , be a sequence of graphs and let be the associated sequence in the embedding space. Then, there exists a positive constant that depends on distribution and time , such that, for any

 Pg∼Qb−a(|se(t;ϕ(g))−sg(t;g)|≤λ|H0)≥1−λ−1V3(t),

with .

Lemma 3 is used every time a new candidate change point is found via Eq. 13, and with extrema and . The bound above takes different values depending on the candidate change point, as it happens also with Lemma 2. In particular, the ratio in can be interpreted as the inverse of the relative location of in the interval under analysis. Notice that its value is unbounded when approaches , as the estimation of the expectation of in Eq. 12 involves computing a maximum value. This issue can be mitigated by considering a margin so that is selected in the range . Moreover, as mentioned in Section IV, such a margin would also be useful to avoid issues related to the power of the test.

## Vii Experiments

Here, we perform simulations in order to assess the effectiveness of the proposed CPMs. We will consider both synthetically generated sequences of graphs and real data. The real-world data come from different application domains, including bio-molecules and electro-electroencephalograms (EEG), thus covering different case studies of practical relevance.

### Vii-a Data

Table I summarizes the main characteristics of the data sets taken into account, which are furhter described in the following sections.

#### Vii-A1 Delaunay graphs

As synthetic –controlled– data, we take into account the Delaunay graphs first introduced in [zambon2017detecting]. Delaunay graphs are geometric graphs composed by 7 vertexes and 2-dimensional real coordinates as vertex attributes; the topology of the graph is defined by the Delaunay triangulation of the vertexes. Different classes of graphs can be generated by considering different coordinates for the vertexes. For a detailed description of the generation process, we refer the reader to Zambon et al. [zambon2017detecting]. Changes in stationarity along a sequence of Delaunay graphs are simulated by inducing a transition between different classes of graphs. Delaunay data set constains several classes. In particular, here we will consider a reference class, called “class 0”. Class 1 contains instance graphs very different from those in class 0. As the class index increases, the graph instances of that class become similar to those of class 0, so that, e.g., distinguishing class 0 from class 8 is easier than distinguishing class 0 from class 12. Accordingly, detecting a change is more difficult in the latter case. In this paper, we consider classes 0, 6, 8, 10, 12, 14, 16, 18, and 20 as reported in Table I.

#### Vii-A2 IAM database

We will experiment on three data sets from the IAM graph database [riesen+bunke2008], namely the Letter, AIDS and Mutagenicity data sets. Letter data set is composed of handwriting letters represented as graphs. There are 15 classes, each of which is associated to a different letter for a total of graphs. Graphs are characterized by a variable topology and number of vertexes (from 2 to 9 vertexes); a 2-dimensional vector is associated to each vertex as attribute. AIDS and Mutagenicity data sets contain graph representations of biological molecules. Both data sets contain two classes of graphs. Originally, the Letter data set comes in three different versions; here we consider the data set having the highest variability in order to make the problem more difficult. The AIDS data set contains 1600 inactive graphs and 400 graphs representing active molecules. Mutagenicity data set contains 1963 nonmutagenic molecules and 2401 mutagenic molecules. In both data sets, the graphs are characteriezd by chemical symbols as vertex attributes (i.e., categorical data) and valence of the chemical links as edge attributes. The AIDS data set contains graphs with as much as 95 vertexes; Mutagenicity data set contains larger graphs with up to 417 vertexes.

#### Vii-A3 Detection of epileptic seizures from iEEGs

The final case study we take into account refers to the problem of detecting epileptic seizures from intracranial electro-encephalogram (iEEG) recordings. Notably, we use the “Detect seizures in intracranial EEG recordings” database by UPenn and Mayo Clinic. The database contains recordings related to different subjects (8 humans and 4 dogs). The recordings belong to two classes, denoting interictal and ictal segments. The first one refers to recordings denoting normal brain activity, while the second one refers to seizure events. Each subject data set contains a pre-defined split in training and a test set, but only the training clips are labeled. Accordingly, in order to rely on a ground truth change point, here we will consider the training set of each subject only.

For each subject, the data is available as a sequence of one-second clips with a variable number of channels (from 16 to 72), giving rise to a multivariate stream of iEEGs. In order to model (statistical) coupling among the activity recorded in different brain regions, it is common to represent iEEG data as functional connectivity networks [bastos2016tutorial]. Functional connectivity networks are weighted graphs, where (usually) the vertexes correspond to the signals recorded by the electrodes (or channels of the electrodes) and the edge weights represent their coupling strength. Many connectivity measures have been proposed for this purpose: here we consider Pearson correlation computed in the high-gamma band (70-100Hz). We also characterize each vertex with the leading four wavelet coefficients [bastos2016tutorial] computed from the related raw signals by means of the discrete wavelet transform. Figure 2 provides a visual representation of two example graphs associated with different regimes.

### Vii-B Experimental setup and implementation details

To obtain a stationary sequence of graphs from one of the above-mentioned data sets, we select all graphs of one class and randomly arrange them in sequence. In order to simulate change points, we generate stationary sequences from different classes and, finally, concatenate them to form a single longer sequence . Throughout the rest of the paper, we indicate a particular sequence with the ID of the considered data set (see Table I) and with sub-scripted the list of classes indicating the order by which they occur in the sequence . For example, experiment “LetA,E,F” refers to the Letter data set and considers a sequence , where contains all graphs of class A, contains all graphs of class E, and, finally, contains all graphs of class F.

Half of the graphs in each original sequence are randomly selected without repetition as training graphs, and are used to learn the graph embedding map . The remaining graphs constitute the actual sequence on which the tests are run to identify the presence of change points. The adopted graph embedding technique is the dissimilarity representation [pekalska2005dissimilarity], which considers prototype graphs and maps each graph to a vector containing GAM evaluations of with respect to each prototype. Here, is built on the Euclidean kernel for real-valued attributes. When the attribute set is of categorical data, the GAM is based on the delta-like kernel, which assigns 1 if the attributes are equal and 0 otherwise. The prototypes are selected among the training graphs according to the -centers method [riesen2007graph], with being the number of centers (prototypes) and, hence, corresponds to the embedding dimension . The embedding dimension is a critical parameters that impacts on the quality of the embedding, affecting the sharpness of bounds of the form in (5). Since the optimal value of embedding dimension depends on the specific data sets at hand, here we set to , which turned out to be an effective choice in most of the cases. We leave a more focused study on the impact of the embedding dimension as future work.

The methodology presented in Section IV is evaluated on sequences with zero, one and multiple change points generated on the above data sets. To properly assess the performance, the experiments are run on the three proposed methods -CPM (Section V-A), -CPM (Section V-B), and E-div (Section VI) as instances of the proposed methodology. The significance level of the test is set to .

Four performance metrics are considered to evaluate the effectiveness of the methods. The first two metrics are the true positive rate (TPR) and false positive rate (FPR), where “positive” refers to an actual change in the sequence. Here TPR ranges in and assesses the rate of detected changes over the total number of ground truth changes. Conversely, the FPR can be greater than 1, as it is the rate of changes identified beyond the ground truth ones over the total number of stationary sequences. The third metric quantifies the mean distance of the detected change from the ground truth change-point. In order to make it independent from the length of sequence , we consider the normalized discrepancy between change point and the estimated change point . We call such a measure relative time-step error (RTE). The last metric is the adjusted Rand index (ARI) [morey1984measurement], which is used for comparing two partitions of the same set. ARI ranges in the interval, with 1 corresponding to partitions in perfect agreement; an ARI equal to 0 is expected when the partitions are completely random; negative values denote partitions in disagreement. To robustly estimate the aforementioned metrics and assess their variability, we repeated each experiment 100 times.