Nomenclature
Graph domain  
Graph distance  
Generic graph in generated at time  
Set of graphs  
Dissimilarity domain  
Distance  
Dissimilarity representation of generic graph  
Set of diss. representations of  
Dissimilarity representation  
Set of prototypes  
Stochastic process generating graphs  
Change time and its estimate 

Probability space for in nominal regime  
Probability space for in nominal regime  
Null and alternative hypotheses of a statistical test  
Statistic used by the change detection test  
Threshold for the change detection test  
Significance level of the change detection test  
Sample mean and expected value w.r.t.  
Fréchet sample and population means w.r.t. 
I Introduction
Learning in nonstationary environments is becoming a hot research topic, as proven by the increasing body of literature on the subject, e.g., see [7296710, gama2014survey] for a survey. Within this learning framework, it is of particular relevance the detection of changes in stationarity of the data generating process. This can be achieved by means of either passive approaches [elwell2011incremental], which follow a pure online adaptation strategy, or active ones [alippi2008justI, alippi2009just], enabling learning only as a proactive reaction to a detected change in stationarity. In this paper, we follow this last learning strategy, though many results are general and can be suitably integrated in passive learning approaches as well.
Most change detection mechanisms have been proposed for numeric i.i.d. sequences and either rely on change point methods or change detection tests. Both change point methods and change detection tests are statistical tests; the former works offline over a finite number of samples [hawkins2003changepoint] while the latter employs a sequential analysis of incoming observations [basseville1993detection]
to detect possible changes. These techniques were originally designed for univariate normal distributed variables, and only later developments extended the methodology to nonGaussian distributions
[alippi2006adaptive, ross2012two] and multivariate datastreams [zamba2006multivariate, golosnoy2009multivariate].A somehow related field to change detection tests is oneclass classification (e.g., see [eocc_mi]
and references therein). There, the idea is to model only the nominal state of a given system and detect nonnominal conditions (e.g., outliers, anomalies, or faults) by means of inference mechanisms. However, oneclass classifiers typically process data batches with no specific presentation order, while change detection problems are sequential in nature.
The important role played, nowadays, by graphs as description of dynamic systems is boosting, also thanks to recent discoveries of theoretical frameworks for performing signal processing on graphs [7407399, shuman2013emerging] and for analysing temporal (complex) networks [holme2012temporal, holme2015modern, masuda2016guide]. However, very few works address the problem of detecting changes in stationarity in streams (i.e., sequences) of graphs [wilson2016modeling, barnett2016change, masoller2015quantifying]
, and, to the best of our knowledge, none of them tackles the problem by considering a generic family of graphs (e.g., graphs with a varying number of vertices/edges, and arbitrary attributes on them). In our opinion, the reason behind such a lack of research in this direction lies in the difficulty of defining a sound statistical framework for generic graphs that, once solved, would permit to also detect changes in time variance in timedependent graphs. In fact, statistics grounds on concepts like average and expectation, which are not standard concepts in graph domains. Fortunately, recent studies
[jain2016geometry, jain2016statistical] have provided some basic mathematical tools that allow us to move forward in this direction, hence, addressing the problem of detecting changes in stationarity in sequences of graphs.A key problem in analysing generic graphs refers to assessing their dissimilarity, which is a wellknown hard problem [gm_survey]. The literature proposes two main approaches for designing such a measure of dissimilarity [foggia2012graph, emmert2016fifty, roy2014modeling]. In the first case, graphs are analysed in their original domain , whereas the second approach consists of mapping (either explicitly or implicitly) graphs to numeric vectors. A wellknown family of algorithms used to assess dissimilarity between graphs relies on the Graph Edit Distance (GED) approach [bunke1983inexact]. More specifically, GED algorithms count and weight the edit operations that are needed in order to make two input graphs equal. Differently, other techniques take advantage of kernel functions [7947106], spectral graph theory [qiu2006graph, bai2015quantum], or assess graph similarity by looking for recurring motifs [costa2010fast]
. The computational complexity associated with the graph matching problem inspired researchers to develop heuristics and approximations, e.g., see
[fankhauser2011speeding, fischer2015approximation, bougleux2017graph] and references therein.Ia Problem formulation
In this paper, we consider sequences of attributed graphs, i.e., directed or undirected labelled graphs characterized by a variable number of vertices and edges [jain2016geometry]. Attributed graphs associate vertices and edges with generic labels, e.g., scalars, vectors, categorical, and userdefined data structures. In addition, multiple attributes can be associated to the same vertex/edge, whenever requested by the application. By considering attributed graphs, we position ourselves on a very general framework covering most of application scenarios. However, generality requires a new operational framework, since all assumptions made in the literature to make the mathematics amenable, e.g., graphs with a fixed number of vertices and/or scalar attributes, cannot be accepted anymore. In order to cover all applications modellable through attributed graphs, we propose the following general problem formulation for change detection.
Given a generic premetric distance on , we construct a algebra containing at least all open balls of and associate a generic probability measure to . The generated probability space
allows us to consider graphs as a realisation of a structured random variable
on . Define to be the process generating a generic graph at timeaccording to a stationary probability distribution
(nominal distribution). We say that a change in stationarity occurs at (unknown) time when, from time on, starts generating graphs according to a nonnominal distribution , i.e.,In this paper, we focus on persistent (abrupt) changes in stationarity affecting the population mean. However, our methodology is general and potentially can detect other types of change, including drifts and transient anomalies lasting for a reasonable lapse of time.
IB Contribution and paper organization
A schematic description of the proposed methodology to design change detection tests for attributed graphs is shown in Figure 1, and consists of two steps: (i) mapping each graph to a numeric vector through a prototypebased embedding, and (ii) using a multivariate change detection test operating on the stream for detecting changes in stationarity.
The novelty content of the paper can be summarized as:

A methodology to detect changes in stationarity in streams of attributed graphs. To the best of our knowledge, this is the first research contribution tackling change detection problems in streams of varyingsize graphs with nonidentified vertices and userdefined vertex/edge attributes;

A method derived from the methodology to detect changes in stationarity in attributed graphs. We stress that the user can design his/her own change detection method by taking advantage of the proposed methodology;

A set of theoretic results grounding the proposed methodology on a firm basis.
The proposed methodology is general and advances the few existing approaches for change detection in graph sequences mostly relying on the extraction and processing of topological features of fixedsize graphs, e.g., see [ranshous2015anomaly].
It is worth emphasising that the proposed approach assumes neither onetoone nor partial correspondence between vertices/edges across time steps (i.e., vertices do not need to be uniquely identified). This fact has important practical implications in several applications. As a very relevant example, we refer to the identification problem of neurons in extracellular brain recordings based on their activity
[Rossant2016]. In fact, each electrode usually records the activity of a neuron cluster, and single neurons need to be disentangled by a procedure called spike sorting. Hence, a precise identification of neurons (vertices) is virtually impossible in such an experimental setting, stressing the importance of methods that do not require onetoone correspondence between vertices over time.The remainder of this paper is structured as follows. Section II contextualizes our contribution and discusses related works. Section IIIA presents the proposed methodology for change detection in generic streams of graphs. Theoretical results are sketched in Section IIIB; related proofs are given in Appendix C. A specific implementation of the methodology is presented in Sections IV and related proofs in Appendix D. Section V shows experimental results conducted on datasets of attributed graphs. Finally, Section VI draws conclusions and offers future directions. Appendices A and B provide further technical details regarding the problem formulation.
Ii Related work
The relatively new field of temporal networks deals with graphlike structures that undergo events across time [holme2015modern, masuda2016guide]. Such events mostly realise in instantaneous or persistent contacts between pairs of vertices. With such structures one can study dynamics taking place on a network, like epidemic and information spreading, and/or dynamics of the network itself, i.e., structural changes affecting vertices and edges over time. Further relevant directions in temporal networks include understanding the (hidden) driving mechanisms and generative models [holme2012temporal].
The literature in statistical inference on timevarying graphs (or networks) is rather limited [holme2015modern, jain2016statistical], especially when dealing with attributed graphs and nonidentified vertices. Among the many, anomaly detection in graphs emerged as a problem of particular relevance, as a consequence of the ever growing possibility to monitor and collect data coming from natural and manmade systems of various size. An overview of proposed approaches for anomaly and change detection on timevariant graphs is reported in [ranshous2015anomaly, akoglu2015graph], where the authors distinguish the level of influence of a change. They identify changes affecting vertices and edges, or involving entire subnetworks of different size; this type of change usually concerns static networks, where the topology is often fixed. Other changes have a global influence, or might not be ascribed to specific vertices or edges.
We report that there are several applications in which the vertices are labelled in such a way that, from a time step to another, we are always able to create a partial onetoone correspondence (identified vertices). This case arises, e.g., when the identity of vertices plays a crucial role and must be preserved over time. Here, we put ourself in the more general scenario where vertices are not necessarily onetoone identifiable through time.
Within the anomaly detection context, only few works tackle the problem in a classical change detection framework. Among the already published works in detecting changes in stationarity, we mention Barnett et al. [barnett2016change], whose paper deals with the problem of monitoring correlation networks by means of a change point method. In particular, at every time step , the authors construct the covariance matrix computed from the signals up to time and the covariance matrix of the remaining data. As statistic for the change point model, they adopt the Frobenius norm between the covariance matrices. The authors evaluate their method on functional magnetic resonance imaging and stock returns. A different way to approach the problem consists in modeling the networkgenerating process within a probabilistic framework. Graphs with vertices and disjoint communities can be described by the degree corrected stochastic block model, where some parameters represent the tendency of single vertices to be connected and communities to interact. This model has been adopted by Wilson et al. for monitoring the U.S. Senate covoting network [wilson2016modeling]
. As monitoring strategy, they consider the standard deviation of each community, and then apply exponential weighted moving average control chart. A further example of change point method for fixedsize graphs combines a generative hierarchical random graph model with a Bayesian hypothesis test
[peel2015detecting].Iii Change detection in a stream of graphs
The structure of the section is as follows. Section IIIA describes the proposed methodology at a high level to ease the understanding. In Section IIIB, we present theoretical results grounding our proposal; their proofs are given in the appendices.
Iiia The proposed methodology
The methodology operates on an input sequence of attributed graphs and, as sketched in Figure 1, it performs two steps: (i) Map (embed) input graphs to a vector domain . Embedding is carried out by means of the dissimilarity representation , which embeds a generic graph to a vector . (ii) Once a multivariate i.i.d. vector stream is formed, change detection is carried out by inspecting such a numerical sequence with the designer favourite method. The two phases are detailed in the sequel.
IiiA1 Dissimilarity representation
The embedding of a generic graph is achieved by computing the dissimilarity between and the prototype graphs in ,
(1) 
The vector is referred to as the dissimilarity representation of . Set has to be suitably chosen to induce informative embedding vectors. For a detailed discussion about dissimilarity representations and possible ways to select prototypes, we suggest [pkekalska+duin2005].
In order to make the mathematics more amenable, here we assume to be a metric distance; nevertheless, in practical applications, one can choose more general dissimilarity measures.
IiiA2 Multivariate vector stream
At time step the process generates graph , and the map embeds onto vector , inducing a multivariate stream whose elements lie in . Figure 1 depicts the continuous embedding of graph process.
Under the nominal condition for process , graphs are i.i.d. and drawn from probability space . Consequently, also vectors are i.i.d.. We now define a second probability space associated with embedded vectors ; in particular, here we propose to consider for the Borel’s algebra generated by all open sets in . is the push forward probability function of by means of , namely
(2) 
With such a choice of , we demonstrate in Appendix A that is a probability measure on .
IiiA3 Change detection test
By observing the i.i.d. vector stream over time we propose a multivariate change detection procedure to infer whether a change has occurred in the vector stream and, in turn, in the graph stream.
The change detection test is the statistical hypothesis test
where is the expected value during the nominal – stationary – regime and is the expectation operator. Statistic , which is applied to windows of the vector stream, is userdefined and is requested to increase when the process becomes non stationary.
Often, the test comes with a threshold so that if
(3) 
and the estimated change time is
Whenever the distribution of under hypothesis is available – or can be estimated – the threshold can be related to an userdefined significance level so that .
IiiB Theoretical results
In this section, we show some theoretical results related to the methodology presented in Section IIIA. In particular, we prove the following claims:

Once a change is detected in the dissimilarity space according to a significance level , then a change occurs in probability with significance level also in the graph domain;

If a change occurs in the graph domain having set a significance level , then, with a significance level , a change occurs also in the dissimilarity space.
Figure 2 depicts the central idea behind the methodology. Through transformation , we map graphs to vectors. In the transformed space, we consider the expectation and the sample mean associated with set obtained by embedding the graph set , and design a hypothesis test of the form
(4) 
where is a positive threshold and hypothesis is associated with a nominal, changefree condition. In this paper, we relate the test of (4) to a correspondent test operating in the graph domain, where are, respectively, the population and sample mean defined according to Fréchet [frechet1948elements]; for further details about Fréchet statistics refer to Appendix B.
Define and to be two significance levels, such that:
(5) 
In the sequel we relate the threshold to , so that also the significance levels and are in turn related to each other.
In order to address our problem, we introduce mild assumptions to obtain closedform expressions. Such assumptions are satisfied in most applications. More specifically,

We assume that the attributed graph space and the dissimilarity space are metric spaces; in particular, is chosen as a graph alignment space [jain2016statistical] – i.e., a general metric space of attributed graphs – and has to be induced by a norm.

We put ourselves in the conditions of [jain2016statistical] in order to take advantage of results therein; specifically, we assume that the Fréchet function is finite for any , and there exists a sufficiently asymmetric graph such that the support of is contained in a cone around . In this way, we are under the hypotheses of Theorems 4.1 and 4.2 of [jain2016statistical], which ensure the existence and uniqueness of the Fréchet population and sample mean in .

The embedding function is bilipschitz, i.e., there exist two constants , such that for any pair
(6) (7)
Let and be the cumulative density functions (CDFs) of and , respectively. Proposition 1 bounds the distribution in terms of . This fact yields the possibility to derive significance levels and thresholds in Equation (5) that are related.
In order to prove Proposition 1, we make use of two auxiliary results, Lemmas 1 and 2. At first we need to comment that, although in general and , differences are bounded in practice, as shown by Lemma 1.
Lemma 1.
Considering a set of i.i.d. random graphs and the associated embedded set , there exists a constant such that
Lemma 2
is used to derive bounds on the marginal distributions from bounds on the joint distributions that are useful in Proposition
1 to relate the threshold and the significance level in the graph space with the ones in the dissimilarity space.Lemma 2.
Consider a random variable and two statistics with associated CDFs , , respectively. If function is increasing and bijective and is a constant in , then
Claims (C1) and (C2) now follow from the relation between the CDFs and proven by the subsequent Proposition 1. In particular, regarding Claim (C1) Proposition 1 provides a criterion for setting a specific threshold for the test (4) for which we can state that is unexpectedly large with a significance level at most ; similarly, we obtain Claim (C2).
Proposition 1.
Consider a sample of i.i.d. graphs. Under assumptions (A1)–(A3), if and are the CDFs of statistics and , respectively, then for every there exist two values and , depending on but independent of , such that
The proofs are given in Appendix C. Results of Proposition 1 allows us to state the major claim (C1): given a sample , for any significance level and threshold , as in (5), we can set up a such that the confidence level of detecting a change in is at least . Specifically, for we have that
(8) 
Equation (8) states that if no change occurs in with confidence level , then a change will not be detected in according to threshold at least with confidence level . Indeed, this proves (C1) by contraposition. Similarly, Proposition 1 allows to prove Claim (C2). In fact, for any and as in (5), we can set so that and obtain
Iv Implementations of the methodology
This section describes two examples showing how to implement the proposed methodology and derive actual change detection tests. In Section IVA we present a specific method for generic families of graphs, whereas in Section IVB we further specialize the methodological results to a special case considering graphs with identified vertices.
Iva Change detection based on central limit theorem
Here, we consider specific techniques for prototype selection and change detection test. For the sake of clarity, we keep the subsection structure of Section IIIA. Both the prototype selection and change detection test require a training phase. For this reason, the first observed graphs of the stream will serve as training set , that we assume to be all drawn under nominal conditions.
IvA1 Dissimilarity representation
Since the change detection method operates in the dissimilarity space, we need to define the embedding that, at each time step, maps a generic graph to a vector .
We comment that the embedding is completely defined once the graph distance metric and prototype set are chosen. Here, we adopt a metric GED as graph distance, since it meets Assumption (A1) of being a graph alignment space.
Many approaches have been proposed to deal with the prototype selection problem, e.g., see [riesen+bunke2010] and references therein. While the proposed methodology is general and one can choose any solution to this problem, here we adopt the kCentres method [riesen2007graph]. The method selects prototypes so as to cover training data with balls of equal radius. Specifically, the algorithm operates as follows:

select random prototypes ;

for each , consider the set of all such that ;

for update the prototypes with a graph minimising ;

if the prototype set did not change from the previous iteration step then exit, otherwise go to step 2.
In order to improve the robustness of the kCentres algorithm, we repeat steps 1–4 by randomizing initial conditions and select the final prototype set to be
where is the collection of prototype sets found at each repetition.
IvA2 Multivariate vector stream
Every time the process generates a graph , we embed it as by using the prototype set identified with the kCentres approach. This operation results in a multivariate vector stream on which we apply the change detection test.
IvA3 Change detection test
We consider here a variation of the cumulative sum (CUSUM) test [alippi2006adaptive] to design the change detection test. CUSUM is based on the cumulative sum chart [page1954continuous], it has been proven to be asymptotically optimal [basseville1993detection] and allows for a simple graphical interpretation of results [manly2000cumulative]. Here, the CUSUM is adapted to the multivariate case. However, we remind that, in principle, any change detection test can be used on the embedded sequence.
We batch the observed embedding vectors into nonoverlapping samples of length , where index represents the th data window. For each window, we compare the sample mean estimated in the training set with that estimated in the th window, i.e., and compute the discrepancy
By assuming that are i.i.d., and given sufficiently large and
, the central limit theorem grants that
and are normally distributed. In particular, and share the same expectation , while covariance matrices are and , respectively.As a specific choice of , we adopt the Mahalanobis’ distance, i.e., where
(9) 
with matrix , i.e., the covariance matrix of . In our implementation, we consider as covariance matrix
.For each stationary window , the squared Mahalanobis’ distance is distributed as a .
The final statistic inspired by the CUSUM test is defined as
(10) 
The difference increases with the increase of the discrepancy in (9) and positive values support the hypothesis that a change occurred, whereas negative values suggest that the system is still operating under nominal conditions. This behaviour resembles that of the original CUSUM increment associated with the loglikelihood ratio. In particular, the parameter can be used to tune the sensitivity of the change detection; in fact, if is the
, then produces a negative increment with probability and a positive one with probability .The last parameter to be defined is the threshold as requested in (3). Since the nonnominal distribution is unknown, a common criterion suggests to control the rate of false alarms, . In sequential detection tests, a related criterion requires a specific average run length
under the null hypothesis (ARL0)
[frisen2003statistical]. ARL0 is connected to the false positive rate in the sense that setting a false alarm rate to yields an ARL0 of. Since we propose a sequential test, statistic
depends on statistics at preceding times. As a consequence, since we wish to keep fixed, we end up with a timedependent threshold . As done in [zamba2006multivariate, golosnoy2009multivariate], we numerically determine thresholds through Monte Carlo sampling by simulating a large number of processes of repeated independent realisations. Threshold is then the quantile of the estimated distribution of .We point out that, when setting a significance level for the random variable , we are implicitly conditioning to the event ; in fact, when exceeds , we raise an alarm and reconfigure the detection procedure.
IvA4 Theoretical results
The choice of Mahalanobis’ distance ensures that almost all assumptions in Section IIIB are met. In particular, the Mahalanobis’ distance meets the requirements of Assumption (A1). Then, the following Lemma 3 provides a lower bound of the form (6) in Assumption (A3); specifically, the lemma shows that, up to a positive factor, the distance between two graphs is larger then the one between the associated embedding vectors.
From these properties, we can apply Proposition 1 to state that Claim (C1) holds. Hence for any , we can set a specific threshold yielding a confidence level at least .
Lemma 3.
The proof is reported in Appendix D. Distance is welldefined only when is positive definite, a condition implying that selected prototypes are not redundant.
IvB A special case: graphs with identified vertices
Here we take into account the particular scenario where the attribute function of assigns numerical attributes in to vertices and edges of ; the vertex set is a subset of a predefined finite set , with . The peculiarity of this space resides in the fact that any vertex permutation of a graph leads to a different graph. Many realworld applications fall in this setting, for instance correlation graphs obtained by signals coming from sensors or graphs generated by transportation networks.
We show an example of method for this setting, which complies with the methodology and satisfies Assumption (A3). This fact follows from the existence of an injective map from to the matrix set. Indeed, we represent each graph with its weighted adjacency matrix whose row/column indices univocally correspond to the vertices in . By endowing with the Frobenius distance^{1}^{1}1In , is a graph alignment distance, as formally shown in Appendix DB. , map is an isometry.
Being the codomain of an Euclidean space, we compute a matrix whose columns constitute a dimensional vector configuration related to the prototype graphs; this is done via Classical Scaling [pkekalska+duin2005], that is, for all pairs of prototypes . As usual, we consider the smallest possible that preserves the data structure as much as possible. Successively, for any dissimilarity vector , we define
to be a linear transformation of
, obtained by squaring all the components of ; the matrix is the centering matrix . We apply the same procedure of Section IV, considering instead of : matrix is derived from the nonsingular covariance matrix^{2}^{2}2 is nonsingular, since there are no isolated points in and as a consequence of the selection of ., and the statistic is the Mahalanobis distance .Considering the space and the above transform, we claim that the following lemma holds. Lemma 4 proves the fulfilment of Assumption (A3).
Lemma 4.
For any positive definite matrix ,
where , . is the th eigenvalue in descending order of magnitude.
The proof is reported in Appendix D.
V Experiments
The proposed methodology can operate on very general families of graphs. Besides the theoretical foundations around which the paper is built, we provide some experimental results showing the effectiveness of what proposed in real change detection problems in streams of graphs. In particular, we consider the method introduced in Section IVA as an instance of our methodology. Source code for replicating the experiments is available in [zambon2017cdg].
Va Experiment description
VA1 Data
The experimental evaluation is performed on the wellknown IAM benchmarking databases [riesen+bunke2008]. The IAM datasets contain attributed graphs representing drawings and biochemical compounds. Here we consider the Letters, Mutagenicity, and AIDS datasets.
The Letters dataset contains 2dimensional geometric graphs. As such, each vertex of the graphs is characterized by a real vector representing its location on a plane. The edges define lines such that the graphical planar representation of the graphs resembles a latinscript letter. The dataset is composed of 15 classes (one for each letter^{3}^{3}3The IAM letters database [riesen+bunke2008] considers only noncurved letters; hence, e.g., letters A and E are considered, whereas B and C are excluded.) containing 150 instances each.
Conversely, the Mutagenicity and AIDS datasets contain biological molecules. Molecules are represented as graphs by considering each atom as a vertex and each chemical link as an edge. Each vertex is attributed with a chemical symbol, whereas the edges are labelled according to the valence of the link. Both datasets contain two classes of graphs: mutagenic/nonmutagenic for Mutagenicity and active/inactive for AIDS. The two datasets are imbalanced in terms of size of each classes, in particular Mutagenicity has 2401 mutagenic and 1963 nonmutagenic molecules; AIDS contains 400 active and 1600 inactive molecules.
We considered these datasets because they contain different types of graphs with variegated attributes (numerical and categorical). We refer the reader to [riesen+bunke2008] and references therein for a more indepth discussion about the datasets.
VA2 Simulating the generating process
For each experiment in Table I, we consider two collections of graphs containing all possible observations in the nominal and nonnominal regimes, respectively. Each collection is composed by graphs present in one or more predefined classes of the dataset under investigation. The collections have to be different, but they do not need to be disjoint; as such, some graphs can belong to both collections.
Next, we simulate the process by bootstrapping graphs from the first collection up to the predefined change time ; this is the nominal regime. After , we bootstrap objects from the second collection hence modelling a change.
Regarding molecular datasets, we considered two distinct experiments. For the Mutagenicity (AIDS) experiment, we set the nonmutagenic (inactive) class as nominal collection and mutagenic (active) class as nonnominal one. On the other side, for the Letter dataset we design four different experiments depending on which classes will populate the collections. Table I reports the settings of all the experiments and next section describes the relevant parameters.
Collection  
Experiment ID  Dataset  Nominal  Nonnominal 
LD2  Letter  A,E  F,H 
LD5  Letter  A,E,F,H,I  K,L,M,N,T 
LO  Letter  A,E,F,H  F,H,I,K 
LS  Letter  A,E,F,H,I  F,H,I 
MUT  Mutagenicity  nonmutagenic  mutagenic 
AIDS  AIDS  inactive  active 
VA3 Parameters setting
For all experiments, the offset
is set to the third quartile of the
distribution. The timedependent threshold has been numerically estimated by Monte Carlo sampling. We drew one million processes of i.i.d. random variables by taking the square root of i.i.d. random variables distributed as . For each obtained process (stream), we computed the sequence of cumulative sums like in (10) and estimated the threshold as the quantile of order 1/ARL0, with ARL0 (windows).We divided the training set in two disjoint subsets, , used during the prototype selection and change detection learning phases, respectively. We set and , afterwards we generated a stream of graphs containing ARL0 observations associated with the operational phase. The change is forced at time . As for the distance , we considered the bipartite GED implemented in [riesen2013novel], where we selected the Volgenant and Jonker assignment algorithm. The other GED parameters are set according to the type of graphs under analysis, i.e., for geometric graphs we consider the Euclidean distance between numerical attributes, and a binary (01 distance) for categorical attributes. The kCentres procedure is repeated 20 times.
We believe that the selected parameter settings are reasonable. Nevertheless, a proper investigation of their impact w.r.t. performance metrics is performed in a companion paper [zambon2017detecting], hence it is outside the focus of the present one.
VA4 Figures of merit
We assess the performance of the proposed methodology by means of the figures of merit here described. Such measurements are obtained by replicating each experiment one hundred times; we report the average of the observed measures with their estimated 95% confidence interval (95CI) or standard deviation (std).
First of all, we consider the observed ARL0 introduced in Section IIIA3 and the delay of detection (DoD). Both of them are computed as the average time lapses between consecutive alarms, but limiting to those alarms raised before and after time , respectively. From them we estimate the rate of detected changes (DCR), by assessing the rate of simulations in which the DoD is less then the observed ARL0. Finally, we consider also the estimated rate of false anomalies within 1000 samples (FA1000). This is computed as the ratio between the count of raised false alarms and the total number of thousands of time steps under the nominal condition.
We point out that the measures ARL0, DoD, and FA1000 are computed with the window as unitary time step.
VA5 Baseline methods
As previously mentioned, stateoftheart change detection methods for graph streams usually assume a given topology with a fixed number of vertices and/or simple numeric features for vertices/edges. As reported in [ranshous2015anomaly], considering a variable topology, a common methodology for anomaly detection on graphs consists of extracting some topological features at each time step and then applying a more general anomaly detector on the resulting numeric sequence. Accordingly, in addition to the method proposed in Section IV, we consider two baseline methods for comparison. More precisely, we considered two topological features: the density of edges and the spectral gap of the Laplacian matrix [chung1994]. The particular choices of and can be justified by considering that both features are suitable for describing graphs with a variable number of vertices and edges. We implemented two CUSUMlike change detection tests as in Section IV where, for , the statistic is now given by , and is numerically estimated in the training phase.
In addition, we consider a further baseline implemented as a degenerate case of our method by selecting only prototype and window size . This last baseline is introduced to show that the strength of our methodology resides also in the embedding procedure, and not only in the graph distance .
VB Results on IAM graph database
Experiment  DCR  ARL0  DoD  FA1000  
dataset  mean  95CI  mean  95CI  mean  95CI  mean  std  
LD2  4  5  1.000  [1.000, 1.000]  189  [103, 333]  26  [8, 66]  1.135  0.368 
LD5  4  5  0.380  [0.290, 0.480]  175  [91, 271]  411  [9, 2149]  1.207  0.390 
LD5  4  25  0.970  [0.930, 1.000]  183  [88, 306]  41  [1, 184]  0.233  0.086 
LD5  4  125  1.000  [1.000, 1.000]  305  [61, 1148]  2  [1, 5]  0.045  0.038 
LD5  8  25  0.990  [0.970, 1.000]  175  [93, 372]  10  [1, 54]  0.245  0.078 
LD5  8  125  1.000  [1.000, 1.000]  255  [50, 928]  1  [1, 1]  0.054  0.047 
LO  4  5  0.230  [0.150, 0.310]  191  [120, 344]  410  [21, 1136]  1.096  0.318 
LO  4  25  0.790  [0.710, 0.870]  205  [101, 355]  123  [3, 723]  0.205  0.070 
LO  4  125  0.940  [0.890, 0.980]  291  [55, 725]  37  [1, 405]  0.045  0.035 
LO  8  25  0.980  [0.950, 1.000]  163  [92, 306]  24  [2, 132]  0.260  0.080 
LO  8  125  1.000  [1.000, 1.000]  238  [52, 620]  2  [1, 5]  0.052  0.053 
LS  4  5  0.720  [0.630, 0.810]  164  [90, 359]  132  [33, 349]  1.321  0.418 
LS  4  25  1.000  [1.000, 1.000]  193  [103, 372]  28  [4, 123]  0.223  0.075 
AIDS  4  5  1.000  [1.000, 1.000]  175  [100, 337]  1  [1, 1]  1.221  0.364 
MUT  4  5  0.050  [0.010, 0.100]  37  [18, 89]  70  [39, 124]  6.294  2.159 
MUT  4  25  0.800  [0.720, 0.880]  52  [19, 145]  53  [4, 348]  1.075  0.521 
MUT  4  125  0.980  [0.950, 1.000]  153  [8, 820]  5  [1, 22]  0.276  0.284 
MUT  8  25  0.920  [0.860, 0.970]  42  [15, 91]  15  [2, 75]  1.242  0.680 
MUT  8  125  1.000  [1.000, 1.000]  112  [9, 540]  2  [1, 4]  0.292  0.267 
Experiment  DCR  ARL0  DoD  FA1000  

dataset  index  mean  95CI  mean  95CI  mean  95CI  mean  std  
LD2  M1  25  1.000  [1.000, 1.000]  238  [106, 594]  15  [2, 87]  0.194  0.082 
LD2  den  1  1.000  [1.000, 1.000]  194  [68, 556]  41  [22, 104]  6.475  3.337 
LD2  SG  1  0.850  [0.780, 0.920]  210  [75, 394]  133  [55, 274]  6.460  3.342 
LD5  M1  25  0.780  [0.700, 0.860]  222  [96, 407]  85  [2, 542]  0.200  0.098 
LD5  den  1  0.180  [0.110, 0.260]  219  [70, 485]  369  [91, 1147]  5.775  3.346 
LD5  SG  1  1.000  [1.000, 1.000]  209  [74, 674]  33  [18, 53]  6.058  2.971 
LO  M1  25  0.500  [0.400, 0.600]  218  [100, 459]  304  [7, 1199]  0.203  0.082 
LO  den  1  1.000  [1.000, 1.000]  194  [70, 549]  5  [4, 5]  6.853  3.428 
LO  SG  1  0.130  [0.070, 0.200]  221  [89, 613]  352  [115, 973]  5.558  2.555 
LS  M1  25  0.880  [0.810, 0.940]  230  [116, 469]  93  [4, 636]  0.189  0.066 
LS  den  1  1.000  [1.000, 1.000]  188  [79, 385]  32  [20, 48]  6.237  2.777 
LS  SG  1  0.080  [0.030, 0.140]  189  [91, 455]  308  [124, 771]  6.100  2.247 
AIDS  M1  25  1.000  [1.000, 1.000]  229  [95, 489]  1  [1, 1]  0.198  0.081 
AIDS  den  1  1.000  [1.000, 1.000]  207  [67, 505]  4  [2, 6]  6.312  3.245 
AIDS  SG  1  1.000  [1.000, 1.000]  197  [79, 443]  93  [48, 152]  6.077  2.744 
MUT  M1  25  0.260  [0.180, 0.350]  154  [24, 575]  484  [13, 2208]  0.506  0.441 
MUT  den  1  0.230  [0.150, 0.310]  194  [69, 381]  267  [96, 741]  6.225  3.175 
MUT  SG  1  0.030  [0.000, 0.070]  208  [72, 437]  688  [134, 2342]  5.940  2.997 
For the sake of readability, we show results for our method and baselines in two different tables, that is, Table II and III, respectively.
In all experiments shown in Table I, there is a parameter setting achieving a detection rate (DCR) statistically larger than . Indeed, in Table II the 95% confidence interval (95CI) of the DCR is above (see symbol in the table). Looking at Table II more in detail, we notice that both the window size and the number of prototypes yield higher DCR. In particular, this can be seen in LO experiments, where all DCR estimates have disjoint 95CIs (i.e., differences are statistically significant). The same phenomenon appears also for LD5, LS, and MUT, e.g., see symbols and in Table II. As far as other figures of merit are concerned, we do not observe statistical evidence of any trend. Still, with the exception of a few cases in MUT, all 95CIs related to ARL0 contain the target value ARL0=200; hence, we may say that the threshold estimation described in Section VA3 completed as expected.
Here, we limit the analysis to the proposed parameter settings for or , since we already reach the highest possible DCR, achieving one hundred detections out of one hundred. We believe that, by increasing the window size the false alarms will decrease, as our method relies on the central limit theorem. Concerning the number of prototypes, we point out that, in the current implementation of the methodology, the number of parameters to be estimated scales as ; accordingly, we need to increase the number of samples.
The second experimental analysis addresses the performance assessment of the three baselines of Section VA5 on the experiments of Table I. The results reported in Table III show that, in some cases, the considered baselines achieve sound performance, which is comparable to the one shown in Table II. Comparing Table III with Table II, by intersecting the 95% confidence intervals, we notice that there is always one of the proposed methods which attains DCR that is statistically equivalent or better than the baselines, see symbol in Table II. In particular, the proposed method performs significantly better than the baselines on the MUT dataset.
Finally, Table III shows that the method based on edge density is significantly more accurate in terms of DCR than the one based on spectral gap in almost all experiments; confidence intervals at level 95% do not intersect. The first method performed better than the degenerate one (M1) with only one prototype: see LO and LS. Conversely, M1 outperforms on LD5.
Vi Conclusions
In this paper, we proposed a methodology for detecting changes in stationarity in streams of graphs. Our approach allows to handle general families of attributed graphs and is not limited to graphs with fixed number of vertices or graphs with (partial) onetoone correspondence between vertices at different time steps (uniquely identified vertices). The proposed methodology consists of an embedding step, which maps input graphs onto numeric vectors, bringing the change detection problem back to a more manageable setting.
We designed and tested a specific method as an instance of the proposed methodology. The embedding has been implemented here as a dissimilarity space representation, hence relying on a suitable set of prototype graphs, which, in our case, provide also a characterization of the nominal condition, and a dissimilarity measure between graphs, here implemented as a graph edit distance. The method then computes the Mahalanobis’ distance between the mean values in two windows of embedded graphs and adopts a CUSUMlike procedure to identify changes in stationarity.
We provided theoretical results proving that, under suitable assumptions, the methodology is able to relate the significance level with which we detect changes in the dissimilarity space with a significance level that changes also occurred in the graph domain; also the vice versa has been proven. We also showed that our methodology can handle more basic, yet relevant scenarios with uniquely identified vertices.
Finally, we performed experiments on IAM graph datasets showing that the methodology can be applied both to geometric graphs (2dimensional drawings) and graphs with categorical attributes (molecules), as instances of possible data encountered in realworld applications. Results show that the proposed method attains at least comparable (and often better) results w.r.t. other change detectors for graph streams.
In conclusion, we believe that the proposed methodology opens the way to designing sound change detection methods for sequences of attributed graphs with possibly timevarying topology and nonidentified vertices. In future studies, we plan to work on realworld applications and focus on the automatic optimization of relevant parameters affecting the performance.
Appendix A Correspondence between probability spaces
Consider the measurable space introduced in Section IIIA2, and the probability space of Section IA.
Let us define the preimage function , with and consider the smallest algebra containing all open balls^{4}^{4}4A ball is defined as a set of all graphs having distance w.r.t. a reference graph smaller than the radius . and preimage sets w.r.t. any
and a generic probability density function
on . Then, we can define the function as in Equation (2).The triple is a probability space. The following three properties provide a proof that is a probability measure on :

,

, and

for any countable collection of pairwise disjoint sets, , hence the sets are pairwise disjoint and .
Notice also that, indicating with the image set of , we have for any such that .
Appendix B Fréchet mean
Given a probability space defined on a metric space , we consider a random sample .
Ba Definition of Fréchet mean and variation
For any object , let us define the functions and . A Fréchet sample (population) mean is any object attaining the minimum of the function (). We point out that the minimum might not exist in and, if it does, it can be attained at multiple objects. Whenever the minimum exists and is unique, we refer to it as (). In addition, we define Fréchet sample (population) variation as the infimum ().
BB Fréchet mean in Euclidean spaces
In the case of a set and distance , the space is Euclidean. Firstly, we show that and , then we show that
(11) 
1. The following equality holds
(12) 
as we will show below, in Part 3. This result proves that the minimum is attained in the expectation .
2. Similarly, in the ‘sample’ case
proving that .
3. The results of previous Parts 1 and 2 are derived from the following three equalities: for any
4. Let us move on proving the second result. Notice that
5. Then,
Now, thanks to the independence of the observations