Anomaly and Change Detection in Graph Streams through Constant-Curvature Manifold Embeddings

05/03/2018 ∙ by Daniele Zambon, et al. ∙ University of Exeter USI Università della Svizzera italiana 0

Mapping complex input data into suitable lower dimensional manifolds is a common procedure in machine learning. This step is beneficial mainly for two reasons: (1) it reduces the data dimensionality and (2) it provides a new data representation possibly characterised by convenient geometric properties. Euclidean spaces are by far the most widely used embedding spaces, thanks to their well-understood structure and large availability of consolidated inference methods. However, recent research demonstrated that many types of complex data (e.g., those represented as graphs) are actually better described by non-Euclidean geometries. Here, we investigate how embedding graphs on constant-curvature manifolds (hyper-spherical and hyperbolic manifolds) impacts on the ability to detect changes in sequences of attributed graphs. The proposed methodology consists in embedding graphs into a geometric space and perform change detection there by means of conventional methods for numerical streams. The curvature of the space is a parameter that we learn to reproduce the geometry of the original application-dependent graph space. Preliminary experimental results show the potential capability of representing graphs by means of curved manifold, in particular for change and anomaly detection problems.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Not all data structures are well-described by Euclidean geometry. For instance, important examples include hyperbolic geometry underlying complex networks [wu2015emergent, boguna2010sustaining], numeric features augmented with pairwise relations [poincare_embedding]

, and irregular/curved shapes typical of computer vision

[bronstein2016geometric, straub2015dirichlet].

Here, we elaborate over our previous contribution on change detection in sequences of attributed graphs [2017arXiv170606941Z]. We propose and experimentally evaluate the effectiveness of using non-Euclidean embedding spaces for graphs in anomaly and change detection problems. More precisely, in addition to the embedding based on dissimilarity representation, e.g, see [2017arXiv170606941Z], here we take into account mappings into three additional embedding spaces characterised by a constant curvature: (i) Euclidean -space, (ii) -sphere and (iii) -dimensional hyperbolic space [bridson2013metric]. Constant-curvature manifolds allow the designer to easily learn the curvature from the application data, which is controlled by a single scalar parameter. In turn, this allows to identify the appropriate embedding as (i) Euclidean, ; (ii) spherical, ; and (iii) hyperbolic, . We denote a generic manifold with constant curvature as . In particular, we show that the family of graphs considered in this paper is better embedded in a hyperbolic manifold than in a spherical one.

In this paper, we address the problem of monitoring a stream of i.i.d. attributed graphs generated by process according to nominal distribution . We say that a change in stationarity occurs at change time if graph for is drawn according to non-nominal distribution , i.e.,

In the same framework, an anomalous event is a graph which is very unlikely to have been generated by distribution , given a confidence level. The proposed monitoring methodology for anomaly and change detection is the same, the difference is the final statistic adopted for the detection (see Section 4).

In the sequel, we consider the space of attributed graphs and a distance between two graphs (e.g., the graph edit distance [riesen2013novel]). Since, the space of attributed graphs is rarely described by an Euclidean geometry, acting directly on is non-trivial and usually computationally expensive due to the cost associated with the graph matching procedures [gm_survey, emmert2016fifty]. In order to process graphs for change detection, this paper proposes an embedding procedure where the generic graph is mapped onto a point lying on a constant-curvature manifold . A numeric stream is hence generated by the embedding, through map , and a detection test is finally applied to the -stream.

The present paper contributes in exploiting embedding of graphs onto Riemannian manifolds in order to, subsequently, apply anomaly and change detection tests on the manifold; it proposes also a prototype-based embedding map which can also deal with the out-of-sample problem, that is, the problem of embedding newly observed graphs without system reconfiguration.

The paper is structured as follows. Section 2 introduces the considered constant-curvature manifolds. In Section 3, we describe the embedding techniques and how they are extended in the out-of-sample case. Section 4 provides details on the anomaly and change detection tests taken into account in this paper. Preliminary experiments and related discussions follow in Section 5. Conclusions and future directions are provided in Section 6.

2 Constant-curvature manifolds

A Riemannian manifold is a differential manifold with a scalar product defined on the tangent bundle. By means of the scalar product, a metric distance is derived. We denote a generic manifold with , where is the domain space and is the geodesic distance between two points in [bridson2013metric]. Manifold is also locally homeomorphic to a -dimensional Euclidean space; accordingly, we say that the dimension of the manifold is . is a special family of Riemannian manifolds characterised by a constant curvature that defines the shape of the manifold.

We explore three constant-curvature manifolds by acting on : Euclidean, ; spherical, ; and hyperbolic, . Manifold is the usual -dimensional Euclidean space equipped with the metric distance

(1)

where is the inner product .

The -dimensional spherical manifold () has support on -sphere of radius

As a distance between two points , the geodesic metric distance is

(2)

Hyperbolic manifolds are constructed over a -dimensional pseudo-Euclidean space equipped with the scalar product111Being not positive definite, this is not a typical inner product. With little abuse of notation, we denote by both the Euclidean and pseudo-Euclidean scalar products. We believe this fact will not mislead the reader, since it should be clear by the curvature which scalar product is involved.

Similarly to the spherical case, for every , we define the hyperbolic manifold as

with . The geodesic metric distance is

(3)

Quantity plays an analogous role of the radius in the spherical case; as such, we will call it radius also in the hyperbolic case.

As for conventional Euclidean spaces, the distance between two points on constant-curvature manifolds can be obtained by means of the scalar product. Further, we can represent the scalar product as a matrix multiplication in the form , where

is the identity matrix for

, and the identity matrix except for element in position set to in the case.

3 Embedding based on dissimilarity matrix

The proposed methodology consists in mapping a sequence of attributed graphs ,

, into a sequence of numeric vectors,

, . By following [wilson2014spherical], we consider a symmetric distance measure operating in and propose to adopt an embedding aiming at preserving the distances, i.e.,

(4)

is the embedding map and is the corresponding distance operating in ; see Equations (1), (2) and (3).

The method configuration phase is composed of three, conceptually different, steps. In the first one, we learn the underlying manifold by processing graphs taken from a finite training set . Since there is a one-to-one correspondence between the curvature

and manifolds introduced above, this step aims at estimating curvature

for the application at hand. In the second step, we select a set of embedded points associated with , say , satisfying (4). In other terms, the distances between pairs of vectors should match those of the original graphs as much as possible (see (4)). The third step addresses the out-of-sample problem, that is, how to apply map to all graphs not in the training set. In order to solve this problem, we propose here to adopt a prototype-based technique, where prototypes serve as landmarks to find the corresponding position of graphs on . Next subsections will discuss the three steps.

3.1 Embedding the training set

The first two steps of the procedure, namely defining a curvature and determining a configuration for training graphs , are carried out simultaneously; as such, they are treated as a single step.

Let us arrange the vectors of configuration row-wise so that set is represented as the matrix . Let be the square matrix containing the pair-wise distances between graphs in , in the sense that generic element of is .

As mentioned in Section 2, the distance can be derived by the scalar product , as such we consider matrix , collecting all pair-wise scalar products. Matrix can be written as ; however, due to the relation between distances and scalar products, matrix can be computed (Equations (1), (2), (3)) as

(5)

where represents the exponentiation operator to power and operations , , and are applied component-wise to matrix . is the centering matrix. Further details can be found in [pekalska2005dissimilarity, wilson2014spherical].

Ideally, we would like to solve equation w.r.t.  to select embedded points , where is given by (5). In a practical applications, however, obtaining isometric embeddings is difficult and requires instead to solve the optimisation problem

(6)

subject to the constraint that every lays on . (6) can be solved thanks the eigen-decomposition of symmetric matrix . In the sequel, we adopt the convention that computed diagonal matrix

stores the eigenvalues in ascending order. We need to solve (

6) by assuming a null curvature space at first and, then, the not null curvature case.

In case of null curvature, the problem can be solved by means of the classical Multi-Dimensional Scaling (MDS) [pekalska2005dissimilarity]. The vector configuration of the solution in hence

where and are the matrices reduced to the -largest eigenvalues222Negative eigenvalues can be discarded. It should be commented that their presence is a strong indicator that data do not well fit with an Euclidean space.. Here the problem is unconstrained and can be any matrix in .

To solve (6) in curved spaces, we exploit a different formulation

which is an equivalent formulation of (6) where . Assuming to be diagonal, with diagonal vector , the problem becomes

(7)

where is the diagonal vector containing the eigenvalues of . Solution of (7) depends on the sign of curvature , which also requires the application of different constraints on [wilson2014spherical].

In particular, for , it is required that holds for all and . The solution vector lays on the line passing through and , and can be always provided in closed form. Finally, a solution to (6), given (7), is obtained by dropping the smallest components of and reconstructing from it, i.e., .

In the hyperbolic case, (7) can be formulated likewise with the difference that we need to take into account constraints coming from the pseudo-Euclidean geometry. Optimization problem (7) is formulated in the same way. The constraint is for all , , and . Finding a solution to the original problem (6) by means of (7) is not always possible as in the spherical case; this is the case when , where the line passing through and does not intersect the feasible set. Except for that case, a solution is always found in closed form on the line touching and . A solution is constructed from and by dropping the smallest components of except for ; therefore, the solution is .

In both spherical and hyperbolic cases, dropping some components of might lead to a vector configuration lying outside the feasible set. When data-point are close to set , this can be mitigated by projecting the points back on the manifold.

The procedure described in this section is performed for a fixed radius . A possible way to identify a suitable curvature is to compare the resulting distance matrix with matrix , then, a suitable is selected as the one minimising the quantity

(8)

Such a measure estimates the distortion of training set and guides the designer towards the most appropriate embedding.

3.2 Out-of-sample embedding

During the test (operational) phase of the method, every time a new graph is observed, we propose to map it onto a vector lying on the manifold . Here, we consider a set of suitably chosen landmarks (prototypes) on the manifold , and find the position of the new graph by means of them. The prototype set is composed by graphs . Set is associated with a set collecting the corresponding positions on the manifold; As done before, we assume to be in matrix form with each row being given by .

Let us assume set is given. In order to embed graph , the first step consists in computing its dissimilarity representation : the generic -th component of is the graph distance . Once is computed, the embedded vector is

where is the vector obtained by applying (5) to .

The selection of the prototype set is performed only once during the training phase and, hence, does not change during the operational one. We adopt the -centres algorithm on the configuration . The algorithm covers the configuration points in using balls of equal radius centred in the prototype images ; this is done by minimising the maximal distance – over points – between and the closest , until a suitable convergence criterion is met [pkekalska2006prototype].

4 Detection test

In this paper, we monitor a process generating attributed graphs in order to detect changes in the distribution as well as anomalous events that might occur. As we have seen in the previous section, the proposed methodology consists in embedding input graphs into a manifold , and perform the detection to numeric stream . Denote as the nominal distribution of induced by through mapping . Accordingly, the non-nominal distribution of induced by is denoted by .

The detection method requires the evaluation of the mean of random vector that, for a Riemannian manifold, is defined differently from the Euclidean case. In particular, here, we adopt the Fréchet mean [bhattacharya2017omnibus]:

(9)

In the case of a finite i.i.d. sample , we can estimate using its sample counterpart:

(10)

In order to identify changes, we apply a change detection test inspired by the CUSUM accumulation process. At every time step , we compute a statistic and sequentially update another statistic that aggregates information from different time steps. Statistic is computed as follows

where is a sensitivity parameter. During the nominal regime, statistic is expected to be zero. The hypothesis test is of the form:

When statistic exceeds a certain threshold , a change is detected with a given confidence. The time of the change is

Quantity is set to yield a user-defined significance level

(11)

The detection of an anomalous graph is performed similarly, but relies on a different hypothesis test. An anomaly is detected at time if the observed (same statistic as the one used for the change detection) is larger than a threshold . The critical value can be estimated according to a user-defined significance level

5 Experiments

Experiment DCR
Embedding Difficulty mean 95% C.I. mean 95% C.I. mean 95% C.I.
Graph Domain 2 - - 1.000 [1.000, 1.000] 124 [41, 435] 1 [1, 2]
Graph Domain 4 - - 0.990 [0.970, 1.000] 124 [41, 435] 6 [2, 38]
Graph Domain 6 - - 0.890 [0.830, 0.950] 124 [41, 435] 33 [4, 178]
Graph Domain 8 - - 0.650 [0.560, 0.740] 124 [41, 435] 85 [19, 248]
Graph Domain 10 - - 0.600 [0.500, 0.690] 124 [41, 435] 114 [25, 489]
Graph Domain 12 - - 0.560 [0.460, 0.660] 124 [41, 435] 115 [33, 398]
Spherical Man. 2 30 15 0.970 [0.930, 1.000] 93 [37, 239] 3 [1, 10]
Spherical Man. 4 30 15 0.960 [0.920, 0.990] 93 [37, 239] 6 [1, 31]
Spherical Man. 6 30 15 0.970 [0.930, 1.000] 93 [37, 239] 11 [3, 44]
Spherical Man. 8 30 15 0.820 [0.740, 0.890] 93 [37, 239] 67 [14, 317]
Spherical Man. 10 30 15 0.600 [0.500, 0.690] 93 [37, 239] 90 [22, 257]
Spherical Man. 12 30 15 0.580 [0.480, 0.680] 93 [37, 239] 89 [27, 237]
Euclidean Man. 2 30 15 0.040 [0.010, 0.080] 94 [40, 209] 137 [5, 369]
Euclidean Man. 4 30 15 0.980 [0.950, 1.000] 94 [40, 209] 19 [3, 87]
Euclidean Man. 6 30 15 1.000 [1.000, 1.000] 94 [40, 209] 21 [7, 60]
Euclidean Man. 8 30 15 0.550 [0.450, 0.650] 94 [40, 209] 86 [28, 215]
Euclidean Man. 10 30 15 0.500 [0.400, 0.600] 94 [40, 209] 107 [32, 328]
Euclidean Man. 12 30 15 0.600 [0.500, 0.690] 94 [40, 209] 102 [31, 312]
Hyperbolic Man. 2 30 15 0.960 [0.920, 0.990] 102 [38, 254] 8 [2, 42]
Hyperbolic Man. 4 30 15 1.000 [1.000, 1.000] 102 [38, 254] 4 [2, 11]
Hyperbolic Man. 6 30 15 0.990 [0.970, 1.000] 102 [38, 254] 11 [4, 48]
Hyperbolic Man. 8 30 15 0.740 [0.650, 0.820] 102 [38, 254] 73 [17, 285]
Hyperbolic Man. 10 30 15 0.690 [0.600, 0.780] 102 [38, 254] 82 [25, 227]
Hyperbolic Man. 12 30 15 0.590 [0.490, 0.690] 102 [38, 254] 90 [28, 276]
Dissimilarity Repr. 2 15 -1 1.000 [1.000, 1.000] 68 [35, 131] 1 [1, 1]
Dissimilarity Repr. 4 15 -1 1.000 [1.000, 1.000] 68 [35, 131] 5 [3, 7]
Dissimilarity Repr. 6 15 -1 0.450 [0.350, 0.550] 68 [35, 131] 108 [16, 441]
Dissimilarity Repr. 8 15 -1 0.050 [0.010, 0.100] 68 [35, 131] 262 [0, 575]
Dissimilarity Repr. 10 15 -1 0.040 [0.010, 0.080] 68 [35, 131] 132 [0, 345]
Dissimilarity Repr. 12 15 -1 0.050 [0.010, 0.100] 68 [35, 131] 199 [0, 560]
Table 1: Comparisons between embedding on constant-curvature manifolds w.r.t. to a change detection operating directly in the graph domain and on the dissimilarity representation. Tests are performed on different datasets of increasing difficulty (larger numbers indicate more difficult detection problems). Parameters and represent the dimension of the manifold and the number of prototypes, respectively. Symbol highlights when the Euclidean method performed statistically worse than spherical or hyperbolic method. Symbol shows when hyperbolic case performed statistically better than spherical one.
Figure 1: Embedding distortion of Equation (8) observed for different curvatures . The logarithm of the distortion is computed in the first step of the training stage. The curve is composed of three distinct parts, corresponding to the hyperbolic, Euclidean and spherical manifolds. The curve is not continuous in , and the central filled circle represents the distortion measured for the Euclidean embedding (). The range of considered curvatures has been selected according to the fact that the maximal distance observable on a -sphere is .

The experimental campaign is designed to assess the impact of using constant-curvature manifolds as embedding spaces when dealing with anomaly and change detection problems in graph streams. In this section, the experimentation limits to the change detection problem, being the anomaly detection a simpler one.

In principle, we should select the mapping by estimating curvature for the problem at hand. However, for comparison and discussion, here we report the performance for three values of : , and estimated on the spherical and hyperbolic manifolds. These settings are denoted in Table 1 as Euclidean Man., Spherical Man., and Hyperbolic Man., respectively.

In order to compare results, we applied the change detection test in the graph domain (Graph Domain) as explained later, which we consider as a sort of “ground truth”. In fact, all embedding methods considered here introduce a distortion of the original pair-wise distances, which we expect to affect change detection results. We consider also the dissimilarity representation, Dissimilarity Repr., as further element of comparison [2017arXiv170606941Z].

Experiments in the graph domain are performed by monitoring statistic , where is a set of training graphs on which we estimated and is the graph edit distance [riesen2013novel]. The experiments denoted with Spherical Man., Euclidean Man., and Hyperbolic Man. consider the numeric -stream where each point is the image of graph embedded through . Here, the monitored statistic is the associated geodesic distance , and is the Fréchet sample mean computed w.r.t. the manifold geometry, and . Finally, Dissimilarity Repr. monitors statistic , where is the dissimilarity representation of and is the ordinary sample mean computed on the dissimilarity representations of the training set.

We remind that, here, we set this particular change detection strategy based on a scalar score only for comparison reasons. In general, attaining a multivariate monitoring would result in a more effective detection. However, implementing such a multivariate test in graph domain is a computationally expensive task as a consequence of the fact we need to implement a graph matching.

The graph streams processed here are based on the Delaunay graph dataset already adopted in [zambon2017detecting]. The dataset contains graphs whose node attributes represent points in the plane, and edges are determined by the Delaunay triangulation of the points. The dataset is composed of several classes. Each class is characterised by an increasing difficulty in distinguishing it from a reference class. Further details can be found in [zambon2017detecting] and are not reported here for the sake of brevity.

5.1 Figures of merit

In monitoring time-series for change detection, a common index for performance assessment is the average run length (). This quantity measures the average number of time steps between consecutive alarms raised by the change detector. One of the advantages is its independence from the length of the input stream. The average run length assessed during the nominal regime is denoted by , and can be thought as the inverse of the false alarm rate. Conversely, represents the during the non-nominal regime.

We also considered a further statistic, the detected change rate (DCR) . In order to infer whether or not a change has occurred in a sequence, we estimate both and , then we say that a change is detected if .

As DCR approaches 1, the performance of the detector improves, whereas low values of depict a prompt detection. is set to to yield a user-defined significance level , as in (11). Therefore, is expected to be constant.

5.2 Parameters setting

Independently from the particular embedding, we have to compute distances between graphs and the prototypes . Here we used a Graph Edit Distance (GED) algorithm with polynomial complexity based on the Volgenant and Jonker algorithm [riesen2013novel]. Distances on the Euclidean, spherical, and hyperbolic manifolds are computed by means of equations (1), (2), and (3), respectively.

We generated one hundred sequences of graphs. Each sequence is produced by bootstrapping graphs from a class until change time is reached. Then, graphs are bootstrapped from a different class, allowing us to simulate an i.i.d. sequence of graphs. The first graphs are reserved to learn the curvature – i.e., the manifold – and for prototypes selection. Further, graphs are used to train the change detector. On these data, the Fréchet mean and the threshold are estimated. In general, we are not able to compute the population mean (9) in the graph domain or on a manifold; hence we estimate it with the sample mean (10). Threshold is set so as to yield a 99% confidence level for the change detection test. Parameter

is set to the estimated third quartile of

.

Finally, graphs have been generated to emulate the operational phase. In particular, the first graphs are produced according to the nominal distribution, whereas the second simulate the change and hence are representative of the non-nominal regime.

5.3 Results

Table 1

shows the results of the experiments. First of all, the 95% confidence intervals related to the estimated

provide evidence that threshold learning completed successfully; indeed, the 95% confidence intervals contain the target value of , here . Notice that the estimations are identical when computed within the same embedding. This is because a seed for the pseudo-random generator has been set for experiment reproducibility.

In Table 1, we observe that different curvatures for the embedding space produce different DCR’s. Our results show that the hyperbolic and spherical spaces performed better than the Euclidean one in the considered application. On the other hand, spherical and hyperbolic embeddings attain comparable results and, in one case, the hyperbolic embedding outperformed the spherical one.

In Figure 1, the distortion (8) introduced by the embedding is analysed as a function of the curvature. In the region of positive curvature, we spot a local minimum around . Looking at negative curvatures, there is a minimum around which is lower than the distortion at . The Euclidean case, with , produces a distortion worse than the hyperbolic and spherical ones. The behaviour observed in terms of distortion (8) is concordant with the detection rates in Table 1; notice, however, that the distortion assesses the goodness of the embedding limited to the training set, and hence is not directly related to the change detection performance. We observe also that, despite the three types of embedding are obtained by different optimisation problems (6) and (7), the distortion appears to be continuous around .

Finally, we conclude that the nominal class turned out to favour embedding on curved spaces, in particular those with negative curvatures. However, the results reported here focus only on a particular dataset and hence more experiments are needed to confirm these findings.

The method Graph Domain appears, in general, to perform well even though is not all differences are statistically significant. Regarding Dissimilarity Repr., the detection is effective as far as the problem is sufficiently simple. Once the two distribution and have the same support, the detection fails completely. Results are different from the ones published in [zambon2017detecting]; this is justified by the different experimental setting adopted here. First of all, the change detection test in [zambon2017detecting] is a multivariate one, which increases the amount of information extracted and monitored from the graph stream. As described above, the scalar setting adopted in the present paper is necessary to attain a fair comparison with Graph Domain. Secondly, the multivariate change detection test monitors windows of data making possible the exploitation of a known distribution and, consequently, setting virtually exact thresholds.

Overall, we conclude that when the mean graphs of the two distributions, and , are far apart, then simpler approaches, like Dissimilarity Rep., are more effective than manifold-based ones. In fact, the manifold is learned on the nominal distribution, hence it may poorly generalise to very diverse graphs. Conversely, when the problem at hand gets more challenging and the two distributions almost overlap (as in smooth drift type of changes), we observe the effectiveness in change detection problems of embedding onto manifolds of constant, non-zero curvature.

6 Conclusions

Performing change detection in graph domains is a challenging problem from both a theoretical and computational point of view. Recently, we have proposed a methodology to perform change detection on sequences of graphs based on an embedding procedure [2017arXiv170606941Z]: graphs are mapped to numeric vectors so that change detection can be performed on a standard geometry setting. The embedding was realized by means of the dissimilarity representation. In this paper, we elaborated over our previous contribution and, in particular, we performed embedding onto three constant-curvature manifolds having planar, spherical, and hyperbolic geometry. Our motivation comes from the fact that complex data representations such as graphs might not be described by a simple Euclidean structure. This intuition is corroborated by several recent results (e.g., see [wu2015emergent, boguna2010sustaining]) suggesting that the geometry underlying complex networks can have a hyperbolic geometry.

Our results indicate that, in the first place, (i) varying the manifold curvature can reduce the distortion of the distances; furthermore, the curvature can be treated as learning parameter for adapting the manifold to the specific application of interest in a data-driven fashion. In the second place, (ii) embedding on such manifolds can be effectively employed to detect change events and anomalies in a stream of graphs. In particular, results showed that the performance of the proposed change detection method is comparable to the corresponding ground-truth version operating directly on graphs.

Future directions include a better experimental evaluation of the proposed embedding procedure on constant-curvature manifolds, accounting also for analytical solutions to the involved optimisation problems and automatic optimisation of relevant parameters. In addition, we plan to work on theoretical aspects related to asymptotic distribution estimations on manifolds and their application in change detection for graphs.

Acknowledgements

This research is funded by the Swiss National Science Foundation project 200021_172671: “ALPSFORT: A Learning graPh-baSed framework FOr cybeR-physical sysTems”.

References