DeepAI
Log In Sign Up

Using Laplacian Spectrum as Graph Feature Representation

12/02/2019
by   Edouard Pineau, et al.
0

Graphs possess exotic features like variable size and absence of natural ordering of the nodes that make them difficult to analyze and compare. To circumvent this problem and learn on graphs, graph feature representation is required. A good graph representation must satisfy the preservation of structural information, with two particular key attributes: consistency under deformation and invariance under isomorphism. While state-of-the-art methods seek such properties with powerful graph neural-networks, we propose to leverage a simple graph feature: the graph Laplacian spectrum (GLS). We first remind and show that GLS satisfies the aforementioned key attributes, using a graph perturbation approach. In particular, we derive bounds for the distance between two GLS that are related to the divergence to isomorphism, a standard computationally expensive graph divergence. We finally experiment GLS as graph representation through consistency tests and classification tasks, and show that it is a strong graph feature representation baseline.

READ FULL TEXT VIEW PDF

page 1

page 2

page 3

page 4

11/13/2019

A Hierarchy of Graph Neural Networks Based on Learnable Local Features

Graph neural networks (GNNs) are a powerful tool to learn representation...
10/14/2021

sMGC: A Complex-Valued Graph Convolutional Network via Magnetic Laplacian for Directed Graphs

Recent advancements in Graph Neural Networks have led to state-of-the-ar...
07/31/2017

Advantages and Limitations of using Successor Features for Transfer in Reinforcement Learning

One question central to Reinforcement Learning is how to learn a feature...
02/12/2020

Deep Multi-Task Augmented Feature Learning via Hierarchical Graph Neural Network

Deep multi-task learning attracts much attention in recent years as it a...
05/30/2018

Fast Incremental von Neumann Graph Entropy Computation: Theory, Algorithm, and Applications

The von Neumann graph entropy (VNGE) facilitates the measure of informat...
12/17/2020

A Generalization of Transformer Networks to Graphs

We propose a generalization of transformer neural network architecture f...
10/24/2017

Classification on Large Networks: A Quantitative Bound via Motifs and Graphons

When each data point is a large graph, graph statistics such as densitie...

1 Introduction

No matter where and at which scale we look, graphs are present. Social networks, public transport, information networks, molecules, any structural dependency between elements of a global system is a graph. An important task is to extract information from these graphs in order to understand whether they contain certain structural properties that can be represented and used in downstream machine learning tasks. In general, graphs are difficult to use as input of standard algorithms because of their exotic features like variable size and absence of natural orientation. Consequently, graph feature representation with equal dimensionality and dimension-wise alignment is required to learn on graphs.

We know that any embedding method must satisfy the preservation of structural information, and in particular for graph must satisfy two key attributes: consistency under deformation and invariance under isomorphism. The first forces the embedding to discriminate two graphs consistently with their structural dissimilarity. The second enables to have one representation for one graph, which can be a challenge since one graph has many possible orientations. In this paper, we propose to analyze the importance of satisfying the introduced criteria through a known, simple, expressive and efficient candidate graph feature representation: the graph Laplacian spectrum (GLS).

The Laplacian matrix of a graph is a major object in spectral learning [4]

. However, most of the attention is usually directed to its eigenvectors and not its spectrum, and spectral learning is generally applied to node clustering or classification, not whole-graph representation. But, GLS holds interesting properties for graph representation. First, the Laplacian eigenvalues give many structural information like the presence of communities and partitions

[25], the regularity, the closed-walks enumeration, the diameter or the connectedness of the graph [7]. It is also interpretable in term of signal processing [35] or mechanics [5]. Second, it is backed by efficient and robust approximate eigen-decomposition algorithms enabling to scale on large graphs and huge datasets [16]. Third, GLS is invariant under graph isomorphism. Finally, each eigenvalue of the GLS can be seen as an graph feature representation by itself, containing specific structural information. Hence any subset of Laplacian eigenvalues is a meaningful and valuable embedding. This enables the usage of truncated Laplacian spectrum (t-GLS) instead of GLS as whole-graph feature representation. Using t-GLS reduces the embedding time thanks to eigenvalue algorithms that do not require entire diagonalization to give partial spectrum [16].

These properties tell us that GLS is a good graph feature representation candidate. In this paper we go further and analyze the interesting properties of the Laplacian spectrum through the following contributions: (1) we build a perturbation-based framework to analyze the representation capacity of the GLS, (2) we analyze the consistency between structural deformation of the graph and its GLS by deriving bounds for the distance between the GLS of two graphs, (3) we validate the consistency and the representational power of the GLS with different experiments on synthetic and real graphs.

The rest of the paper is built as follows. A presentation of the mathematical framework and the theoretical analysis are displayed respectively in Section 2 and 3. Section 4 proposes experiments to illustrate theoretical results and show the representational power of GLS. Finally, Section 5 describes related work about graph representation.

2 Perturbation approach and problem setup

We consider two undirected and weighted graphs and with respective adjacency matrix and , degree matrix and . These matrices are set with respect to an arbitrary indexing of the nodes. Laplacian matrix of is defined as . We aim at using the GLS to build fixed-dimensional representation that encodes structural information to compare any graphs and that are not aligned nor equally sized. For the rest of the paper, and without loss of generality we postulate that . The rest of this section introduces the definitions, hypothesis and notations needed for our theoretical analysis of the GLS.

Definition 1

Let a weighted graph with nodes, with the weighted adjacency matrices. We define a symmetric matrix with , such that . We define the two following perturbations applied on graph :

  • Adding isolated nodes:

  • Adding or removing edges:

We call edge-perturbation the addition or removal of edges, and node-perturbation the addition of nodes. A complete perturbation is done by adding isolated nodes and perturbing the augmented graph with edge-perturbation. We note that the withdrawal of a node is equivalent to removal of all edges around this nodes. Moreover, if graph is unweighted, i.e. with binary adjacency, then edge perturbations .

Remark 1

If with then the perturbation is a permutation of the node indexing. Such a perturbation is not interesting and edge perturbation due to node indexing has to be annihilated by a permutation matrix as in the following definition.

Definition 2

We say that is a perturbed version of if we have

i.e. such that is the sparsest possible i.e. does not include permutations.

Notations

We denote the sparsest perturbation as defined in Definition 2. We denote the completion of G with isolated nodes. If is a matrix associated to , we denote the equivalent matrix for . We denote the eigenvalue of a square matrix in ascending order, the smallest eigenvalue.

Hypothesis 1

Without loss of generality, we assume that is a perturbed version of , i.e. the sparsest -square perturbation matrix and a -square permutation matrix such that . is a -square block matrix, with top-left block being a -square perturbation matrix for graph . Bottom right block is the -square adjacency matrix of the additional nodes. is the adjacency matrix representing the links between graph and the additional nodes .

We have defined a notion of continuous deformation of graphs. This deformation has a natural and simple interpretation: any graph is a perturbed version of graph , and the larger the perturbation the higher the structural dissimilarity between and .

The next section uses the previously presented mathematical framework to analyze the consistency of the Laplacian spectrum as graph representation and its natural link to graph isomorphism problem.

3 Laplacian spectrum as graph feature representation

We place ourselves under the Hypothesis 1 saying that the difference between graphs and is characterized by the unknown deformation . A good embedding of these graphs should be close when level of deformation is low, and far otherwise. This level of deformation can be quantified by the global and node-wise entries of . These features are by construction present in the Laplacian of , denoted . We use this idea to propose an analysis of the distance between to GLS.

All proofs are detailed in appendix.

3.1 Consistency under deformation and relation to graph isomorphism

Two graphs and are isomorphic if and only if such that [24], hence when they are structurally equivalent irrespective to the vertex ordering. Several papers has proposed to use a notion of divergence to graph isomophism (DGI) to compare graphs [15, 31]. The DGI between graphs and is generally the minimal Frobenius norm of the difference between and . Considering this definition, the following Lemma links the graph-isomorphism problem and the Laplacian of the hypothetical perturbation and show that this divergence is the norm of :

Lemma 1

Using the notations from Hyp. 1, we have , with the Laplacian of and the

-dimensional unit vector. In particular,

.

We remind that graph isomorphism is at best solved in quasipolynomial time [2] and can not be used in practice for large graphs and datasets. The following Proposition show how the distance between GLS relaxes the isomorphism-based graph divergence.

Proposition 1

Using Hypothesis 1 and Lemma 1: .

The above result tells us that the higher the difference between GLS, the larger the hypothetical perturbation , i.e. the higher the structural dissimilarity.

We now study the implication of GLS closeness. This problem tackles the notion of non-isomorphic -cospectrality, i.e. the idea that two graphs can have equal eigenvalues while having different Laplacian matrix [7]. The following proposition gives a simple insight into the problem of spectral characterization in our perturbation-based framework:

Proposition 2

We denote

the singular value decomposition (SVD) of

such that the diagonal of is in ascending order. Therefore we have the inequality .

This proposition shows that equal spectrum means equal graphs only when eigenvectors are also equal. Otherwise, -cospectrality for non-isomorphic graphs tells us that there exists families of graphs that are not fully determined by their spectrum. These families are characterized by some structural properties such that two non-isomorphic graphs with equal Laplacian spectrum share these properties but not their adjacency [36]. In practice, this is not a problem. First, almost all graphs are determined by their spectrum [7]. Second, equal GLS indicates the precious information that graphs share common structural properties, no matter the adjacency matrix. These properties might be what we seek to represent when representing graphs for ML tasks. Third, non-isomorphic -cospectrality concerns equally sized graphs which is not likely with respect to all possible real-life graphs. When the studied dataset contains specifically -cospectral non-isomorphic graphs and when the task requires unique representation property, GLS is not appropriate and more sophisticated and powerful embedding methods taking for example eigenvectors into account [37] should be studied and used. Otherwise, i.e. in almost every situations, according to previously presented results, GLS characterizes the graph and is directly related to the hypothetical perturbation .

Nevertheless, we accordingly propose the Proposition 3 to better understand GLS proximity even when graphs are non-isomorphic cospectral.

Proposition 3

The closer the GLS, the closer to unitary-similarity the Laplacian matrices.

We remind that two real -square matrices and are unitary-similarity

if there exists an orthogonal matrix

such that . Similarity is an equivalence relation on the space of square-matrices. Moreover, divergence to unitary-similarity is a relaxed version of the divergence to graph-isomorphism [15]

, where the permutation matrix space is replaced by a unitary matrix space. Finally from Proposition

1 and 3 we can bound the distance between GLS as follows:

In this section, we have shown that structural similarity (divergence) between graphs can be reasonably approximated by the similarity (divergence) between their GLS.

3.2 Laplacian spectrum as whole-graph representation in practice

Previous section showed the capacity of the distance between Laplacian spectrum to serve as proxy for graph similarity. In practice, a fixed embedding dimension must be chosen for all graphs in dataset . According to previous analysis, the most obvious dimension is and all graphs with less than

nodes may be padded with isolated nodes. We note that padding with isolated nodes is equivalent than adding zeros in the GLS. Nevertheless, in some datasets, some graphs can be significantly larger and the padding can become abusive. We therefore propose for these graph to have

. We simply truncate the GLS such that we keep only the highest eigenvalues. This method also enables to save computation time.

The problem with this method is that we may lose information for graphs with more than nodes. In practice, for large graphs, the contribution of the lowest eigenvalues to the distance between GLS as a proxy for graph divergence is negligible. In particular, large graph have many sparse areas, such that many eigenvalues are very low, hence truncating the bottom part of the GLS may not be a problem. We assess the impact of the truncation in the experimental section.

Though, we can also propose several ways to avoid this problem, like embedding the lowest eigenvalues with simple statistics, like moments or histograms. In the experimental section, we do not use this trick.

4 Experiments

All experiments can be reproduced using the code provided at the following address: https://github.com/edouardpineau/Using-Laplacian-Spectrum-as-Graph-Feature-Representation

4.1 Preliminary experiments

As a first illustration of deformation-based results presented in Section 3, we propose to use Erdos-Rényi random graphs [12] with parameter . We focus on three simple experiments.

First, the distance between the Laplacian spectrum of a graph and a perturbed version of this graph is related to the number of perturbations. We can find the experimental illustration in Figure 1 (similar to those in [40]). We see that the number of perturbations is directly related to the distance between GLS features for edge addition and edge withdrawal. A relation between graph sparsity and Laplacian eigenvalues can be seen for example through the Gershgorin circle theorem [13].

[width=0.49]edge_addition.png [width=0.49]edge_remove.png

Figure 1: Experimental results to illustrate how GLS behaves under edge addition (left), edge withdrawal (right). In this case, studied adjacency and perturbation matrix are binary.

Second, we mentioned that when a graph is significantly bigger than other graphs of a dataset, we can use a truncated GLS (t-GLS). This method both saves computation time thanks to iterative eigenvalues algorithms and avoids the addition of isolated nodes in all other graphs. In Figure 3, we show results of experiments showing that t-GLS is consistent with node addition. As experimental setup, we take a reference graph with nodes and compute its GLS. Then we add a randomly connected node and compute the t-GLS of the new graph, by keeping only the largest eigenvalues. We repeat it 20 times. We compute the -distance to reference GLS, for different levels of connectivity for the additional nodes. We first observe that the t-GLS is consistent with node addition. We also confirm our previous theoretical results by observing that the more connected the additional nodes, the higher the GLS divergence.

[width=0.49]node_addition.png [width=0.49]node_addition_real.png

Figure 2:

Experimental results illustrate how truncated GLS behaves under iterative addition of 20 new nodes with respectively 0, 1, 2 and 3 random connections with graph, for respectively synthetic a 80-nodes Erdos-Reyni graph (left) and a 28-nodes molecular graph from MUTAG dataset (right). Horizontal dotted lines (right figure) are the quartiles 25, 50, 75 and 100 of the distances between the GLS of the 28-nodes graph and the other 187 graphs of the dataset.

4.2 Classification of molecular and social network graphs

We evaluate spectral feature embedding with a classification task on molecular graphs and social network graphs. Experimental setup for classification task is given in Appendix 0.E. We assume here that two structurally close graphs belong to the same class. We challenge this assumption with the following experiments.

We propose to compare GLS-based classification results to those obtained by feature-based and deep learning methods. Standard graph feature representation methods are: Earth Mover’s Distance

[28] (EMD), Pyramid Match [28] (PM), Feature-Based [3] (FB) and Dynamic-Based Features [14]

(DyF). All of these methods use support vector classifier (SVC) over extracted features. Deep learning methods are: Variational Recurrent Graph Classifier

[30] (VRGC), Graph Convolutional Network [19] (GCN), Deep Graph CNN [46] (DGCNN), Capsule GNN [41] (CapsGNN), Graph Isomorphism Network [42] (GIN) and GraphSAGE [17]. All deep learning methods are end-to-end graph classifers. A description of these models is given in the related work, Section 5.

All values reported in Table 1 and Table 2 are taken from the above-mentioned papers.

Molecular graphs

We use five datasets for the experiments: Mutag (MT), Enzymes (EZ), Proteins Full (PF), Dobson and Doig (DD) and National Cancer Institute (NCI1) [18]. All graphs are chemical components. Nodes are atoms or molecules and edges represent checmical or electrostatic bindings. We note that molecular graphs contain node attributes, that are used by some models presented in Table 1. We let the question of the relevance of comparing models with slightly different inputs to the discretion of the reader. Description and statistics of molecular datasets are presented in Table 3, Appendix 0.F.

MT EZ PF DD NCI1
EMD + SVC 86.1 0.8 36.8 0.8 - - 72.7 0.2
PM + SVC 85.6 0.6 28.2 0.4 - 75.6 0.6 69.7 0.1
FB + SVC 84.7 2.0 29.0 1.2 70.0 1.3 - 62.9 1.0
DyF + SVC 86.3 1.3 26.6 1.2 73.1 0.4 - 66.6 0.3
FGSD + SVC 92.1 - 73.4 77.1 79.8
VRGC 86.3 8.6 48.4 6.2 74.8 3.0 - 80.7 2.2
GCN* 85.6 5.8 - 76.0 3.2 - 80.2 2.0
DGCNN* 85.8 1.7 51.0 7.3 75.5 0.9 79.4 0.9 74.4 0.5
CapsGNN* 86.7 6.9 54.7 5.7 76.3 3.6 75.4 4.2 78.4 1.6
GIN-0* 89.4 5.6 - 76.2 2.8 - 82.7 1.7
GraphSAGE* 85.1 7.6 - 75.9 3.2 - 77.7 1.5
GLS + SVC 87.9 7.0 40.7 6.3 75.3 3.5 74.3 3.5 73.3 2.1
Table 1: Accuracy () of classification with different graph representations, on molecular graphs. SVC stands for support vector classifier. Comparative models are divided into two groups: feature + SVC and end-to-end deep learning. *Models using node attributes.

Social network graphs

We use five datasets for the experiments: IMDB-Binary (IMBD-B), IMDB-Multi (IMDB-M), REDDIT-Binary (REDDIT-B), REDDIT-5K-Multi (REDDIT-M) and COLLAB. All graphs are social networks. The graphs of these datasets do not contain node attributes. Therefore, we can more appropriately compare GLS + SVC to deep learning based classification. Statistics about social networks datasets are presented in Table 4, Appendix 0.F.

IMDB-B IMDB-M REDDIT-B REDDIT-M COLLAB
GCN 74.0 3.4 51.9 3.8 - - 79.0 1.8
DGCNN 70.0 0.9 47.8 0.9 76.0 1.7 - 73.8 0.5
CapsGNN 73.1 4.8 50.3 2.7 - 52.9 1.5 79.6 0.9
GIN-0 75.1 5.1 52.3 2.8 92.4 2.5 57.5 1.5 80.2 1.9
GraphSAGE 72.3 5.3 50.9 2.2 - - -
GLS + SVC 73.2 4.2 48.5 2.5 87.4 3.4 52.0 1.8 78.5 1.1
Table 2: Classification accuracy () of different deep learning based models plus ours over standard social networks datasets. Graphs of these datasets does not have node features. SVC stands for support vector classifier.

Analysis of the results

The classification results above illustrate the capacity of GLS to capture graph structural information, under the assumption that structurally close graphs belong to the same class. The graph neural-networks are globally more expressive since they can leverage specific information for graph classification since is end-to-end. In particular, they obtain strong results when there are node labels (see molecular experiments 4.2). Nevertheless, GLS is a simple way to represent graphs in an unsupervised manner, with theoretical background, simplicity of implementation (eigendecomposition is accessible to anyone interested in any computer) and competitive downstream classification results.

On the reasonability of using truncated GLS

We assess the impact of truncating the GLS. Using truncated GLS (t-GLS) enables to (1) reduce the computational cost for large graphs and (2) reduce the dimensionality of the graph representation for all graphs. Results are presented in Figure 3 for molecular datasets.

[width=0.7]impact_truncation.png

Figure 3: Illustration of the impact of the truncation in term of classification accuracy of the molecular graphs. We represent the impact relatively to the 95-percentile truncation adopted for classification experiments.

We see that truncating GLS is not highly impacting classification results. Only ENZYMES multi-class classification, which is a particularly difficult task (see experiments in Section 4.2), suffers from truncation. Additional insight about the t-GLS is given in Appendix 0.G.

Computation analysis

GLS extraction is a quick task, thanks to very efficient eigendecomposition algorithms for sparse graph matrices [16]. For example the complete set of molecular experiments (embedding + SVM) took approximately 5 minutes on a single CPU, most of it dedicated to the computation of largest graphs of DD.

5 Related work

We propose to divide graph feature representation into three categories: graph kernel methods, feature-based methods and deep learning.

Graph kernel methods

Kernel methods create a high-dimensional feature representation of data. The kernel trick [33] avoids to compute explicitly the coordinates in the feature space, only the inner product between all pairs of data image: it is an implicit embedding methods. These methods are applied to graphs [27, 28]. It consists in performing pairwise comparisons between atomic substructures of the graphs until a good representative dictionary is found. The embedding of a graph is then the number of occurrences of these substructures within it. These substructures can be graphlets [43], subtree patterns [34], random walks [38] or paths [6]. The main difficulty lives in the choice of appropriate algorithm and kernel that accept graphs with variable size and capture useful feature for downstream task. Moreover, kernel methods can be computationally expensive but techniques like the Nyström algorithm [39] allow to lower the number of comparison with a low rank approximation of the similarity matrix.

Feature-based methods

Feature-based representation methods [3] represent each graph as the concatenation of features. Generally, the feature-based representation can offer a certain degree of interpretability and transparency. The most basic ones are the number of nodes or edges, the histogram of node degrees. These simple graph-level features offers by construction the sought isormorphism-invariance but suffer from low expressiveness. More sophisticated algorithms consider features based on attributes of random walks on the graph [14] while others are graphlet based [21]. [20] explicitly built permutation-invariant features by mapping the adjacency matrix to a function on the symmetric group. [37] proposed a family of graph spectral distances to build graph features. Experimental work in [22]

used normalized Laplacian spectrum with random forest for graph classification with promising results.

[40] analyzes the cospectrality of different graph matrices and studies experimentally the representational power of their spectra. These two last works are directly related to the current work. Nevertheless, in both cases, the theoretical analysis is absent and comparative experiment with current benchmarks and methods is limited. in this paper we propose a response to these concerns.

Deep learning based methods

GNNs learn representation of nodes of a graph by leveraging together their attributes, information on neighboring nodes and the attributes of the connecting edges. When graphs have no vertex features, the node degrees are used instead. To create graph-level representation instead of node representation, node embeddings are pooled by a permutation invariant readout function like summation or more sophisticated information preserving ones [44, 46]. A condition of optimality for readout function is presented in [42]. Recently, [41] levraged capsule networks [32], neural units designed to enable to better preserve information at pooling time. Other popular evolution of GNNs formulate convolution-like operations on graphs. Formulation in spectral domain [8, 10] is limited to the processing of different signals on a single graph structure, because they rely on the fixed spectrum of the Laplacian. Conversly, formulation in spatial domain are not limited to one graph structure [1, 11, 26, 17] and can infer information from unseen graph structures. At the same time, alternative to GNN exist and are related to random walk embedding. In [23]

, neural networks help to sample paths which preserve significant graph properties. Other approaches transforms graphs into sequence of nodes embedding passed into a recurrent neural network (RNN)

[45, 30] to get useful embedding. These models do not inherently include isomorphism-invariance but greedy learn it by seeing the same graph numerous times with different node ordering and embedding. These methods are powerful and globally obtain a high level of expressiveness (see experimental section 4.2).

6 Conclusion

In this paper, we analyzed the graph Laplacian spectrum (GLS) as whole graph representation. In particular, we showed that comparing two GLS is a good proxy for the divergence between two graphs in term of structural information. We coupled these results to the natural invariance to isomorphism, the simplicity of implementation, the computational efficiency offered by modern randomized algorithms and the rare occurrence of detrimental -cospectral non-isomorphic graphs to propose the GLS as a strong baseline graph feature representation.

References

  • [1]

    Atwood, J., Towsley, D.: Diffusion-convolutional neural networks. In: Advances in Neural Information Processing Systems. pp. 1993–2001 (2016)

  • [2]

    Babai, L.: Graph isomorphism in quasipolynomial time. In: Proceedings of the forty-eighth annual ACM symposium on Theory of Computing. pp. 684–697. ACM (2016)

  • [3] Barnett, I., Malik, N., Kuijjer, M.L., Mucha, P.J., Onnela, J.P.: Feature-based classification of networks. arXiv preprint arXiv:1610.05868 (2016)
  • [4] Belkin, M., Niyogi, P.: Laplacian eigenmaps and spectral techniques for embedding and clustering. In: Advances in neural information processing systems. pp. 585–591 (2002)
  • [5] Bonald, T., Hollocou, A., Lelarge, M.: Weighted spectral embedding of graphs. In: 2018 56th Annual Allerton Conference on Communication, Control, and Computing (Allerton). pp. 494–501. IEEE (2018)
  • [6] Borgwardt, K.M., Kriegel, H.P.: Shortest-path kernels on graphs. In: Data Mining, Fifth IEEE International Conference on. pp. 8–pp. IEEE (2005)
  • [7] Brouwer, A.E., Haemers, W.H.: Spectra of graphs. Springer Science & Business Media (2011)
  • [8] Bruna, J., Zaremba, W., Szlam, A., LeCun, Y.: Spectral networks and locally connected networks on graphs. arXiv preprint arXiv:1312.6203 (2013)
  • [9] Cawley, G.C., Talbot, N.L.: On over-fitting in model selection and subsequent selection bias in performance evaluation. Journal of Machine Learning Research 11(Jul), 2079–2107 (2010)
  • [10] Defferrard, M., Bresson, X., Vandergheynst, P.: Convolutional neural networks on graphs with fast localized spectral filtering. In: Advances in Neural Information Processing Systems. pp. 3844–3852 (2016)
  • [11] Duvenaud, D.K., Maclaurin, D., Iparraguirre, J., Bombarell, R., Hirzel, T., Aspuru-Guzik, A., Adams, R.P.: Convolutional networks on graphs for learning molecular fingerprints. In: Advances in neural information processing systems. pp. 2224–2232 (2015)
  • [12] Erdős, P., Rényi, A.: On random graphs i. Publ. Math. Debrecen 6, 290–297 (1959)
  • [13] Gershgorin, S.A.: Uber die abgrenzung der eigenwerte einer matrix (6), 749–754 (1931)
  • [14] Gómez, L.G., Delvenne, J.C.: Dynamics based features for graph classification. In: Benelearn 2017: Proceedings of the Twenty-Sixth Benelux Conference on Machine Learning, Technische Universiteit Eindhoven, 9-10 June 2017. p. 131
  • [15] Grohe, M., Rattan, G., Woeginger, G.J.: Graph similarity and approximate isomorphism. In: 43rd International Symposium on Mathematical Foundations of Computer Science (MFCS 2018). Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik (2018)
  • [16] Halko, N., Martinsson, P.G., Tropp, J.A.: Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions. SIAM review 53(2), 217–288 (2011)
  • [17] Hamilton, W., Ying, Z., Leskovec, J.: Inductive representation learning on large graphs. In: Advances in Neural Information Processing Systems. pp. 1024–1034 (2017)
  • [18] Kersting, K., Kriege, N.M., Morris, C., Mutzel, P., Neumann, M.: Benchmark data sets for graph kernels (2016), http://graphkernels.cs.tu-dortmund.de
  • [19] Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907 (2016)
  • [20]

    Kondor, R., Borgwardt, K.M.: The skew spectrum of graphs. In: Proceedings of the 25th international conference on Machine learning. pp. 496–503. ACM (2008)

  • [21] Kondor, R., Shervashidze, N., Borgwardt, K.M.: The graphlet spectrum. In: Proceedings of the 26th Annual International Conference on Machine Learning. pp. 529–536. ACM (2009)
  • [22] de Lara, N., Pineau, E.: A simple baseline algorithm for graph classification. Relational Representation Learning, NeurIPS Workshop (2018)
  • [23] Li, C., Ma, J., Guo, X., Mei, Q.: Deepcas: An end-to-end predictor of information cascades. In: Proceedings of the 26th International Conference on World Wide Web. pp. 577–586. International World Wide Web Conferences Steering Committee (2017)
  • [24] Merris, R.: Laplacian matrices of graphs: a survey. Linear algebra and its applications 197, 143–176 (1994)
  • [25] Newman, M.E.: Spectral methods for community detection and graph partitioning. Physical Review E 88(4), 042822 (2013)
  • [26] Niepert, M., Ahmed, M., Kutzkov, K.: Learning convolutional neural networks for graphs. In: International conference on machine learning. pp. 2014–2023 (2016)
  • [27] Nikolentzos, G., Meladianos, P., Limnios, S., Vazirgiannis, M.: A degeneracy framework for graph similarity. In: IJCAI. pp. 2595–2601 (2018)
  • [28] Nikolentzos, G., Meladianos, P., Vazirgiannis, M.: Matching node embeddings for graph similarity. In: AAAI. pp. 2429–2435 (2017)
  • [29] Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., et al.: Scikit-learn: Machine learning in python. Journal of machine learning research 12(Oct), 2825–2830 (2011)
  • [30] Pineau, E., de Lara, N.: Variational recurrent neural networks for graph classification. In: Representation Learning on Graphs and Manifolds Workshop (2019)
  • [31] Rameshkumar, A., Palanikumar, R., Deepa, S.: Laplacian matrix in algebraic graph theory. Journal Impact Factor pp. 0–489 (2013)
  • [32] Sabour, S., Frosst, N., Hinton, G.E.: Dynamic routing between capsules. In: Advances in neural information processing systems. pp. 3856–3866 (2017)
  • [33] Shawe-Taylor, J., Cristianini, N., et al.: Kernel methods for pattern analysis. Cambridge university press (2004)
  • [34] Shervashidze, N., Schweitzer, P., Leeuwen, E.J.v., Mehlhorn, K., Borgwardt, K.M.: Weisfeiler-lehman graph kernels. Journal of Machine Learning Research 12(Sep), 2539–2561 (2011)
  • [35] Shuman, D.I., Ricaud, B., Vandergheynst, P.: Vertex-frequency analysis on graphs. Applied and Computational Harmonic Analysis 40(2), 260–291 (2016)
  • [36] Van Dam, E.R., Haemers, W.H.: Which graphs are determined by their spectrum? Linear Algebra and its applications 373, 241–272 (2003)
  • [37] Verma, S., Zhang, Z.L.: Hunt for the unique, stable, sparse and fast feature learning on graphs. In: Advances in Neural Information Processing Systems. pp. 88–98 (2017)
  • [38] Vishwanathan, S.V.N., Schraudolph, N.N., Kondor, R., Borgwardt, K.M.: Graph kernels. Journal of Machine Learning Research 11(Apr), 1201–1242 (2010)
  • [39] Williams, C.K., Seeger, M.: Using the nyström method to speed up kernel machines. In: Advances in neural information processing systems. pp. 682–688 (2001)
  • [40]

    Wilson, R.C., Zhu, P.: A study of graph spectra for comparing graphs and trees. Pattern Recognition

    41(9), 2833–2841 (2008)
  • [41] Xinyi, Z., Chen, L.: Capsule graph neural network. International Conference on Learning Representations (2018)
  • [42] Xu, K., Hu, W., Leskovec, J., Jegelka, S.: How powerful are graph neural networks? International Conference on Learning Representations (2019)
  • [43] Yanardag, P., Vishwanathan, S.: Deep graph kernels. In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. pp. 1365–1374. ACM (2015)
  • [44] Ying, Z., You, J., Morris, C., Ren, X., Hamilton, W., Leskovec, J.: Hierarchical graph representation learning with differentiable pooling. In: Advances in Neural Information Processing Systems. pp. 4800–4810 (2018)
  • [45] You, J., Ying, R., Ren, X., Hamilton, W., Leskovec, J.: Graphrnn: Generating realistic graphs with deep auto-regressive models. In: International Conference on Machine Learning. pp. 5694–5703 (2018)
  • [46]

    Zhang, M., Cui, Z., Neumann, M., Chen, Y.: An end-to-end deep learning architecture for graph classification. In: Thirty-Second AAAI Conference on Artificial Intelligence (2018)

Appendix 0.A Proof of Lemma 1

with and the unit vector.

Therefore,

Appendix 0.B Proof of Proposition 1

From lemma 1 we have . Moreover, from Weyl’s eigenvalues inequalities and since eigenvalues are isomorphism invariant:

Hence: .

Now let be any eigen couple of a matrix . We can always pick and build such that and . Hence:

Using previous results we get:

with the Frobenius norm.

Appendix 0.C Proof of Proposition 2

We remind that the Forbenius norm is unitarily invariant thanks to the cyclic property of the trace. For any we have:

In particular if :

We also have that

Hence: .

Appendix 0.D Proof of Proposition 3

Denoting the -orthogonal matrices group (orthogonal since real), we want to show that:

We denote the eigendecomposition of the Laplacian with . Since is unitary and using property of Frobenius norm, we have, :

We know that and are orthogonal since they are respectively eigenvector matrices of symmetric matrix and . We therefore have:

Moreover , if then .

Hence,

with the permutation group of .

Appendix 0.E Experimental setup for classification of graphs

For classification, we use standard 10-folds cross validation setup. Each dataset is divided into 10 folds such that the class proportions are preserved in each fold for all datasets. These folds are then used for cross-validation i.e, one fold serves as the testing set while the other ones compose the training set. Results are averaged over all testing sets. All figures gathered in the tables of results are build using this setup. For the dimension , representing the number of eigenvalues we keep to build the truncated GLS, we chose the percentile 95 of the distribution of graph sizes in each dataset, i.e. we truncate the 5 smallest eigenvalues. Considering weak truncation impact (see Section 4.2, when we have large datasets containing large graphs, like the two REDDIT datasets, we can truncate more severely to make the problem computationally more efficient. In particular considering that GLS approached as a simple baseline more than a final graph representation for large scale usage.

We use the support vector classifier (SVC) from scikit-learn [29]

. We impose Radial Basis Function as kernel, i.e.

. It is a similarity measure related to -norm between GLS. Hence, our theoretical results remain consistent with our experiments. Hyper parameters and are tuned among respectively and for the molecular datasets, and and

for the social network datasets. In practice, using a global pool for all the datasets gives equivalent results, but hyperparameter inference becomes expensive with a too large grid, in particular in a 10-fold cross validation setup. We use a nested hyperparameter search cross-validation for each of the 10 folds: in each 90% training fold we performe a 5-fold random search cross-validation before training. We therefore avoid the problem of overfitting related to model selection that appear when using non-nested cross-validation

[9].

Appendix 0.F Characteristics of the real datasets

We use five molecular datasets and five social network datasets for the experiments [18]. Tables 3 and 4 gives statistics of the differents datasets. All used datasets can be found at the following address: https://ls11-www.cs.tu-dortmund.de/staff/morris/graphkerneldatasets [18].

Molecular graphs datasets are Mutag (MT), Enzymes (EZ), Proteins Full (PF), Dobson and Doig (DD) and National Cancer Institute (NCI1). In MT, the graphs are either mutagenic and not mutagenic. EZ graphs are tertiary structures of proteins from the 6 Enzyme Commission top level classes. In DD, compounds are secondary structures of proteins that are enzyme or not. PF is a subset of DD without the largest graphs. In NCI1, graphs are anti-cancer or not. The graphs of these datasets have node labels that can be leverages by graph neural networks.

MT EZ PF DD NCI1
graphs 188 600 1113 1178 4110
classes 2 6 2 2 2
bias () 66.5 16.7 59.6 58.7 50.0
min./max. 10/28 2/125 4/620 30/5736 3/106
avg. 18 33 39 284 30
avg. 39 124 146 1431 65
Node attributes
Table 3: Molecular datasets statistics. Bias indicates the proportion of the dominant class.

Social networks datasets are IMDB-Binary (IMBD-B), IMDB-Multi (IMDB-M), REDDIT-Binary (REDDIT-B), REDDIT-5K-Multi (REDDIT-M) and COLLAB. REDDIT-B and REDDIT-M contain graphs representing discussion thread, with edges between users (nodes) when one responded to the other’s comment. Classes are the subreddit topics from which thread have originated. IMDB-B and IMDB-M contain networks of actors that appeared together within the same movie. IMDB-B contains two classes for action or romance genres and IMDB-M three classes for comedy, romance and sci-fi. COLLAB graphs represent scientific collaborations, with edge between two researchers meaning that they co-authored a paper. Labels of the graphs correspond to subfields of Physics. The graphs of these datasets have no node attributes and therefore enable fair comparison with deep learning methods.

IMDB-B IMDB-M REDDIT-B REDDIT-M COLLAB
graphs 1000 1500 2000 4999 5000
classes 2 3 2 5 3
bias () 50.0 33.3 50.0 20.0 52.0
min./max. 12/136 7/89 3/3760 22/3606 32/492
avg. 20 13 426 501 75
avg. 97 66 496 590 2458
Node attributes
Table 4: Social network datasets statistics. Bias indicates the proportion of the dominant class.

Appendix 0.G Additional insight on the acceptability of using truncated GLS

Figure 4 illustrates the reasonability of using only the highest eigenvalues of the Laplacian spectrum as whole-graph feature representation. We take the original and final graphs of the deformation-consistency test presented in Figure 3. We compute the distance between t-GLS with dimension and divide it by , for varying from to . The objective is to confirm that first eigenvalues are relatively more important to discriminate to structurally different graphs, which is the case. We note that for the Erdos-Reyni case with few connected additional nodes, first eigenvalues are not as relatively important as for the other example. In fact, adding nodes with stochastic connections is the construction process of Erdos-Reyni graphs. Hence, discriminating augmented graph from the original one is difficult based only on the structural information.

[width=0.48]dimension_TLS.png [width=0.47]dimension_TLS_real.png

Figure 4: Illustration of the relative importance of the dimensionality of GLS-embedding, after the iterative addition of 20 new nodes with respectively 0, 1, 2 and 3 random connections with graph, for respectively synthetic a 80-nodes Erdos-Reyni graph (left) and a 28-nodes molecular graph from MUTAG dataset (right). We see that the first largest eigenvalues of the Laplacian are the most important to discriminate a graph and its perturbed version.