1 Introduction
No matter where and at which scale we look, graphs are present. Social networks, public transport, information networks, molecules, any structural dependency between elements of a global system is a graph. An important task is to extract information from these graphs in order to understand whether they contain certain structural properties that can be represented and used in downstream machine learning tasks. In general, graphs are difficult to use as input of standard algorithms because of their exotic features like variable size and absence of natural orientation. Consequently, graph feature representation with equal dimensionality and dimensionwise alignment is required to learn on graphs.
We know that any embedding method must satisfy the preservation of structural information, and in particular for graph must satisfy two key attributes: consistency under deformation and invariance under isomorphism. The first forces the embedding to discriminate two graphs consistently with their structural dissimilarity. The second enables to have one representation for one graph, which can be a challenge since one graph has many possible orientations. In this paper, we propose to analyze the importance of satisfying the introduced criteria through a known, simple, expressive and efficient candidate graph feature representation: the graph Laplacian spectrum (GLS).
The Laplacian matrix of a graph is a major object in spectral learning [4]
. However, most of the attention is usually directed to its eigenvectors and not its spectrum, and spectral learning is generally applied to node clustering or classification, not wholegraph representation. But, GLS holds interesting properties for graph representation. First, the Laplacian eigenvalues give many structural information like the presence of communities and partitions
[25], the regularity, the closedwalks enumeration, the diameter or the connectedness of the graph [7]. It is also interpretable in term of signal processing [35] or mechanics [5]. Second, it is backed by efficient and robust approximate eigendecomposition algorithms enabling to scale on large graphs and huge datasets [16]. Third, GLS is invariant under graph isomorphism. Finally, each eigenvalue of the GLS can be seen as an graph feature representation by itself, containing specific structural information. Hence any subset of Laplacian eigenvalues is a meaningful and valuable embedding. This enables the usage of truncated Laplacian spectrum (tGLS) instead of GLS as wholegraph feature representation. Using tGLS reduces the embedding time thanks to eigenvalue algorithms that do not require entire diagonalization to give partial spectrum [16].These properties tell us that GLS is a good graph feature representation candidate. In this paper we go further and analyze the interesting properties of the Laplacian spectrum through the following contributions: (1) we build a perturbationbased framework to analyze the representation capacity of the GLS, (2) we analyze the consistency between structural deformation of the graph and its GLS by deriving bounds for the distance between the GLS of two graphs, (3) we validate the consistency and the representational power of the GLS with different experiments on synthetic and real graphs.
The rest of the paper is built as follows. A presentation of the mathematical framework and the theoretical analysis are displayed respectively in Section 2 and 3. Section 4 proposes experiments to illustrate theoretical results and show the representational power of GLS. Finally, Section 5 describes related work about graph representation.
2 Perturbation approach and problem setup
We consider two undirected and weighted graphs and with respective adjacency matrix and , degree matrix and . These matrices are set with respect to an arbitrary indexing of the nodes. Laplacian matrix of is defined as . We aim at using the GLS to build fixeddimensional representation that encodes structural information to compare any graphs and that are not aligned nor equally sized. For the rest of the paper, and without loss of generality we postulate that . The rest of this section introduces the definitions, hypothesis and notations needed for our theoretical analysis of the GLS.
Definition 1
Let a weighted graph with nodes, with the weighted adjacency matrices. We define a symmetric matrix with , such that . We define the two following perturbations applied on graph :

Adding isolated nodes:

Adding or removing edges:
We call edgeperturbation the addition or removal of edges, and nodeperturbation the addition of nodes. A complete perturbation is done by adding isolated nodes and perturbing the augmented graph with edgeperturbation. We note that the withdrawal of a node is equivalent to removal of all edges around this nodes. Moreover, if graph is unweighted, i.e. with binary adjacency, then edge perturbations .
Remark 1
If with then the perturbation is a permutation of the node indexing. Such a perturbation is not interesting and edge perturbation due to node indexing has to be annihilated by a permutation matrix as in the following definition.
Definition 2
We say that is a perturbed version of if we have
i.e. such that is the sparsest possible i.e. does not include permutations.
Notations
We denote the sparsest perturbation as defined in Definition 2. We denote the completion of G with isolated nodes. If is a matrix associated to , we denote the equivalent matrix for . We denote the eigenvalue of a square matrix in ascending order, the smallest eigenvalue.
Hypothesis 1
Without loss of generality, we assume that is a perturbed version of , i.e. the sparsest square perturbation matrix and a square permutation matrix such that . is a square block matrix, with topleft block being a square perturbation matrix for graph . Bottom right block is the square adjacency matrix of the additional nodes. is the adjacency matrix representing the links between graph and the additional nodes .
We have defined a notion of continuous deformation of graphs. This deformation has a natural and simple interpretation: any graph is a perturbed version of graph , and the larger the perturbation the higher the structural dissimilarity between and .
The next section uses the previously presented mathematical framework to analyze the consistency of the Laplacian spectrum as graph representation and its natural link to graph isomorphism problem.
3 Laplacian spectrum as graph feature representation
We place ourselves under the Hypothesis 1 saying that the difference between graphs and is characterized by the unknown deformation . A good embedding of these graphs should be close when level of deformation is low, and far otherwise. This level of deformation can be quantified by the global and nodewise entries of . These features are by construction present in the Laplacian of , denoted . We use this idea to propose an analysis of the distance between to GLS.
All proofs are detailed in appendix.
3.1 Consistency under deformation and relation to graph isomorphism
Two graphs and are isomorphic if and only if such that [24], hence when they are structurally equivalent irrespective to the vertex ordering. Several papers has proposed to use a notion of divergence to graph isomophism (DGI) to compare graphs [15, 31]. The DGI between graphs and is generally the minimal Frobenius norm of the difference between and . Considering this definition, the following Lemma links the graphisomorphism problem and the Laplacian of the hypothetical perturbation and show that this divergence is the norm of :
Lemma 1
We remind that graph isomorphism is at best solved in quasipolynomial time [2] and can not be used in practice for large graphs and datasets. The following Proposition show how the distance between GLS relaxes the isomorphismbased graph divergence.
The above result tells us that the higher the difference between GLS, the larger the hypothetical perturbation , i.e. the higher the structural dissimilarity.
We now study the implication of GLS closeness. This problem tackles the notion of nonisomorphic cospectrality, i.e. the idea that two graphs can have equal eigenvalues while having different Laplacian matrix [7]. The following proposition gives a simple insight into the problem of spectral characterization in our perturbationbased framework:
Proposition 2
We denote
the singular value decomposition (SVD) of
such that the diagonal of is in ascending order. Therefore we have the inequality .This proposition shows that equal spectrum means equal graphs only when eigenvectors are also equal. Otherwise, cospectrality for nonisomorphic graphs tells us that there exists families of graphs that are not fully determined by their spectrum. These families are characterized by some structural properties such that two nonisomorphic graphs with equal Laplacian spectrum share these properties but not their adjacency [36]. In practice, this is not a problem. First, almost all graphs are determined by their spectrum [7]. Second, equal GLS indicates the precious information that graphs share common structural properties, no matter the adjacency matrix. These properties might be what we seek to represent when representing graphs for ML tasks. Third, nonisomorphic cospectrality concerns equally sized graphs which is not likely with respect to all possible reallife graphs. When the studied dataset contains specifically cospectral nonisomorphic graphs and when the task requires unique representation property, GLS is not appropriate and more sophisticated and powerful embedding methods taking for example eigenvectors into account [37] should be studied and used. Otherwise, i.e. in almost every situations, according to previously presented results, GLS characterizes the graph and is directly related to the hypothetical perturbation .
Nevertheless, we accordingly propose the Proposition 3 to better understand GLS proximity even when graphs are nonisomorphic cospectral.
Proposition 3
The closer the GLS, the closer to unitarysimilarity the Laplacian matrices.
We remind that two real square matrices and are unitarysimilarity
if there exists an orthogonal matrix
such that . Similarity is an equivalence relation on the space of squarematrices. Moreover, divergence to unitarysimilarity is a relaxed version of the divergence to graphisomorphism [15], where the permutation matrix space is replaced by a unitary matrix space. Finally from Proposition
1 and 3 we can bound the distance between GLS as follows:In this section, we have shown that structural similarity (divergence) between graphs can be reasonably approximated by the similarity (divergence) between their GLS.
3.2 Laplacian spectrum as wholegraph representation in practice
Previous section showed the capacity of the distance between Laplacian spectrum to serve as proxy for graph similarity. In practice, a fixed embedding dimension must be chosen for all graphs in dataset . According to previous analysis, the most obvious dimension is and all graphs with less than
nodes may be padded with isolated nodes. We note that padding with isolated nodes is equivalent than adding zeros in the GLS. Nevertheless, in some datasets, some graphs can be significantly larger and the padding can become abusive. We therefore propose for these graph to have
. We simply truncate the GLS such that we keep only the highest eigenvalues. This method also enables to save computation time.The problem with this method is that we may lose information for graphs with more than nodes. In practice, for large graphs, the contribution of the lowest eigenvalues to the distance between GLS as a proxy for graph divergence is negligible. In particular, large graph have many sparse areas, such that many eigenvalues are very low, hence truncating the bottom part of the GLS may not be a problem. We assess the impact of the truncation in the experimental section.
Though, we can also propose several ways to avoid this problem, like embedding the lowest eigenvalues with simple statistics, like moments or histograms. In the experimental section, we do not use this trick.
4 Experiments
All experiments can be reproduced using the code provided at the following address: https://github.com/edouardpineau/UsingLaplacianSpectrumasGraphFeatureRepresentation
4.1 Preliminary experiments
As a first illustration of deformationbased results presented in Section 3, we propose to use ErdosRényi random graphs [12] with parameter . We focus on three simple experiments.
First, the distance between the Laplacian spectrum of a graph and a perturbed version of this graph is related to the number of perturbations. We can find the experimental illustration in Figure 1 (similar to those in [40]). We see that the number of perturbations is directly related to the distance between GLS features for edge addition and edge withdrawal. A relation between graph sparsity and Laplacian eigenvalues can be seen for example through the Gershgorin circle theorem [13].
Second, we mentioned that when a graph is significantly bigger than other graphs of a dataset, we can use a truncated GLS (tGLS). This method both saves computation time thanks to iterative eigenvalues algorithms and avoids the addition of isolated nodes in all other graphs. In Figure 3, we show results of experiments showing that tGLS is consistent with node addition. As experimental setup, we take a reference graph with nodes and compute its GLS. Then we add a randomly connected node and compute the tGLS of the new graph, by keeping only the largest eigenvalues. We repeat it 20 times. We compute the distance to reference GLS, for different levels of connectivity for the additional nodes. We first observe that the tGLS is consistent with node addition. We also confirm our previous theoretical results by observing that the more connected the additional nodes, the higher the GLS divergence.
4.2 Classification of molecular and social network graphs
We evaluate spectral feature embedding with a classification task on molecular graphs and social network graphs. Experimental setup for classification task is given in Appendix 0.E. We assume here that two structurally close graphs belong to the same class. We challenge this assumption with the following experiments.
We propose to compare GLSbased classification results to those obtained by featurebased and deep learning methods. Standard graph feature representation methods are: Earth Mover’s Distance
[28] (EMD), Pyramid Match [28] (PM), FeatureBased [3] (FB) and DynamicBased Features [14](DyF). All of these methods use support vector classifier (SVC) over extracted features. Deep learning methods are: Variational Recurrent Graph Classifier
[30] (VRGC), Graph Convolutional Network [19] (GCN), Deep Graph CNN [46] (DGCNN), Capsule GNN [41] (CapsGNN), Graph Isomorphism Network [42] (GIN) and GraphSAGE [17]. All deep learning methods are endtoend graph classifers. A description of these models is given in the related work, Section 5.Molecular graphs
We use five datasets for the experiments: Mutag (MT), Enzymes (EZ), Proteins Full (PF), Dobson and Doig (DD) and National Cancer Institute (NCI1) [18]. All graphs are chemical components. Nodes are atoms or molecules and edges represent checmical or electrostatic bindings. We note that molecular graphs contain node attributes, that are used by some models presented in Table 1. We let the question of the relevance of comparing models with slightly different inputs to the discretion of the reader. Description and statistics of molecular datasets are presented in Table 3, Appendix 0.F.
MT  EZ  PF  DD  NCI1  
EMD + SVC  86.1 0.8  36.8 0.8      72.7 0.2 
PM + SVC  85.6 0.6  28.2 0.4    75.6 0.6  69.7 0.1 
FB + SVC  84.7 2.0  29.0 1.2  70.0 1.3    62.9 1.0 
DyF + SVC  86.3 1.3  26.6 1.2  73.1 0.4    66.6 0.3 
FGSD + SVC  92.1    73.4  77.1  79.8 
VRGC  86.3 8.6  48.4 6.2  74.8 3.0    80.7 2.2 
GCN*  85.6 5.8    76.0 3.2    80.2 2.0 
DGCNN*  85.8 1.7  51.0 7.3  75.5 0.9  79.4 0.9  74.4 0.5 
CapsGNN*  86.7 6.9  54.7 5.7  76.3 3.6  75.4 4.2  78.4 1.6 
GIN0*  89.4 5.6    76.2 2.8    82.7 1.7 
GraphSAGE*  85.1 7.6    75.9 3.2    77.7 1.5 
GLS + SVC  87.9 7.0  40.7 6.3  75.3 3.5  74.3 3.5  73.3 2.1 
Social network graphs
We use five datasets for the experiments: IMDBBinary (IMBDB), IMDBMulti (IMDBM), REDDITBinary (REDDITB), REDDIT5KMulti (REDDITM) and COLLAB. All graphs are social networks. The graphs of these datasets do not contain node attributes. Therefore, we can more appropriately compare GLS + SVC to deep learning based classification. Statistics about social networks datasets are presented in Table 4, Appendix 0.F.
IMDBB  IMDBM  REDDITB  REDDITM  COLLAB  

GCN  74.0 3.4  51.9 3.8      79.0 1.8 
DGCNN  70.0 0.9  47.8 0.9  76.0 1.7    73.8 0.5 
CapsGNN  73.1 4.8  50.3 2.7    52.9 1.5  79.6 0.9 
GIN0  75.1 5.1  52.3 2.8  92.4 2.5  57.5 1.5  80.2 1.9 
GraphSAGE  72.3 5.3  50.9 2.2       
GLS + SVC  73.2 4.2  48.5 2.5  87.4 3.4  52.0 1.8  78.5 1.1 
Analysis of the results
The classification results above illustrate the capacity of GLS to capture graph structural information, under the assumption that structurally close graphs belong to the same class. The graph neuralnetworks are globally more expressive since they can leverage specific information for graph classification since is endtoend. In particular, they obtain strong results when there are node labels (see molecular experiments 4.2). Nevertheless, GLS is a simple way to represent graphs in an unsupervised manner, with theoretical background, simplicity of implementation (eigendecomposition is accessible to anyone interested in any computer) and competitive downstream classification results.
On the reasonability of using truncated GLS
We assess the impact of truncating the GLS. Using truncated GLS (tGLS) enables to (1) reduce the computational cost for large graphs and (2) reduce the dimensionality of the graph representation for all graphs. Results are presented in Figure 3 for molecular datasets.
Computation analysis
GLS extraction is a quick task, thanks to very efficient eigendecomposition algorithms for sparse graph matrices [16]. For example the complete set of molecular experiments (embedding + SVM) took approximately 5 minutes on a single CPU, most of it dedicated to the computation of largest graphs of DD.
5 Related work
We propose to divide graph feature representation into three categories: graph kernel methods, featurebased methods and deep learning.
Graph kernel methods
Kernel methods create a highdimensional feature representation of data. The kernel trick [33] avoids to compute explicitly the coordinates in the feature space, only the inner product between all pairs of data image: it is an implicit embedding methods. These methods are applied to graphs [27, 28]. It consists in performing pairwise comparisons between atomic substructures of the graphs until a good representative dictionary is found. The embedding of a graph is then the number of occurrences of these substructures within it. These substructures can be graphlets [43], subtree patterns [34], random walks [38] or paths [6]. The main difficulty lives in the choice of appropriate algorithm and kernel that accept graphs with variable size and capture useful feature for downstream task. Moreover, kernel methods can be computationally expensive but techniques like the Nyström algorithm [39] allow to lower the number of comparison with a low rank approximation of the similarity matrix.
Featurebased methods
Featurebased representation methods [3] represent each graph as the concatenation of features. Generally, the featurebased representation can offer a certain degree of interpretability and transparency. The most basic ones are the number of nodes or edges, the histogram of node degrees. These simple graphlevel features offers by construction the sought isormorphisminvariance but suffer from low expressiveness. More sophisticated algorithms consider features based on attributes of random walks on the graph [14] while others are graphlet based [21]. [20] explicitly built permutationinvariant features by mapping the adjacency matrix to a function on the symmetric group. [37] proposed a family of graph spectral distances to build graph features. Experimental work in [22]
used normalized Laplacian spectrum with random forest for graph classification with promising results.
[40] analyzes the cospectrality of different graph matrices and studies experimentally the representational power of their spectra. These two last works are directly related to the current work. Nevertheless, in both cases, the theoretical analysis is absent and comparative experiment with current benchmarks and methods is limited. in this paper we propose a response to these concerns.Deep learning based methods
GNNs learn representation of nodes of a graph by leveraging together their attributes, information on neighboring nodes and the attributes of the connecting edges. When graphs have no vertex features, the node degrees are used instead. To create graphlevel representation instead of node representation, node embeddings are pooled by a permutation invariant readout function like summation or more sophisticated information preserving ones [44, 46]. A condition of optimality for readout function is presented in [42]. Recently, [41] levraged capsule networks [32], neural units designed to enable to better preserve information at pooling time. Other popular evolution of GNNs formulate convolutionlike operations on graphs. Formulation in spectral domain [8, 10] is limited to the processing of different signals on a single graph structure, because they rely on the fixed spectrum of the Laplacian. Conversly, formulation in spatial domain are not limited to one graph structure [1, 11, 26, 17] and can infer information from unseen graph structures. At the same time, alternative to GNN exist and are related to random walk embedding. In [23]
, neural networks help to sample paths which preserve significant graph properties. Other approaches transforms graphs into sequence of nodes embedding passed into a recurrent neural network (RNN)
[45, 30] to get useful embedding. These models do not inherently include isomorphisminvariance but greedy learn it by seeing the same graph numerous times with different node ordering and embedding. These methods are powerful and globally obtain a high level of expressiveness (see experimental section 4.2).6 Conclusion
In this paper, we analyzed the graph Laplacian spectrum (GLS) as whole graph representation. In particular, we showed that comparing two GLS is a good proxy for the divergence between two graphs in term of structural information. We coupled these results to the natural invariance to isomorphism, the simplicity of implementation, the computational efficiency offered by modern randomized algorithms and the rare occurrence of detrimental cospectral nonisomorphic graphs to propose the GLS as a strong baseline graph feature representation.
References

[1]
Atwood, J., Towsley, D.: Diffusionconvolutional neural networks. In: Advances in Neural Information Processing Systems. pp. 1993–2001 (2016)

[2]
Babai, L.: Graph isomorphism in quasipolynomial time. In: Proceedings of the fortyeighth annual ACM symposium on Theory of Computing. pp. 684–697. ACM (2016)
 [3] Barnett, I., Malik, N., Kuijjer, M.L., Mucha, P.J., Onnela, J.P.: Featurebased classification of networks. arXiv preprint arXiv:1610.05868 (2016)
 [4] Belkin, M., Niyogi, P.: Laplacian eigenmaps and spectral techniques for embedding and clustering. In: Advances in neural information processing systems. pp. 585–591 (2002)
 [5] Bonald, T., Hollocou, A., Lelarge, M.: Weighted spectral embedding of graphs. In: 2018 56th Annual Allerton Conference on Communication, Control, and Computing (Allerton). pp. 494–501. IEEE (2018)
 [6] Borgwardt, K.M., Kriegel, H.P.: Shortestpath kernels on graphs. In: Data Mining, Fifth IEEE International Conference on. pp. 8–pp. IEEE (2005)
 [7] Brouwer, A.E., Haemers, W.H.: Spectra of graphs. Springer Science & Business Media (2011)
 [8] Bruna, J., Zaremba, W., Szlam, A., LeCun, Y.: Spectral networks and locally connected networks on graphs. arXiv preprint arXiv:1312.6203 (2013)
 [9] Cawley, G.C., Talbot, N.L.: On overfitting in model selection and subsequent selection bias in performance evaluation. Journal of Machine Learning Research 11(Jul), 2079–2107 (2010)
 [10] Defferrard, M., Bresson, X., Vandergheynst, P.: Convolutional neural networks on graphs with fast localized spectral filtering. In: Advances in Neural Information Processing Systems. pp. 3844–3852 (2016)
 [11] Duvenaud, D.K., Maclaurin, D., Iparraguirre, J., Bombarell, R., Hirzel, T., AspuruGuzik, A., Adams, R.P.: Convolutional networks on graphs for learning molecular fingerprints. In: Advances in neural information processing systems. pp. 2224–2232 (2015)
 [12] Erdős, P., Rényi, A.: On random graphs i. Publ. Math. Debrecen 6, 290–297 (1959)
 [13] Gershgorin, S.A.: Uber die abgrenzung der eigenwerte einer matrix (6), 749–754 (1931)
 [14] Gómez, L.G., Delvenne, J.C.: Dynamics based features for graph classification. In: Benelearn 2017: Proceedings of the TwentySixth Benelux Conference on Machine Learning, Technische Universiteit Eindhoven, 910 June 2017. p. 131
 [15] Grohe, M., Rattan, G., Woeginger, G.J.: Graph similarity and approximate isomorphism. In: 43rd International Symposium on Mathematical Foundations of Computer Science (MFCS 2018). Schloss DagstuhlLeibnizZentrum fuer Informatik (2018)
 [16] Halko, N., Martinsson, P.G., Tropp, J.A.: Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions. SIAM review 53(2), 217–288 (2011)
 [17] Hamilton, W., Ying, Z., Leskovec, J.: Inductive representation learning on large graphs. In: Advances in Neural Information Processing Systems. pp. 1024–1034 (2017)
 [18] Kersting, K., Kriege, N.M., Morris, C., Mutzel, P., Neumann, M.: Benchmark data sets for graph kernels (2016), http://graphkernels.cs.tudortmund.de
 [19] Kipf, T.N., Welling, M.: Semisupervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907 (2016)

[20]
Kondor, R., Borgwardt, K.M.: The skew spectrum of graphs. In: Proceedings of the 25th international conference on Machine learning. pp. 496–503. ACM (2008)
 [21] Kondor, R., Shervashidze, N., Borgwardt, K.M.: The graphlet spectrum. In: Proceedings of the 26th Annual International Conference on Machine Learning. pp. 529–536. ACM (2009)
 [22] de Lara, N., Pineau, E.: A simple baseline algorithm for graph classification. Relational Representation Learning, NeurIPS Workshop (2018)
 [23] Li, C., Ma, J., Guo, X., Mei, Q.: Deepcas: An endtoend predictor of information cascades. In: Proceedings of the 26th International Conference on World Wide Web. pp. 577–586. International World Wide Web Conferences Steering Committee (2017)
 [24] Merris, R.: Laplacian matrices of graphs: a survey. Linear algebra and its applications 197, 143–176 (1994)
 [25] Newman, M.E.: Spectral methods for community detection and graph partitioning. Physical Review E 88(4), 042822 (2013)
 [26] Niepert, M., Ahmed, M., Kutzkov, K.: Learning convolutional neural networks for graphs. In: International conference on machine learning. pp. 2014–2023 (2016)
 [27] Nikolentzos, G., Meladianos, P., Limnios, S., Vazirgiannis, M.: A degeneracy framework for graph similarity. In: IJCAI. pp. 2595–2601 (2018)
 [28] Nikolentzos, G., Meladianos, P., Vazirgiannis, M.: Matching node embeddings for graph similarity. In: AAAI. pp. 2429–2435 (2017)
 [29] Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., et al.: Scikitlearn: Machine learning in python. Journal of machine learning research 12(Oct), 2825–2830 (2011)
 [30] Pineau, E., de Lara, N.: Variational recurrent neural networks for graph classification. In: Representation Learning on Graphs and Manifolds Workshop (2019)
 [31] Rameshkumar, A., Palanikumar, R., Deepa, S.: Laplacian matrix in algebraic graph theory. Journal Impact Factor pp. 0–489 (2013)
 [32] Sabour, S., Frosst, N., Hinton, G.E.: Dynamic routing between capsules. In: Advances in neural information processing systems. pp. 3856–3866 (2017)
 [33] ShaweTaylor, J., Cristianini, N., et al.: Kernel methods for pattern analysis. Cambridge university press (2004)
 [34] Shervashidze, N., Schweitzer, P., Leeuwen, E.J.v., Mehlhorn, K., Borgwardt, K.M.: Weisfeilerlehman graph kernels. Journal of Machine Learning Research 12(Sep), 2539–2561 (2011)
 [35] Shuman, D.I., Ricaud, B., Vandergheynst, P.: Vertexfrequency analysis on graphs. Applied and Computational Harmonic Analysis 40(2), 260–291 (2016)
 [36] Van Dam, E.R., Haemers, W.H.: Which graphs are determined by their spectrum? Linear Algebra and its applications 373, 241–272 (2003)
 [37] Verma, S., Zhang, Z.L.: Hunt for the unique, stable, sparse and fast feature learning on graphs. In: Advances in Neural Information Processing Systems. pp. 88–98 (2017)
 [38] Vishwanathan, S.V.N., Schraudolph, N.N., Kondor, R., Borgwardt, K.M.: Graph kernels. Journal of Machine Learning Research 11(Apr), 1201–1242 (2010)
 [39] Williams, C.K., Seeger, M.: Using the nyström method to speed up kernel machines. In: Advances in neural information processing systems. pp. 682–688 (2001)

[40]
Wilson, R.C., Zhu, P.: A study of graph spectra for comparing graphs and trees. Pattern Recognition
41(9), 2833–2841 (2008)  [41] Xinyi, Z., Chen, L.: Capsule graph neural network. International Conference on Learning Representations (2018)
 [42] Xu, K., Hu, W., Leskovec, J., Jegelka, S.: How powerful are graph neural networks? International Conference on Learning Representations (2019)
 [43] Yanardag, P., Vishwanathan, S.: Deep graph kernels. In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. pp. 1365–1374. ACM (2015)
 [44] Ying, Z., You, J., Morris, C., Ren, X., Hamilton, W., Leskovec, J.: Hierarchical graph representation learning with differentiable pooling. In: Advances in Neural Information Processing Systems. pp. 4800–4810 (2018)
 [45] You, J., Ying, R., Ren, X., Hamilton, W., Leskovec, J.: Graphrnn: Generating realistic graphs with deep autoregressive models. In: International Conference on Machine Learning. pp. 5694–5703 (2018)

[46]
Zhang, M., Cui, Z., Neumann, M., Chen, Y.: An endtoend deep learning architecture for graph classification. In: ThirtySecond AAAI Conference on Artificial Intelligence (2018)
Appendix 0.A Proof of Lemma 1
with and the unit vector.
Therefore,
Appendix 0.B Proof of Proposition 1
From lemma 1 we have . Moreover, from Weyl’s eigenvalues inequalities and since eigenvalues are isomorphism invariant:
Hence: .
Now let be any eigen couple of a matrix . We can always pick and build such that and . Hence:
Using previous results we get:
with the Frobenius norm.
Appendix 0.C Proof of Proposition 2
We remind that the Forbenius norm is unitarily invariant thanks to the cyclic property of the trace. For any we have:
In particular if :
We also have that
Hence: .
Appendix 0.D Proof of Proposition 3
Denoting the orthogonal matrices group (orthogonal since real), we want to show that:
We denote the eigendecomposition of the Laplacian with . Since is unitary and using property of Frobenius norm, we have, :
We know that and are orthogonal since they are respectively eigenvector matrices of symmetric matrix and . We therefore have:
Moreover , if then .
Hence,
with the permutation group of .
Appendix 0.E Experimental setup for classification of graphs
For classification, we use standard 10folds cross validation setup. Each dataset is divided into 10 folds such that the class proportions are preserved in each fold for all datasets. These folds are then used for crossvalidation i.e, one fold serves as the testing set while the other ones compose the training set. Results are averaged over all testing sets. All figures gathered in the tables of results are build using this setup. For the dimension , representing the number of eigenvalues we keep to build the truncated GLS, we chose the percentile 95 of the distribution of graph sizes in each dataset, i.e. we truncate the 5 smallest eigenvalues. Considering weak truncation impact (see Section 4.2, when we have large datasets containing large graphs, like the two REDDIT datasets, we can truncate more severely to make the problem computationally more efficient. In particular considering that GLS approached as a simple baseline more than a final graph representation for large scale usage.
We use the support vector classifier (SVC) from scikitlearn [29]
. We impose Radial Basis Function as kernel, i.e.
. It is a similarity measure related to norm between GLS. Hence, our theoretical results remain consistent with our experiments. Hyper parameters and are tuned among respectively and for the molecular datasets, and andfor the social network datasets. In practice, using a global pool for all the datasets gives equivalent results, but hyperparameter inference becomes expensive with a too large grid, in particular in a 10fold cross validation setup. We use a nested hyperparameter search crossvalidation for each of the 10 folds: in each 90% training fold we performe a 5fold random search crossvalidation before training. We therefore avoid the problem of overfitting related to model selection that appear when using nonnested crossvalidation
[9].Appendix 0.F Characteristics of the real datasets
We use five molecular datasets and five social network datasets for the experiments [18]. Tables 3 and 4 gives statistics of the differents datasets. All used datasets can be found at the following address: https://ls11www.cs.tudortmund.de/staff/morris/graphkerneldatasets [18].
Molecular graphs datasets are Mutag (MT), Enzymes (EZ), Proteins Full (PF), Dobson and Doig (DD) and National Cancer Institute (NCI1). In MT, the graphs are either mutagenic and not mutagenic. EZ graphs are tertiary structures of proteins from the 6 Enzyme Commission top level classes. In DD, compounds are secondary structures of proteins that are enzyme or not. PF is a subset of DD without the largest graphs. In NCI1, graphs are anticancer or not. The graphs of these datasets have node labels that can be leverages by graph neural networks.
MT  EZ  PF  DD  NCI1  

graphs  188  600  1113  1178  4110 
classes  2  6  2  2  2 
bias ()  66.5  16.7  59.6  58.7  50.0 
min./max.  10/28  2/125  4/620  30/5736  3/106 
avg.  18  33  39  284  30 
avg.  39  124  146  1431  65 
Node attributes  ✓  ✓  ✓  ✓  ✓ 
Social networks datasets are IMDBBinary (IMBDB), IMDBMulti (IMDBM), REDDITBinary (REDDITB), REDDIT5KMulti (REDDITM) and COLLAB. REDDITB and REDDITM contain graphs representing discussion thread, with edges between users (nodes) when one responded to the other’s comment. Classes are the subreddit topics from which thread have originated. IMDBB and IMDBM contain networks of actors that appeared together within the same movie. IMDBB contains two classes for action or romance genres and IMDBM three classes for comedy, romance and scifi. COLLAB graphs represent scientific collaborations, with edge between two researchers meaning that they coauthored a paper. Labels of the graphs correspond to subfields of Physics. The graphs of these datasets have no node attributes and therefore enable fair comparison with deep learning methods.
IMDBB  IMDBM  REDDITB  REDDITM  COLLAB  

graphs  1000  1500  2000  4999  5000 
classes  2  3  2  5  3 
bias ()  50.0  33.3  50.0  20.0  52.0 
min./max.  12/136  7/89  3/3760  22/3606  32/492 
avg.  20  13  426  501  75 
avg.  97  66  496  590  2458 
Node attributes  ✗  ✗  ✗  ✗  ✗ 
Appendix 0.G Additional insight on the acceptability of using truncated GLS
Figure 4 illustrates the reasonability of using only the highest eigenvalues of the Laplacian spectrum as wholegraph feature representation. We take the original and final graphs of the deformationconsistency test presented in Figure 3. We compute the distance between tGLS with dimension and divide it by , for varying from to . The objective is to confirm that first eigenvalues are relatively more important to discriminate to structurally different graphs, which is the case. We note that for the ErdosReyni case with few connected additional nodes, first eigenvalues are not as relatively important as for the other example. In fact, adding nodes with stochastic connections is the construction process of ErdosReyni graphs. Hence, discriminating augmented graph from the original one is difficult based only on the structural information.