# Towards a practical k-dimensional Weisfeiler-Leman algorithm

The k-dimensional Weisfeiler-Leman algorithm is a well-known heuristic for the graph isomorphism problem. Moreover, it recently emerged as a powerful tool for supervised graph classification. The algorithm iteratively partitions the set of k-tuples, defined over the set of vertices of a graph, by considering neighboring k-tuples. Here, we propose a local variant which considers a subset of the original neighborhood in each iteration step. The cardinality of this local neighborhood, unlike the original one, only depends on the sparsity of the graph. Surprisingly, we show that the local variant has at least the same power as the original algorithm in terms of distinguishing non-isomorphic graphs. In order to demonstrate the practical utility of our local variant, we apply it to supervised graph classification. Our experimental study shows that our local algorithm leads to improved running times and classification accuracies on established benchmark datasets.

## Authors

• 11 publications
• 23 publications
• ### Global Weisfeiler-Lehman Graph Kernels

Most state-of-the-art graph kernels only take local graph properties int...
03/07/2017 ∙ by Christopher Morris, et al. ∙ 0

• ### DegreeSketch: Distributed Cardinality Sketches on Massive Graphs with Applications

We present DegreeSketch, a semi-streaming distributed sketch data struct...
04/08/2020 ∙ by Benjamin W. Priest, et al. ∙ 0

• ### Semi-Supervised Learning on Graphs Based on Local Label Distributions

In this work, we propose a novel approach for the semi-supervised node c...
02/15/2018 ∙ by Evgeniy Faerman, et al. ∙ 0

• ### Local Partition in Rich Graphs

Local graph partitioning is a key graph mining tool that allows research...
03/14/2018 ∙ by Scott Freitas, et al. ∙ 0

• ### Let's Agree to Degree: Comparing Graph Convolutional Networks in the Message-Passing Framework

In this paper we cast neural networks defined on graphs as message-passi...
04/06/2020 ∙ by Floris Geerts, et al. ∙ 0

• ### Generalizing Graph Convolutional Neural Networks with Edge-Variant Recursions on Graphs

This paper reviews graph convolutional neural networks (GCNNs) through t...
03/04/2019 ∙ by Elvin Isufi, et al. ∙ 0

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

Most practical successful solvers for the graph isomorphism problem, such as nauty [19, 17], as well as the theoretically fastest state-of-the-art graph isomorphism algorithm [3]

are based on vertex refinement. Moreover, in recent years, it was successfully applied to machine learning in the area of (supervised) graph classification. Here the aim is to learn a model using a set of labeled graphs to infer the labels of unlabeled graphs. Possible applications include classification of molecules or social networks

[24, 6, 21, 23]

. Recently, some connections to neural networks for graphs

[22]

and dimension reduction in linear programming

[12] have been shown.

Given two graphs and , the idea of vertex refinement algorithms is to discard bijections between the vertices of and that do not induce an isomorphism between them. Hence, if the algorithm discards all possible bijections, we can be sure that the two graphs are not isomorphic. A well-known instance of this class of algorithms is the -dimensional Weisfeiler-Leman algorithm (-WL), which iteratively partitions or colors the set of -tuples defined over the set of vertices of a graph by considering neighboring -tuples. This coloring encodes valid bijections. The cardinality of the neighborhood of a -tuple is fixed to where denotes the number of vertices of a given graph. Hence, the running time of each iteration of the algorithm does not take the sparsity of a given graph into account.

#### Our Contribution.

We propose a local variant of the -WL. This variant, the local --dimensional Weisfeiler-Leman algorithm (--LWL), considers a subset of the original neighborhood in each iteration. The cardinality of this local neighborhood only depends on the sparsity of the graph, i.e., the degrees of the vertices of a given -tuple. We theoretically analyze the strength of our new local variant and prove that it has the same power as a variant of the -WL [18] in terms of distinguishing non-isomorphic graphs, which in turn has at least the same power as the original algorithm.

We apply our algorithm to supervised graph classification and show that our local algorithm is several magnitudes faster than the original algorithms while achieving higher accuracies for the graph classification problem on real-world benchmark datasets.

#### Related work

The -WL has been heavily investigated in the theory community. Equivalence to logic [14], and Sherali-Adams relaxations of the natural integer linear program for the graph isomorphism problem [2, 13, 18] have been shown. In their seminal paper [8], Cai et al. showed that for each there exists a pair of non-isomorphic graphs of size each that cannot be distinguished by the -WL. A thorough overview of these results can be found in [12]. For , the power of the algorithm has been completely characterized [1]. The algorithm plays a prominent role in the recent result of Babai [3] improving the best-known running time for the graph isomorphism algorithm. Moreover, tight upper bounds on the running time have been shown for and  [4, 16]. A weaker variant of the local -dimensional Weisfeiler-Leman algorithm based on -sets has been suggested in [23].

## 2 Preliminaries

In the following, we fix notation and outline the -WL and its variant introduced in [18].

### 2.1 Mathematical preliminaries

A graph is a pair with a finite set of vertices and a set of edges . We denote the set of vertices and the set of edges of by and , respectively. For ease of notation we denote the edge in by or . In the case of directed graphs . A labeled graph is a triple with a label function , where is some finite alphabet. Then is a label of for in . The neighborhood of in is denoted by . Moreover, its complement . Let then is the subgraph induced by with .

We say that two graphs and are isomorphic if there exists an edge preserving bijection , i.e., is in if and only if is in . If and are isomorphic, we write and call an isomorphism between and . Moreover, we call the equivalence classes induced by isomorphism types, and denote the isomorphism type of by . In the case of labeled graphs, we additionally require that for in and in . In the case that and are directed, isomorphic trees rooted at in and in , respectively, we write . Let be a tuple in for , then is the subgraph induced by the components of , where the vertices are labeled with integers from corresponding to indices of . Moreover, let , for , and let denote a multiset.

### 2.2 Vertex refinement algorithms

Here we are interested in two vertex refinement algorithms. The -WL due to László Babai, see, e.g. [8], and the --dimensional Weisfeiler-Leman algorithm (--WL), which is a variant of the -dimensional combinatorial vertex coloring algorithm due to Malkin [18]. We first formally define the -WL. We largely follow the exposition due to Grohe [11].

Let be a graph, and let . In each iteration , the algorithm computes a coloring , where is some abitrary codomain. In the first iteration (), two tuples and in get the same color if the map induces an isomorphism between and . Now, for , is defined by

 Cki+1(v)=(Cki(v),Mi(v)), (1)

where the multiset

 (2)

and

 ϕj(v,w)=(v1,…,vj−1,w,vj+1,…,vk).

That is, replaces the -th component of the tuple with the vertex . We say that for in is a -neighbor of . We run the algorithm until convergence, i.e.,

 Cki(v)=Cki(w)⟺Cki+1(v)=Cki+1(w),

for all and in holds, and call the partition of induced by the stable partition. For such , we define for in .

For two graphs and , we run the algorithm in “parallel” on both graphs. Then the -WL distinguishes between them if

 |V(G)k∩(Ck∞)−1(c)|≠|V(H)k∩(Ck∞)−1(c)|,

for some color in the codomain of . Hence, if the -WL distinguishes two graphs, the graphs are not isomorphic.

For , the classical Weisfeiler-Leman algorithm is based on the usual neighborhood of a vertex. That is, in the first iteration, we color the vertices uniformly. For , is defined by

 C1i+1(v)=(C1i(v),{{C1i(w)∣w∈δ(v)}}).

Hence, two vertices with the same color in iteration get a different color in the next iteration if the number of neighbors colored with a certain color is different. Observe that it is straightforward to extend the -WL to labeled, directed graphs.

The --WL follows the same ratio but uses

 Ck,δ,¯δi+1(v)=(Ck,δ,¯δi(v),Mδ,¯δi(v)),

where,

 (3)

instead of Equation 1 and Equation 2, respectively, where

 1δ((u,w))={Lif w∈δ(u)Gotherwise,

for and in . For , we set . We say that is a local -neighbor if is in , and otherwise it is a global -neighbor, which is indicated by L and G in Equation 3, respectively. See Figure 1 for an example. Hence, the difference between the two above algorithms is that the -WL does not distinguish between local and global neighbors of a -tuple. Observe that for , the above algorithm and the classical Weisfeiler-Leman algorithm have the same power.

Let and denote two vertex refinement algorithms, we write if distinguishes between all non-isomorphic pairs does. The following result relates the two algorithms from above. Since for a graph , implies for all and in and , the following holds.

###### Proposition 1

For all graphs and , the following holds:

 δ-k-WL⊑k-WL.

## 3 Local δ-k-dimensional Weisfeiler-Leman algorithm

In this section, we define the new local --dimensional Weisfeiler-Leman algorithm (--LWL), which is a variant of the --WL considering only local neighbors. That is, instead of Equation 3, it uses

 Mδi(v)={{sδ(v,w)∣w∈V(G)}},

where

 sδ(v,w)={(ϕj(v,w),j)∣j∈[k] and w∈δ(vj)}.

Hence, the labeling function is defined by

 Ck,δi+1(v)=(Ck,δi(v),Mδi(v)).

Therefore, the algorithm only considers the local -neighbors of the vertex in each iteration. In the following, we show that the --WL and the --LWL have the same power. That is, we prove the following theorem.

###### Theorem 3.1

For all connected graphs and , the following holds:

 δ-k-LWL⊑δ-k-WL and δ-k-WL⊑δ-k-LWL.

Moreover, using Proposition 1, it immediately follows that the --WL has at least the same power as the -WL.

###### Corollary 1

For all connected graphs and , the following holds:

 δ-k-LWL⊑k-WL.

### 3.1 Proof of Theorem 3.1

The idea of the proof is to show that both algorithms, the local and the global one, can be “simulated” on infinite, directed, labeled trees by the -WL by recursively unrolling the local or global neighborhood of each -tuple. We then show that two such local trees are isomorphic if and only if the corresponding global trees are isomorphic. Since the -WL computes the isomorphism type for trees, the result follows. All proofs can be found in the appendix.

In order to formalize the above idea, we need to introduce some terminology. We introduce the -tuple graph and the unrolling of the neighborhood around a vertex. Together, these two definitions enable us to reduce the equivalence of both algorithms to a tree isomorphism problem. The -tuple graph essentially contains the set of all -tuples as vertices. Two such vertices are joined by an edge if the associated -tuples are neighbors. The formal definition of a -tuple graph is as follows.

###### Definition 1

Let be a graph, and let and be tuples in , then the directed, labeled -tuple graph , where , and

 (vs,vt)∈ET⟺t=ϕj(s,w), (4)

for in and some in . Let if is a local -neighbor of and , otherwise, and let . Finally, the map labels each edge with the exchanged vertex, i.e.,

 ex((vs,vt))=w.

Analogously, we define the local -tuple graph that uses

 (vs,vt)∈ET⟺t=ϕj(s,w) for w∈δ(vj),

The following lemma states that the --WL can be simulated on the -tuple graph using a variant of the -WL.

###### Lemma 1 ()

Let be a graph and let be the corresponding -tuple graph. Moreover, let and be -tuples in , then there exists a variant of the -WL with coloring such that

 Ck,δ,¯δi(s)=Ck,δ,¯δi(t)⟺C1,∗i(vs)=C1,∗i(vt),

for all . The same result holds for .

The unrolling of a neighborhood around a vertex of a given graph to a tree is defined a follows, see Figure 2 for an illustration.

###### Definition 2

Let be a labeled (directed) graph and let be in . Then for denotes the unrolled tree around at depth , where

 Wi={{v(0,v)}if i=0\mathrlapWi−1∪{u(i,w)∣u∈δ(w) for w(i−1,p)∈Wi−1}otherwise, and Fi={∅if i=0Fi−1∪{(w(i−1,p),u(i,w))∣u∈δ(w) for w(i−1,p)∈Wi−1}otherwise.

The label function is defined as for in , and . Furthermore, for in and in .

In the following, we use the unrolled tree for the above defined (local) -tuple graphs. For , we denote the directed, unrolled tree in the -tuple graph of around the vertex at depth for the tuple in by . For notational convenience, we write , the analogous local tree is denoted by . Note that we write if there exists an isomorphism between the two unrolled trees that also respects the mapping . The same holds for the local trees.

Finally, we need the following two results. The first one states that the -WL can distinguish any two directed, labeled non-isomorphic trees.

###### Theorem 3.2 ([7, 26])

The -WL distinguishes any two directed, labeled non-isomorphic trees.

Using the first result, the second one states that the (local) --WL can be simulated by -WL on the unrolled tree of the -tuple graph, and hence can be reduced to a tree isomorphism problem.

###### Lemma 2 ()

Let be a connected graph, then the --WL colors and in the same if and only if the corresponding unrolled -tuple trees are isomorphic, i.e.,

 Ck,δ,¯δi(s)=Ck,δ,¯δi(t)⟺UiT,vs≃vs→vtUiT,vt,

for all in . The same holds for the --LWL, i.e.,

 Ck,δi(s)=Ck,δi(t)⟺UiT,vs,L≃vs→vtUiT,vt,L,

for all in .

We can now prove the essential lemma for the proof of Theorem 3.1. It states that the trees of the unrolled neighborhoods of two vertices in the local -tuples graphs are isomorphic if and only if the same holds for the corresponding global trees.

###### Lemma 3 ()

Let be a connected graph. Moreover, let and be -tuples from , and let and , respectively, be the corresponding vertices in the unrolled tree of the -tuple graph . Then for all in there exists an such that

 UlG,v,L≃v→uUlG,u,L⟺UiG,v≃v→uUiG,u.

Together with Lemma 2, the above Lemma directly implies Theorem 3.1.

### 3.2 Practicality

As Theorem 3.1 shows, the --WL and the --LWL have the same power in terms of distinguishing non-isomorphic graphs. Although for dense graphs the local algorithm will have the some running time, for sparse graphs the running time for each iteration can be upper-bounded by , where denotes the maximum or average degree of the graph. Hence, the local algorithm takes the sparsity of the underlying graph into account, resulting in improved computation times compared to the non-local --WL and the -WL, see Section 5. Moreover, this also allows us to employ implementations based on sparse linear algebra routines [15].

## 4 Application to supervised graph classification

Supervised graph classification is an active area in the machine learning community. Standard machine learning algorithms, such as SVMs or neural networks, require vectorial input or input with a regular structur, e.g., images. Hence, the aim of graph classification approaches is to map a graph to a usually high-dimensional vector space, where then standard machine learning approaches can be applied. In order to avoid solving optimization problems in high-dimensional vector spaces the so-called

kernel trick is employed. Here a (positive-semidefinite) kernel function for a pair of graphs is computed, which can be interpreted as a similarity score between the two graphs. This function can then be fed into an SVM, e.g., see [20] for further details.

The idea of the Weisfeiler-Lehman subtree graph kernel [24] is to compute the -WL for iterations resulting in a label function for each iteration . Now after each iteration, we compute a feature vector in for each graph . Each component counts the number of occurrences of vertices labeled with in . The overall feature vector is defined as the concatenation of the feature vectors of all iterations, i.e., . The Weisfeiler-Lehman subtree kernel for iterations then is computed as , where denotes the standard inner product. The running time for a single feature vector computation is in and for the computation of the gram matrix for a set of graphs [24], where and denote the maximum number of vertices and edges over all graphs, respectively. This approach can be naturally lifted to the -dimensional case leading to more expressive kernel functions.

## 5 Experimental evaluation

Our intention here is to investigate the benefits of the --LWL kernel compared to the --WL and the -WL kernel. More precisely, we address the following questions:

Q1

How much does the local algorithm speed up the computation time compared to the non-local algorithms?

Q2

Does the local algorithm lead to improved classification accuracies on real-world benchmark datasets?

Q3

Does the local algorithm prevent overfitting to the training set?

### 5.1 Datasets and graph kernels

We used the following well-known datasets: Enzymes, IMDB-Binary, IMDB-Multi, NCI1, NCI109, PTC_FM, Proteins, and Reddit-Binary

to evaluate our kernels. See the appendix for descriptions, statistics and properties.

111All datasets can be obtained from http://graphkernels.cs.tu-dortmund.de.

We implemented the --LWL, the --LWL, and the -WL kernel for in . We compare our kernels to the Weisfeiler-Lehman subtree kernel [24], the graphlet kernel [25], and the shortest-path kernel [5]. All kernels were (re-)implemented in .222The source code can be obtained from https://github.com/chrsmrrs/localwl.

### 5.2 Experimental protocol

For each kernel, we computed the normalized gram matrix. We computed the classification accuracies using the -SVM implementation of LIBSVM [9], using 10-fold cross validation. The -parameter was selected from by 10-fold cross validation on the training folds.

We repeated each 10-fold cross validation ten times with different random folds, and report average accuracies and standard deviations. We report computation times for the -WL, the --LWL, the --LWL, and the -WL with three refinement steps. For the graphlet kernel we counted (labeled) connected subgraphs of size three. For measuring the classification accuracy the number of iterations of the -WL, --LWL, the --LWL, and the -WL were selected from using 10-fold cross validation on the training folds only.333As already shown in [24], choosing the number of iterations too large will lead to overfitting.

To answer Question 3 we used a single 10-fold cross validation with the hyperparameters found in the former experiment and report average training and test accuracies. All experiments were conducted on a workstation with an Intel Xeon E5-2690v4 with 2.60

Hz and 384B of RAM running Ubuntu 16.04.6 LTS using a single core. Moreover, we used the GNU Compiler 5.5.0 with the flag --O2.

### 5.3 Results and discussion

• The local algorithm severely speeds up the computation time compared to the -WL and the --WL for and . For example, on the Enzymes dataset the -LWL is over times faster than the -WL, the same holds for the -LWL. The improvement of the computation times can be observed accross all datasets. For some datasets, the -WL and --WL did not finish within the given time limit or went out of memory.

• The local algorithm for and severely improves the classification accuracy compared to the -WL and the --WL. For example, on the Enzymes dataset the -LWL achieves an improvement of %, and the -LWL achieves the best accuracies over all employed kernels, improving over the -WL and the --WL by almost %.

• As Table 3 shows the --WL reaches slighty higher training accuracies over all datasets compared to the -LWL, while the testing accuracies are much lower. This indicates that the --WL overfits on the training set. The higher test accuracies of the local algorithm are likely due to the smaller neighborhood which promotes that the number of colors grow slower compared to the global algorithm. Hence, the smaller neighborhood of the local algorithms acts as a graph-based regularization.

## 6 Conclusion

We introduced a variant of the -dimensional Weisfeiler-Leman algorithm, the local --dimensional algorithm, and showed that it has at least the same power as the -dimensional WL. Moreover, we argued that the --LWL takes the sparsity of the underlying graphs into account which leads to vastly reduced computation times in practice. We evaluated our theoretical findings by applying our algorithm to (supervised) graph classification. We showed that our algorithms runs much faster than the --WL and the -WL while achieving higher classification accuracies on a wide range on benchmark datasets.

## Acknowledgement

This work has been supported by the German Science Foundation (DFG) within the Collaborative Research Center SFB 876 “Providing Information by Resource-Constrained Data Analysis”, project A6 “Resource-efficient Graph Mining”.

## References

• [1] V. Arvind, J. Köbler, G. Rattan, and O. Verbitsky. On the power of color refinement. In 20th International Symposium on Fundamentals of Computation Theory, volume 9210 of Lecture Notes in Computer Science, pages 339–350. Springer, 2015.
• [2] A. Atserias and E. N. Maneva. Sherali-adams relaxations and indistinguishability in counting logics. SIAM Journal on Computing, 42(1):112–137, 2013.
• [3] L. Babai. Graph isomorphism in quasipolynomial time. In

48th ACM SIGACT Symposium on Theory of Computing

, pages 684–697. ACM, 2016.
• [4] C. Berkholz, P. S. Bonsma, and M. Grohe. Tight lower and upper bounds for the complexity of canonical colour refinement. In 21st European Symposium on Algorithms, volume 8125 of Lecture Notes in Computer Science, pages 145–156. Springer, 2013.
• [5] K. M. Borgwardt and H.-P. Kriegel. Shortest-path kernels on graphs. In 5th IEEE International Conference on Data Mining, pages 74–81. IEEE Computer Society, 2005.
• [6] K. M. Borgwardt, C. S. Ong, S. Schönauer, S. V. N. Vishwanathan, A. J. Smola, and H.-P. Kriegel. Protein function prediction via graph kernels. Bioinformatics, 21(Supplement 1):i47–i56, 2005.
• [7] R. G. Busacker and T. L. Saaty. Finite graphs and networks: an introduction with applications. McGraw-Hill, 1965.
• [8] J. Cai, M. Fürer, and N. Immerman. An optimal lower bound on the number of variables for graph identifications. Combinatorica, 12(4):389–410, 1992.
• [9] C.-C. Chang and C.-J. Lin.

LIBSVM: A library for support vector machines.

ACM Transactions on Intelligent Systems and Technology, 2:27:1–27:27, 2011. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm.
• [10] A. Feragen, N. Kasenburg, J. Petersen, M. D. Bruijne, and K. M. Borgwardt. Scalable kernels for graphs with continuous attributes. In Advances in Neural Information Processing Systems, pages 216–224, 2013. Erratum available at http://image.diku.dk/aasa/papers/graphkernels_nips_erratum.pdf.
• [11] M. Grohe. Descriptive Complexity, Canonisation, and Definable Graph Structure Theory. Lecture Notes in Logic. Cambridge University Press, 2017.
• [12] M. Grohe, K. Kersting, M. Mladenov, and E. Selman. Dimension reduction via colour refinement. In 22th European Symposium on Algorithms, volume 8737 of Lecture Notes in Computer Science, pages 505–516. Springer, 2014.
• [13] Martin Grohe and Martin Otto. Pebble games and linear equations. In International Workshop on Computer Science Logic, volume 16 of LIPIcs. Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik, 2012.
• [14] N. Immerman and E. Lander. Describing Graphs: A First-Order Approach to Graph Canonization, pages 59–81. 1990.
• [15] K. Kersting, M. Mladenov, R. Garnett, and M. Grohe. Power iterated color refinement. In

28th AAAI Conference on Artificial Intelligence

, pages 1904–1910, 2014.
• [16] S. Kiefer and P. Schweitzer. Upper bounds on the quantifier depth for graph differentiation in first order logic. In 31st ACM/IEEE Symposium on Logic in Computer Science, pages 287–296, 2016.
• [17] J. L. López-Presa and A. Fernández Anta. Fast algorithm for graph isomorphism testing. In Experimental Algorithms, pages 221–232, 2009.
• [18] Peter N. Malkin. Sherali–adams relaxations of graph isomorphism polytopes. Discrete Optimization, 12:73 – 97, 2014.
• [19] B. D. McKay and A. Piperno. Practical graph isomorphism, ii. Journal of Symbolic Computation, 60:94–112, 2014.
• [20] M. Mohri, A. Rostamizadeh, and A. Talwalkar. Foundations of Machine Learning. MIT Press, 2012.
• [21] C. Morris, N. M. Kriege, K. Kersting, and P. Mutzel. Faster kernel for graphs with continuous attributes via hashing. In 16th IEEE International Conference on Data Mining, pages 1095–1100, 2016.
• [22] C. Morris, M. Ritzert, M. Fey, W. L. Hamilton, Jan Eric Lenssen, G. Rattan, and M. Grohe. Weisfeiler and leman go neural: Higher-order graph neural networks. In 33th AAAI Conference on Artificial Intelligence, 2019.
• [23] Christopher Morris, Kristian Kersting, and Petra Mutzel. Glocalized weisfeiler-lehman graph kernels: Global-local feature maps of graphs. In 17th IEEE International Conference on Data Mining, pages 327–336. IEEE Computer Society, 2017.
• [24] N. Shervashidze, P. Schweitzer, E. J. van Leeuwen, K. Mehlhorn, and K. M. Borgwardt. Weisfeiler-Lehman graph kernels. Journal of Machine Learning Research, 12:2539–2561, 2011.
• [25] N. Shervashidze, S. V. N. Vishwanathan, T. H. Petri, K. Mehlhorn, and K. M. Borgwardt. Efficient graphlet kernels for large graph comparison. In 12th International Conference on Artificial Intelligence and Statistics, pages 488–495, 2009.
• [26] G. Valiente. Algorithms on Trees and Graphs. Springer, 2002.
• [27] N. Wale, I. A. Watson, and G. Karypis. Comparison of descriptor spaces for chemical compound retrieval and classification. Knowledge and Information Systems, 14(3):347–375, 2008.
• [28] P. Yanardag and S. V. N. Vishwanathan. Deep graph kernels. In 21st ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 1365–1374, 2015.

## Appendix

In the following we outline the proofs and describe the used datasets.

## Proofs

See 1

###### Proof

We show the results by induction on the number of iterations. For , the result follows by the definition of the label function . Now assume the result holds for some . Let be a tuple in and the corresponding vertex in the -tuple graph. We show how to construct from the labels of the former. The other direction follows by the same means. Let be a vertex in . We first collect the vertices in the neighborhood of that can be reached by edges labeled by the function , i.e.,

Hence, we can now construct a tuple from the above, which is equivalent to an element in the multiset corresponding to the vertex . Hence, can define the coloring .

The coloring of the needed variant of the -WL is defined as

 C1,∗i+1(t)=(C1,∗i(t),M1i(t)), (5)

where the multiset

 M1i(t)={{s1i(t,w)∣w∈V(G)}}.

See 2

###### Proof

First, by Lemma 1, we can simulate the (local) --WL for the graph in the -tuple graph by the -WL. Secondly, consider a vertex in the -tuple graph and a corresponding vertex in the unrolled tree around . Observe that the neighborhoods for both vertices are identical. By definition, this holds for all vertices (excluding the leaves) in the unrolled tree. Hence, by Lemma 1, we can simulate the (local) --WL for each tuple by running the -WL in the unrolled tree around in the -tuple graph. Since the -WL solves the isomorphism problem for trees, cf. Theorem 3.2, the result follows. ∎

See 3

###### Proof

The implication for “” follows by definition. Hence, we show the case “”. Thereto, assume that for some large enough holds. The exact choice for will be determined later, see below.

Assume that there is a bijection between the two trees and that is a (labeled) tree isomorphism. We now extend this isomorphism to the map between the global trees and argue that it is again a tree isomorphism, namely an isomorphism between and .

First, observe that we can assume that maps each tuple containing a particular set of vertices to a tuple in the other local tree that always contains the same vertices throughout the local tree. Let be a vertex from , and let be a global -neighbor of for which we like to define the map . We search, starting at the vertex , for the first occurrence of a vertex in , reachable on a path, where each edge has a label of the form , that represents the same tuple as . We call such a path a -path. Since the graph is connected there must exist such a path. We denote such vertex by . We now consider the vertex . Since was reached on a -path, and the two local trees are isomorphic, there exists a vertex in the neighborhood of that represents the same tuple as the vertex . By assumption on , the trees rooted at and , respectively, are isomorphic, i.e.,

 UlG,t∗,L≃t∗→¯t∗UlG,¯t∗,L.

We now set . By construction, it follows that

 UlG,t,L≃t→¯tUlG,¯t,L.

By applying the above procedure to every global neighbor in top-down fashion, we eventually get the desired map for the global trees. We now set to the length of the longest path in the above construction. ∎

## Datasets

Enzymes and Proteins

contain graphs representing proteins according to the graph model of [6]. Each vertex is annotated with a discrete label. The datasets are subdivided into six and two classes, respectively. Note that this is the same dataset as used in [10], which does not contain all the annotations described and used in [6].

IMDB-BINARY (IMDB-MULTI)

is a movie collaboration dataset first used in [28] based on data from IMDB. Each vertex represents an actor or an actress, and there exists an edge between two vertices if the corresponding actor or actress appears in the same movie. The vertices are unlabeled. Each graph represents an ego network of an actor or actress. The vertices are unlabeled and the dataset is divided into two (three) classes corresponding to movie genres.

NCI1 and NCI109

are (balanced) subsets of datasets made available by the National Cancer Institute [27, 24], consisting of chemical compounds screened for activity against non-small cell lung cancer and ovarian cancer cell lines, respectively. The vertices are annotated with discrete labels.

PTC_FM

is a dataset from the Predictive Toxicology Challenge (PTC) containing chemical compounds labeled according to carcinogenicity on female mice (FM). The vertices are annotated with discrete labels. It is divided into two classes.

Reddit-Binary

is a social network dataset based on data from the content-aggregration website Reddit [28]. Each vertex represents a user and two vertices are connected by an edge if one user responded to the other users comment. The vertices are unlabeled and the dataset is divided into two classes representing question-answer-based or discussion-based communities.