# The expressive power of kth-order invariant graph networks

The expressive power of graph neural network formalisms is commonly measured by their ability to distinguish graphs. For many formalisms, the k-dimensional Weisfeiler-Leman (k-WL) graph isomorphism test is used as a yardstick. In this paper we consider the expressive power of kth-order invariant (linear) graph networks (k-IGNs). It is known that k-IGNs are expressive enough to simulate k-WL. This means that for any two graphs that can be distinguished by k-WL, one can find a k-IGN which also distinguishes those graphs. The question remains whether k-IGNs can distinguish more graphs than k-WL. This was recently shown to be false for k=2. Here, we generalise this result to arbitrary k. In other words, we show that k-IGNs are bounded in expressive power by k-WL. This implies that k-IGNs and k-WL are equally powerful in distinguishing graphs.

## Authors

• 11 publications
06/16/2020

### Walk Message Passing Neural Networks and Second-Order Graph Neural Networks

The expressive power of message passing neural networks (MPNNs) is known...
05/03/2020

### Graph Homomorphism Convolution

In this paper, we study the graph classification problem from the graph ...
01/31/2022

### Weisfeiler and Leman Go Infinite: Spectral and Combinatorial Pre-Colorings

Graph isomorphism testing is usually approached via the comparison of gr...
02/17/2019

### Enumerating Unique Computational Graphs via an Iterative Graph Invariant

In this report, we describe a novel graph invariant for computational gr...
01/25/2022

### Convergence of Invariant Graph Networks

Although theoretical properties such as expressive power and over-smooth...
02/20/2020

### Set2Graph: Learning Graphs From Sets

Many problems in machine learning (ML) can be cast as learning functions...
01/18/2022

### A Short Tutorial on The Weisfeiler-Lehman Test And Its Variants

Graph neural networks are designed to learn functions on graphs. Typical...
##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

Graph neural networks () have become a standard means to analyse graph data. One of the most widely adopted formalisms are the so-called message-passing neural networks ((Scarselli et al., 2009; Gilmer et al., 2017). In , features of vertices are iteratively updated based on the features of neighbouring vertices, and the current feature of the vertex itself. In their simplest form, when only the features of vertices are taken into account, the capability of to distinguish vertices and graphs is rather limited. Indeed, Xu et al. (2019) and Morris et al. (2019) show that the expressive power of is bounded by the 1-dimensional (Folklore) Weisfeiler-Leman () graph isomorphism test (Cai et al., 1992), or equivalently, the 2-dimensional Weisfeiler-Leman () test (Grohe & Otto, 2015; Grohe, 2017)111In works related to Weisfeiler-Leman one has to carefully consider whether or not the Folklore test is used. That is, in some papers, refers to . For general , is equivalent to  (Grohe & Otto, 2015).. That is, when two graphs cannot be distinguished by , then neither can they be distinguished by any . The expressive power of is well-understood. For example, when two graphs cannot be distinguished by then they can also not be distinguished by sentences in the two-variable fragment, , of first-order logic with counting. More relevant in the context of is the complete characterisation of in terms of invariant graph properties (Fürer, 2017; Arvind et al., 2020). For example, is unable to detect cycles of length greater than four or triangles in graphs. We also like to point out connections between and homomorphism profiles. More specifically, two graphs are indistinguishable by if and only if they have the same number of homomorphisms from graphs of treewidth at most one (Dell et al., 2018). Finally, one can rephrase indistinguishability by in terms of agreement of functions defined in terms of linear algebra operators (Geerts, 2019).

The limited expressive power of is primarily due to the fact that vertices are anonymous, i.e., two vertices with the same feature are regarded as equivalent, and that only neighbouring vertices are considered. When, for example, are degree-aware, meaning that they can distinguish vertices based on both their features and degrees, get a slight jump start when compared to and can potentially distinguish graphs in one iteration earlier than  (Geerts et al., 2020). Notable examples of degree-aware are the graph convolutional networks by Kipf & Welling (2017). More powerful variants of can be obtained by incorporating port numbering, which allows to treat features from different neighbours differently (Sato et al., 2019), assigning random initial features (Sato et al., 2020), and having static vertex identifiers (Loukas, 2020). We refer Sato (2020) for a more detailed overview of these and other variations of .

Instead of considering or variations of standard , this paper concerns inspired by the -dimensional Weisfeiler-Leman () graph isomorphism test, for . These tests iteratively update features of -tuples of vertices, based on the features of neighbouring -tuples of vertices. It is known that the expressive power of grows with increasing  (Cai et al., 1992). As such, they provide a promising basis for the development of more expressive . Of particular interest is the ability of , for , to distinguish graphs based on the presence or absence of specific graph patterns, such as cycles and cliques. For example, can distinguish graphs based on their number of cycles of length up to and triangles (Fürer, 2017; Geerts, 2019; Arvind et al., 2020). Furthermore, graphs that are indistinguishable by satisfy the same sentences in , the -variable fragment of first-order logic with counting (Cai et al., 1992), and this in turn is equivalent to the two graphs having the same number of homomorphisms from graphs of treewidth at most (Dell et al., 2018). The latter correspondence has led NT & Maehara (2020) to define based on graph homomorphism convolutions. We refer to Grohe (2020) for other interesting interpretations of and relationships to embeddings of graph, and more generally, structured data.

Given the promise of an increase in expressive power, Morris et al. (2019) propose based a set-variant of . We will not consider this set-variant of in this paper and only mention that match the set-variant of in expressive power. More relevant to this paper is the work by Maron et al. (2019b) in which it is shown that the class of th-order invariant graph networks () is as powerful as in expressive power, for each . In other words, when two graphs can be distinguished by , then there exists a which also distinguishes those graphs. Invariant graph networks () are built-up from equivariant layers defined over

th-order tensors

(Kondor et al., 2018; Maron et al., 2019c). By contrast to , update features of -tuples of vertices based on the features of all -tuples, i.e., not only those that are neighbours as in . As a consequence, it is not immediately clear that are bounded by in expressive power. We remark, however, that in a , not all (features of) -tuples are treated the same due to the equivariance of its layers. More precisely, given a -tuple of vertices, the space of all -tuples of vertices is partitioned according to which equality and inequality conditions are satisfied together with . Then, during the feature update process of , two -tuples of vertices with the same feature may be treated differently by a if the two -tuples belong to different parts of the partition relative to .

Maron et al. (2019a) raise the natural question whether, despite that use more information than , the expressive power of is still limited to that of . In other words, can there be graphs that can be distinguished by a which cannot be distinguished by . This question was recently answered by Chen et al. (2020) for . More precisely, they show that, for undirected graphs, the expressive power of is indeed bounded by . Furthermore, there is a one-to-one correspondence between the layers in a and iterations in . That is, when two graphs cannot by distinguished by in iterations, then neither can they be distinguished by a using equivariant layers.

In this paper, we generalise this result to arbitrary . More precisely, we show that the expressive power of is indeed bounded by . What is interesting to note is that the one-to-one correspondence between iterations of and layers in needs to be revisited. As it turns out, for general , each layer of a can be seen to correspond to iterations by . We remark that when , the one-to-one correspondence from Chen et al. (2020) is recovered. This implies that, in principle, a can distinguish graphs a factor of faster compared to . Of course, this comes at a cost of a more intensive feature update process involving all -tuples of vertices. Chen et al. (2020) establish their result for in a pure combinatorial way and by means of a case analysis, which is feasible for a fixed . For general , we borrow ideas from Chen et al. (2020) but additionally rely on the known connection between and the logic mentioned earlier. We remark that connections with logic, and have been used before to assess the logical expressiveness of (Barceló et al., 2020).

We also remark that incur a large cost in memory and computation. Alternatives to are put forward based on the folklore -dimensional Weisfeiler-Leman () test, which is known to be more efficient to implement. For example, Maron et al. (2019b) propose provably powerful graph networks () that are able to simulate (and thus ) by using th-order tensors only but in which the layers are allowed to use tensor multiplication. For , a single matrix multiplication suffices. The impact of matrix multiplication in layers has been further investigated in Geerts (2020). In that work, inspired by the work of Lichter et al. (2019), walk are proposed as a general formalism for . It is readily verified that walk are bounded in expressive power by , and since can be seen as instances of walk , they are bounded in expressive power by as well (Geerts, 2020). This has been generalised by Azizian & Lelarge (2020) who show that are bounded by , for arbitrary . We also note that allowing more than one matrix multiplication in does not increase their expressive power. Instead, multiple matrix multiplications may result in that can distinguish graphs faster than (Geerts, 2020). In this paper, we only consider and .

#### Structure of the paper.

We start by describing , and in Section 2. Then, in Section 2 we prove that are bounded by in expressive power. We conclude in Section 4.

## 2 Background

We first describe and its connections to logic, followed by the definition of . We use to denote sets and to denote multisets. The sets of natural and real numbers are denoted by and , respectively. For with , we define . A (directed) graph consists of a vertex set and edge set . A (vertex-)coloured graph is a graph in which every vertex is assigned a colour in some set of colours. In the following, when we refer to graphs we always mean coloured graphs. Without loss of generality we assume that for some . Furthermore, if is a th-order tensor, then we denote by with and the value of in entry , and

denotes the vector

in .

### 2.1 Weisfeiler-Leman

The -dimensional Weisefeiler-Leman () graph isomorphism test iteratively produces colourings of -tuples of vertices, starting from a given graph . We follow here the presentation as given in Morris et al. (2019). Given , we denote by the colouring of -tuples generated by after rounds. For , is a colouring in which each -tuple is coloured with the isomorphism type of its induced subgraph. More specifically, if and only if for all we have that and for all , it holds that if and only if and if and only if . Then, for , we define the colouring as

 χ(t)G,k(¯v):=\textscHash(χ(t−1)G,k(¯v),(C(t)1(¯v),…,C(t)k(¯v))),

in which for ,

 C(t)i(¯v):=\textscHash({{χ(t−1)G,k(¯v[vi/v′])∣∣v′∈[n]}}),

where and is a hash function that maps it input in an injective manner to a colour in .

Let be colourings of -tuples of vertices in . We say that refines , denoted by , if for all we have . When and hold, we say that and are equivalent and we denote this by .

We note that, by definition, for all . We define as for which holds. It is known that this “stable” colouring is obtained in a most rounds. For two graphs and , one says that distinguishes and in round if

We write if does not distinguish and in round . When for all , we write and say that and cannot be distinguished by .

### 2.2 Counting logics

The -dimensional Weisfeiler-Leman graph isomorphism test is closely tied to the -variable fragment of first-order logic with counting, denoted by , on graphs. This logic is defined over a finite set of variables, , and a formula in is formed according to the following grammar:

 φ::=xi=xj ∣ Colc(xi) ∣ Edge(xi,xj) ∣ ¬φ ∣ φ1∧φ2 ∣ ∃≥rxiφ,

for , , with . The first three cases in the grammar correspond to so-called atomic formulas. For a formula , we define its free variables in an inductive way, i.e., . , , , and . We write to indicate that all free variables of are among . A sentence is formula without free variables. We further need the quantifier rank of a formula , denoted by . It is defined as follows: if is atomic, , , and .

Let be a graph and let be a formula in . Consider an assignment from the variables to vertices in . We denote by for the assignment which is equal to except that . We define the satisfaction of a formula by a graph, relative to an assignment , denoted by , in an inductive manner. That is, if and only if , if and only if , if and only if , if and only if not , if and only if and , and finally, if and only if there are at least distinct vertices in such that holds for all .

When and satisfy the same sentences in of quantifier rank at most , we denote this by . If holds for all , then we write and say that and are indistinguishable by . The connection to is as follows.

###### Theorem 1 ((Cai et al., 1992)).

Let and be two graphs. Then, if and only if . As a consequence, if and only if .∎

Of particular interest is that the proof of this theorem shows that, for , there exists a formula in of quantifier rank at most such if and only if with defined as .

Later in the paper we also use the shorthand notation to indicate that are at least distinct -tuples satisfying . It is readily verified222I would like to acknowledge Jan Van den Bussche for pointing this out. that if is a formula in of quantifier rank , then is equivalent to a formula in of quantifier rank at most . Here, two formulas and are equivalent if if and only if for all assignments and graphs . As a consequence, quantifiers of the form for do not add expressive power to . In what follows, for a formula and assignment , we write instead of with such that .

### 2.3 Invariant graph neural networks

Let denote the symmetric group over , i.e., consists of all permutation of . Let and a tensor in . We define such that for all . A th-order equivariant linear layer is a mapping such that for all . When , and thus for all , one refers to as an invariant layer. An explicit description of equivariant linear layers was provided by Maron et al. (2019c) and is based on the observation that such a layer is constant on equivalence classes of defined by equality patterns. More specifically, let and be -tuples in . Then and are said to have the same equality pattern, denoted by , if for all , if and only if . We denote the set of equivalence classes in induced by by . Given this, an equivariant layer is of the form

 L(A)¯v,a =∑μ∈[n]2k/∼Lμ(A)¯v,a+∑τ∈[n]k/∼¯v∈τcτ,a,with Lμ(A)¯v,a =∑¯v′∈[n]k(¯v,¯v′)∈μ(∑b∈[p]cμ,a,bA¯v′,b)

for , and . An equality pattern can be equivalently described by a partition with the interpretation that if and only if whenever for some , and whenever and for and . We will use this representation of equality patterns later in the paper.

Maron et al. (2019c) define a th-order invariant (linear) graph network () as a function that can be decomposed as

 F=M∘I∘σ∘L(t)∘σ∘L(t−1)∘⋯∘σ∘L(1),

where for , each layer is an equivariant linear layer from ,

is a pointwise non-linear activation function such as the

function, is a linear invariant layer from , and

is a multi layer perceptron (

) from to .

We next use to define an equivalence relation on graphs. To do so, we first turn a graph into a tensor . More precisely, we first consider the initial colouring (recall that we identified with ). Then, suppose that assigns distinct colours to the -tuples in . We identify each colour with the th basis vector in and define for and , if and otherwise. Given this, we say that two graphs and are indistinguishable by a , denoted by , if and only if . We also consider another equivalence relation defined in terms the equivariant part of an . More precisely, for , let defined by . We let be the identity mapping from . We then denote by that

 {{F(t)(AG)¯v,∙∣∣¯v∈(V(G))k}}={{F(t)(AH)¯w,∙∣∣¯w∈(V(H))k}}.

In other words, when viewing the tensors and in as colouring of -tuples, i.e., is assigned the “colour” and similarly, is assigned the “colour” , then just says these labelings are equivalent. In the remainder of the paper we establish correspondences between and , and the equivalence relations and .

## 3 The expressive power of k-IGNs

Let us start by recalling what is known about the relationship between the equivalence relations and . For every and any two graphs and , it is known that there exists a such that (Maron et al., 2019b). In other words, if and can be distinguished by , then the distinguishes them as well. Hence, the class of is powerful enough to match in expressive power. The used by Maron et al. (2019b) consists of equivariant layers, where is such that reaches the stable colourings and of and , respectively, in rounds. In fact, Maron et al. (2019b) show that holds as well, for , so the rounds of and the layers of are in one-to-one correspondence. It was posed as an open problem in Maron et al. (2019a) whether or not can distinguish more graphs than . More specifically, the question is whether the implication also holds, and this for any . This question was recently answered for . Indeed, Chen et al. (2020) show that holds for any . As a consequence, and have equal distinguishing power. In proving , Chen et al. (2020) show first that, when consists of equivariant layers, then for each . By leveraging this, they then verify . Since for all , the implication follows. We remark that Chen et al. (2020) consider undirected graphs only. We next generalise this result to arbitrary and to directed graphs. In other words, our main result is:

###### Theorem 2.

For any two graphs and , for any .

This theorem will be proved, in analogy with the proof by Chen et al. (2020), by using Lemmas 3 and 4 below. The first lemma is the counterpart, for general , of the implication by Chen et al. (2020). We see, however, that the correspondence between rounds of and layers in is slightly more involved.

###### Lemma 3.

Let be a consisting of equivariant layers and consider graphs and . Then for any ,

 G≡tk-WLH⇒G≡⌊tk−1⌋FH. (†)

Note that when , and hence the known implication for from Chen et al. (2020) is recovered. Since consists of layers, we limit to be in the range of such that . As part of the proof of Lemma 3 we show a stronger implication. More precisely, we show that if holds, then

for any and . We use this property in the next lemma.

###### Lemma 4.

Let be a consisting of equivariant layers and consider graphs and . Let and assume that the following implication holds for and , . Then

 G≡tk-WLH⇒G≡FH.

These two lemmas suffice to prove Theorem 2:

###### Proof.

Indeed, suppose that holds. By definition, this implies for all . In particular, this holds for . As mentioned above, as part of proving Lemma 3 we obtain for and , the implication . Then, Lemma 4 implies , as desired. ∎

Before showing the lemmas, we provide some intuiting behind the implication (3) in Lemma 3. In a nutshell, it reflects that a single (equivariant) layer of a corresponds to rounds of . This is because propagate information to -tuples from all other -tuples, whereas only propagates information from neighbouring -tuples.

To see this, consider and let be a triple in . When a applies a layer , the vector is computed based on all vectors for . For example, depends on with with and being different from , and . By contrast, in round , updates the label of only based on the labels, computed in round , of triples of the form , and for . We observe that the triple is not included here and hence the label is not updated in round based on the label, computed in round , of . We note, however, that in round , also updates the label of the triple based on the label, computed in round , of as is now one of the neighbours of . As a consequence, in round , will update the label of based on the label, computed in round , of . The latter now depends on the label, computed in round , of . Hence, only in round the label of includes information about the label, computed in round , of . By contrast, as we have seen earlier, immediately takes into account information from . We thus see that needs two rounds for a single application of an equivariant layer in a . In other words, rounds of correspond to application of equivariant layers in an . This holds more generally for any .

Furthermore, it is thanks to the invariance and equivariance of the layers in that the information propagation happens in a controlled way. More specifically, a propagates information from triples with the same equality pattern in the same way. As we will see shortly, this is crucial for showing Lemmas 3 and 4.

### 3.1 Proof of Lemma 3

We show by induction on . The proof strategy is similar to the one used by Chen et al. (2020) except that we rely on a more general key lemma in the inductive step. As mentioned earlier, we will show a stronger induction hypothesis. More specifically, we show that for any and -tuples and , if , then

 (‡)

It is an easy observation that the implication (3.1) implies . Indeed, suppose that holds. By definition, this is equivalent to

 {{χ(t)G,k(¯v)∣∣¯v∈(V(G))k}}={{χ(t)H,k(¯w)∣∣¯w∈(V(H))k}}.

In other words, with every one can associate a corresponding such that . Then, (3.1) implies . Since this holds for any and its corresponding , we have

 {{(F