One of the most popular methods for deep learning on graphs are the message passing neural networks (MPNNs) introduced byGilmer et al. (2017). An MPNN iteratively propagates vertex features based on the adjacency structure of a graph in a number of rounds. In each round, every vertex receives messages from its neighbouring vertices, based on the features computed in the previous round. Then, each vertex aggregates the received messages and performs an additional update based on the feature of the vertex itself. As such, new features are obtained for every vertex and the MPNN proceeds to the next round. When the features consist of tuples in , an MPNN can be regarded as a means of computing an embedding of the vertices of a graph into . An MPNN can also include an additional read-out phase in which the embedded vertices are combined to form a single representation of the entire graph. Important questions in this context relate to the expressive power of MPNNs, such as: “When can two vertices be distinguished by means of the computed embedding?” and “When can two graphs be distinguished?”.
In two independent works (Morris et al., 2019; Xu et al., 2019) such expressivity questions were addressed by connecting MPNNs to the one-dimensional Weisfeiler-Leman () graph isomorphism test. Alike MPNNs, also iteratively updates vertex features based on the graph’s adjacency structure. Morris et al. (2019) and Xu et al. (2019) show that MPNNs cannot distinguish more vertices by means of the computed embeddings than does. In other words, the expressive power of MPNNs is bounded by .
Furthermore, Morris et al. (2019) identify a simple class of MPNNs that is as expressive as . In other words, for every graph there exists an MPNN in that class whose distinguishing power matches that of . Similarly, by applying MPNNs on the direct sum of two graphs, these MPNNs can only distinguish the component graphs when can distinguish them. In Geerts et al. (2020), similar results were established for an even simpler class of MPNNs and generalised to MPNNs that that can use degree information (such as the graph convolutional networks by Kipf and Welling (2017)). There is a close correspondence between and logic. More precisely, two graphs are indistinguishable by if and only if no sentence in the two-variable fragment of first-order logic with counting can distinguish those graphs. A more refined analysis of MPNNs based on this connection to logic can be found in Barceló et al. (2020). The impact of random features on the expressive power of MPNNs is considered in Sato et al. (2020).
Xu et al. (2019) propose another way of letting MPNNs match the expressive power of . More specifically, they propose so-called graph isomorphism networks (GINs) and show that GINs can distinguish any two graphs (in some collection of graphs) whenever
does so. GINs crucially rely on the use of multi layer perceptrons (MLPs) and their universality(Cybenko, 1989; Hornik, 1991). To leverage this universality, the collection of graphs should have bounded degree and all features combined should originate from a finite set.
Since fails to distinguish even very simple graphs the above results imply that MPNNs have limited expressive power. To overcome this limitation, higher-dimensional Weisfeiler-Leman graph isomorphism tests have recently be considered as inspiration for constructing graph embeddings. For a given dimension , the 111What we refer to as is sometimes referred to as the “folklore” -dimensional Weisfeiler-Leman test. test iteratively propagates features for -tuples of vertices and again relies on the adjacency structure of the graph (Grohe and Otto, 2015; Grohe, 2017). From a logic perspective, two graphs are indistinguishable by if and only if they are indistinguishable by sentences in the -variable fragment of first-order logic with counting and their expressive power is known to increase with increasing (Cai et al., 1992).
The focus of this paper on . By using a graph product construction, MPNNs can be used to match the distinguishing power of (Morris et al., 2019). The vertices on which the MPNN act are now triples of vertices and a notion of adjacency between such triples is considered222To be more precise: a set-based version of was considered in Morris et al. (2019) where “vertices” correspond to a set of three vertices , and two vertices and are adjacent if and only if . . A disadvantage of this approach is that one has to deal with many embeddings. On the positive side, the dimension of the features is . More closely in spirit to GINs, Maron et al. (2019c)
introduced higher-order (linear) invariant graph neural networks (GNNs) that use third-order tensors inand MLPs to simulate (Maron et al., 2019b) . Also here, many embeddings are used. It is not known whether third-order GNNs are also bounded in expressive power by 333We remark that it has recently been shown in Chen et al. (2020) that second-order linear GNNs are bounded in expressive power by on undirected graphs.. We remark that the constructions provided in Morris et al. (2019) and Maron et al. (2019b) generalise to by using multiple graph products and higher-order tensors, respectively. A more detailed overview of these approaches and results can be found in the recent survey by Sato (2020).
Perhaps the most promising approach related to is the one presented in Maron et al. (2019b). In that paper, simple second-order invariant GNNs are introduced, using second-order tensors in and MLPs, which can simulate . A crucial ingredient in these networks is that the layers are non-linear. More specifically, the non-linearity stems from the use of a single matrix multiplication in each layer. This approach only requires to deal with many embeddings making them more applicable than previous approaches. The downside is that the dimension of features needed increases in each round. In this paper we zoom in into those second-order non-linear GNNs and aim to provide some deeper insights. The contributions made in this paper can be summarised as follows.
We first introduce -walk MPNNs in order to model second-order non-linear invariant GNNs. Walk MPNNs operate on pairs of vertices and can aggregate feature information along walks of a certain length in graphs. We show that -walk MPNNs are bounded in expressive power by the -walk refinement procedure () recently introduced by Lichter et al. (2019). Furthermore, we show that -walk MPNNs match the expressive power of .
We verify that second-order non-linear invariant GNNs are instances of -walk MPNNs. A direct consequence is that their expressive power is bounded by which is known to correspond to (Lichter et al., 2019). Intuitively, walks of length two correspond to the use of a single matrix multiplication in GNNs 444We recall that for an adjacency matrix of a graph , the entries in correspond to the number of walks of length between pairs of vertices.. We recall from Maron et al. (2019b) that second-order non-linear invariant GNNs are also as expressive as .
We generalise second-order non-linear invariant GNNs by allowing matrix multiplications in each layer, for , and verify that these networks can be seen as instances of -walk MPNNs. They are thus bounded in expressive power by . We generalise the construction given in Maron et al. (2019b) and show that they also match in expressive power.
Based on the properties of and reported in Lichter et al. (2019), we observe that allowing for multiple matrix multiplications does not increase the expressive power of second-order GNNs, but vertices and graphs can potentially be distinguished faster (in a smaller number of rounds) than when using only a single matrix multiplication.
In order to reduce the feature dimensions needed we consider the setting in which the features are taken from a countable domain, just as in Xu et al. (2019). In this setting, we observe that a constant feature dimension suffices to model and . We recall than when the features are taken from the reals, the second-order GNNs mentioned earlier require increasing feature dimensions in each round, just as in Maron et al. (2019b). We obtain learnable architectures, similar to GINs, matching in expressive power.
Finally, we show that the results in Morris et al. (2019) can be generalised by using non-linearity. As a consequence, we obtain a simple form of -walk MPNNs that can simulate (and thus also ) on a given graph using only many embeddings. We recall that the higher-order graph neural networks in Morris et al. (2019) require many embeddings. Furthermore, we preserve the nice property that the dimension of the features is of size .
Our results can be seen as partial answer to the question raised by Maron et al. (2019a), whether polynomial layers (of degree greater than two) increase the expressive power of second-order invariant GNNs. We answer this negatively in the restricted setting in which each layer consists of multiple matrix multiplications rather than general equivariant polynomial layers. Indeed, the use of multiple matrix multiplications can be simulated by single matrix multiplication at the cost of introducing additional layers.
For readers familiar with GNNs we summarise the proposed architectures in Table 1 and refer for details to Section 6. All architectures generalise to match in expressive power. We note that the last architecture in Table 1 is the one proposed by Maron et al. (2019b).
Organisation of the paper.
We start by introducing notation and describing the -dimensional Weisfeiler-Leman () graph isomorphism test and walk refinement procedure () in Section 2. To model as a kind of MPNN we introduce -walk MPNNs in Section 3. In Section 4 we verify that -walk MPNNs are bounded in expressive power by . Matching lower bounds on the expressive power of -walk MPNNs are provided in Section 5 in the case when labels originate from a countable domain, and when they come from an uncountable domain. The obtained insights are used in Section 6 to build learnable graph neural networks that match (and in particular) in expressive power. We conclude the paper in Section 7.
We use and to indicate sets and multisets, respectively. The sets of natural, rational, and real numbers are denoted by , , and , respectively. We write to denote the subset of numbers from which are strictly positive, e.g., . For , we denote with the set of numbers .
A labelled directed graph is given by with vertex set , edge relation , and where is an edge labelling function into some set of labels. Without loss of generality we identify with . For , a walk in from vertex to vertex of length is a sequence of vertices such that each consecutive pair of vertices is an edge in . For we denote by the set of walks of length in starting in and ending at .
We opt to work with edge-labelled graphs rather than the more standard vertex-labelled graphs. This does not impose any restriction since we can always turn a vertex-labelled graph into an edge-labelled graph. More specifically, given a vertex-labelled graph with one can define the corresponding edge-labelling by , and then simply consider instead of . ∎
Refinements of labellings.
We will need to be able to compare two edge labellings and we do this as follows. Given two labellings and we say that refines , denoted by , if for every and , implies that . If and hold, then and are said to be equivalent, and we denote this by .
We next describe two procedures which iteratively generate refinements of edge labellings. First, we consider the 2-dimensional Weisfeiler-Leman () procedure. This procedure iteratively generates edge labellings, starting from an initial labelling , until no further changes to the edge labelling is made. The labelling produced in round is denoted by . Since generates labellings for all pairs of vertices, it is commonly assumed that the input graph is a complete graph, i.e., . We remark that an incomplete graph can always be regarded as a complete graph in which the (extended) edge labelling assigns a special label to non-edges, i.e., those pairs in .
Let be a (complete) labelled graph. Then the initial labelling produced by is defined as . For and we define:
where Hash injectively maps with and a multiset of pairs of labels in to a unique label in . It is known that , for all , and thus the procedure indeed generates refinements of labellings. We denote by the labelling such that . It is known that is reached using at most rounds, where (Lichter et al., 2019).
One can simplify by assuming that the initial labelling assigns different labels to loops (i.e., pairs of the form for ) than it does to other edges. In other words, when for every such that , holds. Under this assumption, one can equivalently consider:
In the following, we always assume that treats loops differently from non-loops. One can always ensure this by modifying the labels of a given edge labelling.
To make invariant under graph isomorphisms one additionally requires that the initial edge-labelling respects transpose equivalence, i.e., for any , implies that . In the following we always assume that this assumption holds. One can again ensure this by applying an appropriate modification to a given edge labelling. We also note that this assumption is satisfied when the edge labelling originates from a vertex labelling, as explained in Remark 2.1.
The second procedure which we consider is the -walk refinement procedure (), recently introduced by Lichter et al. (2019). Similar to , it iteratively generates labellings. The labelling produced by in round is denoted by . The initial labelling is defined as , just as for . For and we define:
where Hash now injectively maps multisets of pairs of labels in to a unique label in .
We observe that . Furthermore, for every , and thus also generates refinements of labellings. We define as the labelling such that . We further recall from Lichter et al. (2019) that for and that for all . In particular, .
We thus see that both procedures generate the same labelling after a (possibly different) number of rounds. The labellings obtained by the two procedures may be different, however, in each round, except for , as is illustrated in Lichter et al. (2019). Furthermore, if is reached in rounds by the procedure, then it is reached in rounds by the procedure. ∎
Labellings and matrices.
Given a tensor we denote by its entry at position and , by
the vector at position, and by the matrix at position . Similar notions are in place for matrices and higher-order tensors. A tensor naturally corresponds to an edge labelling by letting for . Conversely, when given an edge labelling , for , we assume that we can encode the labels in as vectors in some . A common way to do this is by hot-one encoding labels in by basis vectors in for some . In this way, can be regarded as a tensor in . We interchangeably consider edge labels and edge labellings as vectors and tensors, respectively.
3 Walk Message Passing Neural Networks
We start by extending MPNNs such that they can easily model the walk-refinement procedure described above. This generalisation of MPNNs is such that message passing occurs between pairs of vertices and is restricted by walks in graphs, rather than between single vertices and their adjacent vertices as in standard MPNNs (Gilmer et al., 2017). We will refer to this generalisation as walk MPNNs.
Walk MPNNs iteratively compute edge labellings starting from an input labelled graph . We refer to each iteration as a round. Walk MPNNs are parametrised by a number , with , which bounds the length of walks considered, and we refer to them as -walk MPNNs. We assume that the edge labelling of the input graph is of the form for some . In what follows we fix the number of vertices to be .
After round , the labelling returned by an -walk MPNN is denoted by and is of the form , for some . We omit the dependency on the input graph in the labellings unless specified otherwise. We next detail how is computed.
We let .
Then, for every round we define , as follows:
- Message Passing.
Each pair receives messages from ordered sequences of edges on walks in of length starting in and ending at . These messages are subsequently aggregated. Formally, if is a walk of length in then the function receives the labels (computed in the previous round) , of the edges in this walk, and outputs a label in , for some . Then, for every pair we aggregate by summing all the received labels:
Each pair further updates based on its current label :
Here, the message functions and update functions are arbitrary functions. When a walk MPNN only iterates for a finite number of rounds , we define the final labelling with returned by on , as for every . If further aggregation over the entire graph is needed, e.g., for graph classification, an additional readout function can be applied. We ignore the read-out function in this paper as most of the computation happens by means of the message and update functions. We do comment on read-out functions in Remark 6.4 in Section 6.
4 Upper bound on the expressive power of walk MPNNs
We start by showing that the expressive power of -walk MPNNs is bounded by the expressive power of just as MPNNs are bounded in expressive power by . The proof of the following proposition is a straightforward modification of the proofs given in Xu et al. (2019) and Morris et al. (2019).
For any -walk MPNN , any graph , and every , .
Let be an -walk MPNN. We verify the proposition by induction on the number of rounds . Clearly, when , , so we can focus on . Suppose that holds. We need to show that holds as well.
Let be vertices for which is satisfied. By definition of this implies that
or in other words, there exists a bijection such that for every in ,
with . By induction, this also implies that for every there are unique such that
holds. This in turn implies that for every there are unique such that
is satisfied. As a consequence, since these are defined by summing up the messages over all and , respectively. We also note that if holds, then (Lichter et al., 2019). Hence also holds by induction. We may thus conclude that
holds, as desired. ∎
As already mentioned in the preliminaries, and for all . We may thus also infer the following.
For every -walk MPNN , any graph , and , .∎
We may thus conclude that for , -walk MPNNs are limited in their distinguishing power by , but they may reach the final labelling faster than by using -walk MPNNs. This comes at the cost, however, of a computationally more intensive messaging passing phase. We next show that -walk MPNNs can also simulate from which we can infer that -walk MPNNs match in their expressive power.
5 Lower bound on the expressive power of -walk MPNNs
We next show how to simulate by means of -walk MPNNs. In particular, we show that they can simulate on all graphs of a fixed size (). We provide two simulations, one for when the labels come from a countable domain, and one for when the labels come from an uncountable domain, such as for some .
The challenge is to simulate the hash function used in by means of message and update functions, hereby taking into consideration that -walk MPNNs always perform a sum aggregation over the received messages555One could also extend the aggregate/combine formalisms used in Xu et al. (2019) and Morris et al. (2019). In that formalism, one can define for some arbitrary aggregate function . To simulate , it then suffices to take with the hash function used in . As mentioned already, MPNNs only allow sum aggregation for . . For the countable case we generalise the technique underlying GINs (Xu et al., 2019); for the uncountable case we use multi-symmetric polynomials underlying higher-order graph neural networks (Maron et al., 2019b).
5.1 Simulating : Countable case
We first consider the setting in which graphs have a labelling for some countable domain . Without loss of generality we assume that . Indeed, since is countable the elements in can be mapped to elements in by means of an injection. The following result shows that -walk MPNNs can simulate on the set of of graphs with vertices with labels from .
For every , , there exists an -walk MPNN such that holds for all , on any given an input graph with and .
We define the -walk MPNN by induction on . More specifically, we inductively define the message and update functions of and verify that holds for all on any given an input graph with . Furthermore, along the way we verify that for , , i.e., the labels remain to be elements in .
Clearly, by definition, so we can focus on . Assume that we have specified up to round such that holds, where . We next consider round .
The labels of a walk of length correspond to an element in . We want to map these to elements in by means of an injection. We can use any pairing function for this purpose666Since we defined walk MPNNs over the reals, we assume that extends to a function .. Given such a pairing function, we define the function as
Then, any multiset consisting of at most elements in can be mapped to a number in by means of the injective function
Indeed, we here just represent a multiset by its unique -ary representation, just as in Xu et al. (2019). It now suffices to define to consist of the following message and update functions777Strictly speaking the message and update functions depend on which is not allowed by the definition of walk MPNNs. Since we consider graphs of fixed size, we treat as as constant. Alternatively, one can incorporate in the initial labelling and ensure that this value is propagated to all consecutive labellings. In this way, the message and update functions have access to in every round. in round : For every :
and for every ,
It remains to verify that holds. In other words, we need to show that for every ,
We define for every , the multiset
Hence, if and only if . It now suffices to observe that
Since the multiplicity of every element in the multisets is bounded by , is an injection and thus if and only if if and only if , from which the proposition follows. We note that when the labels assigned by belong to , then so do the labels assigned by , by the definition of . As a consequence, the -walk MPNN generates labels in in every round. ∎
We note that in the simulation above the message and update functions can be fixed, independent of .
The function used in the proof of Proposition 5.1 is similar to the one motivating the definition of GINs (Xu et al., 2019). The difference is that Xu et al. (2019) incorporate the initial injective mapping from to in the first round, and that instead of a representation in , a representation in is used. Translated to our setting this corresponds to defining as with a pairing function and an injection from to . Since labels now take rational values, one needs to incorporate an injective mapping from to in each round . By contrast, our simulation produces labels in for all . ∎
In the standard MPNN setting, MPNNs are known to simulate on all graphs with labels in and that have bounded degree. As such MPNNs can simulate on graphs of arbitrary size. In our setting, assigns labels to all pairs of vertices and the degree is thus always because the input graphs are complete graphs. Hence, the bounded degree condition reduces to the graphs having a fixed size.∎
5.2 Simulating : Uncountable case
We next consider graphs with for some . We first recall from Maron et al. (2019b) how to, by using multi-symmetric polynomials, assign a unique value in to multisets of elements in for some . Let and let be a multi-index, i.e., with for . For an element we write and define . Consider a multiset with each . We represent such a multiset by a matrix, also denoted by , by choosing an arbitrary order on the elements . More precisely, and corresponds to one of the ’s for each . We next define and let , where corresponds to the number of multi-indexes with . More precisely, . Then, for and in , if and only if there exists a permutation of such that for all (see Proposition 1 in Maron et al. (2019b)). In other words, by regarding and as multisets, if and only if and represent the same multiset.
For every , , there exists an -walk MPNN such that holds for all , on any given an input graph with and .
For each , , and we define an -walk MPNN such that holds on any given an input graph with and . We define by induction on . More specifically, we inductively define the message and update functions of and verify that holds for all on any given an input graph with and .
Clearly, by definition, so we can focus on . Assume that we have specified up to round such that holds, where . We next consider round .
We use the injective function as described above. We will apply it to the setting whether and . More precisely, we consider the multi-index set of cardinality . We denote the elements in this set by for . We define for in ,
When evaluated on an input graph , for any :