Reconstruction for Powerful Graph Representations

10/01/2021
by   Leonardo Cotta, et al.
Purdue University
12

Graph neural networks (GNNs) have limited expressive power, failing to represent many graph classes correctly. While more expressive graph representation learning (GRL) alternatives can distinguish some of these classes, they are significantly harder to implement, may not scale well, and have not been shown to outperform well-tuned GNNs in real-world tasks. Thus, devising simple, scalable, and expressive GRL architectures that also achieve real-world improvements remains an open challenge. In this work, we show the extent to which graph reconstruction – reconstructing a graph from its subgraphs – can mitigate the theoretical and practical problems currently faced by GRL architectures. First, we leverage graph reconstruction to build two new classes of expressive graph representations. Secondly, we show how graph reconstruction boosts the expressive power of any GNN architecture while being a (provably) powerful inductive bias for invariances to vertex removals. Empirically, we show how reconstruction can boost GNN's expressive power – while maintaining its invariance to permutations of the vertices – by solving seven graph property tasks not solvable by the original GNN. Further, we demonstrate how it boosts state-of-the-art GNN's performance across nine real-world benchmark datasets.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 9

03/09/2020

A Survey on The Expressive Power of Graph Neural Networks

Graph neural networks (GNNs) are effective machine learning models for v...
06/28/2020

Characterizing the Expressive Power of Invariant and Equivariant Graph Neural Networks

Various classes of Graph Neural Networks (GNN) have been proposed and sh...
02/15/2021

Topological Graph Neural Networks

Graph neural networks (GNNs) are a powerful architecture for tackling gr...
10/06/2021

Equivariant Subgraph Aggregation Networks

Message-passing neural networks (MPNNs) are the leading architecture for...
12/06/2020

Counting Substructures with Higher-Order Graph Neural Networks: Possibility and Impossibility Results

While massage passing based Graph Neural Networks (GNNs) have become inc...
03/12/2021

On the Equivalence Between Temporal and Static Graph Representations for Observational Predictions

In this work we formalize the (pure observational) task of predicting no...
09/08/2021

Power to the Relational Inductive Bias: Graph Neural Networks in Electrical Power Grids

The application of graph neural networks (GNNs) to the domain of electri...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Supervised machine learning for graph-structured data, i.e., graph classification and regression, is ubiquitous across application domains ranging from chemistry and bioinformatics 

(Barabasi and Oltvai, 2004; Stokes et al., 2020) to image (Simonovsky and Komodakis, 2017), and social network analysis (Easley and Kleinberg, 2010). Consequently, machine learning on graphs is an active research area with numerous proposed approaches—notably GNNs (Chami et al., 2020; Gilmer et al., 2017; Grohe, 2020) being the most representative case of GRL methods.

Arguably, GRL’s most interesting results arise from a cross-over between graph theory and representation learning. For instance, the representational limits of GNNs are upper-bounded by a simple heuristic for the graph isomorphism problem 

(Morris et al., 2019; Xu et al., 2019), the -dimensional Weisfeiler-Leman algorithm (-WL) (Grohe, 2017; Morris, 2021; Weisfeiler, 1976; Weisfeiler and Leman., 1968), which might miss crucial structural information in the data (Arvind et al., 2015). Further works show how GNNs cannot approximate graph properties such as diameter, radius, girth, and subgraph counts (Chen et al., 2020; Garg et al., 2020), inspiring architectures Azizian and Lelarge (2020); Maron et al. (2019a); Morris et al. (2019, 2020b) based on the more powerful -dimensional Weisfeiler-Leman algorithm (-WL) Grohe (2017).111We opt for using instead of , i.e., -WL instead of

-WL, to not confuse the reader with the hyperparameter

of our models.
On the other hand, despite the limited expressiveness of GNNs, they still can overfit the training data, offering limited generalization performance (Xu et al., 2019). Hence, devising GRL architectures that are simultaneously sufficiently expressive and avoid overfitting remains an open problem.

An under-explored connection between graph theory and GRL is graph reconstruction, which studies graphs and graph properties uniquely determined by their subgraphs. In this direction, both the pioneering work of Shawe-Taylor (1993) and the more recent work of Bouritsas et al. (2020), show that assuming the reconstruction conjecture (see Conjecture 1) holds, their models are most-expressive representations (universal approximators) of graphs. Unfortunately, Shawe-Taylor’s computational graph grows exponentially with the number of vertices, and Bouritsas et al.’s full representation power requires performing multiple graph isomorphism tests on potentially large graphs (with vertices). Moreover, these methods were not inspired by the more general subject of graph reconstruction; instead, they rely on the reconstruction conjecture to prove their architecture’s expressive powers.

Contributions. In this work, we directly connect graph reconstruction to GRL. We first show how the -reconstruction of graphs—reconstruction from induced

-vertex subgraphs—induces a natural class of expressive GRL architectures for supervised learning with graphs, denoted

-Reconstruction Neural Networks. We then show how several existing works have their expressive power limited by -reconstruction. Further, we show how the reconstruction conjecture’s insights lead to a provably most expressive representation of graphs. Unlike Shawe-Taylor (1993) and Bouritsas et al. (2020), which, for graph tasks, require fixed-size unattributed graphs and multiple (large) graph isomorphism tests, respectively, our method represents bounded-size graphs with vertex attributes and does not rely on isomorphism tests.

To make our models scalable, we propose -Reconstruction GNNs, a general tool for boosting the expressive power and performance of GNNs with graph reconstruction. Theoretically, we characterize their expressive power showing that -Reconstruction GNNs can distinguish graph classes that the -WL and

-WL cannot, such as cycle graphs and strongly regular graphs, respectively. Further, to explain gains in real-world tasks, we show how reconstruction can act as a lower-variance risk estimator when the graph-generating distribution is invariant to vertex removals. Empirically, we show that reconstruction enhances GNNs’ expressive power, making them solve multiple synthetic graph property tasks in the literature not solvable by the original GNN. On real-world datasets, we show that the increase in expressive power coupled with the lower-variance risk estimator boosts GNNs’ performance up to 25%. Our combined theoretical and empirical results make another important connection between graph theory and GRL.

1.1 Related work

We review related work from GNNs, their limitations, data augmentation, and the reconstruction conjecture in the following. See Appendix A for a more detailed discussion.

GNNs. Notable instances of this architecture include, e.g., (Duvenaud et al., 2015; Hamilton et al., 2017; Velickovic et al., 2018), and the spectral approaches proposed in, e.g., (Bruna et al., 2014; Defferrard et al., 2016; Kipf and Welling, 2017; Monti et al., 2017)—all of which descend from early work in (Baskin et al., 1997; Kireev, 1995; Merkwirth and Lengauer, 2005; Micheli, 2009; Micheli and Sestito, 2005; Scarselli et al., 2009; Sperduti and Starita, 1997). Aligned with the field’s recent rise in popularity, there exists a plethora of surveys on recent advances in GNN methods. Some of the most recent ones include (Chami et al., 2020; Wu et al., 2018; Zhou et al., 2018).

Limits of GNNs. Recently, connections to Weisfeiler-Leman type algorithms have been shown (Barceló et al., 2020; Chen et al., 2019c; Geerts et al., 2020; Geerts, 2020; Maehara and NT, 2019; Maron et al., 2019a; Morris et al., 2019, 2020b; Xu et al., 2019). Specifically, the authors of (Morris et al., 2019; Xu et al., 2019) show how the 1-WL limits the expressive power of any possible GNN architecture. Morris et al. (2019) introduce -dimensional GNNs which rely on a more expressive message-passing scheme between subgraphs of cardinality . Later, this was refined in (Azizian and Lelarge, 2020; Maron et al., 2019a) and in (Morris and Mutzel, 2019) by deriving models equivalent to the more powerful -dimensional Weisfeiler-Leman algorithm. Chen et al. (2019c) connect the theory of universal approximation of permutation-invariant functions and graph isomorphism testing, further introducing a variation of the -WL. Recently, a large body of work propose enhancements to GNNs, e.g., see Albooyeh et al. (2019); Beaini et al. (2020); Bodnar et al. (2021); Bouritsas et al. (2020); Murphy et al. (2019b); Vignac et al. (2020); You et al. (2021), making them more powerful than the -WL; see Appendix A for a in-depth discussion. For clarity, throughout this work, we will use the term GNNs to denote the class of message-passing architectures limited by the -WL algorithm, where the class of distinguishable graphs is well understood (Arvind et al., 2015).

Data augmentation, generalization and subgraph-based inductive biases. There exist few works proposing data augmentation for GNNs for graph classification. Kong et al. (2020) introduces a simple feature perturbation framework to achieve this, while Rong et al. (2020); Feng et al. (2020) focus on vertex-level tasks. Garg et al. (2020) study the generalization abilities of GNNs showing bounds on the Rademacher complexity, while Liao et al. (2020) offer a refined analysis within the PAC-Bayes framework. Recently, Bouritsas et al. (2020) proposed to use subgraph counts as vertex and edge features in GNNs. Although the authors show an increase in expressiveness, the extent, e.g., which graph classes their model can distinguish, is still mostly unclear. Moreover, Yehudai et al. (2020) investigate GNNs’ ability to generalize to larger graphs. Concurrently, Bevilacqua et al. (2021) show how subgraph densities can be used to build size-invariant graph representations. However, the performance of such models in in-distribution tasks, their expressiveness, and scalability remain unclear. Finally, Yuan et al. (2021) show how GNNs’ decisions can be explained by (often large) subgraphs, further motivating our use of graph reconstruction as a powerful inductive bias for GRL.

Reconstruction conjecture. The reconstruction conjecture is a longstanding open problem in graph theory, which has been solved in many particular settings. Such results come in two flavors. Either proving that graphs from a specific class are reconstructible or determining which graph functions are reconstructible. Known results of the former are, for instance, that regular graphs, disconnected graphs, and trees are reconstructible (Bondy, 1991; Kelly et al., 1957). In particular, we highlight that outerplanar graphs, which account for most molecule graphs, are known to be reconstructible (Giles, 1974). For a comprehensive review of graph reconstruction results, see Bondy (1991).

2 Preliminaries

Figure 1: A graph and its deck , faded out vertices are not part of each card in the deck.

Here, we introduce notation and give an overview of the main results in graph reconstruction theory (Bondy, 1991; Godsil, 1993), including the reconstruction conjecture (Ulam, 1960), which forms the basis of the models in this work.

Notation and definitions. As usual, let for , and let denote a multiset. In an abuse of notation, for a set with in , we denote by the set . We also assume elementary definitions from graph theory, such as graphs, directed graphs, vertices, edges, neighbors, trees, isomorphism, et cetera; see Appendix B. The vertex and the edge set of a graph are denoted by and , respectively. The size of a graph is equal to its number of vertices. Unless indicated otherwise, we use . If not otherwise stated, we assume that vertices and edges are annoted with attributes

, i.e., real-valued vectors.

We denote the set of all finite and simple graphs by . The subset of without edge attributes (or edge directions) is denoted . We write if the graphs and are isomorphic. Further, we denote the isomorphism type, i.e., the equivalence class of the isomorphism relation, of a graph as . Let , then is the induced subgraph with edge set . We will refer to induced subgraphs simply as subgraphs in this work.

Let be a family of graph representations, such that for , in , , assigns a -dimensional representation vector for a graph in . We say can distinguish a graph if there exists in that assigns a unique representation to the isomorphism type of , i.e., if and only if . Further, we say distinguishes a pair of non-isomorphic graphs and if there exists some in such that . Moreover, we write if distinguishes between all graphs does, and if both directions hold. The corresponding strict relation is denoted by . Finally, we say is a most-expressive representation of a class of graphs if it distinguishes all non-isomorphic graphs in that class.

Graph reconstruction. Intuitively, the reconstruction conjecture states that an undirected edge-unattributed graph can be fully recovered up to its isomorphism type given the multiset of its vertex-deleted subgraphs’ isomorphism types. This multiset of subgraphs is usually referred to as the deck of the graph, see Figure 1 for an illustration. Formally, for a graph , we define its deck as . We often call an element in a card. We define the graph reconstruction problem as follows.

Definition 1.

Let and be graphs, then is a reconstruction of if and have the same deck, denoted . A graph is reconstructible if every reconstruction of is isomorphic to , i.e., implies .

Similarly, we define function reconstruction, which relates functions that map two graphs to the same value if they have the same deck.

Definition 2.

Let be a function, then is reconstructible if for all graphs in , i.e., implies .

We can now state the reconstruction conjecture, which in short says that every in with is reconstructible.

Conjecture 1 (Kelly (1942); Ulam (1960)).

Let and in be two finite, undirected, simple graphs with at least three vertices. If is a reconstruction of , then and are isomorphic.

We note here that the reconstruction conjecture does not hold for directed graphs, hypergraphs, and infinite graphs (Bondy, 1991; Stockmeyer, 1977, 1981). In particular, edge directions can be seen as edge attributes. Thus, the reconstruction conjecture does not hold for the class . In contrast, the conjecture has been proved for practical-relevant graph classes, such as disconnected graphs, regular graphs, trees, and outerplanar graphs (Bondy, 1991). Further, computational searches show that graphs with up to 11 vertices are reconstructible (McKay, 1997). Finally, many graph properties are known to be reconstructible, such as every size subgraph count, degree sequence, number of edges, and the characteristic polynomial (Bondy, 1991).

Graph -reconstruction. Kelly et al. (1957) generalized graph reconstruction, considering the multiset of subgraphs of size instead of , which we denote , where is the set of all -size subsets of . We often call an element in a -card. From the -deck definition, it is easy to extend the concept of graph and function reconstruction, cf. Definitions 2 and 1, to graph and function -reconstruction.

Definition 3.

Let and be graphs, then is a -reconstruction of if and have the same -deck, denoted . A graph is -reconstructible if every -reconstruction of is isomorphic to , i.e., implies .

Accordingly, we define -function reconstruction as follows.

Definition 4.

Let be a function, then is -reconstructible if for all graphs in , i.e., implies .

Results for -reconstruction usually state the least as a function of such that all graphs in (or some subset) are -reconstructible (Nỳdl, 2001). There exist extensive partial results in this direction, mostly describing -reconstructibility (as a function of ) for a particular family of graphs, such as trees, disconnected graphs, complete multipartite graphs, and paths, see (Nỳdl, 2001; Kostochka and West, 2020). More concretely, Nỳdl (1981); Spinoza and West (2019) showed graphs with vertices that are not -reconstructible. In practice, these results imply that for some fixed there will be graphs with not many more vertices than that are not -reconstructible. Further, -reconstructible graph functions such as degree sequence and connectedness have been studied in (Manvel, 1974; Spinoza and West, 2019) depending on the size of . In Appendix C, we discuss further such results.

3 Reconstruction Neural Networks

Building on the previous section, we propose two neural architectures based on graph -reconstruction and graph reconstruction. First, we look at -Reconstruction Neural Networks, the most natural way to use graph -reconstruction. Secondly, we look at Full Reconstruction Neural Networks, where we leverage the Reconstruction Conjecture to build a most-expressive representation for the class of graphs of bounded size and unattributed edges.

-Reconstruction Neural Networks. Intuitively, the key idea of -Reconstruction Neural Networks is that of learning a joint representation based on subgraphs induced by vertices. Formally, let be a (row-wise) permutation-invariant function and be the set of graphs with exactly vertices. Further, let be a graph representation function such that two graphs and on vertices are mapped to the same vectorial representation if and only if they are isomorphic, i.e., for all and in . We define -Reconstruction Neural Networks over as a function with parameters in the form

where is the set of all -size subsets of for some , and Concat denotes row-wise concatenation of a multi-set of vectors in some arbitrary order. Note that might also be a function with learnable parameters. In that case, we require it to be most-expressive for . The following results characterize the expressive power of the above architecture.

Proposition 1.

Let be a universal approximator of multisets (Zaheer et al., 2017; Wagstaff et al., 2019; Murphy et al., 2019a). Then, can approximate a function if and only if the function is -reconstructible.

Moreover, we can observe the following.

Observation 1 (Nỳdl (2001); Kostochka and West (2020)).

For any graph in , its -deck determines its -deck .

From Observation 1, we can derive a hierarchy in the expressive power of -Reconstruction Neural Networks with respect to the subgraph size . That is,

In Appendix D, we show how many existing architectures have their expressive power limited by -reconstruction. We also refer to Appendix D for the proofs, a discussion on the model’s computational complexity, approximation methods, and relation to existing work.

Full Reconstruction Neural Networks. Here, we propose a recursive scheme based on the reconstruction conjecture to build a most-expressive representation for graphs. Intuitively, Full Reconstruction Neural Networks recursively compute subgraph representations based on smaller subgraph representations. Formally, let be the class of undirected graphs with unattributed edges and maximum size . Further, let be a (row-wise) permutation invariant function and let be a most-expressive representation of the two-vertex subgraph induced by vertices and . We can now define the representation of a graph in in a recursive fashion as

Again, Concat() is row-wise concatenation in some arbitrary order. Note that in practice, it is easier to build the subgraph representations in a bottom-up fashion. First, use two-vertex subgraph representations to compute all three-vertex subgraph representations. Then, perform this inductively until we arrive at a single whole-graph representation. In Appendix E, we prove the expressive power of Full Reconstruction Neural Networks, i.e., we show how if the reconstruction conjecture holds, it is a most-expressive representation of undirected edge-unattributed graphs. Finally, we show its quadratic number of parameters, exponential computational complexity, and relation to existing work.

4 Reconstruction Graph Neural Networks

Although Full Reconstruction Neural Networks provide a most-expressive representation for undirected, unattributed-edge graphs, they are impractical due to their computational cost. Similarly, -Reconstruction Neural Networks are not scalable since increasing their expressive power requires computing most-expressive representations of larger -size subgraphs. Hence, to circumvent the computational cost, we replace the most-expressive representations of subgraphs from -Reconstruction Neural Networks with GNN representations, resulting in what we name -Reconstruction GNNs. This change allows for scaling the model to larger subgraph sizes, such as , , …, et cetera.

Since, in the general case, graph reconstruction assumes most-expressive representations of subgraphs, it cannot capture -Reconstruction GNNs’ expressive power directly. Hence, we provide a theoretical characterization of the expressive power of -Reconstruction GNNs by coupling graph reconstruction and the GNN expressive power characterization based on the -WL algorithm. Nevertheless, in Section F.2, we devise conditions under which -Reconstruction GNNs have the same power as -Reconstruction Neural Networks. Finally, we show how graph reconstruction can act as a (provably) powerful inductive bias for invariances to vertex removals, which boosts the performance of GNNs even in tasks where all graphs are already distinguishable by them (see Appendix G). We refer to Appendix F for a discussion on the model’s relation to existing work.

Formally, let be a (row-wise) permutation invariant function and a GNN representation. Then, for , a -Reconstruction GNN takes the form

with parameters , where is the set of all -size subsets of , and Concat is row-wise concatenation in some arbitrary order.

Approximating . By design, -Reconstruction GNNs require computing GNN representations for all -vertex subgraphs, which might not be feasible for large graphs or datasets. To address this, we discuss a direction to circumvent computing all subgraphs, i.e., approximating by sampling.

One possible choice for is Deep Sets (Zaheer et al., 2017), which we use for the experiments in Section 5, where the representation is a sum decomposition taking the form , where and are permutation sensitive functions, such as feed-forward networks. We can learn the -Reconstruction GNN model over a training dataset

w and a loss function

by minimizing the empirical risk

(1)

Equation 1 is impractical for all but the smallest graphs, since is a sum over all -vertex induced subgraphs of . Hence, we approximate using a sample drawn uniformly at random at every gradient step, i.e., . Due to non-linearities in and , plugging into Equation 1

does not provide us with an unbiased estimate of

. However, if is convex in , in expectation we will be minimizing a proper upper bound of our loss, i.e., . In practice, many models rely on this approximation and provide scalable and reliable training procedures, cf. (Murphy et al., 2019a, b; Zaheer et al., 2017; Hinton et al., 2012).

4.1 Expressive power

Now, we analyze the expressive power of -Reconstruction GNNs. It is clear that -Reconstruction GNNs -Reconstruction Neural Networks, however the relationship between -Reconstruction GNNs and GNNs is not that straightforward. At first, one expects that there exists a well-defined hierarchy—such as the one in -Reconstruction Neural Networks (see Observation 1)—between GNNs, -Reconstruction GNNs, -Reconstruction GNNs, and so on. However, there is no such hierarchy, as we see next.

Are GNNs more expressive than -Reconstruction GNNs? It is well-known that GNNs cannot distinguish regular graphs (Arvind et al., 2015; Morris et al., 2019). By leveraging the fact that regular graphs are reconstructible (Kelly et al., 1957), we show that cycles and circular skip link (CSL) graphs—two classes of regular graphs—can indeed be distinguished by -Reconstruction GNNs, implying that -Reconstruction GNNs are not less expressive than GNNs. We start by showing that -Reconstruction GNNs can distinguish the class of cycle graphs.

Theorem 1 (-Reconstruction GNNs can distinguish cycles).

Let be a cycle graph with vertices and . An -Reconstruction GNN assigns a unique representation to if
i) and ii) hold.

The following results shows that -Reconstruction GNNs can distinguish the class of CSL graphs.

Theorem 2 (-Reconstruction GNNs can distinguish CSL graphs).

Let be two non-isomorphic circular skip link (CSL) graphs (a class of 4-regular graphs, cf. (Chen et al., 2019a; Murphy et al., 2019b)). Then, -Reconstruction GNNs can distinguish and .

Hence, if the conditions in LABEL:{thm:cycle} hold, GNNs -Reconstruction GNNs. Figure 2 (cf. Appendix F) depicts how -Reconstruction GNNs can distinguish a graph that GNNs cannot. The process essentially breaks the local symmetries that make GNNs struggle by removing one (or a few) vertices from the graph. By doing so, we arrive at distinguishable subgraphs. Since we can reconstruct the original graph with its unique subgraph representations, we can identify it. See Appendix F for the complete proofs of Theorems 2 and 1.

Are GNNs less expressive than -Reconstruction GNNs? We now show that GNNs can distinguish graphs that -Reconstruction GNNs with small cannot. We start with Proposition 2 stating that there exist some graphs that GNNs can distinguish which -Reconstruction GNNs with small cannot.

Proposition 2.

GNNs -Reconstruction GNNs for .

On the other hand, the analysis is more interesting for larger subgraph sizes, e.g., , where there are no known examples of (undirected, edge-unattributed) non-reconstructible graphs. There are graphs distinguishable by GNNs with at least one subgraph not distinguishable by them; see Appendix F. However, the analysis is whether the multiset of all subgraphs’ representations can distinguish the original graph. Since we could not find any counter-examples, we conjecture that every graph distinguishable by a GNN is also distinguishable by a -Reconstruction GNN with or possibly more generally with any close enough to . In Appendix F, we state and discuss the conjecture, which we name WL reconstruction conjecture. If true, the conjecture implies GNNs -Reconstruction GNNs. Moreover, if we use the original GNN representation together with -Reconstruction GNNs, Theorems 2 and 1 imply that the resulting model is strictly more powerful than the original GNN.

Are -Reconstruction GNNs less expressive than higher-order (-WL) GNNs?

Recently a line of work, e.g., Azizian and Lelarge (2020); Maron et al. (2019b); Morris and Mutzel (2019), explored higher-order GNNs aligning with the -WL hierarchy. Such architectures have, in principle, the same power as the -WL algorithm in distinguishing non-isomorphic graphs. Hence, one might wonder how -Reconstruction GNNs stack up to -WL-based algorithms. The following result shows that pairs of non-isomorphic graphs exist that a -Reconstruction GNN can distinguish but the -WL cannot.

Proposition 3.

Let -GNNs be neural architectures with the same expressiveness as the -WL algorithm. Then, .

As a result of Proposition 3, using a -Reconstruction GNN representation together with a 2-GNN increases the original 2-GNN’s expressive power.

4.2 Reconstruction as a powerful extra invariance for general graphs

An essential feature of modern machine learning models is capturing invariances of the problem of interest (Lyle et al., 2020)

. It reduces degrees of freedom while allowing for better generalization 

(Bloem-Reddy and Teh, 2020; Lyle et al., 2020). GRL is predicated on invariance to vertex permutations, i.e., assigning the same representation to isomorphic graphs. But are there other invariances that could improve generalization error?

-reconstruction is an extra invariance. Let

be the joint probability of observing a graph

with label . Any -reconstruction-based model, such as -Reconstruction Neural Networks and -Reconstruction GNNs, by definition assumes to be invariant to the -deck, i.e., if . Hence, our neural architectures for -Reconstruction Neural Networks and -Reconstruction GNNs directly define this extra invariance beyond permutation invariance. How we do know it is an extra invariance and not a consequence of permutation invariance? It does not hold on directed graphs Stockmeyer (1981), where permutation invariance still holds.

Hereditary property variance reduction. We now show that the invariance imposed by -reconstruction helps in tasks based on hereditary properties (Borowiecki et al., 1997). A graph property is called hereditary if it is invariant to vertex removals, i.e. for every and . By induction the property is invariant to every size subgraph, i.e., for every where is the set of all -size subsets of . Here, the property is invariant to any given subgraph. For example, every subgraph of a planar graph is also planar, every subgraph of an acyclic graph is also acyclic, any subgraph of a -colorable graph is also -colorable. A more practically interesting (weaker) invariance would be invariance to a few vertex removals. Next we define -hereditary properties (a special case of a -hereditary property). In short, a property is -hereditary if it is a hereditary property for graphs with more than vertices.

Definition 5 (-hereditary property).

A graph property is said to be -hereditary if . That is, is uniform in and all subgraphs of with more than vertices.

Consider the task of predicting . Theorem 3 shows that -Reconstruction GNNs is an invariance that reduces the variance of the empirical risk associated with -hereditary property tasks. See Appendix F for the proof.

Theorem 3 (-Reconstruction GNNs for variance reduction of -hereditary tasks).

Let be a -hereditary distribution, i.e., where is a -hereditary property. Further, let for all with , . Then, for -Reconstruction GNNs taking the form , if is convex in , we have

where is the empirical risk of -Reconstruction GNNs with (cf. Equation 1) and is the empirical risk of GNNs.

5 Experimental Evaluation

In this section, we investigate the benefits of -Reconstruction GNNs against GNN baselines on both synthetic and real-world tasks. Concretely, we address the following questions:
Q1. Does the increase in expressive power from reconstruction (cf. Section 4.1) make -Reconstruction GNNs solve graph property tasks not originally solvable by GNNs?
Q2. Can reconstruction boost the original GNNs performance on real-world tasks? If so, why?
Q3. What is the influence of the subgraph size in both graph property and real-world tasks?

Synthetic graph property datasets. For Q1 and Q3, we chose the synthetic graph property tasks in Table 1, for which GNNs are provably incapable to solve due to their limited expressive power (Garg et al., 2020; Murphy et al., 2019c). The tasks are csl (Dwivedi et al., 2020)

, where we classify CSL graphs, the cycle detection tasks

4 cycles, 6 cycles and 8 cycles (Vignac et al., 2020) and the multi-task regression from Corso et al. (2020), where we want to determine whether a graph is connected, its diameter and its spectral radius. See Appendix H for datasets statistics.
Real-world datasets. To address Q2 and Q3, we evaluated -Reconstruction GNNs on a diverse set of large-scale, standard benchmark instances Hu et al. (2020); Morris et al. (2020a). Specifically, we used the zinc (10K) (Dwivedi et al., 2020), alchemy (10K)  (Chen et al., 2019a), ogbg-molfreesolv, ogbg-molesol, and ogbg-mollipo (Hu et al., 2020) regression datasets. For the case of graph classification, we used ogbg-molhiv, ogbg-molpcba, ogbg-tox21, and ogbg-toxcast (Hu et al., 2020). See Appendix H for datasets statistics.
Neural architectures. We used the GIN (Xu et al., 2018), GCN (Kipf and Welling, 2017), and the PNA (Corso et al., 2020)

architectures as GNN baselines. We always replicated the exact architectures from the original paper, building on the respective PyTorch Geometric implementation 

Fey and Lenssen (2019). For the ogbg regression datasets, we noticed how using a jumping knowledge layer yields better validation and test results for GIN and GCN. Thus we made this small change. For each of these three architectures, we implemented -Reconstruction GNNs for in using a Deep Sets function (Zaheer et al., 2017) over the exact same original GNN architecture. For more details, see Appendix G.
Experimental setup. To establish fair comparisons, we retain all hyperparameters and training procedures from the original GNNs to train the corresponding -Reconstruction GNNs. Tables 2 and 1 and Table 6 in Appendix I present results with the same number of runs as previous work (Corso et al., 2020; Dwivedi et al., 2020; Hu et al., 2020; Morris et al., 2020b; Vignac et al., 2020), i.e., five for all datasets execpt the ogbg datasets, where we use ten runs. For more details, such as the number of subgraphs sampled for each -Reconstruction GNN and each dataset, see Appendix G.
Non-GNN baselines. For the graph property tasks, original work used vertex identifiers or laplacian embeddings to make GNNs solve them. This trick is effective for the tasks but violates an important premise of graph representations, invariance to vertex permutations. To illustrate this line of work, we compare against Positional GIN, which uses Laplacian embeddings (Dwivedi et al., 2020) for the csl task and vertex identifiers for the others (Vignac et al., 2020; Corso et al., 2020). To compare against other methods that like -Reconstruction GNNs are invariant to vertex permutations and increase the expressive power of GNNs, we compare against Ring-GNNs (Chen et al., 2019c) and (3-WL) PPGNs (Maron et al., 2019a). For real-world tasks, Table 6 in Appendix I shows the results from GRL alternatives that incorporate higher-order representations in different ways, LRP (Chen et al., 2019c), GSN (Bouritsas et al., 2020), -2-LGNN (Morris et al., 2020b), and SMP (Vignac et al., 2020).

All results are fully reproducible from the source and are available at https://github.com/PurdueMINDS/reconstruction-gnns.

Results and discussion.

Multi-task Invariant to
csl 4 cycles 6 cycles 8 cycles connectivity diameter spectral radius
(Accuracy % %) (Accuracy % %) (Accuracy %) (Accuracy %) ( MSE) ( MSE) ( MSE) vertex permutations?
GIN (orig.) 4.66 4.00 93.0 92.7 92.5 -3.419 0.320 0.588 0.354 -2.130 1.396

Reconstr.

88.66 22.66 95.17 4.91 97.35 0.74 94.69 2.34 -3.575 0.395 -0.195 0.714 -2.732 0.793
78.66 22.17 94.06 5.10 97.50 0.72 95.04 2.69 -3.799 0.187 -0.207 0.381 -2.344 0.569
73.33 16.19 96.61 1.40 97.84 1.37 94.48 2.13 -3.779 0.064 0.105 0.225 -1.908 0.860
40.66 9.04 75.13 0.26 63.28 0.59 63.53 1.14 -3.765 0.083 0.564 0.025 -2.130 0.166
GCN(orig.) 6.66 2.10 98.336 0.24 95.73 2.72 87.14 12.73 -3.781 0.075 0.087 0.186 -2.204 0.362

Reconstr.

100.00 0.00 99.00 0.10 97.63 0.19 94.99 2.31 -4.039 0.101 -1.175 0.425 -3.625 0.536
100.00 0.00 98.77 0.61 97.89 0.69 97.82 1.10 -3.970 0.059 -0.577 0.135 -3.397 0.273
96.00 6.46 99.11 0.19 98.31 0.52 97.18 0.58 -3.995 0.031 -0.333 0.117 -3.105 0.286
49.33 7.42 75.19 0.19 66.04 0.59 63.66 0.51 -3.693 0.063 0.8518 0.016 -1.838 0.054
PNA (orig.) 10.00 2.98 81.59 19.86 95.57 0.36 84.81 16.48 -3.794 0.155 -0.605 0.097 -3.610 0.137

Reconstr.

100.00 0.00 97.88 2.19 99.18 0.20 98.92 0.72 -3.904 0.001 -0.765 0.032 -3.954 0.118
95.33 7.77 99.12 0.28 99.10 0.57 99.22 0.27 -3.781 0.085 -0.090 0.135 -3.478 0.206
95.33 5.81 89.36 0.22 99.34 0.26 93.92 8.15 -3.710 0.209 0.042 0.047 -3.311 0.067
42.66 11.03 75.34 0.18 65.58 0.95 64.01 0.30 -2.977 0.065 1.445 0.037 -1.073 0.075
Positional GIN 99.33 1.33 88.3 96.1 95.3 -1.61 -2.17 -2.66
Ring-GNN 10.00 0.00 99.9 100.0 71.4
PPGN (3-WL) 97.80 10.91 99.8 87.1 76.5
Table 1: Synthetic graph property tasks. We highlight in green -Reconstruction GNNs boosting the original GNN architecture. : Std. not reported in original work. : Laplacian embeddings used as positional features. : vertex identifiers used as positional features.

A1 (Graph property tasks). Table 1 confirms Theorem 2, where the increase in expressive power from reconstruction allows -Reconstruction GNNs to distinguish CSL graphs, a task that GNNs cannot solve. Here, -Reconstruction GNNs boost the accuracy of standard GNNs between 10 and 20. Theorem 2 only guarantees GNN expressiveness boosting for -Reconstruction, but our empirical results also show benefits for -Reconstruction with . Table 1 also confirms Theorem 1, where -Reconstruction GNNs provide significant accuracy boosts on all cycle detection tasks (4 cycles, 6 cycles and 8 cycles). See Section J.1, for a detailed discussion on results for connectivity, diameter, and spectral radius, which also show boostings.

ogbg-moltox21 ogbg-moltoxcast ogbg-molfreesolv ogbg-molesol ogbg-mollipo ogbg-molpcba
(ROC-AUC %) (ROC-AUC %) (RSMSE) (RSMSE) (RSMSE) (AP %)
GIN (orig.) 74.91 0.51 63.41 0.74 2.411 0.123 1.111 0.038 0.754 0.010 21.16 0.28

Reconstr.

75.15 1.40 63.95 0.53 2.283 0.279 1.026 0.033 0.716 0.020 23.60 0.02
76.84 0.62 65.36 0.49 2.117 0.181 1.006 0.030 0.736 0.025 23.25 0.00
76.78 0.64 64.84 0.71 2.370 0.326 1.055 0.031 0.738 0.018 23.33 0.09
74.40 0.75 62.29 0.28 2.531 0.206 1.343 0.053 0.842 0.020 13.50 0.32
GCN (orig.) 75.29 0.69 63.54 0.42 2.417 0.178 1.106 0.036 0.793 0.040 20.20 0.24

Reconstr.

76.46 0.77 64.51 0.60 2.524 0.300 1.096 0.045 0.760 0.015 21.25 0.25
75.58 0.99 64.38 0.39 2.467 0.231 1.086 0.048 0.766 0.025 20.10 0.08
75.88 0.73 64.70 0.81 2.345 0.261 1.114 0.047 0.754 0.021 19.04 0.03
74.03 0.63 62.80 0.77 2.599 0.161 1.372 0.048 0.835 0.020 11.69 1.41
PNA (orig.) 74.28 0.52 62.69 0.63 2.192 0.125 1.140 0.032 0.759 0.017 25.45 0.04

Reconstr.

73.64 0.74 64.14 0.76 2.341 0.070 1.723 0.145 0.743 0.015 23.11 0.05
74.89 0.29 65.22 0.47 2.298 0.115 1.392 0.272 0.794 0.065 22.10 0.03
75.10 0.73 65.03 0.58 2.133 0.086 1.360 0.163 0.785 0.041 20.05 0.15
73.71 0.61 61.25 0.49 2.185 0.231 1.157 0.056 0.843 0.018 12.33 1.20
Table 2: ogbg molecule graph classification and regression tasks. We highlight in green -Reconstruction GNNs boosting the original GNN architecture.

A2 (Real-world tasks). Table 2 and Table 6 in Appendix I show that applying -reconstruction to GNNs significantly boosts their performance across all eight real-world tasks. In particular, in Table 2 we see a boost of up to 5% while achieving the best results in five out of six datasets. The -reconstruction applied to GIN gives the best results in the ogbg tasks, with the exception of ogbg-mollipo and ogbg-molpcba where -reconstruction performs better. The only settings where we did not get any boost were PNA for ogbg-molesol and ogbg-molpcba. Table 6 in Appendix I also shows consistent boost in GNNs’ performance of up to 25% in other datasets. On zinc, -Reconstruction yields better results than the higher-order alternatives LRP and -2-LGNN. While GSN gives the best zinc results, we note that GSN requires application-specific features. In ogbg-molhiv, -reconstruction is able to boost both GIN and GCN. The results in Appendix G show that nearly of the graphs in our real-world datasets are distinguishable by the -WL algorithm, thus we can conclude that traditional GNNs are expressive enough for all our real-world tasks. Hence, real-world boosts of reconstruction over GNNs can be attributed to the gains from invariances to vertex removals (cf. Section 4.2) rather than the boost in expressive power (cf. Section 4.1).

A3 (Subgraph sizes). Overall we observe that removing one vertex () is enough to improve the performance of GNNs in most experiments. At the other extreme end of vertex removals, , there is a significant loss in expressiveness compared to the original GNN. In most real-world tasks Table 2 and Table 6 in Appendix I show a variety of performance boosts also with . For GCN and PNA in ogbg-molesol, specifically, we only see -Reconstruction boosts over smaller subgraphs such as , which might be due to the task’s need of more invariance to vertex removals (cf. Section 4.2). In the graph property tasks (Table 1), we see significant boosts also for in all models across most tasks, except PNA. However, as in real-world tasks the extreme case of small subgraphs significantly harms the ability to solve tasks with -Reconstruction GNNs.

6 Conclusions

Our work connected graph (-)reconstruction and modern GRL. We first showed how such connection results in two natural expressive graph representation classes. To make our models practical, we combined insights from graph reconstruction and GNNs, resulting in -Reconstruction GNNs. Our theory shows that reconstruction boosts the expressiveness of GNNs and has a lower-variance risk estimator in distributions invariant to vertex removals. Empirically, we showed how the theoretical gains of -Reconstruction GNNs translate into practice, solving graph property tasks not originally solvable by GNNs and boosting their performance on real-world tasks.

Acknowledgements

This work was funded in part by the National Science Foundation (NSF) awards CAREER IIS-1943364 and CCF-1918483. Any opinions, findings and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the sponsors. Christopher Morris is funded by the German Academic Exchange Service (DAAD) through a DAAD IFI postdoctoral scholarship (57515245). We want to thank our reviewers, who gave excellent suggestions to improve the paper.

References

Appendix A Related work (expanded)

GNNs. Recently, graph neural networks (Gilmer et al., 2017; Scarselli et al., 2009) emerged as the most prominent (supervised) GRL architectures. Notable instances of this architecture include, e.g., (Duvenaud et al., 2015; Hamilton et al., 2017; Velickovic et al., 2018), and the spectral approaches proposed in, e.g., (Bruna et al., 2014; Defferrard et al., 2016; Kipf and Welling, 2017; Monti et al., 2017)—all of which descend from early work in (Kireev, 1995; Merkwirth and Lengauer, 2005; Sperduti and Starita, 1997; Scarselli et al., 2009). Recent extensions and improvements to the GNN framework include approaches to incorporate different local structures (around subgraphs), e.g., (Abu-El-Haija et al., 2019; Flam-Shepherd et al., 2020; Jin et al., 2019; Niepert et al., 2016; Xu et al., 2018), novel techniques for pooling vertex representations in order perform graph classification, e.g., (Cangea et al., 2018; Gao and Ji, 2019; Ying et al., 2018; Zhang et al., 2018), incorporating distance information (You et al., 2019), and non-euclidian geometry approaches (Chami et al., 2019). Moreover, recently empirical studies on neighborhood aggregation functions for continuous vertex features (Corso et al., 2020), edge-based GNNs leveraging physical knowledge (Anderson et al., 2019; Klicpera et al., 2020), and sparsification methods (Rong et al., 2020) emerged. A survey of recent advancements in GNN techniques can be found, e.g., in (Chami et al., 2020; Wu et al., 2019; Zhou et al., 2018).

Limits of GNNs. Chen et al. (2020) study the substructure counting abilities of GNNs. Dasoulas et al. (2020); Abboud et al. (2020) investigate the connection between random coloring and universality. Recent works have extended GNNs’ expressive power by encoding vertex identifiers (Murphy et al., 2019b; Vignac et al., 2020), adding random features (Sato et al., 2020), using higher-order topology as features (Bouritsas et al., 2020), considering simplicial complexes (Albooyeh et al., 2019; Bodnar et al., 2021), encoding ego-networks (You et al., 2021), and encoding distance information (Li et al., 2020). Although these works increase the expressiveness of GNNs, their generalization abilities are understood to a lesser extent. Further, works such as Vignac et al. (2020, Lemma 6) and the most recent Beaini et al. (2020) and Bodnar et al. (2021) prove the boost in expressiveness with a single pair of graphs, giving no insights into the extent of their expressive power or their generalization abilities. For clarity, throughout this work, we use the term GNNs to denote the class of message-passing architectures limited by the -WL algorithm, where the class of distinguishable graphs is well understood (Arvind et al., 2015).

Appendix B Notation (expanded)

As usual, let for , and let denote a multiset. In an abuse of notation, for a set with in , we denote by the set .

Graphs. A graph is a pair with a finite set of vertices and a set of edges . We denote the set of vertices and the set of edges of by and , respectively. For ease of notation, we denote the edge in by or . In the case of directed graphs . An attributed graph is a triple with an attribute function for . Then is an attribute of for in . The neighborhood of in is denoted by . Unless indicated otherwise, we use .

We say that two graphs and are isomorphic, , if there exists an adjacency preserving bijection , i.e., is in if and only if is in , and call an isomorphism from to . If the graphs have vertex or edge attributes, the isomorphism is additionally required to match these attributes accordingly.

We denote the set of all finite and simple graphs by . The subset of without edge attributes is denoted . Further, we denote the isomorphism type, i.e., the equivalence class of the isomorphism relation, of a graph as . Let , then is the induced subgraph with edge set . We will refer to induced subgraphs simply as subgraphs in this work.

Appendix C More on reconstruction

After formulating the Reconstruction Conjecture, it is natural to wonder whether it stands for other relational structures, such as directed graphs. Interestingly, directed graphs, hypergraphs, and infinite graphs are not reconstructible (Bondy, 1991; Stockmeyer, 1977). Thus, in particular the Reconstruction Conjecture does not hold for the class .

Another question is how many cards from the deck are sufficient to reconstruct a graph. Bollobás (1990) show that almost every graph, in a probabilistic sense, can be reconstructed with only three subgraphs from the deck. For example, the graph shown in Figure 1 (Section 2) is reconstructible from the three leftmost cards.

For an extensive survey on reconstruction, we refer the reader to  Bondy (1991); Godsil (1993). From there, we highlight a significant result, Kelly’s Lemma (cf. Lemma 1). In short, the lemma states that the deck of a graph completely defines its subgraph count of every size.

Lemma 1 (Kelly’s Lemma (Kelly et al., 1957)).

Let be the number of copies of in . For any pair of graphs with , is reconstructible.

In fact, its proof is very simple once we realize every subgraph appears in exactly cards from the deck, i.e.,

Manvel (1974) started the study of graph reconstruction with the -deck, which has been recently reviewed by Kostochka and West (2020) and Nỳdl (2001). Related work also refers to -reconstruction as -reconstruction (Kostochka and West, 2020), where is the number of deleted vertices from the original graph.

Here, in Lemma 2, we highlight a generalization of Kelly’s Lemma (Lemma 1) established in Nỳdl (2001), where the count of any subgraph of size at most is -reconstructible.

Lemma 2 (Nỳdl (2001)).

For any pair of graphs with , is -reconstructible.

Appendix D More on -Reconstruction Neural Networks

Here, we give more background on -Reconstruction Neural Networks.

d.1 Properties

We start by showing how the -ary Relational Pooling framework Murphy et al. (2019b) is a specific case of -Reconstruction Neural Networks and thus limited by -reconstruction. Then, we show how -Reconstruction Neural Networks are limited by -GNNs at initialization, which implies that -GNNs Morris et al. (2019) at initialization can approximate any -reconstructible function.

Observation 2 (-ary Relational Pooling -Reconstruction Neural Networks).

The -ary pooling approach in the Relational Pooling (RP) framework (Murphy et al., 2019b) defines a graph representation of the form

where is the set of all -size subsets of and

is a most-expressive graph representation given by the average of a permutation-sensitive universal approximator, e.g., a feed-forward neural network, applied over the

permutations of the subgraph, accordingly. Thus, -ary RP can be casted as a -Reconstruction Neural Network with as mean pooling and as . Note that for -ary RP to be as expressive as -Reconstruction Neural Networks, i.e., -ary RP -Reconstruction Neural Networks, we would need to replace the average pooling by a universal multiset approximator or simply add a feed-forward neural network after it.

Observation 3 (-Reconstruction Neural Networks -WL at initialization).

The -WL test, which limits architectures such as Morris et al. (2020b, 2019); Maron et al. (2019a)

, at initialization, with zero iteration, considers one-hot encodings of

-tuples of vertices. Note that each -size subgraph is completely defined by its corresponding vertex tuples. Thus, it follows that -WL with zero iterations is at least as expressive as -Reconstruction Neural Networks. Further, by combining Proposition 1 and the result from Lemma 2 (Nỳdl, 2001), it follows that -WL (Morris et al., 2020b) at initialization can count subgraphs of size , which is a simple proof for the recent result (Chen et al., 2019a, Theorem 3.7).

Now, we discuss the computational complexity of -Reconstruction Neural Networks and how to circumvent it through subgraph sampling.

Computational complexity. As outlined in Section 2, we would need subgraphs of size almost to have a most-expressive representation of graphs with -Reconstruction Neural Networks. This would imply performing isomorphism testing for arbitrarily large graphs, as in Bouritsas et al. (2020), making the model computationally infeasible.

A graph with vertices has induced subgraphs of size . Let be an upper-bound on computing . Thus, computing would take time. Although Babai (2016) has shown how to do isomorphism testing in quasi-polynomial time, an efficient (polynomial) time algorithm remains unknown. More generally, expressive representations of graphs (Keriven and Peyré, 2019; Murphy et al., 2019b) and isomorphism class hashing algorithms (Junttila and Kaski, 2007) still require exponential time regarding the graph size. Thus, if we choose a small value for , i.e., , the factor dominates, while if we choose the factor dominates. In both cases, the time complexity is exponential in , i.e., .

d.2 Relation to previous work

Recently, Bouritsas et al. (2020) propose using subgraph isomorphism type counts as features of vertices and edges used in a GNN architecture. The authors comment that if the reconstruction conjecture holds, their architecture is most expressive for . Here, we point out two things. First, their architecture is at least as powerful as -reconstruction. Secondly, the reconstruction conjecture does not hold for directed graphs. Since edge directions can be seen as edge attributes, their architecture is not the most expressive for graphs with attributed edges. Finally, to make their architecture scalable, in practice, the authors choose only specific hand-engineered subgraph types, which makes the model incomparable to -reconstruction.

d.3 Proof of Proposition 1

We start by giving a more formal statement of Proposition 1.

Let be a continuous function over a compact set of and the uniform (sup) norm. Proposition 1 states that for every there exists some such that if and only if is -reconstructible.

Proof.

Since is required to be most expressive, we can see the input of -Reconstruction Neural Networks as a multiset of unique identifiers of isomorphism types. Thus, it follows from Definition 4, that -reconstrucible functions can be approximated by . The other direction, i.e., a function can be approximated by if it is -reconstructible, follows from the Stone–Weierstrass theorem, see Zaheer et al. (2017).∎

Appendix E More on Full Reconstruction Neural Networks

The following result captures the expressive power of Full Reconstruction Neural Networks.

Proposition 4.

If the functions are universal approximators of multisets (Murphy et al., 2019a; Wagstaff et al., 2019; Zaheer et al., 2017) and the Reconstruction Conjecture holds, Full Reconstruction Neural Networks can approximate a function if the function is reconstructible.

Proof.

We use induction on to show that every subgraph representation in Full Reconstruction Neural Networks is a most expressive representation if the Reconstruction Conjecture holds.

  • Base case: . It follows from the model definition that is a most expressive representation if .

  • Inductive step: . If all subgraph representations in