1 Introduction
Supervised machine learning for graphstructured data, i.e., graph classification and regression, is ubiquitous across application domains ranging from chemistry and bioinformatics
(Barabasi and Oltvai, 2004; Stokes et al., 2020) to image (Simonovsky and Komodakis, 2017), and social network analysis (Easley and Kleinberg, 2010). Consequently, machine learning on graphs is an active research area with numerous proposed approaches—notably GNNs (Chami et al., 2020; Gilmer et al., 2017; Grohe, 2020) being the most representative case of GRL methods.Arguably, GRL’s most interesting results arise from a crossover between graph theory and representation learning. For instance, the representational limits of GNNs are upperbounded by a simple heuristic for the graph isomorphism problem
(Morris et al., 2019; Xu et al., 2019), the dimensional WeisfeilerLeman algorithm (WL) (Grohe, 2017; Morris, 2021; Weisfeiler, 1976; Weisfeiler and Leman., 1968), which might miss crucial structural information in the data (Arvind et al., 2015). Further works show how GNNs cannot approximate graph properties such as diameter, radius, girth, and subgraph counts (Chen et al., 2020; Garg et al., 2020), inspiring architectures Azizian and Lelarge (2020); Maron et al. (2019a); Morris et al. (2019, 2020b) based on the more powerful dimensional WeisfeilerLeman algorithm (WL) Grohe (2017).^{1}^{1}1We opt for using instead of , i.e., WL instead ofWL, to not confuse the reader with the hyperparameter
of our models. On the other hand, despite the limited expressiveness of GNNs, they still can overfit the training data, offering limited generalization performance (Xu et al., 2019). Hence, devising GRL architectures that are simultaneously sufficiently expressive and avoid overfitting remains an open problem.An underexplored connection between graph theory and GRL is graph reconstruction, which studies graphs and graph properties uniquely determined by their subgraphs. In this direction, both the pioneering work of ShaweTaylor (1993) and the more recent work of Bouritsas et al. (2020), show that assuming the reconstruction conjecture (see Conjecture 1) holds, their models are mostexpressive representations (universal approximators) of graphs. Unfortunately, ShaweTaylor’s computational graph grows exponentially with the number of vertices, and Bouritsas et al.’s full representation power requires performing multiple graph isomorphism tests on potentially large graphs (with vertices). Moreover, these methods were not inspired by the more general subject of graph reconstruction; instead, they rely on the reconstruction conjecture to prove their architecture’s expressive powers.
Contributions. In this work, we directly connect graph reconstruction to GRL. We first show how the reconstruction of graphs—reconstruction from induced
vertex subgraphs—induces a natural class of expressive GRL architectures for supervised learning with graphs, denoted
Reconstruction Neural Networks. We then show how several existing works have their expressive power limited by reconstruction. Further, we show how the reconstruction conjecture’s insights lead to a provably most expressive representation of graphs. Unlike ShaweTaylor (1993) and Bouritsas et al. (2020), which, for graph tasks, require fixedsize unattributed graphs and multiple (large) graph isomorphism tests, respectively, our method represents boundedsize graphs with vertex attributes and does not rely on isomorphism tests.To make our models scalable, we propose Reconstruction GNNs, a general tool for boosting the expressive power and performance of GNNs with graph reconstruction. Theoretically, we characterize their expressive power showing that Reconstruction GNNs can distinguish graph classes that the WL and
WL cannot, such as cycle graphs and strongly regular graphs, respectively. Further, to explain gains in realworld tasks, we show how reconstruction can act as a lowervariance risk estimator when the graphgenerating distribution is invariant to vertex removals. Empirically, we show that reconstruction enhances GNNs’ expressive power, making them solve multiple synthetic graph property tasks in the literature not solvable by the original GNN. On realworld datasets, we show that the increase in expressive power coupled with the lowervariance risk estimator boosts GNNs’ performance up to 25%. Our combined theoretical and empirical results make another important connection between graph theory and GRL.
1.1 Related work
We review related work from GNNs, their limitations, data augmentation, and the reconstruction conjecture in the following. See Appendix A for a more detailed discussion.
GNNs. Notable instances of this architecture include, e.g., (Duvenaud et al., 2015; Hamilton et al., 2017; Velickovic et al., 2018), and the spectral approaches proposed in, e.g., (Bruna et al., 2014; Defferrard et al., 2016; Kipf and Welling, 2017; Monti et al., 2017)—all of which descend from early work in (Baskin et al., 1997; Kireev, 1995; Merkwirth and Lengauer, 2005; Micheli, 2009; Micheli and Sestito, 2005; Scarselli et al., 2009; Sperduti and Starita, 1997). Aligned with the field’s recent rise in popularity, there exists a plethora of surveys on recent advances in GNN methods. Some of the most recent ones include (Chami et al., 2020; Wu et al., 2018; Zhou et al., 2018).
Limits of GNNs. Recently, connections to WeisfeilerLeman type algorithms have been shown (Barceló et al., 2020; Chen et al., 2019c; Geerts et al., 2020; Geerts, 2020; Maehara and NT, 2019; Maron et al., 2019a; Morris et al., 2019, 2020b; Xu et al., 2019). Specifically, the authors of (Morris et al., 2019; Xu et al., 2019) show how the 1WL limits the expressive power of any possible GNN architecture. Morris et al. (2019) introduce dimensional GNNs which rely on a more expressive messagepassing scheme between subgraphs of cardinality . Later, this was refined in (Azizian and Lelarge, 2020; Maron et al., 2019a) and in (Morris and Mutzel, 2019) by deriving models equivalent to the more powerful dimensional WeisfeilerLeman algorithm. Chen et al. (2019c) connect the theory of universal approximation of permutationinvariant functions and graph isomorphism testing, further introducing a variation of the WL. Recently, a large body of work propose enhancements to GNNs, e.g., see Albooyeh et al. (2019); Beaini et al. (2020); Bodnar et al. (2021); Bouritsas et al. (2020); Murphy et al. (2019b); Vignac et al. (2020); You et al. (2021), making them more powerful than the WL; see Appendix A for a indepth discussion. For clarity, throughout this work, we will use the term GNNs to denote the class of messagepassing architectures limited by the WL algorithm, where the class of distinguishable graphs is well understood (Arvind et al., 2015).
Data augmentation, generalization and subgraphbased inductive biases. There exist few works proposing data augmentation for GNNs for graph classification. Kong et al. (2020) introduces a simple feature perturbation framework to achieve this, while Rong et al. (2020); Feng et al. (2020) focus on vertexlevel tasks. Garg et al. (2020) study the generalization abilities of GNNs showing bounds on the Rademacher complexity, while Liao et al. (2020) offer a refined analysis within the PACBayes framework. Recently, Bouritsas et al. (2020) proposed to use subgraph counts as vertex and edge features in GNNs. Although the authors show an increase in expressiveness, the extent, e.g., which graph classes their model can distinguish, is still mostly unclear. Moreover, Yehudai et al. (2020) investigate GNNs’ ability to generalize to larger graphs. Concurrently, Bevilacqua et al. (2021) show how subgraph densities can be used to build sizeinvariant graph representations. However, the performance of such models in indistribution tasks, their expressiveness, and scalability remain unclear. Finally, Yuan et al. (2021) show how GNNs’ decisions can be explained by (often large) subgraphs, further motivating our use of graph reconstruction as a powerful inductive bias for GRL.
Reconstruction conjecture. The reconstruction conjecture is a longstanding open problem in graph theory, which has been solved in many particular settings. Such results come in two flavors. Either proving that graphs from a specific class are reconstructible or determining which graph functions are reconstructible. Known results of the former are, for instance, that regular graphs, disconnected graphs, and trees are reconstructible (Bondy, 1991; Kelly et al., 1957). In particular, we highlight that outerplanar graphs, which account for most molecule graphs, are known to be reconstructible (Giles, 1974). For a comprehensive review of graph reconstruction results, see Bondy (1991).
2 Preliminaries
Here, we introduce notation and give an overview of the main results in graph reconstruction theory (Bondy, 1991; Godsil, 1993), including the reconstruction conjecture (Ulam, 1960), which forms the basis of the models in this work.
Notation and definitions. As usual, let for , and let denote a multiset. In an abuse of notation, for a set with in , we denote by the set . We also assume elementary definitions from graph theory, such as graphs, directed graphs, vertices, edges, neighbors, trees, isomorphism, et cetera; see Appendix B. The vertex and the edge set of a graph are denoted by and , respectively. The size of a graph is equal to its number of vertices. Unless indicated otherwise, we use . If not otherwise stated, we assume that vertices and edges are annoted with attributes
, i.e., realvalued vectors.
We denote the set of all finite and simple graphs by . The subset of without edge attributes (or edge directions) is denoted . We write if the graphs and are isomorphic. Further, we denote the isomorphism type, i.e., the equivalence class of the isomorphism relation, of a graph as . Let , then is the induced subgraph with edge set . We will refer to induced subgraphs simply as subgraphs in this work.
Let be a family of graph representations, such that for , in , , assigns a dimensional representation vector for a graph in . We say can distinguish a graph if there exists in that assigns a unique representation to the isomorphism type of , i.e., if and only if . Further, we say distinguishes a pair of nonisomorphic graphs and if there exists some in such that . Moreover, we write if distinguishes between all graphs does, and if both directions hold. The corresponding strict relation is denoted by . Finally, we say is a mostexpressive representation of a class of graphs if it distinguishes all nonisomorphic graphs in that class.
Graph reconstruction. Intuitively, the reconstruction conjecture states that an undirected edgeunattributed graph can be fully recovered up to its isomorphism type given the multiset of its vertexdeleted subgraphs’ isomorphism types. This multiset of subgraphs is usually referred to as the deck of the graph, see Figure 1 for an illustration. Formally, for a graph , we define its deck as . We often call an element in a card. We define the graph reconstruction problem as follows.
Definition 1.
Let and be graphs, then is a reconstruction of if and have the same deck, denoted . A graph is reconstructible if every reconstruction of is isomorphic to , i.e., implies .
Similarly, we define function reconstruction, which relates functions that map two graphs to the same value if they have the same deck.
Definition 2.
Let be a function, then is reconstructible if for all graphs in , i.e., implies .
We can now state the reconstruction conjecture, which in short says that every in with is reconstructible.
Conjecture 1 (Kelly (1942); Ulam (1960)).
Let and in be two finite, undirected, simple graphs with at least three vertices. If is a reconstruction of , then and are isomorphic.
We note here that the reconstruction conjecture does not hold for directed graphs, hypergraphs, and infinite graphs (Bondy, 1991; Stockmeyer, 1977, 1981). In particular, edge directions can be seen as edge attributes. Thus, the reconstruction conjecture does not hold for the class . In contrast, the conjecture has been proved for practicalrelevant graph classes, such as disconnected graphs, regular graphs, trees, and outerplanar graphs (Bondy, 1991). Further, computational searches show that graphs with up to 11 vertices are reconstructible (McKay, 1997). Finally, many graph properties are known to be reconstructible, such as every size subgraph count, degree sequence, number of edges, and the characteristic polynomial (Bondy, 1991).
Graph reconstruction. Kelly et al. (1957) generalized graph reconstruction, considering the multiset of subgraphs of size instead of , which we denote , where is the set of all size subsets of . We often call an element in a card. From the deck definition, it is easy to extend the concept of graph and function reconstruction, cf. Definitions 2 and 1, to graph and function reconstruction.
Definition 3.
Let and be graphs, then is a reconstruction of if and have the same deck, denoted . A graph is reconstructible if every reconstruction of is isomorphic to , i.e., implies .
Accordingly, we define function reconstruction as follows.
Definition 4.
Let be a function, then is reconstructible if for all graphs in , i.e., implies .
Results for reconstruction usually state the least as a function of such that all graphs in (or some subset) are reconstructible (Nỳdl, 2001). There exist extensive partial results in this direction, mostly describing reconstructibility (as a function of ) for a particular family of graphs, such as trees, disconnected graphs, complete multipartite graphs, and paths, see (Nỳdl, 2001; Kostochka and West, 2020). More concretely, Nỳdl (1981); Spinoza and West (2019) showed graphs with vertices that are not reconstructible. In practice, these results imply that for some fixed there will be graphs with not many more vertices than that are not reconstructible. Further, reconstructible graph functions such as degree sequence and connectedness have been studied in (Manvel, 1974; Spinoza and West, 2019) depending on the size of . In Appendix C, we discuss further such results.
3 Reconstruction Neural Networks
Building on the previous section, we propose two neural architectures based on graph reconstruction and graph reconstruction. First, we look at Reconstruction Neural Networks, the most natural way to use graph reconstruction. Secondly, we look at Full Reconstruction Neural Networks, where we leverage the Reconstruction Conjecture to build a mostexpressive representation for the class of graphs of bounded size and unattributed edges.
Reconstruction Neural Networks. Intuitively, the key idea of Reconstruction Neural Networks is that of learning a joint representation based on subgraphs induced by vertices. Formally, let be a (rowwise) permutationinvariant function and be the set of graphs with exactly vertices. Further, let be a graph representation function such that two graphs and on vertices are mapped to the same vectorial representation if and only if they are isomorphic, i.e., for all and in . We define Reconstruction Neural Networks over as a function with parameters in the form
where is the set of all size subsets of for some , and Concat denotes rowwise concatenation of a multiset of vectors in some arbitrary order. Note that might also be a function with learnable parameters. In that case, we require it to be mostexpressive for . The following results characterize the expressive power of the above architecture.
Proposition 1.
Moreover, we can observe the following.
Observation 1 (Nỳdl (2001); Kostochka and West (2020)).
For any graph in , its deck determines its deck .
From Observation 1, we can derive a hierarchy in the expressive power of Reconstruction Neural Networks with respect to the subgraph size . That is,
In Appendix D, we show how many existing architectures have their expressive power limited by reconstruction. We also refer to Appendix D for the proofs, a discussion on the model’s computational complexity, approximation methods, and relation to existing work.
Full Reconstruction Neural Networks. Here, we propose a recursive scheme based on the reconstruction conjecture to build a mostexpressive representation for graphs. Intuitively, Full Reconstruction Neural Networks recursively compute subgraph representations based on smaller subgraph representations. Formally, let be the class of undirected graphs with unattributed edges and maximum size . Further, let be a (rowwise) permutation invariant function and let be a mostexpressive representation of the twovertex subgraph induced by vertices and . We can now define the representation of a graph in in a recursive fashion as
Again, Concat() is rowwise concatenation in some arbitrary order. Note that in practice, it is easier to build the subgraph representations in a bottomup fashion. First, use twovertex subgraph representations to compute all threevertex subgraph representations. Then, perform this inductively until we arrive at a single wholegraph representation. In Appendix E, we prove the expressive power of Full Reconstruction Neural Networks, i.e., we show how if the reconstruction conjecture holds, it is a mostexpressive representation of undirected edgeunattributed graphs. Finally, we show its quadratic number of parameters, exponential computational complexity, and relation to existing work.
4 Reconstruction Graph Neural Networks
Although Full Reconstruction Neural Networks provide a mostexpressive representation for undirected, unattributededge graphs, they are impractical due to their computational cost. Similarly, Reconstruction Neural Networks are not scalable since increasing their expressive power requires computing mostexpressive representations of larger size subgraphs. Hence, to circumvent the computational cost, we replace the mostexpressive representations of subgraphs from Reconstruction Neural Networks with GNN representations, resulting in what we name Reconstruction GNNs. This change allows for scaling the model to larger subgraph sizes, such as , , …, et cetera.
Since, in the general case, graph reconstruction assumes mostexpressive representations of subgraphs, it cannot capture Reconstruction GNNs’ expressive power directly. Hence, we provide a theoretical characterization of the expressive power of Reconstruction GNNs by coupling graph reconstruction and the GNN expressive power characterization based on the WL algorithm. Nevertheless, in Section F.2, we devise conditions under which Reconstruction GNNs have the same power as Reconstruction Neural Networks. Finally, we show how graph reconstruction can act as a (provably) powerful inductive bias for invariances to vertex removals, which boosts the performance of GNNs even in tasks where all graphs are already distinguishable by them (see Appendix G). We refer to Appendix F for a discussion on the model’s relation to existing work.
Formally, let be a (rowwise) permutation invariant function and a GNN representation. Then, for , a Reconstruction GNN takes the form
with parameters , where is the set of all size subsets of , and Concat is rowwise concatenation in some arbitrary order.
Approximating . By design, Reconstruction GNNs require computing GNN representations for all vertex subgraphs, which might not be feasible for large graphs or datasets. To address this, we discuss a direction to circumvent computing all subgraphs, i.e., approximating by sampling.
One possible choice for is Deep Sets (Zaheer et al., 2017), which we use for the experiments in Section 5, where the representation is a sum decomposition taking the form , where and are permutation sensitive functions, such as feedforward networks. We can learn the Reconstruction GNN model over a training dataset
w and a loss function
by minimizing the empirical risk(1) 
Equation 1 is impractical for all but the smallest graphs, since is a sum over all vertex induced subgraphs of . Hence, we approximate using a sample drawn uniformly at random at every gradient step, i.e., . Due to nonlinearities in and , plugging into Equation 1
does not provide us with an unbiased estimate of
. However, if is convex in , in expectation we will be minimizing a proper upper bound of our loss, i.e., . In practice, many models rely on this approximation and provide scalable and reliable training procedures, cf. (Murphy et al., 2019a, b; Zaheer et al., 2017; Hinton et al., 2012).4.1 Expressive power
Now, we analyze the expressive power of Reconstruction GNNs. It is clear that Reconstruction GNNs Reconstruction Neural Networks, however the relationship between Reconstruction GNNs and GNNs is not that straightforward. At first, one expects that there exists a welldefined hierarchy—such as the one in Reconstruction Neural Networks (see Observation 1)—between GNNs, Reconstruction GNNs, Reconstruction GNNs, and so on. However, there is no such hierarchy, as we see next.
Are GNNs more expressive than Reconstruction GNNs? It is wellknown that GNNs cannot distinguish regular graphs (Arvind et al., 2015; Morris et al., 2019). By leveraging the fact that regular graphs are reconstructible (Kelly et al., 1957), we show that cycles and circular skip link (CSL) graphs—two classes of regular graphs—can indeed be distinguished by Reconstruction GNNs, implying that Reconstruction GNNs are not less expressive than GNNs. We start by showing that Reconstruction GNNs can distinguish the class of cycle graphs.
Theorem 1 (Reconstruction GNNs can distinguish cycles).
Let be a cycle graph with vertices and . An Reconstruction GNN assigns a unique representation to if
i) and ii) hold.
The following results shows that Reconstruction GNNs can distinguish the class of CSL graphs.
Theorem 2 (Reconstruction GNNs can distinguish CSL graphs).
Hence, if the conditions in LABEL:{thm:cycle} hold, GNNs Reconstruction GNNs. Figure 2 (cf. Appendix F) depicts how Reconstruction GNNs can distinguish a graph that GNNs cannot. The process essentially breaks the local symmetries that make GNNs struggle by removing one (or a few) vertices from the graph. By doing so, we arrive at distinguishable subgraphs. Since we can reconstruct the original graph with its unique subgraph representations, we can identify it. See Appendix F for the complete proofs of Theorems 2 and 1.
Are GNNs less expressive than Reconstruction GNNs? We now show that GNNs can distinguish graphs that Reconstruction GNNs with small cannot. We start with Proposition 2 stating that there exist some graphs that GNNs can distinguish which Reconstruction GNNs with small cannot.
Proposition 2.
GNNs Reconstruction GNNs for .
On the other hand, the analysis is more interesting for larger subgraph sizes, e.g., , where there are no known examples of (undirected, edgeunattributed) nonreconstructible graphs. There are graphs distinguishable by GNNs with at least one subgraph not distinguishable by them; see Appendix F. However, the analysis is whether the multiset of all subgraphs’ representations can distinguish the original graph. Since we could not find any counterexamples, we conjecture that every graph distinguishable by a GNN is also distinguishable by a Reconstruction GNN with or possibly more generally with any close enough to . In Appendix F, we state and discuss the conjecture, which we name WL reconstruction conjecture. If true, the conjecture implies GNNs Reconstruction GNNs. Moreover, if we use the original GNN representation together with Reconstruction GNNs, Theorems 2 and 1 imply that the resulting model is strictly more powerful than the original GNN.
Are Reconstruction GNNs less expressive than higherorder (WL) GNNs?
Recently a line of work, e.g., Azizian and Lelarge (2020); Maron et al. (2019b); Morris and Mutzel (2019), explored higherorder GNNs aligning with the WL hierarchy. Such architectures have, in principle, the same power as the WL algorithm in distinguishing nonisomorphic graphs. Hence, one might wonder how Reconstruction GNNs stack up to WLbased algorithms. The following result shows that pairs of nonisomorphic graphs exist that a Reconstruction GNN can distinguish but the WL cannot.
Proposition 3.
Let GNNs be neural architectures with the same expressiveness as the WL algorithm. Then, .
As a result of Proposition 3, using a Reconstruction GNN representation together with a 2GNN increases the original 2GNN’s expressive power.
4.2 Reconstruction as a powerful extra invariance for general graphs
An essential feature of modern machine learning models is capturing invariances of the problem of interest (Lyle et al., 2020)
. It reduces degrees of freedom while allowing for better generalization
(BloemReddy and Teh, 2020; Lyle et al., 2020). GRL is predicated on invariance to vertex permutations, i.e., assigning the same representation to isomorphic graphs. But are there other invariances that could improve generalization error?reconstruction is an extra invariance. Let
be the joint probability of observing a graph
with label . Any reconstructionbased model, such as Reconstruction Neural Networks and Reconstruction GNNs, by definition assumes to be invariant to the deck, i.e., if . Hence, our neural architectures for Reconstruction Neural Networks and Reconstruction GNNs directly define this extra invariance beyond permutation invariance. How we do know it is an extra invariance and not a consequence of permutation invariance? It does not hold on directed graphs Stockmeyer (1981), where permutation invariance still holds.Hereditary property variance reduction. We now show that the invariance imposed by reconstruction helps in tasks based on hereditary properties (Borowiecki et al., 1997). A graph property is called hereditary if it is invariant to vertex removals, i.e. for every and . By induction the property is invariant to every size subgraph, i.e., for every where is the set of all size subsets of . Here, the property is invariant to any given subgraph. For example, every subgraph of a planar graph is also planar, every subgraph of an acyclic graph is also acyclic, any subgraph of a colorable graph is also colorable. A more practically interesting (weaker) invariance would be invariance to a few vertex removals. Next we define hereditary properties (a special case of a hereditary property). In short, a property is hereditary if it is a hereditary property for graphs with more than vertices.
Definition 5 (hereditary property).
A graph property is said to be hereditary if . That is, is uniform in and all subgraphs of with more than vertices.
Consider the task of predicting . Theorem 3 shows that Reconstruction GNNs is an invariance that reduces the variance of the empirical risk associated with hereditary property tasks. See Appendix F for the proof.
Theorem 3 (Reconstruction GNNs for variance reduction of hereditary tasks).
Let be a hereditary distribution, i.e., where is a hereditary property. Further, let for all with , . Then, for Reconstruction GNNs taking the form , if is convex in , we have
where is the empirical risk of Reconstruction GNNs with (cf. Equation 1) and is the empirical risk of GNNs.
5 Experimental Evaluation
In this section, we investigate the benefits of Reconstruction GNNs against GNN baselines on both synthetic and realworld tasks. Concretely, we address the following questions:
Q1. Does the increase in expressive power from reconstruction (cf. Section 4.1) make Reconstruction GNNs solve graph property tasks not originally solvable by GNNs?
Q2. Can reconstruction boost the original GNNs performance on realworld tasks? If so, why?
Q3. What is the influence of the subgraph size in both graph property and realworld tasks?
Synthetic graph property datasets. For Q1 and Q3, we chose the synthetic graph property tasks in Table 1, for which GNNs are provably incapable to solve due to their limited expressive power (Garg et al., 2020; Murphy et al., 2019c). The tasks are csl (Dwivedi et al., 2020)
, where we classify CSL graphs, the cycle detection tasks
4 cycles, 6 cycles and 8 cycles (Vignac et al., 2020) and the multitask regression from Corso et al. (2020), where we want to determine whether a graph is connected, its diameter and its spectral radius. See Appendix H for datasets statistics.Realworld datasets. To address Q2 and Q3, we evaluated Reconstruction GNNs on a diverse set of largescale, standard benchmark instances Hu et al. (2020); Morris et al. (2020a). Specifically, we used the zinc (10K) (Dwivedi et al., 2020), alchemy (10K) (Chen et al., 2019a), ogbgmolfreesolv, ogbgmolesol, and ogbgmollipo (Hu et al., 2020) regression datasets. For the case of graph classification, we used ogbgmolhiv, ogbgmolpcba, ogbgtox21, and ogbgtoxcast (Hu et al., 2020). See Appendix H for datasets statistics.
Neural architectures. We used the GIN (Xu et al., 2018), GCN (Kipf and Welling, 2017), and the PNA (Corso et al., 2020)
architectures as GNN baselines. We always replicated the exact architectures from the original paper, building on the respective PyTorch Geometric implementation
Fey and Lenssen (2019). For the ogbg regression datasets, we noticed how using a jumping knowledge layer yields better validation and test results for GIN and GCN. Thus we made this small change. For each of these three architectures, we implemented Reconstruction GNNs for in using a Deep Sets function (Zaheer et al., 2017) over the exact same original GNN architecture. For more details, see Appendix G.Experimental setup. To establish fair comparisons, we retain all hyperparameters and training procedures from the original GNNs to train the corresponding Reconstruction GNNs. Tables 2 and 1 and Table 6 in Appendix I present results with the same number of runs as previous work (Corso et al., 2020; Dwivedi et al., 2020; Hu et al., 2020; Morris et al., 2020b; Vignac et al., 2020), i.e., five for all datasets execpt the ogbg datasets, where we use ten runs. For more details, such as the number of subgraphs sampled for each Reconstruction GNN and each dataset, see Appendix G.
NonGNN baselines. For the graph property tasks, original work used vertex identifiers or laplacian embeddings to make GNNs solve them. This trick is effective for the tasks but violates an important premise of graph representations, invariance to vertex permutations. To illustrate this line of work, we compare against Positional GIN, which uses Laplacian embeddings (Dwivedi et al., 2020) for the csl task and vertex identifiers for the others (Vignac et al., 2020; Corso et al., 2020). To compare against other methods that like Reconstruction GNNs are invariant to vertex permutations and increase the expressive power of GNNs, we compare against RingGNNs (Chen et al., 2019c) and (3WL) PPGNs (Maron et al., 2019a). For realworld tasks, Table 6 in Appendix I shows the results from GRL alternatives that incorporate higherorder representations in different ways, LRP (Chen et al., 2019c), GSN (Bouritsas et al., 2020), 2LGNN (Morris et al., 2020b), and SMP (Vignac et al., 2020).
All results are fully reproducible from the source and are available at https://github.com/PurdueMINDS/reconstructiongnns.
Results and discussion.
Multitask  Invariant to  
csl  4 cycles  6 cycles  8 cycles  connectivity  diameter  spectral radius  
(Accuracy % %)  (Accuracy % %)  (Accuracy %)  (Accuracy %)  ( MSE)  ( MSE)  ( MSE)  vertex permutations?  
GIN (orig.)  4.66 4.00  93.0  92.7  92.5  3.419 0.320  0.588 0.354  2.130 1.396  
Reconstr. 
88.66 22.66  95.17 4.91  97.35 0.74  94.69 2.34  3.575 0.395  0.195 0.714  2.732 0.793  
78.66 22.17  94.06 5.10  97.50 0.72  95.04 2.69  3.799 0.187  0.207 0.381  2.344 0.569  
73.33 16.19  96.61 1.40  97.84 1.37  94.48 2.13  3.779 0.064  0.105 0.225  1.908 0.860  
40.66 9.04  75.13 0.26  63.28 0.59  63.53 1.14  3.765 0.083  0.564 0.025  2.130 0.166  
GCN(orig.)  6.66 2.10  98.336 0.24  95.73 2.72  87.14 12.73  3.781 0.075  0.087 0.186  2.204 0.362  
Reconstr. 
100.00 0.00  99.00 0.10  97.63 0.19  94.99 2.31  4.039 0.101  1.175 0.425  3.625 0.536  
100.00 0.00  98.77 0.61  97.89 0.69  97.82 1.10  3.970 0.059  0.577 0.135  3.397 0.273  
96.00 6.46  99.11 0.19  98.31 0.52  97.18 0.58  3.995 0.031  0.333 0.117  3.105 0.286  
49.33 7.42  75.19 0.19  66.04 0.59  63.66 0.51  3.693 0.063  0.8518 0.016  1.838 0.054  
PNA (orig.)  10.00 2.98  81.59 19.86  95.57 0.36  84.81 16.48  3.794 0.155  0.605 0.097  3.610 0.137  
Reconstr. 
100.00 0.00  97.88 2.19  99.18 0.20  98.92 0.72  3.904 0.001  0.765 0.032  3.954 0.118  
95.33 7.77  99.12 0.28  99.10 0.57  99.22 0.27  3.781 0.085  0.090 0.135  3.478 0.206  
95.33 5.81  89.36 0.22  99.34 0.26  93.92 8.15  3.710 0.209  0.042 0.047  3.311 0.067  
42.66 11.03  75.34 0.18  65.58 0.95  64.01 0.30  2.977 0.065  1.445 0.037  1.073 0.075  
Positional GIN  99.33 1.33  88.3  96.1  95.3  1.61  2.17  2.66  
RingGNN  10.00 0.00  99.9  100.0  71.4  —  —  —  
PPGN (3WL)  97.80 10.91  99.8  87.1  76.5  —  —  — 
A1 (Graph property tasks). Table 1 confirms Theorem 2, where the increase in expressive power from reconstruction allows Reconstruction GNNs to distinguish CSL graphs, a task that GNNs cannot solve. Here, Reconstruction GNNs boost the accuracy of standard GNNs between 10 and 20. Theorem 2 only guarantees GNN expressiveness boosting for Reconstruction, but our empirical results also show benefits for Reconstruction with . Table 1 also confirms Theorem 1, where Reconstruction GNNs provide significant accuracy boosts on all cycle detection tasks (4 cycles, 6 cycles and 8 cycles). See Section J.1, for a detailed discussion on results for connectivity, diameter, and spectral radius, which also show boostings.
ogbgmoltox21  ogbgmoltoxcast  ogbgmolfreesolv  ogbgmolesol  ogbgmollipo  ogbgmolpcba  
(ROCAUC %)  (ROCAUC %)  (RSMSE)  (RSMSE)  (RSMSE)  (AP %)  
GIN (orig.)  74.91 0.51  63.41 0.74  2.411 0.123  1.111 0.038  0.754 0.010  21.16 0.28  
Reconstr. 
75.15 1.40  63.95 0.53  2.283 0.279  1.026 0.033  0.716 0.020  23.60 0.02  
76.84 0.62  65.36 0.49  2.117 0.181  1.006 0.030  0.736 0.025  23.25 0.00  
76.78 0.64  64.84 0.71  2.370 0.326  1.055 0.031  0.738 0.018  23.33 0.09  
74.40 0.75  62.29 0.28  2.531 0.206  1.343 0.053  0.842 0.020  13.50 0.32  
GCN (orig.)  75.29 0.69  63.54 0.42  2.417 0.178  1.106 0.036  0.793 0.040  20.20 0.24  
Reconstr. 
76.46 0.77  64.51 0.60  2.524 0.300  1.096 0.045  0.760 0.015  21.25 0.25  
75.58 0.99  64.38 0.39  2.467 0.231  1.086 0.048  0.766 0.025  20.10 0.08  
75.88 0.73  64.70 0.81  2.345 0.261  1.114 0.047  0.754 0.021  19.04 0.03  
74.03 0.63  62.80 0.77  2.599 0.161  1.372 0.048  0.835 0.020  11.69 1.41  
PNA (orig.)  74.28 0.52  62.69 0.63  2.192 0.125  1.140 0.032  0.759 0.017  25.45 0.04  
Reconstr. 
73.64 0.74  64.14 0.76  2.341 0.070  1.723 0.145  0.743 0.015  23.11 0.05  
74.89 0.29  65.22 0.47  2.298 0.115  1.392 0.272  0.794 0.065  22.10 0.03  
75.10 0.73  65.03 0.58  2.133 0.086  1.360 0.163  0.785 0.041  20.05 0.15  
73.71 0.61  61.25 0.49  2.185 0.231  1.157 0.056  0.843 0.018  12.33 1.20  
A2 (Realworld tasks). Table 2 and Table 6 in Appendix I show that applying reconstruction to GNNs significantly boosts their performance across all eight realworld tasks. In particular, in Table 2 we see a boost of up to 5% while achieving the best results in five out of six datasets. The reconstruction applied to GIN gives the best results in the ogbg tasks, with the exception of ogbgmollipo and ogbgmolpcba where reconstruction performs better. The only settings where we did not get any boost were PNA for ogbgmolesol and ogbgmolpcba. Table 6 in Appendix I also shows consistent boost in GNNs’ performance of up to 25% in other datasets. On zinc, Reconstruction yields better results than the higherorder alternatives LRP and 2LGNN. While GSN gives the best zinc results, we note that GSN requires applicationspecific features. In ogbgmolhiv, reconstruction is able to boost both GIN and GCN. The results in Appendix G show that nearly of the graphs in our realworld datasets are distinguishable by the WL algorithm, thus we can conclude that traditional GNNs are expressive enough for all our realworld tasks. Hence, realworld boosts of reconstruction over GNNs can be attributed to the gains from invariances to vertex removals (cf. Section 4.2) rather than the boost in expressive power (cf. Section 4.1).
A3 (Subgraph sizes). Overall we observe that removing one vertex () is enough to improve the performance of GNNs in most experiments. At the other extreme end of vertex removals, , there is a significant loss in expressiveness compared to the original GNN. In most realworld tasks Table 2 and Table 6 in Appendix I show a variety of performance boosts also with . For GCN and PNA in ogbgmolesol, specifically, we only see Reconstruction boosts over smaller subgraphs such as , which might be due to the task’s need of more invariance to vertex removals (cf. Section 4.2). In the graph property tasks (Table 1), we see significant boosts also for in all models across most tasks, except PNA. However, as in realworld tasks the extreme case of small subgraphs significantly harms the ability to solve tasks with Reconstruction GNNs.
6 Conclusions
Our work connected graph ()reconstruction and modern GRL. We first showed how such connection results in two natural expressive graph representation classes. To make our models practical, we combined insights from graph reconstruction and GNNs, resulting in Reconstruction GNNs. Our theory shows that reconstruction boosts the expressiveness of GNNs and has a lowervariance risk estimator in distributions invariant to vertex removals. Empirically, we showed how the theoretical gains of Reconstruction GNNs translate into practice, solving graph property tasks not originally solvable by GNNs and boosting their performance on realworld tasks.
Acknowledgements
This work was funded in part by the National Science Foundation (NSF) awards CAREER IIS1943364 and CCF1918483. Any opinions, findings and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the sponsors. Christopher Morris is funded by the German Academic Exchange Service (DAAD) through a DAAD IFI postdoctoral scholarship (57515245). We want to thank our reviewers, who gave excellent suggestions to improve the paper.
References
 Abboud et al. (2020) Abboud, R., Ceylan, İ. İ., Grohe, M., and Lukasiewicz, T. (2020). The surprising power of graph neural networks with random node initialization. CoRR, abs/2010.01179.
 AbuElHaija et al. (2019) AbuElHaija, S., Perozzi, B., Kapoor, A., Alipourfard, N., Lerman, K., Harutyunyan, H., Steeg, G. V., and Galstyan, A. (2019). Mixhop: Higherorder graph convolutional architectures via sparsified neighborhood mixing. In International Conference on Machine Learning, pages 21–29.
 Albooyeh et al. (2019) Albooyeh, M., Bertolini, D., and Ravanbakhsh, S. (2019). Incidence networks for geometric deep learning. CoRR, abs/1905.11460.
 Anderson et al. (2019) Anderson, B. M., Hy, T., and Kondor, R. (2019). Cormorant: Covariant molecular neural networks. In Advances in Neural Information Processing Systems, pages 14510–14519.
 Arvind et al. (2015) Arvind, V., Köbler, J., Rattan, G., and Verbitsky, O. (2015). On the power of color refinement. In International Symposium on Fundamentals of Computation Theory, pages 339–350.
 Azizian and Lelarge (2020) Azizian, W. and Lelarge, M. (2020). Characterizing the expressive power of invariant and equivariant graph neural networks. arXiv preprint arXiv:2006.15646.

Babai (2016)
Babai, L. (2016).
Graph isomorphism in quasipolynomial time.
In
ACM SIGACT Symposium on Theory of Computing
, pages 684–697.  Barabasi and Oltvai (2004) Barabasi, A.L. and Oltvai, Z. N. (2004). Network biology: Understanding the cell’s functional organization. Nature Reviews Genetics, 5(2):101–113.
 Barceló et al. (2020) Barceló, P., Kostylev, E. V., Monet, M., Pérez, J., Reutter, J. L., and Silva, J. P. (2020). The logical expressiveness of graph neural networks. In International Conference on Learning Representations.
 Baskin et al. (1997) Baskin, I. I., Palyulin, V. A., and Zefirov, N. S. (1997). A neural device for searching direct correlations between structures and properties of chemical compounds. Journal of Chemical Information and Computer Sciences, 37(4):715–721.
 Beaini et al. (2020) Beaini, D., Passaro, S., Létourneau, V., Hamilton, W. L., Corso, G., and Liò, P. (2020). Directional graph networks. CoRR, abs/2010.02863.
 Bevilacqua et al. (2021) Bevilacqua, B., Zhou, Y., and Ribeiro, B. (2021). Sizeinvariant graph representations for graph classification extrapolations. arXiv preprint arXiv:2103.05045.
 BloemReddy and Teh (2020) BloemReddy, B. and Teh, Y. W. (2020). Probabilistic symmetries and invariant neural networks. Journal of Machine Learning Research, 21(90):1–61.
 Bodnar et al. (2021) Bodnar, C., Frasca, F., Wang, Y. G., Otter, N., Montúfar, G., Lio, P., and Bronstein, M. (2021). Weisfeiler and lehman go topological: Message passing simplicial networks. arXiv preprint arXiv:2103.03212.
 Bollobás (1990) Bollobás, B. (1990). Almost every graph has reconstruction number three. Journal of Graph Theory, 14(1):1–4.
 Bondy (1991) Bondy, J. A. (1991). A graph reconstructor’s manual. Surveys in combinatorics, 166:221–252.
 Borowiecki et al. (1997) Borowiecki, M., Broere, I., Frick, M., Mihok, P., and Semanišin, G. (1997). A survey of hereditary properties of graphs. Discussiones Mathematicae Graph Theory, 17(1):5–50.
 Bouritsas et al. (2020) Bouritsas, G., Frasca, F., Zafeiriou, S., and Bronstein, M. M. (2020). Improving graph neural network expressivity via subgraph isomorphism counting. CoRR, abs/2006.09252.
 Bruna et al. (2014) Bruna, J., Zaremba, W., Szlam, A., and LeCun, Y. (2014). Spectral networks and deep locally connected networks on graphs. In International Conference on Learning Representation.
 Cangea et al. (2018) Cangea, C., Velickovic, P., Jovanovic, N., Kipf, T., and Liò, P. (2018). Towards sparse hierarchical graph classifiers. CoRR, abs/1811.01287.
 Chami et al. (2020) Chami, I., AbuElHaija, S., Perozzi, B., Ré, C., and Murphy, K. (2020). Machine learning on graphs: A model and comprehensive taxonomy. CoRR, abs/2005.03675.

Chami et al. (2019)
Chami, I., Ying, Z., Ré, C., and Leskovec, J. (2019).
Hyperbolic graph convolutional neural networks.
In Advances in Neural Information Processing Systems, pages 4869–4880.  Chen et al. (2019a) Chen, G., Chen, P., Hsieh, C., Lee, C., Liao, B., Liao, R., Liu, W., Qiu, J., Sun, Q., Tang, J., Zemel, R. S., and Zhang, S. (2019a). Alchemy: A quantum chemistry dataset for benchmarking AI models. CoRR, abs/1906.09427.
 Chen et al. (2019b) Chen, S., Dobriban, E., and Lee, J. H. (2019b). Invariance reduces variance: Understanding data augmentation in deep learning and beyond. arXiv preprint arXiv:1907.10905.
 Chen et al. (2020) Chen, Z., Chen, L., Villar, S., and Bruna, J. (2020). Can graph neural networks count substructures? In Advances in Neural Information Processing Systems.
 Chen et al. (2019c) Chen, Z., Villar, S., Chen, L., and Bruna, J. (2019c). On the equivalence between graph isomorphism testing and function approximation with GNNs. In Advances in Neural Information Processing Systems, pages 15868–15876.
 Corso et al. (2020) Corso, G., Cavalleri, L., Beaini, D., Liò, P., and Velickovic, P. (2020). Principal neighbourhood aggregation for graph nets. In Advances in Neural Information Processing Systems.

Dasoulas et al. (2020)
Dasoulas, G., Santos, L. D., Scaman, K., and Virmaux, A. (2020).
Coloring graph neural networks for node disambiguation.
In
International Joint Conference on Artificial Intelligence
, pages 2126–2132.  Defferrard et al. (2016) Defferrard, M., X., B., and Vandergheynst, P. (2016). Convolutional neural networks on graphs with fast localized spectral filtering. In Advances in Neural Information Processing Systems, pages 3844–3852.
 Duvenaud et al. (2015) Duvenaud, D. K., Maclaurin, D., Iparraguirre, J., Bombarell, R., Hirzel, T., AspuruGuzik, A., and Adams, R. P. (2015). Convolutional networks on graphs for learning molecular fingerprints. In Advances in Neural Information Processing Systems, pages 2224–2232.
 Dwivedi et al. (2020) Dwivedi, V. P., Joshi, C. K., Laurent, T., Bengio, Y., and Bresson, X. (2020). Benchmarking graph neural networks. CoRR, abs/2003.00982.
 Easley and Kleinberg (2010) Easley, D. and Kleinberg, J. (2010). Networks, Crowds, and Markets: Reasoning About a Highly Connected World. Cambridge University Press.

Feng et al. (2020)
Feng, W., Zhang, J., Dong, Y., Han, Y., Luan, H., Xu, Q., Yang, Q., Kharlamov,
E., and Tang, J. (2020).
Graph random neural networks for semisupervised learning on graphs.
In Advances in Neural Information Processing Systems.  Fey and Lenssen (2019) Fey, M. and Lenssen, J. E. (2019). Fast graph representation learning with PyTorch Geometric. CoRR, abs/1903.02428.
 FlamShepherd et al. (2020) FlamShepherd, D., Wu, T., Friederich, P., and AspuruGuzik, A. (2020). Neural message passing on high order paths. CoRR, abs/2002.10413.
 Gao and Ji (2019) Gao, H. and Ji, S. (2019). Graph UNets. In International Conference on Machine Learning, pages 2083–2092.
 Garg et al. (2020) Garg, V. K., Jegelka, S., and Jaakkola, T. S. (2020). Generalization and representational limits of graph neural networks. In International Conference on Machine Learning, pages 3419–3430.
 Geerts (2020) Geerts, F. (2020). The expressive power of kthorder invariant graph networks. CoRR, abs/2007.12035.
 Geerts et al. (2020) Geerts, F., Mazowiecki, F., and Pérez, G. A. (2020). Let’s agree to degree: Comparing graph convolutional networks in the messagepassing framework. CoRR, abs/2004.02593.
 Giles (1974) Giles, W. B. (1974). The reconstruction of outerplanar graphs. Journal of Combinatorial Theory, Series B, 16(3):215 – 226.
 Gilmer et al. (2017) Gilmer, J., Schoenholz, S. S., Riley, P. F., Vinyals, O., and Dahl, G. E. (2017). Neural message passing for quantum chemistry. In International Conference on Machine Learning.
 Godsil (1993) Godsil, C. (1993). Algebraic combinatorics, volume 6. CRC Press.
 Grohe (2017) Grohe, M. (2017). Descriptive Complexity, Canonisation, and Definable Graph Structure Theory. Lecture Notes in Logic. Cambridge University Press.
 Grohe (2020) Grohe, M. (2020). Word2vec, Node2vec, Graph2vec, X2vec: Towards a theory of vector embeddings of structured data. CoRR, abs/2003.12590.
 Grohe and Neuen (2020) Grohe, M. and Neuen, D. (2020). Recent advances on the graph isomorphism problem. CoRR, abs/2011.01366.
 Hamilton et al. (2017) Hamilton, W. L., Ying, R., and Leskovec, J. (2017). Inductive representation learning on large graphs. In Advances in Neural Information Processing Systems, pages 1025–1035.
 Hinton et al. (2012) Hinton, G. E., Srivastava, N., Krizhevsky, A., Sutskever, I., and Salakhutdinov, R. R. (2012). Improving neural networks by preventing coadaptation of feature detectors. CoRR, abs/1207.0580.
 Hu et al. (2020) Hu, W., Fey, M., Zitnik, M., Dong, Y., Ren, H., Liu, B., Catasta, M., and Leskovec, J. (2020). Open graph benchmark: Datasets for machine learning on graphs. In Advances in Neural Information Processing Systems.
 Jin et al. (2019) Jin, Y., Song, G., and Shi, C. (2019). GraLSP: Graph neural networks with local structural patterns. CoRR, abs/1911.07675.
 Junttila and Kaski (2007) Junttila, T. and Kaski, P. (2007). Engineering an efficient canonical labeling tool for large and sparse graphs. In Workshop on Algorithm Engineering and Experiments, pages 135–149.
 Kelly (1942) Kelly, P. J. (1942). On isometric transformations. PhD thesis, University of WisconsinMadison.
 Kelly et al. (1957) Kelly, P. J. et al. (1957). A congruence theorem for trees. Pacific Journal of Mathematics, 7(1):961–968.
 Keriven and Peyré (2019) Keriven, N. and Peyré, G. (2019). Universal invariant and equivariant graph neural networks. In Advances in Neural Information Processing Systems, pages 7090–7099.
 Kiefer et al. (2015) Kiefer, S., Schweitzer, P., and Selman, E. (2015). Graphs identified by logics with counting. In International Symposium on Mathematical Foundations of Computer Science, pages 319–330.
 Kipf and Welling (2017) Kipf, T. N. and Welling, M. (2017). Semisupervised classification with graph convolutional networks. In International Conference on Learning Representation.
 Kireev (1995) Kireev, D. B. (1995). Chemnet: A novel neural network based method for graph/property mapping. Journal of Chemical Information and Computer Sciences, 35(2):175–180.
 Klicpera et al. (2020) Klicpera, J., Groß, J., and Günnemann, S. (2020). Directional message passing for molecular graphs. In International Conference on Learning Representations.
 Kong et al. (2020) Kong, K., Li, G., Ding, M., Wu, Z., Zhu, C., Ghanem, B., Taylor, G., and Goldstein, T. (2020). FLAG: adversarial data augmentation for graph neural networks. CoRR, abs/2010.09891.
 Kostochka and West (2020) Kostochka, A. V. and West, D. B. (2020). On reconstruction of nvertex graphs from the multiset of (n)vertex induced subgraphs. IEEE Transactions on Information Theory, PP:1–1.
 Li et al. (2020) Li, P., Wang, Y., Wang, H., and Leskovec, J. (2020). Distance encoding: Design provably more powerful neural networks for graph representation learning. Advances in Neural Information Processing Systems.
 Liao et al. (2020) Liao, R., Urtasun, R., and Zemel, R. S. (2020). A PACbayesian approach to generalization bounds for graph neural networks. CoRR, abs/2012.07690.
 Lyle et al. (2020) Lyle, C., van der Wilk, M., Kwiatkowska, M., Gal, Y., and BloemReddy, B. (2020). On the benefits of invariance in neural networks. arXiv preprint arXiv:2005.00178.
 Maehara and NT (2019) Maehara, T. and NT, H. (2019). A simple proof of the universality of invariant/equivariant graph neural networks. CoRR, abs/1910.03802.
 Manvel (1974) Manvel, B. (1974). Some basic observations on kelly’s conjecture for graphs. Discrete Mathematics, 8(2):181–185.
 Maron et al. (2019a) Maron, H., BenHamu, H., Serviansky, H., and Lipman, Y. (2019a). Provably powerful graph networks. In Advances in Neural Information Processing Systems, pages 2153–2164.
 Maron et al. (2019b) Maron, H., Fetaya, E., Segol, N., and Lipman, Y. (2019b). On the universality of invariant networks. In International Conference on Machine Learning, volume 97, pages 4363–4371. PMLR.
 McKay (1997) McKay, B. D. (1997). Small graphs are reconstructible. Australasian Journal of Combinatorics, 15:123–126.
 Merkwirth and Lengauer (2005) Merkwirth, C. and Lengauer, T. (2005). Automatic generation of complementary descriptors with molecular graph networks. Journal of Chemical Information and Modeling, 45(5):1159–1168.
 Micheli (2009) Micheli, A. (2009). Neural network for graphs: A contextual constructive approach. IEEE Transactions on Neural Networks, 20(3):498–511.
 Micheli and Sestito (2005) Micheli, A. and Sestito, A. S. (2005). A new neural network model for contextual processing of graphs. In Italian Workshop on Neural Nets Neural Nets and International Workshop on Natural and Artificial Immune Systems, volume 3931 of Lecture Notes in Computer Science, pages 10–17. Springer.

Monti et al. (2017)
Monti, F., Boscaini, D., Masci, J., Rodolà, E., Svoboda, J., and
Bronstein, M. M. (2017).
Geometric deep learning on graphs and manifolds using mixture model CNNs.
InIEEE Conference on Computer Vision and Pattern Recognition
, pages 5425–5434.  Morris (2021) Morris, C. (2021). The power of the weisfeilerleman algorithm for machine learning with graphs. In International Joint Conference on Artificial Intelligence, page TBD.
 Morris et al. (2020a) Morris, C., Kriege, N. M., Bause, F., Kersting, K., Mutzel, P., and Neumann, M. (2020a). TUDataset: A collection of benchmark datasets for learning with graphs. CoRR, abs/2007.08663.
 Morris and Mutzel (2019) Morris, C. and Mutzel, P. (2019). Towards a practical dimensional WeisfeilerLeman algorithm. CoRR, abs/1904.01543.
 Morris et al. (2020b) Morris, C., Rattan, G., and Mutzel, P. (2020b). Weisfeiler and leman go sparse: Towards higherorder graph embeddings. In Advances in Neural Information Processing Systems.
 Morris et al. (2019) Morris, C., Ritzert, M., Fey, M., Hamilton, W. L., Lenssen, J. E., Rattan, G., and Grohe, M. (2019). Weisfeiler and Leman go neural: Higherorder graph neural networks. In AAAI Conference on Artificial Intelligence, pages 4602–4609.
 Murphy et al. (2019a) Murphy, R. L., Srinivasan, B., Rao, V., and Ribeiro, B. (2019a). Janossy pooling: Learning deep permutationinvariant functions for variablesize inputs. International Conference on Learning Representations.
 Murphy et al. (2019b) Murphy, R. L., Srinivasan, B., Rao, V., and Ribeiro, B. (2019b). Relational pooling for graph representations. In International Conference on Machine Learning, pages 4663–4673.
 Murphy et al. (2019c) Murphy, R. L., Srinivasan, B., Rao, V. A., and Ribeiro, B. (2019c). Relational pooling for graph representations. In International Conference on Machine Learning, pages 4663–4673.
 Niepert et al. (2016) Niepert, M., Ahmed, M., and Kutzkov, K. (2016). Learning convolutional neural networks for graphs. In International Conference on Machine Learning, pages 2014–2023.
 Nỳdl (1981) Nỳdl, V. (1981). Finite graphs and digraphs which are not reconstructible from their cardinality restricted subgraphs. Commentationes Mathematicae Universitatis Carolinae, 22(2):281–287.
 Nỳdl (2001) Nỳdl, V. (2001). Graph reconstruction from subgraphs. Discrete Mathematics, 235(13):335–341.
 Rong et al. (2020) Rong, Y., Huang, W., Xu, T., and Huang, J. (2020). DropEdge: Towards deep graph convolutional networks on node classification. In International Conference on Learning Representations.
 Sato et al. (2020) Sato, R., Yamada, M., and Kashima, H. (2020). Random features strengthen graph neural networks. CoRR, abs/2002.03155.
 Scarselli et al. (2009) Scarselli, F., Gori, M., Tsoi, A. C., Hagenbuchner, M., and Monfardini, G. (2009). The graph neural network model. IEEE Transactions on Neural Networks, 20(1):61–80.
 ShaweTaylor (1993) ShaweTaylor, J. (1993). Symmetries and discriminability in feedforward network architectures. IEEE Transactions on Neural Networks, 4(5):816–826.
 Simonovsky and Komodakis (2017) Simonovsky, M. and Komodakis, N. (2017). Dynamic edgeconditioned filters in convolutional neural networks on graphs. In IEEE Conference on Computer Vision and Pattern Recognition, pages 29–38.
 Sperduti and Starita (1997) Sperduti, A. and Starita, A. (1997). Supervised neural networks for the classification of structures. IEEE Transactions on Neural Networks, 8(2):714–35.
 Spinoza and West (2019) Spinoza, H. and West, D. B. (2019). Reconstruction from the deck ofvertex induced subgraphs. Journal of Graph Theory, 90(4):497–522.
 Stockmeyer (1977) Stockmeyer, P. K. (1977). The falsity of the reconstruction conjecture for tournaments. Journal of Graph Theory, 1(1):19–25.
 Stockmeyer (1981) Stockmeyer, P. K. (1981). A census of nonreconstructable digraphs, i: Six related families. Journal of Combinatorial Theory, Series B, 31(2):232–239.
 Stokes et al. (2020) Stokes, J., Yang, K., Swanson, K., Jin, W., CubillosRuiz, A., Donghia, N., MacNair, C., French, S., Carfrae, L., BloomAckerman, Z., Tran, V., ChiappinoPepe, A., Badran, A., Andrews, I., Chory, E., Church, G., Brown, E., Jaakkola, T., Barzilay, R., and Collins, J. (2020). A deep learning approach to antibiotic discovery. Cell, 180:688–702.e13.
 Taylor (1990) Taylor, R. (1990). Reconstructing degree sequences from kvertexdeleted subgraphs. Discrete mathematics, 79(2):207–213.
 Ulam (1960) Ulam, S. M. (1960). A collection of mathematical problems, volume 8. Interscience Publishers.
 Velickovic et al. (2018) Velickovic, P., Cucurull, G., Casanova, A., Romero, A., Liò, P., and Bengio, Y. (2018). Graph attention networks. In International Conference on Learning Representations.
 Vignac et al. (2020) Vignac, C., Loukas, A., and Frossard, P. (2020). Building powerful and equivariant graph neural networks with structural messagepassing. In Advances in Neural Information Processing Systems.
 Vinyals et al. (2015) Vinyals, O., Bengio, S., and Kudlur, M. (2015). Order matters: Sequence to sequence for sets. arXiv preprint arXiv:1511.06391.
 Wagstaff et al. (2019) Wagstaff, E., Fuchs, F., Engelcke, M., Posner, I., and Osborne, M. A. (2019). On the limitations of representing functions on sets. International Conference on Machine Learning, pages 6487–6494.
 Weisfeiler (1976) Weisfeiler, B. (1976). On Construction and Identification of Graphs. Lecture Notes in Mathematics, Vol. 558. Springer.
 Weisfeiler and Leman. (1968) Weisfeiler, B. and Leman., A. (1968). The reduction of a graph to canonical form and the algebra which appears therein. NauchnoTechnicheskaya Informatsia, 2(9):12–16. English translation by G. Ryabov is available at https://www.iti.zcu.cz/wl2018/pdf/wl_paper_translation.pdf.
 Wu et al. (2019) Wu, Z., Pan, S., Chen, F., Long, G., Zhang, C., and Yu, P. S. (2019). A comprehensive survey on graph neural networks. CoRR, abs/1901.00596.
 Wu et al. (2018) Wu, Z., Ramsundar, B., Feinberg, E. N., Gomes, J., Geniesse, C., Pappu, A. S., Leswing, K., and Pande, V. (2018). MoleculeNet: A benchmark for molecular machine learning. Chemical Science, 9:513–530.
 Xu et al. (2019) Xu, K., Hu, W., Leskovec, J., and Jegelka, S. (2019). How powerful are graph neural networks? In International Conference on Learning Representations.
 Xu et al. (2018) Xu, K., Li, C., Tian, Y., Sonobe, T., Kawarabayashi, K., and Jegelka, S. (2018). Representation learning on graphs with jumping knowledge networks. In International Conference on Machine Learning, pages 5453–5462.
 Yehudai et al. (2020) Yehudai, G., Fetaya, E., Meirom, E. A., Chechik, G., and Maron, H. (2020). On size generalization in graph neural networks. CoRR, abs/2010.08853.
 Ying et al. (2018) Ying, R., You, J., Morris, C., Ren, X., Hamilton, W. L., and Leskovec, J. (2018). Hierarchical graph representation learning with differentiable pooling. In Advances in Neural Information Processing Systems, pages 4800–4810.
 You et al. (2021) You, J., GomesSelman, J., Ying, R., and Leskovec, J. (2021). Identityaware graph neural networks. arXiv preprint arXiv:2101.10320.
 You et al. (2019) You, J., Ying, R., and Leskovec, J. (2019). Positionaware graph neural networks. In International Conference on Machine Learning, pages 7134–7143.
 Yuan et al. (2021) Yuan, H., Yu, H., Wang, J., Li, K., and Ji, S. (2021). On explainability of graph neural networks via subgraph explorations. arXiv preprint arXiv:2102.05152.
 Zaheer et al. (2017) Zaheer, M., Kottur, S., Ravanbakhsh, S., Poczos, B., Salakhutdinov, R. R., and Smola, A. J. (2017). Deep sets. In Advances in neural information processing systems, pages 3391–3401.
 Zhang et al. (2018) Zhang, M., Cui, Z., Neumann, M., and Yixin, C. (2018). An endtoend deep learning architecture for graph classification. In AAAI Conference on Artificial Intelligence, pages 4428–4435.
 Zhou et al. (2018) Zhou, J., Cui, G., Zhang, Z., Yang, C., Liu, Z., Wang, L., Li, C., and Sun, M. (2018). Graph neural networks: A review of methods and applications. CoRR, abs/1812.08434.
Appendix A Related work (expanded)
GNNs. Recently, graph neural networks (Gilmer et al., 2017; Scarselli et al., 2009) emerged as the most prominent (supervised) GRL architectures. Notable instances of this architecture include, e.g., (Duvenaud et al., 2015; Hamilton et al., 2017; Velickovic et al., 2018), and the spectral approaches proposed in, e.g., (Bruna et al., 2014; Defferrard et al., 2016; Kipf and Welling, 2017; Monti et al., 2017)—all of which descend from early work in (Kireev, 1995; Merkwirth and Lengauer, 2005; Sperduti and Starita, 1997; Scarselli et al., 2009). Recent extensions and improvements to the GNN framework include approaches to incorporate different local structures (around subgraphs), e.g., (AbuElHaija et al., 2019; FlamShepherd et al., 2020; Jin et al., 2019; Niepert et al., 2016; Xu et al., 2018), novel techniques for pooling vertex representations in order perform graph classification, e.g., (Cangea et al., 2018; Gao and Ji, 2019; Ying et al., 2018; Zhang et al., 2018), incorporating distance information (You et al., 2019), and noneuclidian geometry approaches (Chami et al., 2019). Moreover, recently empirical studies on neighborhood aggregation functions for continuous vertex features (Corso et al., 2020), edgebased GNNs leveraging physical knowledge (Anderson et al., 2019; Klicpera et al., 2020), and sparsification methods (Rong et al., 2020) emerged. A survey of recent advancements in GNN techniques can be found, e.g., in (Chami et al., 2020; Wu et al., 2019; Zhou et al., 2018).
Limits of GNNs. Chen et al. (2020) study the substructure counting abilities of GNNs. Dasoulas et al. (2020); Abboud et al. (2020) investigate the connection between random coloring and universality. Recent works have extended GNNs’ expressive power by encoding vertex identifiers (Murphy et al., 2019b; Vignac et al., 2020), adding random features (Sato et al., 2020), using higherorder topology as features (Bouritsas et al., 2020), considering simplicial complexes (Albooyeh et al., 2019; Bodnar et al., 2021), encoding egonetworks (You et al., 2021), and encoding distance information (Li et al., 2020). Although these works increase the expressiveness of GNNs, their generalization abilities are understood to a lesser extent. Further, works such as Vignac et al. (2020, Lemma 6) and the most recent Beaini et al. (2020) and Bodnar et al. (2021) prove the boost in expressiveness with a single pair of graphs, giving no insights into the extent of their expressive power or their generalization abilities. For clarity, throughout this work, we use the term GNNs to denote the class of messagepassing architectures limited by the WL algorithm, where the class of distinguishable graphs is well understood (Arvind et al., 2015).
Appendix B Notation (expanded)
As usual, let for , and let denote a multiset. In an abuse of notation, for a set with in , we denote by the set .
Graphs. A graph is a pair with a finite set of vertices and a set of edges . We denote the set of vertices and the set of edges of by and , respectively. For ease of notation, we denote the edge in by or . In the case of directed graphs . An attributed graph is a triple with an attribute function for . Then is an attribute of for in . The neighborhood of in is denoted by . Unless indicated otherwise, we use .
We say that two graphs and are isomorphic, , if there exists an adjacency preserving bijection , i.e., is in if and only if is in , and call an isomorphism from to . If the graphs have vertex or edge attributes, the isomorphism is additionally required to match these attributes accordingly.
We denote the set of all finite and simple graphs by . The subset of without edge attributes is denoted . Further, we denote the isomorphism type, i.e., the equivalence class of the isomorphism relation, of a graph as . Let , then is the induced subgraph with edge set . We will refer to induced subgraphs simply as subgraphs in this work.
Appendix C More on reconstruction
After formulating the Reconstruction Conjecture, it is natural to wonder whether it stands for other relational structures, such as directed graphs. Interestingly, directed graphs, hypergraphs, and infinite graphs are not reconstructible (Bondy, 1991; Stockmeyer, 1977). Thus, in particular the Reconstruction Conjecture does not hold for the class .
Another question is how many cards from the deck are sufficient to reconstruct a graph. Bollobás (1990) show that almost every graph, in a probabilistic sense, can be reconstructed with only three subgraphs from the deck. For example, the graph shown in Figure 1 (Section 2) is reconstructible from the three leftmost cards.
For an extensive survey on reconstruction, we refer the reader to Bondy (1991); Godsil (1993). From there, we highlight a significant result, Kelly’s Lemma (cf. Lemma 1). In short, the lemma states that the deck of a graph completely defines its subgraph count of every size.
Lemma 1 (Kelly’s Lemma (Kelly et al., 1957)).
Let be the number of copies of in . For any pair of graphs with , is reconstructible.
In fact, its proof is very simple once we realize every subgraph appears in exactly cards from the deck, i.e.,
Manvel (1974) started the study of graph reconstruction with the deck, which has been recently reviewed by Kostochka and West (2020) and Nỳdl (2001). Related work also refers to reconstruction as reconstruction (Kostochka and West, 2020), where is the number of deleted vertices from the original graph.
Here, in Lemma 2, we highlight a generalization of Kelly’s Lemma (Lemma 1) established in Nỳdl (2001), where the count of any subgraph of size at most is reconstructible.
Lemma 2 (Nỳdl (2001)).
For any pair of graphs with , is reconstructible.
Appendix D More on Reconstruction Neural Networks
Here, we give more background on Reconstruction Neural Networks.
d.1 Properties
We start by showing how the ary Relational Pooling framework Murphy et al. (2019b) is a specific case of Reconstruction Neural Networks and thus limited by reconstruction. Then, we show how Reconstruction Neural Networks are limited by GNNs at initialization, which implies that GNNs Morris et al. (2019) at initialization can approximate any reconstructible function.
Observation 2 (ary Relational Pooling Reconstruction Neural Networks).
The ary pooling approach in the Relational Pooling (RP) framework (Murphy et al., 2019b) defines a graph representation of the form
where is the set of all size subsets of and
is a mostexpressive graph representation given by the average of a permutationsensitive universal approximator, e.g., a feedforward neural network, applied over the
permutations of the subgraph, accordingly. Thus, ary RP can be casted as a Reconstruction Neural Network with as mean pooling and as . Note that for ary RP to be as expressive as Reconstruction Neural Networks, i.e., ary RP Reconstruction Neural Networks, we would need to replace the average pooling by a universal multiset approximator or simply add a feedforward neural network after it.Observation 3 (Reconstruction Neural Networks WL at initialization).
The WL test, which limits architectures such as Morris et al. (2020b, 2019); Maron et al. (2019a)
, at initialization, with zero iteration, considers onehot encodings of
tuples of vertices. Note that each size subgraph is completely defined by its corresponding vertex tuples. Thus, it follows that WL with zero iterations is at least as expressive as Reconstruction Neural Networks. Further, by combining Proposition 1 and the result from Lemma 2 (Nỳdl, 2001), it follows that WL (Morris et al., 2020b) at initialization can count subgraphs of size , which is a simple proof for the recent result (Chen et al., 2019a, Theorem 3.7).Now, we discuss the computational complexity of Reconstruction Neural Networks and how to circumvent it through subgraph sampling.
Computational complexity. As outlined in Section 2, we would need subgraphs of size almost to have a mostexpressive representation of graphs with Reconstruction Neural Networks. This would imply performing isomorphism testing for arbitrarily large graphs, as in Bouritsas et al. (2020), making the model computationally infeasible.
A graph with vertices has induced subgraphs of size . Let be an upperbound on computing . Thus, computing would take time. Although Babai (2016) has shown how to do isomorphism testing in quasipolynomial time, an efficient (polynomial) time algorithm remains unknown. More generally, expressive representations of graphs (Keriven and Peyré, 2019; Murphy et al., 2019b) and isomorphism class hashing algorithms (Junttila and Kaski, 2007) still require exponential time regarding the graph size. Thus, if we choose a small value for , i.e., , the factor dominates, while if we choose the factor dominates. In both cases, the time complexity is exponential in , i.e., .
d.2 Relation to previous work
Recently, Bouritsas et al. (2020) propose using subgraph isomorphism type counts as features of vertices and edges used in a GNN architecture. The authors comment that if the reconstruction conjecture holds, their architecture is most expressive for . Here, we point out two things. First, their architecture is at least as powerful as reconstruction. Secondly, the reconstruction conjecture does not hold for directed graphs. Since edge directions can be seen as edge attributes, their architecture is not the most expressive for graphs with attributed edges. Finally, to make their architecture scalable, in practice, the authors choose only specific handengineered subgraph types, which makes the model incomparable to reconstruction.
d.3 Proof of Proposition 1
We start by giving a more formal statement of Proposition 1.
Let be a continuous function over a compact set of and the uniform (sup) norm. Proposition 1 states that for every there exists some such that if and only if is reconstructible.
Proof.
Since is required to be most expressive, we can see the input of Reconstruction Neural Networks as a multiset of unique identifiers of isomorphism types. Thus, it follows from Definition 4, that reconstrucible functions can be approximated by . The other direction, i.e., a function can be approximated by if it is reconstructible, follows from the Stone–Weierstrass theorem, see Zaheer et al. (2017).∎
Appendix E More on Full Reconstruction Neural Networks
The following result captures the expressive power of Full Reconstruction Neural Networks.
Proposition 4.
Proof.
We use induction on to show that every subgraph representation in Full Reconstruction Neural Networks is a most expressive representation if the Reconstruction Conjecture holds.

Base case: . It follows from the model definition that is a most expressive representation if .

Inductive step: . If all subgraph representations in
Comments
There are no comments yet.