# Counting Substructures with Higher-Order Graph Neural Networks: Possibility and Impossibility Results

While massage passing based Graph Neural Networks (GNNs) have become increasingly popular architectures for learning with graphs, recent works have revealed important shortcomings in their expressive power. In response, several higher-order GNNs have been proposed, which substantially increase the expressive power, but at a large computational cost. Motivated by this gap, we introduce and analyze a new recursive pooling technique of local neighborhoods that allows different tradeoffs of computational cost and expressive power. First, we show that this model can count subgraphs of size k, and thereby overcomes a known limitation of low-order GNNs. Second, we prove that, in several cases, the proposed algorithm can greatly reduce computational complexity compared to the existing higher-order k-GNN and Local Relational Pooling (LRP) networks. We also provide a (near) matching information-theoretic lower bound for graph representations that can provably count subgraphs, and discuss time complexity lower bounds as well.

There are no comments yet.

## Authors

• 5 publications
• 53 publications
10/02/2020

### The Surprising Power of Graph Neural Networks with Random Node Initialization

Graph neural networks (GNNs) are effective models for representation lea...
02/10/2020

### Can graph neural networks count substructures?

The ability to detect and count certain substructures in graphs is impor...
10/06/2021

### Equivariant Subgraph Aggregation Networks

Message-passing neural networks (MPNNs) are the leading architecture for...
06/12/2021

### Graph Neural Networks with Local Graph Parameters

Various recent proposals increase the distinguishing power of Graph Neur...
06/16/2020

### Improving Graph Neural Network Expressivity via Subgraph Isomorphism Counting

While Graph Neural Networks (GNNs) have achieved remarkable results in a...
10/01/2021

### Reconstruction for Powerful Graph Representations

Graph neural networks (GNNs) have limited expressive power, failing to r...
03/15/2020

### Self-Constructing Graph Convolutional Networks for Semantic Labeling

Graph Neural Networks (GNNs) have received increasing attention in many ...
##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

Graph Neural Networks (GNNs) are powerful tools for graph representation learning [1, 2, 3]

, and have been successfully used in applications such as encoding molecules, simulating physics, social network analysis, knowledge graphs, and many others

[4, 5, 6, 7]. An important class of GNNs is the set of Message Passing Graph Neural Networks (MPNNs) [8, 2, 3, 9, 1], which follow an iterative message passing scheme to compute a graph representation.

Despite the empirical success of MPNNs, their expressive power has been shown to be limited. For example, their discriminative power, at best, corresponds to the one-dimensional Weisfeiler-Leman (1-WL) graph isomorphism test [9, 10], so they cannot, e.g., distinguish regular graphs. Moreover, they also cannot count any induced subgraph with at least three vertices [11], or learn graph parameters such as clique information, diameter, or shortest cycle [12]. Still, in several applications, e.g. in computational chemistry, materials design or pharmacy [13, 14, 7], we aim to learn functions that depend on the presence or count of specific substructures.

To strengthen the expressive power of GNNs, higher-order representations such as GNNs [10] and Invariant Graph Networks (IGNs) [15] have been proposed. GNNs are inspired by the -dimensional WL (WL) graph isomorphism test, a message passing algorithm on tuples of vertices, and

IGNs are based on equivariant linear layers of a feed-forward neural network applied to the input graph as a matrix, and are at least as powerful as

GNN. These models are provably more powerful than MPNNs and can, e.g., count any induced substructure with at most vertices. But, this power comes at the computational cost of at least operations for vertices. The necessary tradeoffs between expressive power and computational complexity are still an open question.

The expressive power of a GNN is often measured in terms of a hierarchy of graph isomorphism tests, i.e., by comparing it to a WL test. Yet, there is limited knowledge about how the expressive power of higher-order graph isomorphism tests relates to various functions of interest [16]. A different approach is to take the perspective of specific functions that are of practical interest, and quantify a GNN’s expressive power via those. Here, we focus on counting induced substructures to measure the power of a GNN, as proposed in [11]. In particular, we study whether it is possible to count given substructures with a GNN whose complexity is between that of MPNNs and the existing higher-order GNNs.

To this end, we study the scheme of many higher-order GNNs [10, 11]: select a collection of subgraphs of the input graph, encode these, and (possibly iteratively) compute a learned function on this collection. First, we propose a new such class of GNNs, Recursive Neighborhood Pooling Graph Neural Networks (RNP-GNNs). Specifically, RNP-GNNs represent each vertex by a representation of its neighborhood of a specific radius. Importantly, this neighborhood representation is computed recursively from its subgraphs. As we show, RNP-GNNs can count any induced substructure with at most vertices. Moreover, for any set of substructures with at most vertices, there is a specifiable RNP-GNN that can count them. This flexibility allows to design a GNN that is adapted to the power needed for the task of interest, in terms of counting (induced) substructures.

The Local Relational Pooling (LRP) architecture too has been introduced with the goal of counting substructures [11]. While it can do so, it is polynomial-time only if the encoded neighborhoods are of size . In contrast, RNP-GNNs use almost linear operations, i.e., , if the size of each encoded neighborhood is . This is an exponential theoretical improvement in the tolerable size of neighborhoods, and a significant improvement over the complexity of in GNN and IGN.

Finally, we take a broader perspective and provide an information-theoretic lower bound on the complexity of a general class of GNNs that can provably count substructures with at most vertices. This class includes GNNs that represent a given graph by aggregating a number of encoded graphs, where the encoded graphs are related to the given graph with an arbitrary function.

In short, in this paper, we make the following contributions:

• We introduce Recursive Neighborhood Pooling Graph Neural Networks (RNP-GNNs), a flexible class of higher-order graph neural networks, that provably allow to design graph representation networks with any expressive power of interest, in terms of counting (induced) substructures.

• We show that RNP-GNNs offer computational gains over existing models that count substructures: an exponential improvement in terms of the “tolerable” size of the encoded neighborhoods compared to LRP networks, and much less complexity in sparse graphs compared to GNN and IGN.

• We provide an information-theoretic lower bound on the complexity of a general class of GNN that can count (induced) substructures with at most vertices.

## 2 Background

##### Message Passing Graph Neural Networks.

Let be a labeled graph with vertices. Here, denotes the initial label of , where is a (countable) domain.

A typical Message Passing Graph Neural Network (MPNN) first computes a representation of each vertex, and then aggregates the vertex representations via a readout function into a representation of the entire graph . The representation of each vertex is computed iteratively by aggregating the representations of the neighboring vertices :

 m(i)v =AGGREGATE(i)({{h(i−1)u:u∈N(v)}}), (1) h(i)v =COMBINE(i)(h(i−1)v,m(i)v), (2)

for any , for iterations, and with . The AGGREGATE/COMBINE functions are parametrized, learnable functions, and denotes a multi-set, i.e., a set with (possibly) repeating elements. A graph-level representation can be computed as

where READOUT is a learnable function. For representational power, it is important that the learnable functions above are injective, which can be achieved, e.g., if the AGGREGATE function is a summation and COMBINE is a weighted sum concatenated with an MLP ([9]).

##### Higher-Order GNNs.

To increase the representational power of GNNs, several higher-order GNNs have been proposed. In a GNN, a message passing algorithm is applied to the tuples of vertices, in a similar fashion as GNNs do on vertices [10]. At initialization, each tuple is labeled with its type, that is, two tuples are labeled differently if their induced subgraphs are not isomorphic. As a result, GNNs can count (induced) substructures with at most vertices even at initialization. Another class of higher-order networks are IGNs, which are constructed with linear invariant/equivariant feed-forward layers, whose inputs consider graphs via adjacency matrices [15]. IGNs are at least as powerful as GNNs, and hence they too can count substructures with at most vertices. However, both methods need operations.

Specifically for counting substructures, [11] propose Local Relational Pooling (LRP) networks. LRPs apply Relational Pooling (RP) networks [17, 18] on the neighborhoods around each vertex. RP networks use permutation-variant functions and convert them to a permutation-invariant function by summing over all permutations. This summation is computationally expensive.

## 3 Other Related Works

Expressive power. Several other works have studied the expressive power of GNNs [19]. [20] extend universal approximation from feedforward networks to GNNs, using the notion of unfolding equivalence. [21] establish an equivalence between the graph isomorphism problem and the power to approximate permutation invariant functions on graphs. [15] and [22]

propose higher-order, tensor-based GNN models that provably achieve universal approximation of permutation-invariant functions on graphs, and

[23] studies expressive power under depth and width restrictions. Studying GNNs from the perspective of local algorithms, [24]

show that GNNs can approximate solutions to certain combinatorial optimization problems.

Subgraphs and GNNs. The idea of considering local neighborhoods to have better representations than MPNNs is considered in several works [25, 26, 27, 28, 29, 30, 31, 32]. For example, in link prediction, one can use local neighborhoods around links and apply GNNs, as suggested in [33]. A novel method based on combining GNNs and a clustering algorithm is proposed in ([34]). For graph comparison (i.e., testing whether a given possibly large subgraph exists in the given model), [35] compare the outputs of GNNs for small subgraphs of the two graphs. To improve the expressive power of GNNs, [36] use features that are counts of specific subgraphs of interest. Another related work is [37], where an MPNN is strengthened by learning local context matrices around vertices.

## 4 Recursive Neighborhood Pooling

Next, we construct Recursive Neighborhood Pooling Graph Neural Networks (RNP-GNNs), GNNs that can count any set of induced substructures of interest, with lower complexity than previous models. We represent each vertex by a representation of its radius

-neighborhood, and then combine these representations. The key question is how to encode these local neighborhoods in a vector representation. To do so, we introduce a new idea: we view local neighborhoods as small subgraphs, and recursively apply our model to encode these neighborhood subgraphs. When encoding the local subgraphs, we use a different radius

, and, recursively, a sequence of radii to obtain the final representation of vertices after recursion steps.

While MPNNs also encode a representation of a local neighborhood of certain radius, the recursive representations differ as they essentially take into account intersections of neighborhoods. As a result, as we will see in Section 5.1, they retain more structural information and are more expressive. Models such as -GNN and LRP also compute encodings of subgraphs, and then update the resulting representations via message passing. We can do the same with the neighborhood representations computed by RNP-GNNs to encode more global information, although our representation results in Section 5.1 hold even without that. In Section 6, we will compare the computational complexity of RNP-GNN and these other models.

Formally, an RNP-GNN is a parametrized learnable function , where is the set of all labeled graphs on vertices. Let be a labeled graph with vertices, and let be the initial representation of each vertex . Let denote the neighborhood of radius of vertex , and let denote the induced subgraph of on the set of vertices , with augmented vertex label for any . This means we add information about whether vertices are direct neighbors (with distance one) of . Given a recursion sequence or radii, the representations are updated as

 m(t)v =RNP-GNN(t−1)(G(t−1)v(Nr1(v)∖{v})), (4) h(t)v =COMBINE(t)(h(t−1)v,m(t)v), (5)

for any , and

Different from MPNNs, the recursive update (4) is in general applied to a subgraph, and not a multi-set of vertex representations. is an RNP-GNN with recursion parameters . The final READOUT is an injective, permutation-invariant learnable multi-set function.

If , then

 m(t)v =AGGREGATE(t)({{(h(t−1)u,1[(u,v)∈E]):u∈Nr1(v)}}) (7)

is a permutation-invariant aggregation function as used in MPNNs, only over a potentially larger neighborhood. For and , RNP-GNN reduces to MPNN.

In Figure 1, we illustrate an RNP-GNN with recursion parameters as an example. We also provide pseudocode for RNP-GNNs in Appendix C.

## 5 Expressive Power

In this section, we analyze the expressive power of RNP-GNNs.

### 5.1 Counting (Induced) Substructures

In contrast to MPNNs, which, in general, cannot count substructures of three vertices or more, in this section we prove that for any set of substructures, there is an RNP-GNN that provably counts them. We begin with a few definitions.

###### Definition 1.

Let be arbitrary labeled simple graphs, where is the set of vertices in . Also, for any , let denote the subgraph of induced by . The induced subgraph count function is defined as

 C(G;H):=∑S⊆V1{G(S)≅H}, (8)

i.e., the number of subgraphs of isomorphic to . For unlabeled , the function is defined analogously.

We also need to define a notion of covering for graphs. Our definition uses distances on graphs.

###### Definition 2.

Let be a (possibly labeled) simple connected graph. For any and , define

 ¯dH(v;S):=maxu∈Sd(u,v), (9)

where is the shortest-path distance in .

###### Definition 3.

Let be a (possibly labeled) simple connected graph on vertices. A permutation of vertices, such as , is called a vertex covering sequence, with respect to a sequence called a covering sequence, if and only if

 ¯dH′i(vi;Si)≤ri, (10)

for any , where and is the subgraph of induced by the set of vertices . We also say that admits the covering sequence if there is a vertex covering sequence for with respect to .

In particular, in a covering sequence we first consider the whole graph as a local neighborhood of one of its vertices with radius . Then, we remove that vertex and compute the covering sequence of the remaining graph. Figure 3 shows an example of a covering sequence computation. An important property, which holds by definition, is that if is a covering sequence for , then any (in a point-wise sense) is also a covering sequence for .

Note that any connected graph on vertices admits at least one covering sequence, which is . To observe this fact, note that in a connected graph, there is at least one vertex that can be removed and the remaining graph still remains connected. Therefore, we may take this vertex as the first element of a vertex covering sequence, and inductively find the other elements. Since the diameter of a connected graph with vertices is always bounded by , we achieve the desired result. However, we will see in the next section that, when using covering sequences to identify sufficiently powerful RNP-GNNs, it is desirable to have covering sequences with low , since the complexity of the resulting RNP-GNN depends on . We provide an algorithm in Appendix D to find such covering sequences in polynomial time

More generally, if and are (possibly labeled) simple graphs on vertices and , i.e., is a subgraph of (not necessarily induced-subgraph), then, it follows from the definition that any covering sequence for is also a covering sequence for . As a side remark, as illustrated in Figure 4, covering sequences need not always to be decreasing.

Using covering sequences, we can show the following result.

###### Theorem 1.

Consider a set of (labeled or unlabeled) graphs on vertices, such that any admits the covering sequence . Then, there is an RNP-GNN with recursion parameters that can count any . In other words, if there exists such that , then . The same result also holds for the non-induced subgraph count function.

Theorem 1 states that, with appropriate recursion parameters, any set of (labeled or unlabeled) substructures can be counted by an RNP-GNN. Interestingly, induced and non-induced subgraphs can be both counted in RNP-GNNs111For simplicity, we assume that only contains vertex graphs. If includes graphs with strictly less than vertices, we can simply add a sufficient number of zeros to the RHS of their covering sequences..

The theorem holds for any covering sequence that is valid for all graphs in . For any graph, one can compute a covering sequence by computing a spanning tree, and sequentially pruning the leaves of the tree. The resulting sequence of nodes is a vertex covering sequence, and the corresponding covering sequence can be obtained from the tree too (Appendix D). A valid covering sequence for all the graphs in is the coordinate-wise maximum of all these sequences.

For large substructures, the sequence can be long or include large numbers, and this will affect the computational complexity of RNP-GNNs. For small, e.g., constant-size substructures, the recursion parameters are also small (i.e., for all ), raising the hope to count these structures efficiently. In particular, is an important parameter. In Section 6, we analyze the complexity of RNP-GNNs in more detail.

### 5.2 A Universal Approximation Result for Local Functions

Theorem 1 shows that RNP-GNNs can count substructures if their recursion parameters are chosen carefully. Next, we provide a universal approximation result, which shows that they can learn any function related to local neighborhoods or small subgraphs in a graph.

First, we recall that for a graph , denotes the subgraph of induced by the set of vertices .

###### Definition 4.

A function is called an local graph function if

 ℓ(G)=ϕ({{ψ(G(S)):S⊆V,|S|≤r}}), (11)

where is a function on graphs and is a multi-set function.

In other words, a local function only depends on small substructures.

###### Theorem 2.

For any local graph function , there exists an RNP-GNN with recursion parameters such that for any .

As a result, we can provably learn all the local information in a graph with an appropriate RNP-GNN. Note that we still need recursions, because the function may be an arbitrarily difficult graph function. However, to achieve the full generality of such a universal approximation result, we need to consider large recursion parameters () and injective aggregations in the RNP-GNN network. For universal approximation, we may also need high dimensions if feedforward network layers are used for aggregation (see the proof of the theorem for more details).

As a remark, for , achieving universal approximation on graphs implies solving the graph isomorphism problem. But, in this extreme case, the computational complexity of the model in general is not a polynomial in .

## 6 Computational Complexity

The computational complexity of RNP-GNNs is graph-dependent. For instance, we need to compute the set of local neighborhoods, which is cheaper for sparse graphs. A complexity measure existing in the literature is the tensor order. For higher-order networks, e.g., IGNs, we need to consider tensors in . The space complexity is then and the time complexity can be even more, dependent on the algorithm used to process tensors. In general, for a message passing algorithm on graphs, the complexity of the model depends linearly on the number of vertices (if the graph is sparse). Therefore, to bound the complexity of a method, we need to bound the number of node representation updates, which we do in the following theorem.

###### Theorem 3.

Let be an RNP-GNN with the recursion parameters . Assume that the observed graphs , whose representations we compute, satisfy the following property:

 maxv∈[n]|Nr1(v)|≤c, (12)

where is a graph independent constant. Then, the number of node updates in the RNP-GNN is .

In other words, if and , then RNP-GNN requires relatively few updates (that is, ), compared to the higher-order networks (). Also, in this case, finding neighborhoods is not difficult, since neighborhoods are small (). Note that if the maximum degree of the given graphs is , then . Therefore, similarly, if then we can count with at most updates.

The above results show that when using RNP-GNNs with sparse graphs, we can learn functions of substructures with vertices without requiring order tensors. LRPs also encode neighborhoods of distance around nodes. In particular, all permutations of the nodes in a neighborhood of size are considered to obtain the representation. As a result, LRP networks only have polynomial complexity if . Thus, RNP-GNNs can provide an exponential improvement in terms of the tolerable size of neighborhoods with distance in the graph.

Moreover, theorem 3 suggests to aim for small . The other ’s may be larger than , as shown in Figure 4, but do not affect the upper bound on the complexity.

## 7 An Information-Theoretic Lower Bound

In this section, we provide a general information-theoretic lower bound for graph representations that encode a given graph by first encoding a number of (possibly small) graphs and then aggregating the resulting representations. The sequence of graphs may be obtained in an arbitrary way from . For example, in an MPNN, can be the computation tree (rooted tree) at node . As another example, in LRP, is the local neighborhood around node .

Formally, consider a graph representation as

 f(G;θ)=Φ({{ψ(Gi):i∈[t]}}),[t]={1,…,t} (13)

for any , where is a multi-set function, where is a function from one graph to graphs, and is a function on graphs taking values. In short, we encode graphs, and each encoding takes one of values. We call this graph representation function an -good graph representation.

###### Theorem 4.

Consider a parametrized class of good representations that is able to count any (not necessarily induced222The theorem also holds for induced-subgraphs, with/without vertex labels.) substructure with vertices. More precisely, for any graph with vertices, there exists such that if , then . Then333 is up to poly-logarithmic factors. , .

In particular, for any good graph representation with , i.e., binary encoding functions, we need encoded graphs. This implies that, for , enumerating all subgraphs and deciding for each whether it equals is near optimal. Moreover, if , then small graphs would not suffice to enable counting.

More interestingly, if , then it is impossible to perform the substructure counting task with . As a result, in this case, considering encoded graphs (as is done in GNNs or LRP networks) cannot be exponentially improved.

The lower bound in this section is information-theoretic and hence applies to any algorithm. It may be possible to strengthen it by considering computational complexity, too. For binary encodings, i.e., , however, we know that the bound cannot be improved since manual counting of subgraphs matches the lower bound.

## 8 Time Complexity Lower Bounds for Counting Subgraphs

In this section, we put our results in the context of known hardness results for subgraph counting.

In general, the subgraph isomorphism problem is known to be NP-complete. Going further, the Exponential Time Hypothesis (ETH) is a conjecture in complexity theory [38], and states that several NP-complete problems cannot be solved in sub-exponential time. ETH, as a stronger version of the problem, is widely believed to hold. Assuming that ETH holds, the clique detection problem requires at least time [39]. This means that if a graph representation can count any subgraph of size , then computing it requires at least time.

###### Corollary 1.

Assuming ETH conjecture holds, any graph representation that can count any substructure on vertices with appropriate parametrization needs time to compute.

The above bound matches the complexity of the higher-order GNNs. Comparing with Theorem 4 above, Corollary 1 is more general, while Theorem 4 has fewer assumptions and offers a refined result for aggregation-based graph representations.

Given that Corollary 1 is a worst-case bound, a natural question is whether we can do better for subclasses of graphs. Regarding , even if is a random Erdös-Rényi graph, it can only be counted in time [40].

Regarding the input graph in which we count, consider two classes of sparse graphs: strongly sparse graphs have maximum degree , and weakly sparse graphs have average degree . We argued in Theorem 3 that RNP-GNNs achieve almost linear complexity for the class of strongly sparse graphs. For weakly sparse graphs, in contrast, the complexity of RNP-GNNs is generally not linear, but still polynomial, and can be much better than . One may ask whether it is possible to achieve a learnable graph representation such that its complexity for weakly sparse graphs is still linear. Recent results in complexity theory imply that this is impossible:

###### Corollary 2 ([41, 42]).

There is no graph representation algorithm that runs in linear time on weakly sparse graphs and is able to count any substructure on vertices (with appropriate parametrization).

Hence, RNP-GNNs are close to optimal for several cases of counting substructures with parametrized learnable functions.

## 9 Conclusion

In this paper, we studied the theoretical possibility of counting substructures (induced-subgraphs) by a graph representation network. We proposed an architecture, called RNP-GNN, and we proved that for reasonably sparse graphs we can efficiently count substructures. Characterizing the expressive power of GNNs via the set of functions they can learn on substructures may be useful for developing new architectures. In the end, we proved a general lower bound for any graph representation which counts subgraphs and works by aggregating representations of a collection of graphs derived from the graph.

## Acknowledgements

This project was funded by NSF CAREER award 1553284 and an ONR MURI.

## References

• [1] F. Scarselli, M. Gori, A. C. Tsoi, M. Hagenbuchner, and G. Monfardini, “The graph neural network model,” IEEE Transactions on Neural Networks, vol. 20, no. 1, pp. 61–80, 2008.
• [2] T. N. Kipf and M. Welling, “Semi-supervised classification with graph convolutional networks,” in International Conference on Learning Representations, 2017.
• [3] W. Hamilton, Z. Ying, and J. Leskovec, “Inductive representation learning on large graphs,” in Advances in neural information processing systems, pp. 1024–1034, 2017.
• [4] D. K. Duvenaud, D. Maclaurin, J. Iparraguirre, R. Bombarell, T. Hirzel, A. Aspuru-Guzik, and R. P. Adams, “Convolutional networks on graphs for learning molecular fingerprints,” in Advances in neural information processing systems, pp. 2224–2232, 2015.
• [5]

M. Defferrard, X. Bresson, and P. Vandergheynst, “Convolutional neural networks on graphs with fast localized spectral filtering,” in

Advances in neural information processing systems, pp. 3844–3852, 2016.
• [6] P. Battaglia, R. Pascanu, M. Lai, D. J. Rezende, et al., “Interaction networks for learning about objects, relations and physics,” in Advances in neural information processing systems, pp. 4502–4510, 2016.
• [7] W. Jin, K. Yang, R. Barzilay, and T. Jaakkola, “Learning multimodal graph-to-graph translation for molecule optimization,” in International Conference on Learning Representations, 2018.
• [8] J. Gilmer, S. S. Schoenholz, P. F. Riley, O. Vinyals, and G. E. Dahl, “Neural message passing for quantum chemistry,” in

Proceedings of the 34th International Conference on Machine Learning-Volume 70

, pp. 1263–1272, 2017.
• [9] K. Xu, W. Hu, J. Leskovec, and S. Jegelka, “How powerful are graph neural networks?,” in International Conference on Learning Representations, 2019.
• [10] C. Morris, M. Ritzert, M. Fey, W. L. Hamilton, J. E. Lenssen, G. Rattan, and M. Grohe, “Weisfeiler and leman go neural: Higher-order graph neural networks.,” in AAAI, 2019.
• [11] Z. Chen, L. Chen, S. Villar, and J. Bruna, “Can graph neural networks count substructures?,” arXiv preprint arXiv:2002.04025, 2020.
• [12] V. Garg, S. Jegelka, and T. Jaakkola, “Generalization and representational limits of graph neural networks,” in Int. Conference on Machine Learning (ICML), pp. 5204–5215, 2020.
• [13]

D. C. Elton, Z. Boukouvalas, M. D. Fuge, and P. W. Chung, “Deep learning for molecular design—a review of the state of the art,”

Molecular Systems Design & Engineering, vol. 4, no. 4, pp. 828–849, 2019.
• [14] M. Sun, S. Zhao, C. Gilvary, O. Elemento, J. Zhou, and F. Wang, “Graph convolutional networks for computational drug development and discovery,” Briefings in bioinformatics, vol. 21, no. 3, pp. 919–935, 2020.
• [15] H. Maron, E. Fetaya, N. Segol, and Y. Lipman, “On the universality of invariant networks,” in International Conference on Machine Learning, pp. 4363–4371, 2019.
• [16] V. Arvind, F. Fuhlbrück, J. Köbler, and O. Verbitsky, “On weisfeiler-leman invariance: subgraph counts and related graph properties,” Journal of Computer and System Sciences, 2020.
• [17] R. Murphy, B. Srinivasan, V. Rao, and B. Riberio, “Relational pooling for graph representations,” in International Conference on Machine Learning (ICML 2019), 2019.
• [18] R. Murphy, B. Srinivasan, V. Rao, and B. Riberio, “Janossy pooling: Learning deep permutation-invariant functions for variable-size inputs,” in International Conference on Learning Representations, 2019.
• [19] W. Azizian and M. Lelarge, “Characterizing the expressive power of invariant and equivariant graph neural networks,” arXiv preprint arXiv:2006.15646, 2020.
• [20] F. Scarselli, M. Gori, A. C. Tsoi, M. Hagenbuchner, and G. Monfardini, “Computational capabilities of graph neural networks,” IEEE Transactions on Neural Networks, vol. 20, no. 1, pp. 81–102, 2009.
• [21] Z. Chen, S. Villar, L. Chen, and J. Bruna, “On the equivalence between graph isomorphism testing and function approximation with gnns,” in Advances in Neural Information Processing Systems, pp. 15894–15902, 2019.
• [22] N. Keriven and G. Peyré, “Universal invariant and equivariant graph neural networks,” in Advances in Neural Information Processing Systems (NeurIPS), pp. 7092–7101, 2019.
• [23] A. Loukas, “What graph neural networks cannot learn: depth vs width,” in International Conference on Learning Representations, 2019.
• [24] R. Sato, M. Yamada, and H. Kashima, “Approximation ratios of graph neural networks for combinatorial problems,” in Advances in Neural Information Processing Systems, pp. 4081–4090, 2019.
• [25]

S. Liu, M. F. Demirel, and Y. Liang, “N-gram graph: Simple unsupervised representation for graphs, with applications to molecules,” in

Advances in Neural Information Processing Systems, pp. 8466–8478, 2019.
• [26] F. Monti, K. Otness, and M. M. Bronstein, “Motifnet: a motif-based graph convolutional network for directed graphs,” in

2018 IEEE Data Science Workshop (DSW)

, pp. 225–228, IEEE, 2018.
• [27] X. Liu, H. Pan, M. He, Y. Song, X. Jiang, and L. Shang, “Neural subgraph isomorphism counting,” in Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 1959–1969, 2020.
• [28] Y. Yu, K. Huang, C. Zhang, L. M. Glass, J. Sun, and C. Xiao, “Sumgnn: Multi-typed drug interaction prediction via efficient knowledge graph summarization,” arXiv preprint arXiv:2010.01450, 2020.
• [29] C. Meng, S. C. Mouli, B. Ribeiro, and J. Neville, “Subgraph pattern neural networks for high-order graph evolution prediction.,” in AAAI, pp. 3778–3787, 2018.
• [30] L. Cotta, C. H. C. Teixeira, A. Swami, and B. Ribeiro, “Unsupervised joint -node graph representations with compositional energy-based models,” arXiv preprint arXiv:2010.04259, 2020.
• [31] E. Alsentzer, S. Finlayson, M. Li, and M. Zitnik, “Subgraph neural networks,” Advances in Neural Information Processing Systems, vol. 33, 2020.
• [32] K. Huang and M. Zitnik, “Graph meta learning via local subgraphs,” arXiv preprint arXiv:2006.07889, 2020.
• [33] M. Zhang and Y. Chen, “Link prediction based on graph neural networks,” in Advances in Neural Information Processing Systems, pp. 5165–5175, 2018.
• [34] Z. Ying, J. You, C. Morris, X. Ren, W. Hamilton, and J. Leskovec, “Hierarchical graph representation learning with differentiable pooling,” in Advances in neural information processing systems, pp. 4800–4810, 2018.
• [35] R. Ying, Z. Lou, J. You, C. Wen, A. Canedo, and J. Leskovec, “Neural subgraph matching,” arXiv preprint arXiv:2007.03092, 2020.
• [36] G. Bouritsas, F. Frasca, S. Zafeiriou, and M. M. Bronstein, “Improving graph neural network expressivity via subgraph isomorphism counting,” arXiv preprint arXiv:2006.09252, 2020.
• [37] C. Vignac, A. Loukas, and P. Frossard, “Building powerful and equivariant graph neural networks with message-passing,” arXiv preprint arXiv:2006.15107, 2020.
• [38] R. Impagliazzo and R. Paturi, “On the complexity of k-sat,” Journal of Computer and System Sciences, vol. 62, no. 2, pp. 367–375, 2001.
• [39] J. Chen, B. Chor, M. Fellows, X. Huang, D. Juedes, I. A. Kanj, and G. Xia, “Tight lower bounds for certain parameterized np-hard problems,” Information and Computation, vol. 201, no. 2, pp. 216–231, 2005.
• [40] M. Dalirrooyfard, T. D. Vuong, and V. V. Williams, “Graph pattern detection: Hardness for all induced patterns and faster non-induced cycles,” in

Proceedings of the 51st Annual ACM SIGACT Symposium on Theory of Computing

, pp. 1167–1178, 2019.
• [41] L. Gishboliner, Y. Levanzov, and A. Shapira, “Counting subgraphs in degenerate graphs,” arXiv preprint arXiv:2010.05998, 2020.
• [42] S. K. Bera, N. Pashanasangi, and C. Seshadhri, “Linear time subgraph counting, graph degeneracy, and the chasm at size six,” arXiv preprint arXiv:1911.05896, 2019.
• [43] P. Kelly et al., “A congruence theorem for trees.,” Pacific Journal of Mathematics, vol. 7, no. 1, pp. 961–968, 1957.
• [44] B. D. McKay, “Small graphs are reconstructible,” Australasian Journal of Combinatorics, vol. 15, pp. 123–126, 1997.
• [45] K. Hornik, M. Stinchcombe, H. White, et al., “Multilayer feedforward networks are universal approximators.,” Neural networks, vol. 2, no. 5, pp. 359–366, 1989.
• [46] K. Hornik, “Approximation capabilities of multilayer feedforward networks,” Neural networks, vol. 4, no. 2, pp. 251–257, 1991.
• [47] J. Kleinberg and E. Tardos, Algorithm design. Pearson Education India, 2006.

## Appendix A Proofs

### a.1 Proof of Theorem 1

#### a.1.1 Preliminaries

Let us first state a few definitions about the graph functions. Note that for any graph function , we have for any .

###### Definition 5.

Given two graph functions , we write , if and only if for any ,

 ∀G1,G2∈Gn:g(G1)≠g(G2)⟹f(G1)≠f(G2), (14)

or, equivalently,

 ∀G1,G2∈Gn:f(G1)=f(G2)⟹g(G1)=g(G2). (15)
###### Proposition 1.

Consider graph functions such that and . Then, . In other words, is transitive.

###### Proof.

The proposition holds by definition. ∎

###### Proposition 2.

Consider graph functions such that . Then, there is a function such that .

###### Proof.

Let be the partitioning induced by the equality relation with respect to the function on . Similarly define , for . Note that due to the definition, is a refinement for . Define to be the unique mapping from to which respects the equality relation. One can observe that such satisfies the requirement in the proposition. ∎

###### Definition 6.

An RNP-GNN is called maximally expressive, if and only if

• all the aggregate functions are injective as mappings from a multi-set on a countable ground set to their codomain.

• all the combine functions are injective mappings.

###### Proposition 3.

Consider two RNP-GNNs with the same recursion parameters where is maximally expressive. Then, .

###### Proof.

The proposition holds by definition. ∎

###### Proposition 4.

Consider a sequence of graph functions . If for all , then

 f⊒k∑i=1cigi, (16)

for any , .

###### Proof.

Since , we have

 ∀G1,G2∈Gn:f(G1)=f(G2)⟹gi(G1)=gi(G2), (17)

for all . This means that for any if then , , and consequently . Therefore, from the definition we conclude . Note that the same proof also holds in the case of countable summations as long as the summation is bounded. ∎

###### Definition 7.

Let be a labeled connected simple graph on vertices. For any labeled graph , the induced subgraph count function is defined as

 C(G;H):=∑S⊆[n]1{G(S)≅H}. (18)

Also, let denote the number of non-induced subgraphs of which are isomorphic to . It can be defined with the homomorphisms from to . Formally, if define

 ¯C(G;H):=∑S⊆[n]|S|=k¯C(G(S);H). (19)

Otherwise, , and we define

 ¯C(G;H):=∑~H∈~H(H)c~H,H×1{G≅~H}, (20)

where

 ~H(H):={~H∈Gk:~H⋑H}, (21)

is defined with respect to the graph isomorphism, and denotes the number of subgraphs in identical to . Note that is a finite set and denotes being a (not necessarily induced) subgraph.

###### Proposition 5.

Let be a family of graphs. If for any , there is an RNP-GNN with recursion parameters such that , then there exists an RNP-GNN with recursion parameters such that .

###### Proof.

Let be a maximally expressive RNP-GNN. Note that by the definition for any . Since is transitive, for all , and using Proposition 4, we conclude that . ∎

The following proposition shows that there is no difference between counting induced labeled graphs and counting induced unlabeled graphs in RNP-GNNs.

###### Proposition 6.

Let be an unlabeled connected graph. Assume that for any labeled graph , which is constructed by adding arbitrary labels to , there exists an RNP-GNN such that , then for its unlabeled counterpart , there exists an RNP-GNN with the same recursion parameters as such that .

###### Proof.

If there exists an RNP-GNN such that , then for a maximally expressive RNP-GNN with the same recursion parameters as we also have . Let be the set of all labeled graphs up to graph isomorphism, where for a countable set . Note that is a countable set. Now we write

 C(G;H0) =∑S⊆[n]|S|=k1{G(S)≅H0} (22) =∑S⊆[n]|S|=k∑i∈N1{G(S)≅Hi} (23) (24) =∑i∈NC(G;Hi). (25)

Now using Proposition 4 we conclude that since is always finite. ∎

###### Definition 8.

Let be a (possibly labeled) simple connected graph. For any and , define

 ¯dH(v;S):=maxu∈Sd(u,v). (27)
###### Definition 9.

Let be a (possibly labeled) connected simple graph on vertices. A permutation of vertices, such as , is called a vertex covering sequence, with respect to a sequence , called a covering sequence, if and only if

 ¯dH′i(vi;Si)≤ri, (28)

for , where and . Let denote the set of all vertex covering sequences with respect to the covering sequence for .

###### Proposition 7.

For any , if (non-induced subgraph), then

 CH(r)⊆CG(r), (29)

for any sequence .

###### Proof.

The proposition follows from the fact that the function is decreasing with introducing new edges. ∎

###### Proposition 8.

Assume that Theorem 1 holds for induced-subgraph count functions. Then, it also holds for the non-induced subgraph count functions.

###### Proof.

Assume that for a connected (labeled or unlabeled) graph , there exists an RNP-GNN with appropriate recursion parameters such that , then we prove there exists an RNP-GNN with the same recursion parameters as such that .

If there exists an RNP-GNN such that , then for a maximally expressive RNP-GNN with the same recursion parameters as we also have . Note that

 ¯C(G,H) =∑S⊆[n]|S|=k¯C(G(S);H) (30) =∑S⊆[n]|S|=k∑~H∈~H(H)c~H,H×1{G(S)≅~H} (31) =∑~H∈~H(H)c~H,H∑S⊆[n]|S|=k1{G(S)≅~H} (32) =∑i∈NcHi,H×C(G,Hi), (33)

where .

###### Claim 1.

for any .

Using Proposition 4 and Claim 1 we conclude that since is finite and for any , and the proof is complete. The missing part which we must show here is that for any the sequence which covers also covers . This follows from Proposition 7. We are done. ∎

At the end of this part, let us introduce an important notation. For any labeled connected simple graph on vertices , let be the resulting induced graph obtained after removing from with the new labels defined as

 X∗u:=(Xu,1{(u,v)∈E}), (34)

for each . We may also use for more clarification.

#### a.1.2 Proof of Theorem 1

We utilize an inductive prove on , which is the length of the covering sequence of . Equivalently, due to the definition, , where is the number of vertices in . First, we note that due to Proposition 8, without loss of generality, we can assume that is a simple connected labeled graph and the goal is to achieve the induced-subgraph count function via an RNP-GNN with appropriate recursion parameters. We also consider only maximally expressive networks here to prove the desired result.

Induction base. For the induction base, i.e., , is a two-vertex graph. This means that we only need to count the number of a specific (labeled) edge in the given graph . Note that in this case we apply an RNP-GNN with recursion parameter . Denote the two labels of the vertices in by . The output of an RNP-GNN is

 f(G;θ)=ϕ({{ψ(XGv,φ({{X∗vu:u∈Nr1(v)}})):v∈[n]}}), (35)

where we assume that is maximally expressive. The goal is to show that . Using the transitivity of , we only need to choose appropriate to achieve as the final representation. Let

 ϕ({{zv:v∈[n]}}) :=12+2×1{XH1=XH2}n∑i=1zi (36) ψ(X,(z,z′)) :=z×1{X=XH1}+z′×1{X=XH2} (37) φ({{zu:u∈[n′]}}) :=(n′∑i=11{zu=(XH2,1)},n′∑i=11{zu=(XH1,1)). (38)

Then, a simple computation shows that

 ^f(G;θ) =ϕ({{ψ(XGv,φ({{X∗vu:u∈Nr1(v)}})):v∈[n]}}), (39) =C(G;H). (40)

Since is an RNP-GNN with recursion parameter and for any maximally expressive RNP-GNN with the same recursion parameter as we have and , we conclude that and this completes the proof.

Induction step. Assume that the desired result holds for (). We show that it also holds for . Let us first define

 H∗ :={H∗v1:∃v2,…,vt∈[k]:(v1,v2,…,vt)∈CH(r)} (41) c∗(H∗) :=1{H∗∈H∗}×#{v∈[k]:H∗v≅H∗}. (42)

Note that by the assumption. Let

 ∥H∗∥:=∑H∗∈H∗c∗(H∗). (43)

For all , using the induction hypothesis, there is a (universal) RNP-GNN with recursion parameters such that . Using Proposition 4 we conclude

 ^f⊒∑u∈[k]:H∗u∈H∗C(G;H∗u). (44)

Define a maximally expressive RNP-GNN with the recursion parameters as follows:

 f(G;θ)=ϕ({{ψ(XGv,^f(G∗(Nr1(v));^θ)):v∈[n]}}). (45)

Similar to the proof for , here we only need to propose a (not necessarily maximally expressive) RNP-GNN which achieves the function .

Let us define

 fH∗u(G;θ):=ϕ({{ψH∗u(XGv,ξ∘^f(G∗(Nr1(v));^θ)):v∈[n]}}), (46)

where

 ϕ({{zv:v∈[n]}}) :=1∥H∗∥n∑i=1zi (47) ψH∗u(X,z) :=z×1{X=XHu}, (48)

and . Note that the existence of such function is guaranteed due to Proposition 2. Now we write

 ∥H∗∥×C(G;H) =∥H∗∥∑S⊆[n]1{G(S)≅H} (50) =∑S⊆[n]∑v∈S1{∃u∈[k]:(G(S∖{v}))∗v≅H∗u∈H∗∧XGv=XHu} (51) =∑v∈[n]∑v∈S⊆[n]1{∃u∈[k]:(G(S∖{v}))∗v≅H∗u∈H∗∧XGv=XHu} (52) =∑v∈[n]∑v∈S⊆Nr1(v)1{∃u∈[k]:(G(S∖{v}))∗v≅H∗u∈H∗∧XGv=XHu} (53) =∑v∈[n]∑v∈S⊆Nr1(v)∑u∈[k]:H∗u∈H∗