1 Introduction
Graph Neural Networks (GNNs) are powerful tools for graph representation learning [1, 2, 3]
, and have been successfully used in applications such as encoding molecules, simulating physics, social network analysis, knowledge graphs, and many others
[4, 5, 6, 7]. An important class of GNNs is the set of Message Passing Graph Neural Networks (MPNNs) [8, 2, 3, 9, 1], which follow an iterative message passing scheme to compute a graph representation.Despite the empirical success of MPNNs, their expressive power has been shown to be limited. For example, their discriminative power, at best, corresponds to the onedimensional WeisfeilerLeman (1WL) graph isomorphism test [9, 10], so they cannot, e.g., distinguish regular graphs. Moreover, they also cannot count any induced subgraph with at least three vertices [11], or learn graph parameters such as clique information, diameter, or shortest cycle [12]. Still, in several applications, e.g. in computational chemistry, materials design or pharmacy [13, 14, 7], we aim to learn functions that depend on the presence or count of specific substructures.
To strengthen the expressive power of GNNs, higherorder representations such as GNNs [10] and Invariant Graph Networks (IGNs) [15] have been proposed. GNNs are inspired by the dimensional WL (WL) graph isomorphism test, a message passing algorithm on tuples of vertices, and
IGNs are based on equivariant linear layers of a feedforward neural network applied to the input graph as a matrix, and are at least as powerful as
GNN. These models are provably more powerful than MPNNs and can, e.g., count any induced substructure with at most vertices. But, this power comes at the computational cost of at least operations for vertices. The necessary tradeoffs between expressive power and computational complexity are still an open question.The expressive power of a GNN is often measured in terms of a hierarchy of graph isomorphism tests, i.e., by comparing it to a WL test. Yet, there is limited knowledge about how the expressive power of higherorder graph isomorphism tests relates to various functions of interest [16]. A different approach is to take the perspective of specific functions that are of practical interest, and quantify a GNN’s expressive power via those. Here, we focus on counting induced substructures to measure the power of a GNN, as proposed in [11]. In particular, we study whether it is possible to count given substructures with a GNN whose complexity is between that of MPNNs and the existing higherorder GNNs.
To this end, we study the scheme of many higherorder GNNs [10, 11]: select a collection of subgraphs of the input graph, encode these, and (possibly iteratively) compute a learned function on this collection. First, we propose a new such class of GNNs, Recursive Neighborhood Pooling Graph Neural Networks (RNPGNNs). Specifically, RNPGNNs represent each vertex by a representation of its neighborhood of a specific radius. Importantly, this neighborhood representation is computed recursively from its subgraphs. As we show, RNPGNNs can count any induced substructure with at most vertices. Moreover, for any set of substructures with at most vertices, there is a specifiable RNPGNN that can count them. This flexibility allows to design a GNN that is adapted to the power needed for the task of interest, in terms of counting (induced) substructures.
The Local Relational Pooling (LRP) architecture too has been introduced with the goal of counting substructures [11]. While it can do so, it is polynomialtime only if the encoded neighborhoods are of size . In contrast, RNPGNNs use almost linear operations, i.e., , if the size of each encoded neighborhood is . This is an exponential theoretical improvement in the tolerable size of neighborhoods, and a significant improvement over the complexity of in GNN and IGN.
Finally, we take a broader perspective and provide an informationtheoretic lower bound on the complexity of a general class of GNNs that can provably count substructures with at most vertices. This class includes GNNs that represent a given graph by aggregating a number of encoded graphs, where the encoded graphs are related to the given graph with an arbitrary function.
In short, in this paper, we make the following contributions:

We introduce Recursive Neighborhood Pooling Graph Neural Networks (RNPGNNs), a flexible class of higherorder graph neural networks, that provably allow to design graph representation networks with any expressive power of interest, in terms of counting (induced) substructures.

We show that RNPGNNs offer computational gains over existing models that count substructures: an exponential improvement in terms of the “tolerable” size of the encoded neighborhoods compared to LRP networks, and much less complexity in sparse graphs compared to GNN and IGN.

We provide an informationtheoretic lower bound on the complexity of a general class of GNN that can count (induced) substructures with at most vertices.
2 Background
Message Passing Graph Neural Networks.
Let be a labeled graph with vertices. Here, denotes the initial label of , where is a (countable) domain.
A typical Message Passing Graph Neural Network (MPNN) first computes a representation of each vertex, and then aggregates the vertex representations via a readout function into a representation of the entire graph . The representation of each vertex is computed iteratively by aggregating the representations of the neighboring vertices :
(1)  
(2) 
for any , for iterations, and with . The AGGREGATE/COMBINE functions are parametrized, learnable functions, and denotes a multiset, i.e., a set with (possibly) repeating elements. A graphlevel representation can be computed as
(3) 
where READOUT is a learnable function. For representational power, it is important that the learnable functions above are injective, which can be achieved, e.g., if the AGGREGATE function is a summation and COMBINE is a weighted sum concatenated with an MLP ([9]).
HigherOrder GNNs.
To increase the representational power of GNNs, several higherorder GNNs have been proposed. In a GNN, a message passing algorithm is applied to the tuples of vertices, in a similar fashion as GNNs do on vertices [10]. At initialization, each tuple is labeled with its type, that is, two tuples are labeled differently if their induced subgraphs are not isomorphic. As a result, GNNs can count (induced) substructures with at most vertices even at initialization. Another class of higherorder networks are IGNs, which are constructed with linear invariant/equivariant feedforward layers, whose inputs consider graphs via adjacency matrices [15]. IGNs are at least as powerful as GNNs, and hence they too can count substructures with at most vertices. However, both methods need operations.
Specifically for counting substructures, [11] propose Local Relational Pooling (LRP) networks. LRPs apply Relational Pooling (RP) networks [17, 18] on the neighborhoods around each vertex. RP networks use permutationvariant functions and convert them to a permutationinvariant function by summing over all permutations. This summation is computationally expensive.
3 Other Related Works
Expressive power. Several other works have studied the expressive power of GNNs [19]. [20] extend universal approximation from feedforward networks to GNNs, using the notion of unfolding equivalence. [21] establish an equivalence between the graph isomorphism problem and the power to approximate permutation invariant functions on graphs. [15] and [22]
propose higherorder, tensorbased GNN models that provably achieve universal approximation of permutationinvariant functions on graphs, and
[23] studies expressive power under depth and width restrictions. Studying GNNs from the perspective of local algorithms, [24]show that GNNs can approximate solutions to certain combinatorial optimization problems.
Subgraphs and GNNs. The idea of considering local neighborhoods to have better representations than MPNNs is considered in several works [25, 26, 27, 28, 29, 30, 31, 32]. For example, in link prediction, one can use local neighborhoods around links and apply GNNs, as suggested in [33]. A novel method based on combining GNNs and a clustering algorithm is proposed in ([34]). For graph comparison (i.e., testing whether a given possibly large subgraph exists in the given model), [35] compare the outputs of GNNs for small subgraphs of the two graphs. To improve the expressive power of GNNs, [36] use features that are counts of specific subgraphs of interest. Another related work is [37], where an MPNN is strengthened by learning local context matrices around vertices.
4 Recursive Neighborhood Pooling
Next, we construct Recursive Neighborhood Pooling Graph Neural Networks (RNPGNNs), GNNs that can count any set of induced substructures of interest, with lower complexity than previous models. We represent each vertex by a representation of its radius
neighborhood, and then combine these representations. The key question is how to encode these local neighborhoods in a vector representation. To do so, we introduce a new idea: we view local neighborhoods as small subgraphs, and recursively apply our model to encode these neighborhood subgraphs. When encoding the local subgraphs, we use a different radius
, and, recursively, a sequence of radii to obtain the final representation of vertices after recursion steps.While MPNNs also encode a representation of a local neighborhood of certain radius, the recursive representations differ as they essentially take into account intersections of neighborhoods. As a result, as we will see in Section 5.1, they retain more structural information and are more expressive. Models such as GNN and LRP also compute encodings of subgraphs, and then update the resulting representations via message passing. We can do the same with the neighborhood representations computed by RNPGNNs to encode more global information, although our representation results in Section 5.1 hold even without that. In Section 6, we will compare the computational complexity of RNPGNN and these other models.
Formally, an RNPGNN is a parametrized learnable function , where is the set of all labeled graphs on vertices. Let be a labeled graph with vertices, and let be the initial representation of each vertex . Let denote the neighborhood of radius of vertex , and let denote the induced subgraph of on the set of vertices , with augmented vertex label for any . This means we add information about whether vertices are direct neighbors (with distance one) of . Given a recursion sequence or radii, the representations are updated as
(4)  
(5) 
for any , and
(6) 
Different from MPNNs, the recursive update (4) is in general applied to a subgraph, and not a multiset of vertex representations. is an RNPGNN with recursion parameters . The final READOUT is an injective, permutationinvariant learnable multiset function.
If , then
(7) 
is a permutationinvariant aggregation function as used in MPNNs, only over a potentially larger neighborhood. For and , RNPGNN reduces to MPNN.
5 Expressive Power
In this section, we analyze the expressive power of RNPGNNs.
5.1 Counting (Induced) Substructures
In contrast to MPNNs, which, in general, cannot count substructures of three vertices or more, in this section we prove that for any set of substructures, there is an RNPGNN that provably counts them. We begin with a few definitions.
Definition 1.
Let be arbitrary labeled simple graphs, where is the set of vertices in . Also, for any , let denote the subgraph of induced by . The induced subgraph count function is defined as
(8) 
i.e., the number of subgraphs of isomorphic to . For unlabeled , the function is defined analogously.
We also need to define a notion of covering for graphs. Our definition uses distances on graphs.
Definition 2.
Let be a (possibly labeled) simple connected graph. For any and , define
(9) 
where is the shortestpath distance in .
Definition 3.
Let be a (possibly labeled) simple connected graph on vertices. A permutation of vertices, such as , is called a vertex covering sequence, with respect to a sequence called a covering sequence, if and only if
(10) 
for any , where and is the subgraph of induced by the set of vertices . We also say that admits the covering sequence if there is a vertex covering sequence for with respect to .
In particular, in a covering sequence we first consider the whole graph as a local neighborhood of one of its vertices with radius . Then, we remove that vertex and compute the covering sequence of the remaining graph. Figure 3 shows an example of a covering sequence computation. An important property, which holds by definition, is that if is a covering sequence for , then any (in a pointwise sense) is also a covering sequence for .
Note that any connected graph on vertices admits at least one covering sequence, which is . To observe this fact, note that in a connected graph, there is at least one vertex that can be removed and the remaining graph still remains connected. Therefore, we may take this vertex as the first element of a vertex covering sequence, and inductively find the other elements. Since the diameter of a connected graph with vertices is always bounded by , we achieve the desired result. However, we will see in the next section that, when using covering sequences to identify sufficiently powerful RNPGNNs, it is desirable to have covering sequences with low , since the complexity of the resulting RNPGNN depends on . We provide an algorithm in Appendix D to find such covering sequences in polynomial time
More generally, if and are (possibly labeled) simple graphs on vertices and , i.e., is a subgraph of (not necessarily inducedsubgraph), then, it follows from the definition that any covering sequence for is also a covering sequence for . As a side remark, as illustrated in Figure 4, covering sequences need not always to be decreasing.
Using covering sequences, we can show the following result.
Theorem 1.
Consider a set of (labeled or unlabeled) graphs on vertices, such that any admits the covering sequence . Then, there is an RNPGNN with recursion parameters that can count any . In other words, if there exists such that , then . The same result also holds for the noninduced subgraph count function.
Theorem 1 states that, with appropriate recursion parameters, any set of (labeled or unlabeled) substructures can be counted by an RNPGNN. Interestingly, induced and noninduced subgraphs can be both counted in RNPGNNs^{1}^{1}1For simplicity, we assume that only contains vertex graphs. If includes graphs with strictly less than vertices, we can simply add a sufficient number of zeros to the RHS of their covering sequences..
The theorem holds for any covering sequence that is valid for all graphs in . For any graph, one can compute a covering sequence by computing a spanning tree, and sequentially pruning the leaves of the tree. The resulting sequence of nodes is a vertex covering sequence, and the corresponding covering sequence can be obtained from the tree too (Appendix D). A valid covering sequence for all the graphs in is the coordinatewise maximum of all these sequences.
For large substructures, the sequence can be long or include large numbers, and this will affect the computational complexity of RNPGNNs. For small, e.g., constantsize substructures, the recursion parameters are also small (i.e., for all ), raising the hope to count these structures efficiently. In particular, is an important parameter. In Section 6, we analyze the complexity of RNPGNNs in more detail.
5.2 A Universal Approximation Result for Local Functions
Theorem 1 shows that RNPGNNs can count substructures if their recursion parameters are chosen carefully. Next, we provide a universal approximation result, which shows that they can learn any function related to local neighborhoods or small subgraphs in a graph.
First, we recall that for a graph , denotes the subgraph of induced by the set of vertices .
Definition 4.
A function is called an local graph function if
(11) 
where is a function on graphs and is a multiset function.
In other words, a local function only depends on small substructures.
Theorem 2.
For any local graph function , there exists an RNPGNN with recursion parameters such that for any .
As a result, we can provably learn all the local information in a graph with an appropriate RNPGNN. Note that we still need recursions, because the function may be an arbitrarily difficult graph function. However, to achieve the full generality of such a universal approximation result, we need to consider large recursion parameters () and injective aggregations in the RNPGNN network. For universal approximation, we may also need high dimensions if feedforward network layers are used for aggregation (see the proof of the theorem for more details).
As a remark, for , achieving universal approximation on graphs implies solving the graph isomorphism problem. But, in this extreme case, the computational complexity of the model in general is not a polynomial in .
6 Computational Complexity
The computational complexity of RNPGNNs is graphdependent. For instance, we need to compute the set of local neighborhoods, which is cheaper for sparse graphs. A complexity measure existing in the literature is the tensor order. For higherorder networks, e.g., IGNs, we need to consider tensors in . The space complexity is then and the time complexity can be even more, dependent on the algorithm used to process tensors. In general, for a message passing algorithm on graphs, the complexity of the model depends linearly on the number of vertices (if the graph is sparse). Therefore, to bound the complexity of a method, we need to bound the number of node representation updates, which we do in the following theorem.
Theorem 3.
Let be an RNPGNN with the recursion parameters . Assume that the observed graphs , whose representations we compute, satisfy the following property:
(12) 
where is a graph independent constant. Then, the number of node updates in the RNPGNN is .
In other words, if and , then RNPGNN requires relatively few updates (that is, ), compared to the higherorder networks (). Also, in this case, finding neighborhoods is not difficult, since neighborhoods are small (). Note that if the maximum degree of the given graphs is , then . Therefore, similarly, if then we can count with at most updates.
The above results show that when using RNPGNNs with sparse graphs, we can learn functions of substructures with vertices without requiring order tensors. LRPs also encode neighborhoods of distance around nodes. In particular, all permutations of the nodes in a neighborhood of size are considered to obtain the representation. As a result, LRP networks only have polynomial complexity if . Thus, RNPGNNs can provide an exponential improvement in terms of the tolerable size of neighborhoods with distance in the graph.
7 An InformationTheoretic Lower Bound
In this section, we provide a general informationtheoretic lower bound for graph representations that encode a given graph by first encoding a number of (possibly small) graphs and then aggregating the resulting representations. The sequence of graphs may be obtained in an arbitrary way from . For example, in an MPNN, can be the computation tree (rooted tree) at node . As another example, in LRP, is the local neighborhood around node .
Formally, consider a graph representation as
(13) 
for any , where is a multiset function, where is a function from one graph to graphs, and is a function on graphs taking values. In short, we encode graphs, and each encoding takes one of values. We call this graph representation function an good graph representation.
Theorem 4.
Consider a parametrized class of good representations that is able to count any (not necessarily induced^{2}^{2}2The theorem also holds for inducedsubgraphs, with/without vertex labels.) substructure with vertices. More precisely, for any graph with vertices, there exists such that if , then . Then^{3}^{3}3 is up to polylogarithmic factors. , .
In particular, for any good graph representation with , i.e., binary encoding functions, we need encoded graphs. This implies that, for , enumerating all subgraphs and deciding for each whether it equals is near optimal. Moreover, if , then small graphs would not suffice to enable counting.
More interestingly, if , then it is impossible to perform the substructure counting task with . As a result, in this case, considering encoded graphs (as is done in GNNs or LRP networks) cannot be exponentially improved.
The lower bound in this section is informationtheoretic and hence applies to any algorithm. It may be possible to strengthen it by considering computational complexity, too. For binary encodings, i.e., , however, we know that the bound cannot be improved since manual counting of subgraphs matches the lower bound.
8 Time Complexity Lower Bounds for Counting Subgraphs
In this section, we put our results in the context of known hardness results for subgraph counting.
In general, the subgraph isomorphism problem is known to be NPcomplete. Going further, the Exponential Time Hypothesis (ETH) is a conjecture in complexity theory [38], and states that several NPcomplete problems cannot be solved in subexponential time. ETH, as a stronger version of the problem, is widely believed to hold. Assuming that ETH holds, the clique detection problem requires at least time [39]. This means that if a graph representation can count any subgraph of size , then computing it requires at least time.
Corollary 1.
Assuming ETH conjecture holds, any graph representation that can count any substructure on vertices with appropriate parametrization needs time to compute.
The above bound matches the complexity of the higherorder GNNs. Comparing with Theorem 4 above, Corollary 1 is more general, while Theorem 4 has fewer assumptions and offers a refined result for aggregationbased graph representations.
Given that Corollary 1 is a worstcase bound, a natural question is whether we can do better for subclasses of graphs. Regarding , even if is a random ErdösRényi graph, it can only be counted in time [40].
Regarding the input graph in which we count, consider two classes of sparse graphs: strongly sparse graphs have maximum degree , and weakly sparse graphs have average degree . We argued in Theorem 3 that RNPGNNs achieve almost linear complexity for the class of strongly sparse graphs. For weakly sparse graphs, in contrast, the complexity of RNPGNNs is generally not linear, but still polynomial, and can be much better than . One may ask whether it is possible to achieve a learnable graph representation such that its complexity for weakly sparse graphs is still linear. Recent results in complexity theory imply that this is impossible:
Corollary 2 ([41, 42]).
There is no graph representation algorithm that runs in linear time on weakly sparse graphs and is able to count any substructure on vertices (with appropriate parametrization).
Hence, RNPGNNs are close to optimal for several cases of counting substructures with parametrized learnable functions.
9 Conclusion
In this paper, we studied the theoretical possibility of counting substructures (inducedsubgraphs) by a graph representation network. We proposed an architecture, called RNPGNN, and we proved that for reasonably sparse graphs we can efficiently count substructures. Characterizing the expressive power of GNNs via the set of functions they can learn on substructures may be useful for developing new architectures. In the end, we proved a general lower bound for any graph representation which counts subgraphs and works by aggregating representations of a collection of graphs derived from the graph.
Acknowledgements
This project was funded by NSF CAREER award 1553284 and an ONR MURI.
References
 [1] F. Scarselli, M. Gori, A. C. Tsoi, M. Hagenbuchner, and G. Monfardini, “The graph neural network model,” IEEE Transactions on Neural Networks, vol. 20, no. 1, pp. 61–80, 2008.
 [2] T. N. Kipf and M. Welling, “Semisupervised classification with graph convolutional networks,” in International Conference on Learning Representations, 2017.
 [3] W. Hamilton, Z. Ying, and J. Leskovec, “Inductive representation learning on large graphs,” in Advances in neural information processing systems, pp. 1024–1034, 2017.
 [4] D. K. Duvenaud, D. Maclaurin, J. Iparraguirre, R. Bombarell, T. Hirzel, A. AspuruGuzik, and R. P. Adams, “Convolutional networks on graphs for learning molecular fingerprints,” in Advances in neural information processing systems, pp. 2224–2232, 2015.

[5]
M. Defferrard, X. Bresson, and P. Vandergheynst, “Convolutional neural networks on graphs with fast localized spectral filtering,” in
Advances in neural information processing systems, pp. 3844–3852, 2016.  [6] P. Battaglia, R. Pascanu, M. Lai, D. J. Rezende, et al., “Interaction networks for learning about objects, relations and physics,” in Advances in neural information processing systems, pp. 4502–4510, 2016.
 [7] W. Jin, K. Yang, R. Barzilay, and T. Jaakkola, “Learning multimodal graphtograph translation for molecule optimization,” in International Conference on Learning Representations, 2018.

[8]
J. Gilmer, S. S. Schoenholz, P. F. Riley, O. Vinyals, and G. E. Dahl, “Neural
message passing for quantum chemistry,” in
Proceedings of the 34th International Conference on Machine LearningVolume 70
, pp. 1263–1272, 2017.  [9] K. Xu, W. Hu, J. Leskovec, and S. Jegelka, “How powerful are graph neural networks?,” in International Conference on Learning Representations, 2019.
 [10] C. Morris, M. Ritzert, M. Fey, W. L. Hamilton, J. E. Lenssen, G. Rattan, and M. Grohe, “Weisfeiler and leman go neural: Higherorder graph neural networks.,” in AAAI, 2019.
 [11] Z. Chen, L. Chen, S. Villar, and J. Bruna, “Can graph neural networks count substructures?,” arXiv preprint arXiv:2002.04025, 2020.
 [12] V. Garg, S. Jegelka, and T. Jaakkola, “Generalization and representational limits of graph neural networks,” in Int. Conference on Machine Learning (ICML), pp. 5204–5215, 2020.

[13]
D. C. Elton, Z. Boukouvalas, M. D. Fuge, and P. W. Chung, “Deep learning for molecular design—a review of the state of the art,”
Molecular Systems Design & Engineering, vol. 4, no. 4, pp. 828–849, 2019.  [14] M. Sun, S. Zhao, C. Gilvary, O. Elemento, J. Zhou, and F. Wang, “Graph convolutional networks for computational drug development and discovery,” Briefings in bioinformatics, vol. 21, no. 3, pp. 919–935, 2020.
 [15] H. Maron, E. Fetaya, N. Segol, and Y. Lipman, “On the universality of invariant networks,” in International Conference on Machine Learning, pp. 4363–4371, 2019.
 [16] V. Arvind, F. Fuhlbrück, J. Köbler, and O. Verbitsky, “On weisfeilerleman invariance: subgraph counts and related graph properties,” Journal of Computer and System Sciences, 2020.
 [17] R. Murphy, B. Srinivasan, V. Rao, and B. Riberio, “Relational pooling for graph representations,” in International Conference on Machine Learning (ICML 2019), 2019.
 [18] R. Murphy, B. Srinivasan, V. Rao, and B. Riberio, “Janossy pooling: Learning deep permutationinvariant functions for variablesize inputs,” in International Conference on Learning Representations, 2019.
 [19] W. Azizian and M. Lelarge, “Characterizing the expressive power of invariant and equivariant graph neural networks,” arXiv preprint arXiv:2006.15646, 2020.
 [20] F. Scarselli, M. Gori, A. C. Tsoi, M. Hagenbuchner, and G. Monfardini, “Computational capabilities of graph neural networks,” IEEE Transactions on Neural Networks, vol. 20, no. 1, pp. 81–102, 2009.
 [21] Z. Chen, S. Villar, L. Chen, and J. Bruna, “On the equivalence between graph isomorphism testing and function approximation with gnns,” in Advances in Neural Information Processing Systems, pp. 15894–15902, 2019.
 [22] N. Keriven and G. Peyré, “Universal invariant and equivariant graph neural networks,” in Advances in Neural Information Processing Systems (NeurIPS), pp. 7092–7101, 2019.
 [23] A. Loukas, “What graph neural networks cannot learn: depth vs width,” in International Conference on Learning Representations, 2019.
 [24] R. Sato, M. Yamada, and H. Kashima, “Approximation ratios of graph neural networks for combinatorial problems,” in Advances in Neural Information Processing Systems, pp. 4081–4090, 2019.

[25]
S. Liu, M. F. Demirel, and Y. Liang, “Ngram graph: Simple unsupervised representation for graphs, with applications to molecules,” in
Advances in Neural Information Processing Systems, pp. 8466–8478, 2019. 
[26]
F. Monti, K. Otness, and M. M. Bronstein, “Motifnet: a motifbased graph
convolutional network for directed graphs,” in
2018 IEEE Data Science Workshop (DSW)
, pp. 225–228, IEEE, 2018.  [27] X. Liu, H. Pan, M. He, Y. Song, X. Jiang, and L. Shang, “Neural subgraph isomorphism counting,” in Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 1959–1969, 2020.
 [28] Y. Yu, K. Huang, C. Zhang, L. M. Glass, J. Sun, and C. Xiao, “Sumgnn: Multityped drug interaction prediction via efficient knowledge graph summarization,” arXiv preprint arXiv:2010.01450, 2020.
 [29] C. Meng, S. C. Mouli, B. Ribeiro, and J. Neville, “Subgraph pattern neural networks for highorder graph evolution prediction.,” in AAAI, pp. 3778–3787, 2018.
 [30] L. Cotta, C. H. C. Teixeira, A. Swami, and B. Ribeiro, “Unsupervised joint node graph representations with compositional energybased models,” arXiv preprint arXiv:2010.04259, 2020.
 [31] E. Alsentzer, S. Finlayson, M. Li, and M. Zitnik, “Subgraph neural networks,” Advances in Neural Information Processing Systems, vol. 33, 2020.
 [32] K. Huang and M. Zitnik, “Graph meta learning via local subgraphs,” arXiv preprint arXiv:2006.07889, 2020.
 [33] M. Zhang and Y. Chen, “Link prediction based on graph neural networks,” in Advances in Neural Information Processing Systems, pp. 5165–5175, 2018.
 [34] Z. Ying, J. You, C. Morris, X. Ren, W. Hamilton, and J. Leskovec, “Hierarchical graph representation learning with differentiable pooling,” in Advances in neural information processing systems, pp. 4800–4810, 2018.
 [35] R. Ying, Z. Lou, J. You, C. Wen, A. Canedo, and J. Leskovec, “Neural subgraph matching,” arXiv preprint arXiv:2007.03092, 2020.
 [36] G. Bouritsas, F. Frasca, S. Zafeiriou, and M. M. Bronstein, “Improving graph neural network expressivity via subgraph isomorphism counting,” arXiv preprint arXiv:2006.09252, 2020.
 [37] C. Vignac, A. Loukas, and P. Frossard, “Building powerful and equivariant graph neural networks with messagepassing,” arXiv preprint arXiv:2006.15107, 2020.
 [38] R. Impagliazzo and R. Paturi, “On the complexity of ksat,” Journal of Computer and System Sciences, vol. 62, no. 2, pp. 367–375, 2001.
 [39] J. Chen, B. Chor, M. Fellows, X. Huang, D. Juedes, I. A. Kanj, and G. Xia, “Tight lower bounds for certain parameterized nphard problems,” Information and Computation, vol. 201, no. 2, pp. 216–231, 2005.

[40]
M. Dalirrooyfard, T. D. Vuong, and V. V. Williams, “Graph pattern detection:
Hardness for all induced patterns and faster noninduced cycles,” in
Proceedings of the 51st Annual ACM SIGACT Symposium on Theory of Computing
, pp. 1167–1178, 2019.  [41] L. Gishboliner, Y. Levanzov, and A. Shapira, “Counting subgraphs in degenerate graphs,” arXiv preprint arXiv:2010.05998, 2020.
 [42] S. K. Bera, N. Pashanasangi, and C. Seshadhri, “Linear time subgraph counting, graph degeneracy, and the chasm at size six,” arXiv preprint arXiv:1911.05896, 2019.
 [43] P. Kelly et al., “A congruence theorem for trees.,” Pacific Journal of Mathematics, vol. 7, no. 1, pp. 961–968, 1957.
 [44] B. D. McKay, “Small graphs are reconstructible,” Australasian Journal of Combinatorics, vol. 15, pp. 123–126, 1997.
 [45] K. Hornik, M. Stinchcombe, H. White, et al., “Multilayer feedforward networks are universal approximators.,” Neural networks, vol. 2, no. 5, pp. 359–366, 1989.
 [46] K. Hornik, “Approximation capabilities of multilayer feedforward networks,” Neural networks, vol. 4, no. 2, pp. 251–257, 1991.
 [47] J. Kleinberg and E. Tardos, Algorithm design. Pearson Education India, 2006.
Appendix A Proofs
a.1 Proof of Theorem 1
a.1.1 Preliminaries
Let us first state a few definitions about the graph functions. Note that for any graph function , we have for any .
Definition 5.
Given two graph functions , we write , if and only if for any ,
(14) 
or, equivalently,
(15) 
Proposition 1.
Consider graph functions such that and . Then, . In other words, is transitive.
Proof.
The proposition holds by definition. ∎
Proposition 2.
Consider graph functions such that . Then, there is a function such that .
Proof.
Let be the partitioning induced by the equality relation with respect to the function on . Similarly define , for . Note that due to the definition, is a refinement for . Define to be the unique mapping from to which respects the equality relation. One can observe that such satisfies the requirement in the proposition. ∎
Definition 6.
An RNPGNN is called maximally expressive, if and only if

all the aggregate functions are injective as mappings from a multiset on a countable ground set to their codomain.

all the combine functions are injective mappings.
Proposition 3.
Consider two RNPGNNs with the same recursion parameters where is maximally expressive. Then, .
Proof.
The proposition holds by definition. ∎
Proposition 4.
Consider a sequence of graph functions . If for all , then
(16) 
for any , .
Proof.
Since , we have
(17) 
for all . This means that for any if then , , and consequently . Therefore, from the definition we conclude . Note that the same proof also holds in the case of countable summations as long as the summation is bounded. ∎
Definition 7.
Let be a labeled connected simple graph on vertices. For any labeled graph , the induced subgraph count function is defined as
(18) 
Also, let denote the number of noninduced subgraphs of which are isomorphic to . It can be defined with the homomorphisms from to . Formally, if define
(19) 
Otherwise, , and we define
(20) 
where
(21) 
is defined with respect to the graph isomorphism, and denotes the number of subgraphs in identical to . Note that is a finite set and denotes being a (not necessarily induced) subgraph.
Proposition 5.
Let be a family of graphs. If for any , there is an RNPGNN with recursion parameters such that , then there exists an RNPGNN with recursion parameters such that .
Proof.
Let be a maximally expressive RNPGNN. Note that by the definition for any . Since is transitive, for all , and using Proposition 4, we conclude that . ∎
The following proposition shows that there is no difference between counting induced labeled graphs and counting induced unlabeled graphs in RNPGNNs.
Proposition 6.
Let be an unlabeled connected graph. Assume that for any labeled graph , which is constructed by adding arbitrary labels to , there exists an RNPGNN such that , then for its unlabeled counterpart , there exists an RNPGNN with the same recursion parameters as such that .
Proof.
If there exists an RNPGNN such that , then for a maximally expressive RNPGNN with the same recursion parameters as we also have . Let be the set of all labeled graphs up to graph isomorphism, where for a countable set . Note that is a countable set. Now we write
(22)  
(23)  
(24)  
(25) 
Now using Proposition 4 we conclude that since is always finite. ∎
Definition 8.
Let be a (possibly labeled) simple connected graph. For any and , define
(27) 
Definition 9.
Let be a (possibly labeled) connected simple graph on vertices. A permutation of vertices, such as , is called a vertex covering sequence, with respect to a sequence , called a covering sequence, if and only if
(28) 
for , where and . Let denote the set of all vertex covering sequences with respect to the covering sequence for .
Proposition 7.
For any , if (noninduced subgraph), then
(29) 
for any sequence .
Proof.
The proposition follows from the fact that the function is decreasing with introducing new edges. ∎
Proposition 8.
Assume that Theorem 1 holds for inducedsubgraph count functions. Then, it also holds for the noninduced subgraph count functions.
Proof.
Assume that for a connected (labeled or unlabeled) graph , there exists an RNPGNN with appropriate recursion parameters such that , then we prove there exists an RNPGNN with the same recursion parameters as such that .
If there exists an RNPGNN such that , then for a maximally expressive RNPGNN with the same recursion parameters as we also have . Note that
(30)  
(31)  
(32)  
(33) 
where .
Claim 1.
for any .
At the end of this part, let us introduce an important notation. For any labeled connected simple graph on vertices , let be the resulting induced graph obtained after removing from with the new labels defined as
(34) 
for each . We may also use for more clarification.
a.1.2 Proof of Theorem 1
We utilize an inductive prove on , which is the length of the covering sequence of . Equivalently, due to the definition, , where is the number of vertices in . First, we note that due to Proposition 8, without loss of generality, we can assume that is a simple connected labeled graph and the goal is to achieve the inducedsubgraph count function via an RNPGNN with appropriate recursion parameters. We also consider only maximally expressive networks here to prove the desired result.
Induction base. For the induction base, i.e., , is a twovertex graph. This means that we only need to count the number of a specific (labeled) edge in the given graph . Note that in this case we apply an RNPGNN with recursion parameter . Denote the two labels of the vertices in by . The output of an RNPGNN is
(35) 
where we assume that is maximally expressive. The goal is to show that . Using the transitivity of , we only need to choose appropriate to achieve as the final representation. Let
(36)  
(37)  
(38) 
Then, a simple computation shows that
(39)  
(40) 
Since is an RNPGNN with recursion parameter and for any maximally expressive RNPGNN with the same recursion parameter as we have and , we conclude that and this completes the proof.
Induction step. Assume that the desired result holds for (). We show that it also holds for . Let us first define
(41)  
(42) 
Note that by the assumption. Let
(43) 
For all , using the induction hypothesis, there is a (universal) RNPGNN with recursion parameters such that . Using Proposition 4 we conclude
(44) 
Define a maximally expressive RNPGNN with the recursion parameters as follows:
(45) 
Similar to the proof for , here we only need to propose a (not necessarily maximally expressive) RNPGNN which achieves the function .
Let us define
(46) 
where
(47)  
(48) 
and . Note that the existence of such function is guaranteed due to Proposition 2. Now we write
(50)  
(51)  
(52)  
(53)  
Comments
There are no comments yet.