1 Introduction
1.1 Background and Motivation
In this study, we consider a graph regression problem. Let be the set of simple directed graphs with vertex and edge weights.
Problem 1.
We are given pairs of input graphs and outcomes . The task is to learn a hypothesis such that .
This problem naturally arises in practice. For example, in a toxicity detection problem [23], we want to learn a function () such that if has a toxicity and otherwise. For another example, in a community detection problem [1], we want to learn a function () such that if the vertices and are in the same community and otherwise. Here, we denote by for a finite set and a positive integer .333We only consider the values of equivariant functions at pairwise different indices. The general case is easily handled by considering each pattern separately, but this complicates the notation. Thus we concentrate on this case.
We are often interested in a hypothesis that merely depends on the topology of the graph (see Section 1.3). Mathematically, this condition is represented by invariance and equivariance. A function () is invariant if for any graph and a permutation of , the following equation holds:
(1) |
where is the graph whose indices of vertices are permuted by , i.e., with . A function is equivariant if for any graph and any permutation on , the following equation holds:
(2) |
where is defined by the relation for all . The invariance and equivariance mean that the output of the function is determined up to isomorphism.
One desirable property of a hypothesis space is the universal approximation property (universality for short) [3, 5], i.e., for any continuous function is arbitrary accurately by a function in the hypothesis space. Maron et al. [14] introduced a feed-forward invariant neural network and proved that this model has the universality in the continuous invariant functions. They also characterized all the invariant linear layers [13]. Keriven and Peyré [6] extended the tensorized graph neural network (Figure 0(a)) to represent equivariant functions and proved the universality for the dimensional output case. Then, they left the universality for the higher-order output case as an open problem.
![]() |
![]() |
1.2 Contribution
In this study, we give a simple proof of the universality of tensorized neural networks for both invariant and (higher-order) equivariant cases; the latter solves an open problem posed in [6]. Our proof relies on a result in graph theory (see Section 3 for a comparison of proof techniques in the existing studies).
Let be the set of simple unweighted graphs, and let . Let be a weighted graph, where is the weighted adjacency matrix. Let be the set of all permutations on . Then, the homomorphism number is defined by
(3) |
Similarly, for a given , the -labeled homomorphism number is defined by
(4) |
By the definition, homomorphism numbers and -labeled homomorphism numbers are continuous functions in that is invariant and equivariant, respectively.
Let be the set of weighted directed graphs whose edge weights are bounded by one and the number of vertices is . Let and be the set of functions of the following forms
(5) | ||||
(6) |
where
denotes the identity matrix of
. We prove the following theorems.Theorem 2.
is dense in the continuous invariant functions.
Theorem 3.
is dense in the continuous equivariant functions.
The translated homomorphism number and the translated -labeled homomorphism number are invariant and equivariant linear function on the -fold tensor product, respectively. Therefore, we can implement them in a tensorized neural network. This means that our model is less powerful than the tensorized graph neural network. On the other hand, because our models have the universality, we obtain the following.
Corollary 4.
The invariant (resp., equivariant) tensorized graph neural network has the universality in continuous invariant (resp., equivariant) functions. ∎
1.3 Related Work
In practice, the design of a machine learning model (e.g. neural networks) usually follows some prior knowledge about the target functions since restriction bias helps to simplify the learning process. For instance, in image processing, convolutional neural networks
[10] are designed to be translation invariant [9] or shift-invariant [22]. Therefore, much research has been conducted to address the universality of general invariant neural networks. More recently, graph neural networks predicting labels of vertices [4, 7, 19] have hinted the importance of equivariant models. A natural question in learning theory is whether these aforementioned models are universal? [5, 13]. Here, we discuss related work that answered this question.Invariant models
The invariant property of a model is usually discussed in the context of learning from points clouds and sets [16, 21, 18], then generalized to symmetries [13] and group actions [2]. While universality analyses for models on sets are well developed, the analysis for graphs is limited [13, 14, 8]. Recently, Maron et al. [14] proved a neural network which is -invariant is universal. Similarly, Keriven and Peyré [6] obtained the universal result on tensorized graph neural networks by using a more direct application of the Stone-Weierstrass theorem.
Equivariant models
The existence of equivariant models only makes practical sense in some specific cases, for example, learning on graphs’ vertices. Therefore, comparing to the invariant case, there are only a limited number of work addressing equivariance [17, 6, 18]. Consequently, the universality of equivariant graph models have only has recently proven by Keriven and Peyré [6].
2 Proofs
To prove the universality of a class of functions, we use the Stone–Weierstrass theorem:
Theorem 5 (Stone–Weierstrass Theorem [15, Theorem 1.1]).
Let be a compact Hausdorff space and be the set of continuous functions from to , equipped with the -norm. If a subalgebra satisfies the following two conditions:
-
separates points, i.e., for any , there exists such that , and
-
There exists that is bounded away from zero, i.e., ,
then is dense in . ∎
The proofs are devoted to verify the conditions of the Stone–Weierstrass theorem.
2.1 Proof of Theorem 2 (Invariant Case)
We first define the graph space. Let be the number of vertices in input graphs. Let
(7) |
be the set of the weighted adjacency matrices.
We denote by .
The norm444Because is a finite-dimensional vector space, any norms are equivalent; thus, the result in this section is invariant with respect to the choice of the norm.
(8) |
Then, we introduce the edit distance by
(9) |
where is the set of all permutations on . The edit distance is nonnegative and satisfies the triangle inequality, i.e., it is a pseudo-metric. We define the graph space by the metric identification as where if and only if . This forms a metric space.
Any invariant function is identified as a function . Our goal is to prove the universality for the set of continuous functions from to . Now, we check the conditions of the Stone–Weierstrass theorem.
First, we check the condition of the space.
Lemma 6.
The graph space is a compact Hausdorff space.
Proof.
It is Hausdorff because it is a metric space. We show the sequential compactness. Let be an arbitrary sequence in , which is also identified as a sequence in . Because is compact in the norm, we can choose a convergent subsequence. Such sequence is also a convergent subsequence in . Thus, is compact. ∎
Next, we check the conditions of .
Lemma 7.
forms an algebra.
Proof.
Clearly, it is closed under the addition and the scalar multiplication. It is closed under the product because of the following identity:
(10) |
where is the disjoint union of and . ∎
Lemma 8.
contains an element that is bounded away from zero.
Proof.
Let be the singleton graph. Then, is bounded away from zero. ∎
To prove the separate point property, we use the following theorem.
Theorem 9 ([11, Lemma 2.4], case, in our terminology).
Let be matrices with positive diagonal elements. Then, and are isomorphic if and only if for all simple unweighted graph . ∎
Lemma 10.
separates points in .
Proof.
If are non-isomorphic, then and are also non-isomorphic. Because and satisfy the condition in Theorem 9, there exists such that . This means that separates points in . ∎
Therefore, we proved Theorem 2.
2.2 Proof of Theorem 3 (Equivariant Case)
We identify an array-valued function as a two-argument function . Let . Then, each element is identified as a -labeled graph, which is a graph with distinguished vertices .
For a permutation , we define
(11) |
where . Then, is equivariant if and only if is invariant in the sense that . We say that and are isomorphic if .
Now we define the -labeled edit distance by
(12) |
Then, we define the -labeled graph space by the metric identification as where if and only if , i.e., these are isomorphic. This forms a metric space.
Any equivariant function is identified as a function . Our goal is to prove the universality for the set of continuous functions from to . Now, we check the condition of the Stone–Weierstrass theorem. This part is very similar to that of the invariant case.
First, we check the condition of the space.
Lemma 11.
The -labeled graph space is compact.
Proof.
It is Hausdorff because it is a metric space. We show the sequentially compactness. Let be an arbitrary sequence in , which is also identified as a sequence in . Because the number of possibilities of is finite, we can select an (infinite) subsequence that has the same value of . The remaining part is the same as the proof of Lemma 6. ∎
Next, we check the conditions of .
Lemma 12.
forms an algebra.
Proof.
Clearly, it is closed under the addition and the scalar multiplication. It is closed under the product because of the following identity.
(13) |
where is the graph obtained from the disjoint union of and by glueing the labeled vertices. ∎
Lemma 13.
contains an element that is bounded away from zero.
Proof.
Let be the graph of isolated vertices. Then, is bounded away from zero. ∎
To prove the separate point property, we use the following theorem.
Theorem 14 ([11, Lemma 2.4] in our terminology).
Let be matrices with positive diagonal elements. Let . Then, and are isomorphic if and only if for all simple unweighted graph . ∎
Lemma 15.
separates points in .
Therefore, we proved Theorem 3.
3 Comparison with Other Proofs
Compare with Keriven and Peyré [6].
Our proofs are similar to that of their proofs. For the invariant case, they used the standard Stone–Weierstrass theorem and verified the separable points property by constructing functions on higher-order tensor space. For the equivariant case, they developed a new Stone–Weierstrass type theorem, and verified the corresponding separate point properties by a similar technique to the invariant case. On the other hand, for both cases, we used the standard Stone–Weierstrass theorem, and verified the separate point property using the property of the homomorphism number. This unified treatment allows us to establish the result on arbitrary higher-order outputs.
One advantage of their method is that it is applicable to hypergraphs. Our method could be applicable for hypergraphs; however, there is a gap because the theory of weighted homomorphism number of hypergraphs is not well established (compared with graphs).
Note that they considered graphs with different but bounded numbers of vertices (i.e., ). However, this is not effective because such space is disconnected, and each connected components corresponds to the graphs having the same number of vertices. If we have to consider a set of graphs with different numbers of vertices, it is promising to consider graphons; see Section 4.
Compare with Maron et al. [13]
They only considered the invariant case. They used the universality of symmetric polynomials by Yarotsky [20]. Then, they approximated the polynomials by a tensorized neural network.
One advantage of their method is that one can bound the order of tensors. Our method can also bound the order of tensors by bounding the size of the subgraphs [12, Theorem 5.33]; however, this may give a loose bound. On the other hand, our method shows a very restricted form of the linear invariant (or equivariant) layers are sufficient to obtain the universality.
4 Concluding Remarks
In this study (and the existing studies [13, 6]), the number of vertices in the input graphs are fixed. This is reasonable because the graph space is disconnected and each connected component corresponds to the graphs of the same number of vertices; hence, a continuous function in the graph space of different numbers of vertices is just a collection of continuous functions in each connected component.
If we want to consider graphs of different numbers of vertices, it is promising to consider graphons [12]. Below we explain that all the results obtained in this paper can be extended to graphons.
An (asymmetric) graphon is a measurable function . This is a continuous generalization of the weighted adjacency matrix. The set of graphons is denoted by . The cut-norm is defined by
(14) |
and the cut-distance is defined by
(15) |
where runs over all measure-preserving bijections. The graphon space is defined by the metric identification.
The graphon space contains infinitely many large graphs. However, it is still compact with respect to the cut distance. Note that this does not hold for the edit distance.
Theorem 16 ([12, Theorem 9.23]).
The graphon space is compact.
In graphons, we use the homomorphism density instead of the homomorphism number. Let be a simple unweighted graph. Then, the homomorphism density is given by
(16) |
The set of linear combination of the homomorphism densites also forms an unital algebra. To show the separate points property, we can use the following theorem.
Theorem 17 (Directed version of [12, Corollary 10.34]).
Let . and are isomorphic if and only if for all simple unweighted graph . ∎
Therefore, we obtain the following result.
Theorem 18.
The set of finite linear combinations of the homomorphism densities are dense in the continuous invariant graphon functions. ∎
Note that this fact is already proved in [12, Theorem 17.6] for a different context (for symmetric graphons).
The equivariant case is also handled by considering -labeled graphons. The -labeled homomorphism density is given by
(17) |
Then, we obtain the following result.
Theorem 19.
The set of finite linear combinations of the -labeled homomorphism densities are dense in the continuous equivariant graphon functions. ∎
References
- [1] Joan Bruna and X Li. Community detection with graph neural networks. stat, 1050:27, 2017.
- [2] Taco Cohen and Max Welling. Group equivariant convolutional networks. In International conference on machine learning, pages 2990–2999, 2016.
-
[3]
George Cybenko.
Approximation by superpositions of a sigmoidal function.
Mathematics of Control, Signals and Systems, 2(4):303–314, 1989. - [4] Michaël Defferrard, Xavier Bresson, and Pierre Vandergheynst. Convolutional neural networks on graphs with fast localized spectral filtering. In Advances in neural information processing systems, pages 3844–3852, 2016.
- [5] Kurt Hornik. Approximation capabilities of multilayer feedforward networks. Neural Networks, 4(2):251–257, 1991.
- [6] Nicolas Keriven and Gabriel Peyré. Universal invariant and equivariant graph neural networks. arXiv preprint arXiv:1905.04943, 2019.
- [7] Thomas N. Kipf and Max Welling. Semi-supervised classification with graph convolutional networks. In International Conference on Learning Representations, 2017.
- [8] Risi Kondor and Shubhendu Trivedi. On the generalization of equivariance and convolution in neural networks to the action of compact groups. arXiv preprint arXiv:1802.03690, 2018.
- [9] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems, pages 1097–1105, 2012.
- [10] Yann LeCun, Bernhard Boser, John S Denker, Donnie Henderson, Richard E Howard, Wayne Hubbard, and Lawrence D Jackel. Backpropagation applied to handwritten zip code recognition. Neural computation, 1(4):541–551, 1989.
- [11] László Lovász. The rank of connection matrices and the dimension of graph algebras. European Journal of Combinatorics, 27(6):962–970, 2006.
- [12] László Lovász. Large networks and graph limits, volume 60. American Mathematical Soc., 2012.
- [13] Haggai Maron, Heli Ben-Hamu, Nadav Shamir, and Yaron Lipman. Invariant and equivariant graph networks. arXiv preprint arXiv:1812.09902, 2018.
- [14] Haggai Maron, Ethan Fetaya, Nimrod Segol, and Yaron Lipman. On the universality of invariant networks. arXiv preprint arXiv:1901.09342, 2019.
- [15] L. D. Nel. Theorems of stone-weierstrass type for non-compact spaces. Mathematische Zeitschrift, 104(3):226–230, 1968.
-
[16]
Charles R. Qi, Hao Su, Kaichun Mo, and Leonidas J. Guibas.
Pointnet: Deep learning on point sets for 3d classification and segmentation.
InThe IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
, July 2017. - [17] Siamak Ravanbakhsh, Jeff Schneider, and Barnabas Poczos. Equivariance through parameter-sharing. In Proceedings of the 34th International Conference on Machine Learning-Volume 70, pages 2892–2901. JMLR. org, 2017.
- [18] Akiyoshi Sannai, Yuuki Takai, and Matthieu Cordonnier. Universal approximations of permutation invariant/equivariant functions by deep neural networks. arXiv preprint arXiv:1903.01939, 2019.
- [19] Petar Veličković, William Fedus, William L. Hamilton, Pietro Liò, Yoshua Bengio, and R. Devon Hjelm. Deep graph infomax. International Conference on Learning Representations, 2019.
- [20] Dmitry Yarotsky. Universal approximations of invariant maps by neural networks. arXiv preprint arXiv:1804.10306, 2018.
- [21] Manzil Zaheer, Satwik Kottur, Siamak Ravanbakhsh, Barnabas Poczos, Ruslan R Salakhutdinov, and Alexander J Smola. Deep sets. In Advances in neural information processing systems, pages 3391–3401, 2017.
- [22] Richard Zhang. Making convolutional networks shift-invariant again. arXiv preprint arXiv:1904.11486, 2019.
- [23] Marinka Zitnik and Jure Leskovec. Predicting multicellular function through multi-layer tissue networks. Bioinformatics, 33(14):i190–i198, 2017.
Comments
There are no comments yet.