# A Simple Proof of the Universality of Invariant/Equivariant Graph Neural Networks

We present a simple proof for the universality of invariant and equivariant tensorized graph neural networks. Our approach considers a restricted intermediate hypothetical model named Graph Homomorphism Model to reach the universality conclusions including an open case for higher-order output. We find that our proposed technique not only leads to simple proofs of the universality properties but also gives a natural explanation for the tensorization of the previously studied models. Finally, we give some remarks on the connection between our model and the continuous representation of graphs.

## Authors

• 31 publications
• 5 publications
• ### E(n) Equivariant Graph Neural Networks

This paper introduces a new model to learn graph neural networks equivar...
02/19/2021 ∙ by Victor Garcia Satorras, et al. ∙ 0

• ### Universal Invariant and Equivariant Graph Neural Networks

Graph Neural Networks (GNN) come in many flavors, but should always be e...
05/13/2019 ∙ by Nicolas Keriven, et al. ∙ 13

• ### Graph2Graph Learning with Conditional Autoregressive Models

We present a graph neural network model for solving graph-to-graph learn...
06/06/2021 ∙ by Guan Wang, et al. ∙ 0

• ### Graph Neural Networks and Reinforcement Learning for Behavior Generation in Semantic Environments

Most reinforcement learning approaches used in behavior generation utili...
06/22/2020 ∙ by Patrick Hart, et al. ∙ 0

• ### Simplicial Neural Networks

We present simplicial neural networks (SNNs), a generalization of graph ...
10/07/2020 ∙ by Stefania Ebli, et al. ∙ 45

• ### Contrastive Graph Neural Network Explanation

Graph Neural Networks achieve remarkable results on problems with struct...
10/26/2020 ∙ by Lukas Faber, et al. ∙ 0

• ### "The cracks that wanted to be a graph": application of image processing and Graph Neural Networks to the description of craquelure patterns

Cracks on a painting is not a defect but an inimitable signature of an a...
05/13/2019 ∙ by Oleksii Sidorov, et al. ∙ 0

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

### 1.1 Background and Motivation

In this study, we consider a graph regression problem. Let be the set of simple directed graphs with vertex and edge weights.

###### Problem 1.

We are given pairs of input graphs and outcomes . The task is to learn a hypothesis such that .

This problem naturally arises in practice. For example, in a toxicity detection problem [23], we want to learn a function () such that if has a toxicity and otherwise. For another example, in a community detection problem [1], we want to learn a function () such that if the vertices and are in the same community and otherwise. Here, we denote by for a finite set and a positive integer .333We only consider the values of equivariant functions at pairwise different indices. The general case is easily handled by considering each pattern separately, but this complicates the notation. Thus we concentrate on this case.

We are often interested in a hypothesis that merely depends on the topology of the graph (see Section 1.3). Mathematically, this condition is represented by invariance and equivariance. A function () is invariant if for any graph and a permutation of , the following equation holds:

 h(Gσ)=h(G), (1)

where is the graph whose indices of vertices are permuted by , i.e., with . A function is equivariant if for any graph and any permutation on , the following equation holds:

 h(Gσ)=h(G)σ, (2)

where is defined by the relation for all . The invariance and equivariance mean that the output of the function is determined up to isomorphism.

One desirable property of a hypothesis space is the universal approximation property (universality for short) [3, 5], i.e., for any continuous function is arbitrary accurately by a function in the hypothesis space. Maron et al. [14] introduced a feed-forward invariant neural network and proved that this model has the universality in the continuous invariant functions. They also characterized all the invariant linear layers [13]. Keriven and Peyré [6] extended the tensorized graph neural network (Figure 0(a)) to represent equivariant functions and proved the universality for the dimensional output case. Then, they left the universality for the higher-order output case as an open problem.

### 1.2 Contribution

In this study, we give a simple proof of the universality of tensorized neural networks for both invariant and (higher-order) equivariant cases; the latter solves an open problem posed in [6]. Our proof relies on a result in graph theory (see Section 3 for a comparison of proof techniques in the existing studies).

Let be the set of simple unweighted graphs, and let . Let be a weighted graph, where is the weighted adjacency matrix. Let be the set of all permutations on . Then, the homomorphism number is defined by

 hom(F,W)=∑π:V(F)→V(G) ∏i∈V(F)W(π(i),π(i)) × ∏(i,j)∈E(F)W(π(i),π(j)). (3)

Similarly, for a given , the -labeled homomorphism number is defined by

 hom→x(F,W)=∑π:V(F)→V(G)π(i)=xi (i∈[k]) ∏i∈V(F)W(π(i),π(i)) × ∏(i,j)∈E(F)W(π(i),π(j)). (4)

By the definition, homomorphism numbers and -labeled homomorphism numbers are continuous functions in that is invariant and equivariant, respectively.

Let be the set of weighted directed graphs whose edge weights are bounded by one and the number of vertices is . Let and be the set of functions of the following forms

 A =⎧⎨⎩W↦finite∑F∈FaFhom(F,W+2I):aF∈R⎫⎬⎭, (5) A′ =⎧⎨⎩W↦finite∑F∈FaFhom→x(F,W+2I):aF∈R⎫⎬⎭, (6)

where

denotes the identity matrix of

. We prove the following theorems.

###### Theorem 2.

is dense in the continuous invariant functions.

###### Theorem 3.

is dense in the continuous equivariant functions.

The translated homomorphism number and the translated -labeled homomorphism number are invariant and equivariant linear function on the -fold tensor product, respectively. Therefore, we can implement them in a tensorized neural network. This means that our model is less powerful than the tensorized graph neural network. On the other hand, because our models have the universality, we obtain the following.

###### Corollary 4.

The invariant (resp., equivariant) tensorized graph neural network has the universality in continuous invariant (resp., equivariant) functions. ∎

### 1.3 Related Work

In practice, the design of a machine learning model (e.g. neural networks) usually follows some prior knowledge about the target functions since restriction bias helps to simplify the learning process. For instance, in image processing, convolutional neural networks

[10] are designed to be translation invariant [9] or shift-invariant [22]. Therefore, much research has been conducted to address the universality of general invariant neural networks. More recently, graph neural networks predicting labels of vertices [4, 7, 19] have hinted the importance of equivariant models. A natural question in learning theory is whether these aforementioned models are universal? [5, 13]. Here, we discuss related work that answered this question.

#### Invariant models

The invariant property of a model is usually discussed in the context of learning from points clouds and sets [16, 21, 18], then generalized to symmetries [13] and group actions [2]. While universality analyses for models on sets are well developed, the analysis for graphs is limited [13, 14, 8]. Recently, Maron et al. [14] proved a neural network which is -invariant is universal. Similarly, Keriven and Peyré [6] obtained the universal result on tensorized graph neural networks by using a more direct application of the Stone-Weierstrass theorem.

#### Equivariant models

The existence of equivariant models only makes practical sense in some specific cases, for example, learning on graphs’ vertices. Therefore, comparing to the invariant case, there are only a limited number of work addressing equivariance [17, 6, 18]. Consequently, the universality of equivariant graph models have only has recently proven by Keriven and Peyré [6].

## 2 Proofs

To prove the universality of a class of functions, we use the Stone–Weierstrass theorem:

###### Theorem 5 (Stone–Weierstrass Theorem [15, Theorem 1.1]).

Let be a compact Hausdorff space and be the set of continuous functions from to , equipped with the -norm. If a subalgebra satisfies the following two conditions:

• separates points, i.e., for any , there exists such that , and

• There exists that is bounded away from zero, i.e., ,

then is dense in . ∎

The proofs are devoted to verify the conditions of the Stone–Weierstrass theorem.

### 2.1 Proof of Theorem 2 (Invariant Case)

We first define the graph space. Let be the number of vertices in input graphs. Let

 G0={ W∈Rn×n:|W(i,j)|≤1, ∀i,j} (7)

be the set of the weighted adjacency matrices. We denote by . The norm444Because

is a finite-dimensional vector space, any norms are equivalent; thus, the result in this section is invariant with respect to the choice of the norm.

of the graphs is given by

 ∥W∥1=∑u,v∈[n]|W(u,v)|. (8)

Then, we introduce the edit distance by

 δ1(W1,W2)=minσ∈S([n])∥W1−Wσ2∥1, (9)

where is the set of all permutations on . The edit distance is nonnegative and satisfies the triangle inequality, i.e., it is a pseudo-metric. We define the graph space by the metric identification as where if and only if . This forms a metric space.

Any invariant function is identified as a function . Our goal is to prove the universality for the set of continuous functions from to . Now, we check the conditions of the Stone–Weierstrass theorem.

First, we check the condition of the space.

###### Lemma 6.

The graph space is a compact Hausdorff space.

###### Proof.

It is Hausdorff because it is a metric space. We show the sequential compactness. Let be an arbitrary sequence in , which is also identified as a sequence in . Because is compact in the norm, we can choose a convergent subsequence. Such sequence is also a convergent subsequence in . Thus, is compact. ∎

Next, we check the conditions of .

###### Lemma 7.

forms an algebra.

###### Proof.

Clearly, it is closed under the addition and the scalar multiplication. It is closed under the product because of the following identity:

 (10)

where is the disjoint union of and . ∎

###### Lemma 8.

contains an element that is bounded away from zero.

###### Proof.

Let be the singleton graph. Then, is bounded away from zero. ∎

To prove the separate point property, we use the following theorem.

###### Theorem 9 ([11, Lemma 2.4], k=0 case, in our terminology).

Let be matrices with positive diagonal elements. Then, and are isomorphic if and only if for all simple unweighted graph . ∎

###### Lemma 10.

separates points in .

###### Proof.

If are non-isomorphic, then and are also non-isomorphic. Because and satisfy the condition in Theorem 9, there exists such that . This means that separates points in . ∎

Therefore, we proved Theorem 2.

### 2.2 Proof of Theorem 3 (Equivariant Case)

We identify an array-valued function as a two-argument function . Let . Then, each element is identified as a -labeled graph, which is a graph with distinguished vertices .

For a permutation , we define

 (W,→x)σ:=(Wσ,→xσ), (11)

where . Then, is equivariant if and only if is invariant in the sense that . We say that and are isomorphic if .

Now we define the -labeled edit distance by

 δ′1((W1,→x1),(W2,→x2)):=minσ∈S([n])→x1=→xσ2∥W1−Wσ2∥1. (12)

Then, we define the -labeled graph space by the metric identification as where if and only if , i.e., these are isomorphic. This forms a metric space.

Any equivariant function is identified as a function . Our goal is to prove the universality for the set of continuous functions from to . Now, we check the condition of the Stone–Weierstrass theorem. This part is very similar to that of the invariant case.

First, we check the condition of the space.

###### Lemma 11.

The -labeled graph space is compact.

###### Proof.

It is Hausdorff because it is a metric space. We show the sequentially compactness. Let be an arbitrary sequence in , which is also identified as a sequence in . Because the number of possibilities of is finite, we can select an (infinite) subsequence that has the same value of . The remaining part is the same as the proof of Lemma 6. ∎

Next, we check the conditions of .

###### Lemma 12.

forms an algebra.

###### Proof.

Clearly, it is closed under the addition and the scalar multiplication. It is closed under the product because of the following identity.

 (13)

where is the graph obtained from the disjoint union of and by glueing the labeled vertices. ∎

###### Lemma 13.

contains an element that is bounded away from zero.

###### Proof.

Let be the graph of isolated vertices. Then, is bounded away from zero. ∎

To prove the separate point property, we use the following theorem.

###### Theorem 14 ([11, Lemma 2.4] in our terminology).

Let be matrices with positive diagonal elements. Let . Then, and are isomorphic if and only if for all simple unweighted graph . ∎

###### Lemma 15.

separates points in .

###### Proof.

It is proved similarly as Lemma 10 by using Theorem 14. ∎

Therefore, we proved Theorem 3.

## 3 Comparison with Other Proofs

#### Compare with Keriven and Peyré [6].

Our proofs are similar to that of their proofs. For the invariant case, they used the standard Stone–Weierstrass theorem and verified the separable points property by constructing functions on higher-order tensor space. For the equivariant case, they developed a new Stone–Weierstrass type theorem, and verified the corresponding separate point properties by a similar technique to the invariant case. On the other hand, for both cases, we used the standard Stone–Weierstrass theorem, and verified the separate point property using the property of the homomorphism number. This unified treatment allows us to establish the result on arbitrary higher-order outputs.

One advantage of their method is that it is applicable to hypergraphs. Our method could be applicable for hypergraphs; however, there is a gap because the theory of weighted homomorphism number of hypergraphs is not well established (compared with graphs).

Note that they considered graphs with different but bounded numbers of vertices (i.e., ). However, this is not effective because such space is disconnected, and each connected components corresponds to the graphs having the same number of vertices. If we have to consider a set of graphs with different numbers of vertices, it is promising to consider graphons; see Section 4.

#### Compare with Maron et al. [13]

They only considered the invariant case. They used the universality of symmetric polynomials by Yarotsky [20]. Then, they approximated the polynomials by a tensorized neural network.

One advantage of their method is that one can bound the order of tensors. Our method can also bound the order of tensors by bounding the size of the subgraphs [12, Theorem 5.33]; however, this may give a loose bound. On the other hand, our method shows a very restricted form of the linear invariant (or equivariant) layers are sufficient to obtain the universality.

## 4 Concluding Remarks

In this study (and the existing studies [13, 6]), the number of vertices in the input graphs are fixed. This is reasonable because the graph space is disconnected and each connected component corresponds to the graphs of the same number of vertices; hence, a continuous function in the graph space of different numbers of vertices is just a collection of continuous functions in each connected component.

If we want to consider graphs of different numbers of vertices, it is promising to consider graphons [12]. Below we explain that all the results obtained in this paper can be extended to graphons.

An (asymmetric) graphon is a measurable function . This is a continuous generalization of the weighted adjacency matrix. The set of graphons is denoted by . The cut-norm is defined by

 ∥W∥□=supS,T⊆[0,1]∣∣∣∫S×TW(x,y)dxdy∣∣∣. (14)

and the cut-distance is defined by

 δ□(W1,W2)=infσ∥W1−Wσ2∥□, (15)

where runs over all measure-preserving bijections. The graphon space is defined by the metric identification.

The graphon space contains infinitely many large graphs. However, it is still compact with respect to the cut distance. Note that this does not hold for the edit distance.

###### Theorem 16 ([12, Theorem 9.23]).

The graphon space is compact.

In graphons, we use the homomorphism density instead of the homomorphism number. Let be a simple unweighted graph. Then, the homomorphism density is given by

 t(F,W)=∫∏(i,j)∈E(F)W(xi,xj)∏i∈V(F)dxi. (16)

The set of linear combination of the homomorphism densites also forms an unital algebra. To show the separate points property, we can use the following theorem.

###### Theorem 17 (Directed version of [12, Corollary 10.34]).

Let . and are isomorphic if and only if for all simple unweighted graph . ∎

Therefore, we obtain the following result.

###### Theorem 18.

The set of finite linear combinations of the homomorphism densities are dense in the continuous invariant graphon functions. ∎

Note that this fact is already proved in [12, Theorem 17.6] for a different context (for symmetric graphons).

The equivariant case is also handled by considering -labeled graphons. The -labeled homomorphism density is given by

 t→x(F,W)=∫∏(i,j)∈E(F)W(xi,xj)∏i∈V(F)∖[k]dxi. (17)

Then, we obtain the following result.

###### Theorem 19.

The set of finite linear combinations of the -labeled homomorphism densities are dense in the continuous equivariant graphon functions. ∎