## 1 Introduction

Graph neural networks (GNNs) have emerged as a dominant paradigm for representation learning on graph-structured data [hamilton2020grl].
GNNs have been used to achieve state-of-the-art results for tasks, including predicting the properties of molecules [gilmer2017neural], forecasting traffic in a large-scale production system^{1}^{1}1https://deepmind.com/blog/article/traffic-prediction-with-advanced-graph-neural-networks

, classifying protein functions

[hamilton2017inductive] and powering social recommendation systems [ying2018graph]. However, despite their empirical successes, GNNs are known to have serious limitations. In particular, the theoretical power of standard GNNs is known to be bounded by the classical Weisfeiler-Lehman (WL) isomorphism test [morris2019weisfeiler, xu2018powerful]. In signal processing terms, GNNs are known to be limited to simple forms of convolutions [bronstein2017geometric, defferrard2016convolutional].One prominent example of the limits of GNNs—and a consequence of their power being bounded by the WL test—is the inability of GNNs to detect the presence of closed triangles in graphs [hamilton2020grl] consistently. This can be proved by showing that, as graph patterns, cycles of length are not invariant under the color refinement procedure introduced by Arvind et al. [arvind2020weisfeiler]. In general, GNNs cannot detect the presence of closed triangles, which is a major limitation as transitivity is known to be a critical property in many real-world networks [newman2018networks, watts1998collective]. In other words, GNNs cannot correctly distinguish the ego-graphs around each node (i.e., the subgraph induced by a node and its immediate neighbors in a graph) since GNNs cannot consistently detect whether two neighbors of a node are connected.

Present work. We propose an approach to imbue GNNs with information about the ego-structure of graphs explicitly. Our approach’s basic idea is to combine the standard message-passing used in GNNs with a form of ego-messages, which are only propagated within the ego-graphs. Our approach is theoretically motivated, both in terms of addressing known representational limitations of GNNs, as well as based on multiplex generalizations of graph convolutions. Experiments on both real and synthetic data highlight how this approach can alleviate some of the known shortcomings of GNNs.

## 2 Related Work

Our work is closely related to several recent attempts to improve the theoretical capabilities of GNNs. Early work in this direction focused on elucidating the key properties necessary for a GNN to achieve power equal to basic WL test [morris2019weisfeiler, xu2018powerful], as well as approaches to design GNNs that can achieve the power of higher-order -WL algorithms [maron2019provably, morris2019weisfeiler]. Other works have expanded on these ideas, elucidating connections to logic [barcelo2019logical], dynamic programming [xu2019can]

, and statistical learning theory

[garg2020generalization].Our work is distinguished from these prior contributions in that we focus on a particular limitation of GNNs, i.e., their inability to exploit ego-structures. This limitation is connected to the known representational limits of GNNs: the inability of GNNs to count triangles can be viewed as a consequence of the representational bound from the WL algorithm [morris2019weisfeiler] (or equally as a consequence of GNN’s restriction to graded modal logic [barcelo2019logical]). However, whereas GNNs that are provably more powerful than the WL test in general (e.g., [maron2019provably]) are known to suffer from scalability and stability issues [dwivedi2020benchmarking], our goal is to provide a targeted theoretical improvement for GNNs, with real-world implications.

## 3 Proposed Approach

Our key proposal is to design a GNN algorithm that can naturally exploit the ego-structure of graphs. In doing so, we hope to improve the theoretical capacity of the model. In addition to that, we hope to potentially address known empirical limitations, such as the over-smoothing problem that results from node signals becoming uninformative after several rounds of message-passing [hamilton2020grl].

Our approach involves performing message-passing over the ego-graphs of all the nodes in a graph, rather than simply over the graph itself. In the following section, we will formalize how to construct a model that has this desired behavior using a graph, which is defined by a set of vertices , an adjacency matrix , and a matrix of node features .

### 3.1 Conceptual Motivation

One way of motivating our Ego-GNN approach is based on the idea of message-passing over a multiplex graph defined over the original graph. In particular, we construct a multiplex graph with layers, where it’s ^{th} layer corresponds to the ^{th} node’s ego-graph, from the original graph and the inter-layer connections are based on the original adjacency structure. Message-passing on such a multiplex would involve passing messages between nodes in the same ego-graph (via the intra-layer edges), while also propagating information between ego-structures (via the inter-layer edges).

If we first define the adjacency matrix of the ego-graph of node , , we can then formally define the multiplex ego-graph using a supra-adjacency matrix, constructed from the original graph as follows:

(1) |

Here, we are creating the intra-layer adjacency structure by taking the direct sum of different adjacency matrices from all the nodes, . The term in Equation (1) defines the inter-layer adjacency structure and is created based on the Kronecker product between the original adjacency matrix and the identity, which creates a group of product graphs [sandryhaila2014big]. Intuitively, the intra-layer diagonal blocks correspond to each ego-graph, and the off-diagonal blocks connect two ego-graphs if they share a node (i.e., if the corresponding nodes are connected in the original graph).

Our approach’s core idea is to perform message-passing over rather than the original graph. However, naively implementing a GNN on the matrix would be computationally expensive due to the quadratic increase in the size of the graph. Thus, in the following sections, we describe incremental relaxations and refinements that allow us to implement our Ego-GNN approach in a tractable manner.

### 3.2 Sequencing the Intra- and Inter-layer Messages

In this section, we describe how we approximate message-passing over the full multiplex, defined by the supra-adjacency matrix , using a sequential approach where we run the intra-layer (i.e., within ego-graph) and inter-layer (i.e., between ego-graph) operations in sequence. Note also that we assume that our goal is to maintain a single representation for each node in the graph, and we will use the notation to denote the matrix of node representations at layer of the model.

Message-passing within ego-graphs. We first run message-passing independently within each ego-graph. To do so, we tile the node representations vertically times before the matrix multiplication:

Multiplying this tiled representation by a power of the intra-layer portion of the supra-adacency matrix, we get

(2) |

where is the desired scale of the message-passing within each ego-graph (e.g., corresponds to aggregating over two-hop neighborhoods in the ego-graphs).

Aggregating across ego-graphs. After Equation (2), contains the representation of every node in every ego-graph. Therefore, we need to aggregate information across the different ego-graphs to collapse it back into a matrix.

In our conceptual motivation, we envisioned connected ego-graph layers in the supra-adjacency matrix based on the original adjacency structure of the graph. In intuitive terms, we would like the overall representation of each node to depend on its representation in each ego-graph to which it belongs. With this in mind, a natural way to approximate the inter-layer message-passing of the full multiplex is to define each node ’s representation in our final matrix as the average of that node’s representation in all of the ego-graphs that contain it. We therefore define each row of as follows:

(3) |

Where is the list of all the nodes whose ego-graphs contain node .

This approach, based on sequencing the within-ego-graph and between-ego-graph operations, is sound and would produce the desired node representations that leverage the graph’s ego-structure. However, it has two major flaws. Firstly, it is memory-intensive since cannot be stored using a sparse matrix. Second, the method has not been properly normalized, which can lead to stability issues.

### 3.3 The Ego-GNN Model

In this section, we build upon the sequential approach proposed above and remedy its key limitations to describe our full Ego-GNN approach.

Addressing the memory limitations. The memory problem caused by can be solved by performing the within-ego-graph operations in an iterative manner rather than all at once. In particular, instead of carrying out Equation (2), we do a combination of operations which calculates without ever needing to go through . These operations are captured in the following sum, which is equivalent to Equation (3):

(4) |

Adding normalization. Finally, we only need to apply an appropriate normalization to the model to improve its performance further. For generality, we assume that each ego-adjacency matrix can be replaced by an appropriately normalized counter-part (e.g., the popular symmetric normalization , where is diagonal degree matrix of ).

Putting it all together. Mixing all of these concepts along with the fact that is equal to when we add self-loops, we can formulate the entire Ego-GNN model with one equation:

(5) |

Interleaving with standard GNN layers. Finally, we note that the ego-message passing in Equation (5) can be interleaved with layers of standard GNN message-passing, such as the simple propagation rule proposed in [kipf2016semi].

Node Classification Accuracy | |||||

Cora[mccallum2000automating] | Citeseer[getoor2005link] | Pubmed[namata2012query] | Amazon Computers[shchur2018pitfalls] | Amazon Photos [shchur2018pitfalls] | |

GCN | 84.61 | 70.88 | 86.67 | 83.32 | 91.86 |

GIN | 83.76 | 69.77 | 84.10 | 85.27 | 90.72 |

GAT | 81.86 | 69.65 | 85.44 | 88.61 | 92.74 |

Ego-GNN | 86.20 | 72.22 | 85.92 | 89.17 | 92.28 |

## 4 Theoretical Motivations

We briefly remark on some of the theoretical motivations behind the Ego-GNN approach, both in terms of identifying closed triangles as well as motivations from graph signal processing.

Ego-GNNs and closed triangles. First, we can note that Ego-GNN layers can be trivially used to recognize the existence of closed triangles in a graph. For example, assuming constant features as node inputs, counting triangles can be performed via two rounds of message-passing in the ego-graphs: one round to compute node degrees and a second round for the central node to count the degree of each of its neighbors in the ego-graph. This simple approach suffices due to the fact that the number of closed triangles in a node’s neighborhood is directly computable from the degrees of the nodes in its ego-graph. Note, however, that this does not guarantee that an Ego-GNN will learn such a function easily from data.

Ego-GNNs and the classical WL Test. A critical facet of Ego-GNNs is that they are provably more powerful than the classical 1-WL subtree kernel test. This is especially true when interleaved with standard GNN layers with the same power as the 1-WL subtree kernel. We highlight that our Ego-GNN model can recognize closed triangles with a simple toy problem where the task is to distinguish Graph-1: 3-Cycle () from Graph-2: 6-cycle (). Figure 1 shows outputs of the models on both the graph at different message-passing steps. Unsurprisingly, GNNs are not able to distinguish the two 3-cycles from the 6-cycle graph as the output of every message passing-round is the same. In contrast, the Ego-GNN model distinguished the two graphs by distinguishing wedges and closed triangles in and , respectively.

This can be seen in Figure 2, which illustrates the first message passing step of Ego-GNN where every node in is able to recognize that their neighbors themselves are not connected within their ego-graph.

Ego-GNNs and graph convolutions. Ego-GNNs can also be motivated as a graph convolution on the multiplex graph defined by Equation (1). Interestingly, based on the multiplex networks theory, we know that the spectrum of this multiplex graph has a close relationship to the original graph. In particular, based on the perturbation analysis of Sole et al. [sole2013spectral]

, we know that there are natural conditions under which the spectrum of this multiplex contains frequencies corresponding to the original graph (via the inter-layer structure) as well as frequencies from the ego-graphs (via the intra-layer structure). Moreover, based on the fact that the ego-graphs are all subgraphs of the original graph, the well-known eigenvalue interlacing theorem

[chung1997spectral] implies that intra-layer frequencies will interlace the original graph spectrum. This suggests that the Ego-GNN approach can be motivated as a way to give access to a broader set of meaningful frequencies over which to perform graph convolutions.## 5 Experimental results

In the previous section, we saw that Ego-GNNs have theoretical benefits compared to standard GNNs (e.g., Ego-GNNs can distinguish two triangles from a six-cycle). We will now evaluate the empirical performance of Ego-GNNs. We first examine a synthetic task in order to demonstrate the ability of Ego-GNNs to reduce the over-smoothing problem, and following this, we examine Ego-GNNs performance on five real-world datasets.

### 5.1 Combating over-smoothing

Our first experiment shows that Ego-GNNs are capable of effectively combating common over-smoothing problems which crop up regularly with classical GNNs. We can demonstrate this by taking advantage of the stochastic block model (SBM), a type of artificial graph generator which allows its cluster sizes and inter-connectivity probabilities to be pre-specified

[holland1983stochastic]. To simulate increased graph signal noise, we gradually increased the inter-cluster connectivity of an artificial SBM graph and measured the node classification performance of the Ego-GNN model, GIN model, and GCN model, where the goal is to classify each node into its underlying community in the SBM graph.As we can see from the results in Figure 3, the Ego-GNN greatly outperformed the other models even when faced with a mounting density of high-degree nodes connecting disparate clusters. This kind of stability shows how Ego-GNNs can be useful in practice when applied to graphs which are easily susceptible to over-smoothing.

### 5.2 Classifying nodes on benchmark datasets

Finally, we compared the performance of the Ego-GNN model with commonly used GNNs in a standard node classification task on five standard and well-studied benchmarks. The results presented in Table 1 show that the Ego-GNN model is capable of maintaining a competitive state-of-the-art performance alongside the numerous other useful theoretical properties we have already mentioned in this paper. This makes the Ego-GNN model usable in a wider variety of contexts and applications than other models.

## 6 Conclusion

We have shown that Ego-GNNs are capable of doing more complex graph analysis than other widely used GNN models because of their desirable theoretical properties, specifically their ability to surpass the power of the standard WL test by distinguishing from . This result lays the ground-work for further exploration of similar models and presents new ways to test those properties directly.

While constructing deeper convolutional networks in fields like image processing have recently led to large gains in performance, the same can not be said for classical GNNs. Higher order GNNs, like the Ego-GNN model, may carve the path towards overcoming this issue by allowing for the creation of deeper and more scalable graph models that do not suffer from over-smoothing.

Comments

There are no comments yet.