DeepAI
Log In Sign Up

Make Heterophily Graphs Better Fit GNN: A Graph Rewiring Approach

09/17/2022
by   Wendong Bi, et al.
Microsoft
1

Graph Neural Networks (GNNs) are popular machine learning methods for modeling graph data. A lot of GNNs perform well on homophily graphs while having unsatisfactory performance on heterophily graphs. Recently, some researchers turn their attention to designing GNNs for heterophily graphs by adjusting the message passing mechanism or enlarging the receptive field of the message passing. Different from existing works that mitigate the issues of heterophily from model design perspective, we propose to study heterophily graphs from an orthogonal perspective by rewiring the graph structure to reduce heterophily and making the traditional GNNs perform better. Through comprehensive empirical studies and analysis, we verify the potential of the rewiring methods. To fully exploit its potential, we propose a method named Deep Heterophily Graph Rewiring (DHGR) to rewire graphs by adding homophilic edges and pruning heterophilic edges. The detailed way of rewiring is determined by comparing the similarity of label/feature-distribution of node neighbors. Besides, we design a scalable implementation for DHGR to guarantee high efficiency. DHRG can be easily used as a plug-in module, i.e., a graph pre-processing step, for any GNNs, including both GNN for homophily and heterophily, to boost their performance on the node classification task. To the best of our knowledge, it is the first work studying graph rewiring for heterophily graphs. Extensive experiments on 11 public graph datasets demonstrate the superiority of our proposed methods.

READ FULL TEXT VIEW PDF

page 3

page 9

10/06/2022

Expander Graph Propagation

Deploying graph neural networks (GNNs) on whole-graph classification or ...
09/13/2022

Characterizing Graph Datasets for Node Classification: Beyond Homophily-Heterophily Dichotomy

Homophily is a graph property describing the tendency of edges to connec...
10/21/2022

Efficient Automatic Machine Learning via Design Graphs

Despite the success of automated machine learning (AutoML), which aims t...
11/10/2021

LSP : Acceleration and Regularization of Graph Neural Networks via Locality Sensitive Pruning of Graphs

Graph Neural Networks (GNNs) have emerged as highly successful tools for...
10/05/2020

My Body is a Cage: the Role of Morphology in Graph-Based Incompatible Control

Multitask Reinforcement Learning is a promising way to obtain models wit...
06/11/2020

Pointer Graph Networks

Graph neural networks (GNNs) are typically applied to static graphs that...
09/24/2022

From Local to Global: Spectral-Inspired Graph Neural Networks

Graph Neural Networks (GNNs) are powerful deep learning methods for Non-...

1. Introduction

Graph-structure data is ubiquitous in representing complex interactions between objects (Du et al., 2018; Chen et al., 2020b; Song et al., 2020). Graph Neural Network (GNN), as a powerful tool for graph data modeling, has been widely developed for various real-world applications (Du et al., 2021; Yao et al., 2022; Wang et al., 2020). Based on the message passing mechanism, GNNs update node representations by aggregating messages from neighbors, thereby concurrently exploiting the rich information inherent in the graph structure and node attributes.

Traditional GNNs (Kipf and Welling, 2016; Veličković et al., 2017; Hamilton et al., 2017) mainly focus on homophily graphs that satisfy property of homophily (i.e. most of connected nodes belong to the same class). However, these GNNs usually can not perform well on graphs with heterophily (i.e. most of connected nodes belong to different classes) for the node classification problem, because message passing between nodes from different classes makes their representations less distinguishable, and thus leading to bad performance on node classification task. The aforementioned issues motivate considerable studies around GNNs for heterophily graph. For example, some studies (Wang et al., 2021; Du et al., 2022b; Yan et al., 2021) adjust message passing mechanism for heterophily edges, while others (Abu-El-Haija et al., 2019; Zhu et al., 2020a; Chien et al., 2020; Pei et al., 2020) enlarge the receptive field for the message passing. Note that, all these works mitigate the distinguishability issue caused by heterophily from the perspective of the GNN model design. While, there is another orthogonal perspective to mitigate the issue caused by heterophily, i.e., rewiring graph to reduce heterophily or increase homophily, which is still under-explored.

Graph rewiring (Alon and Yahav, 2020; Topping et al., 2021; Franceschi et al., 2019; Chen et al., 2020c) is a kind of method that decouples the input graph from the graph for message passing and boost the performance of GNN on node classification tasks via changing the message passing structure. Many works have utilized graph rewiring for different tasks. However, most existing graph rewiring techniques have been developed for graphs under homophily assumption (sparsity (Louizos et al., 2017), smoothness (Ortega et al., 2018; Kalofolias, 2016) and low-rank (Zhu et al., 2020b)), and thereby can not directly transfer to heterophily graphs. Different from existing solutions that design specific GNN architectures adapted to heterophily graphs, in this paper, we conduct comprehensive study on graph rewiring and propose an effective rewiring algorithm to reduce graph heterophily, which make GNNs perform better for both heterophily and homophily graphs.

First we demonstrate the effects of increasing homophily-level for heterophily graphs in Sec. 3 with comprehensive controlled experiments. Note that the homophily (and heterophily) level can be measured with Homophily Ratio (HR) (Pei et al., 2020; Zhu et al., 2020a), which is formally defined as an average of the consistency of labels between each connected node-pair. From the analysis in Sec. 3.1, we find that both the node-level homophily ratio (Du et al., 2022b; Pei et al., 2020) and node degree (reflects the recall of nodes from the same class) can affect the performance of GCN on the node classification task, where increasing either of the two variables can lead to better performance of GCN. This finding, i.e., classification performance of GCN on heterophily graphs can be increased by reducing the heterophily-level of graphs, motivates us to design a graph-rewiring strategy to increase homophily-level for heterophily graphs so that GNNs can perform better on the rewired graphs.

Then, we propose a learning-based graph rewiring approach on heterophily graphs, namely Deep Heterophily Graph Rewiring (DHGR). DHGR rewires the graph by adding/pruning edges on the input graph to reduce its heterophily-level. It can be viewed as a plug-in module for graph pre-processing that can work together with many kinds of GNN models including both GNN for homophily and heterophily, to boost their performance on node classification tasks.

Figure 1. Pipeline of Graph Rewiring for heterophily graphs. Red and blue circles denote nodes from different classes.

The key idea of DHGR is to reduce the heterophily while keeping the effectiveness by adding more homophilic edges and removing heterophilic edges. However, simply adding homophilic edges and removing heterophilic edges between nodes in the training set may increase the risk of overfitting and lead to poor performance (we prove this in Sec. 5.4). Another challenge is that unlike homophily graphs that can leverage Laplace Smooth to enhance the correlation between node features and labels, heterophily graphs do not satisfy the property of smoothness (Kalofolias, 2016; Ortega et al., 2018). In this paper, we propose to use label/feature-distribution of neighbors on the input graph as guidance signals to identify edge polarity (homophily/heterophily) and prove its effectiveness in  3.2.

Under the guidance of the neighbors’ label-distribution, DHGR learns the similarity between each node-pair, which forms a similarity matrix. Then based on the learned similarity matrix, we can rewire the graph by adding edges between high-similarity node-pairs and pruning edges connecting low-similarity node-pairs. Then the learned graph structure can be further fed into GNNs for node classification tasks. Besides, we also design a scalable implementation of DHGR which avoids the quadratic time and memory complexity with respect to the numbers of nodes, making our method available for large-scale graphs. Finally, extensive experiments on 11 real-world graph datasets, including both homophily and heterophily graphs, demonstrate the superiority of our method.

We summarize the contributions of this paper as follows:

  1. We propose the new perspective, i.e., graph rewiring, to deal with heterophily graphs by reducing heterophily and make GNNs perform better.

  2. We propose to use neighbor’s label-distribution as guidance signals to identify homophily and heterophily edges with comprehensive experiments.

  3. We design a learnable plug-in module for graph rewiring on heterophily graphs, namely DHGR. And we design a high-efficient scalable training algorithm for DHGR.

  4. We conduct extensive experiments on 11 real-world graphs, including both heterophily and homophily graphs. The results show that GNNs with DHGR consistently outperform their vanilla versions. In addition, DHGR has additional gain even when combined with GNNs specifically designed for heterophily graphs.

(a) Cora.
(b) Chameleon
(c) Actor
Figure 2. Graph rewiring validation experiments on three datasets. Each block in the heatmap denotes a rewired graph with node degree and node-level homophily ratio. The values in the block denote node classification accuracy of vanilla GCN on the test set (average accuracy of 3 runs).

2. Preliminary

In this section, we give the definitions of some important terminologies and concepts appearing in this paper.

2.1. Graph Neural Networks

Let denotes a graph, where is the node set, is the number of nodes in . Let denote the feature matrix and the -th row of denoted as is the

-dimensional feature vector of node

. and is connected. GNNs aim to learn representation for nodes in the graph. Typically, GNN models follow a neighborhood aggregation framework, where node representations are updated by aggregating information of its neighboring nodes. Let denotes the output vector of node at the -th hidden layer and let . The -th iteration of aggregation step can be written as:

where is the set of neighbors of . The AGG function indicates the aggregation function aimed to gather information from neighbors and the goal of the COMBINE function is to fuse the information from neighbors and the central node. For graph-level tasks, an additional READOUT function is required to get the global representation of the graph.

2.2. Graph Rewiring

Given a graph with node features as the input, Graph Rewiring (GR) aims at learning an optimal under a given criterion, where the edge set is updated and the node set is constant. Let denote the adjacent matrix of and , respectively. The rewired graph is used as input of GNNs, which is expected to be more effective than directly inputting the original graph . As shown in Fig. 1, the pipeline of Graph Rewiring models usually involves two stages, the similarity learning and the graph rewiring based on the learned similarity between pairs of nodes. It is obvious that the criterion (i.e., objective function) design plays a critical role for the similarity learning stage. Thus, we first mine knowledge from data in the next section to abstract an effective criterion of graph rewiring.

3. Observations from data

We observed from data that there exist two important properties of graph (i.e., node-level homophily ratio111Node-level homophily ratio is the homophily ratio of one specific node, which equals the percent of the same-class neighbors in all neighboring nodes. and degree) that are strongly correlated with the performance of GNNs. And the two properties provide vital guidance so that we can optimize the graph structure by graph rewiring. However, we cannot directly calculate node-level homophily ratio because of the partially observable labels during training. Therefore, we introduce two other effective signals, i.e., neighbor’s observable label/feature-distribution, which have strong correlations with the node-level homophily ratio. In this section, we first verify the relations between the two properties and the performance of GNNs. Then we verify the correlations between neighbor distribution and node-level homophily ratio.

3.1. Effects of Node-level Homophily Ratio and Degree

First, we conduct validation experiments to verify the effects of node-level homophily ratio (Pei et al., 2020; du2021gbk) and node degree on the performance of GCN, as guidance for graph rewiring. Specifically, we first construct graphs by quantitatively controlling the node-level homophily ratio and node degree, and then verify the performance of GCN on the constructed graphs as a basis for measuring the quality of constructed graph structure. Note that considering the direction of message passing is from source nodes to target nodes, the node degree mentioned in this paper refers to the in-degree. For example, given node degree and node-level homophily ratio , we can constructed a directional Graph where each node on the has different neighboring nodes pointing to it and there are same-class nodes among the k neighbors, with other neighbors randomly selected from remaining different-class nodes on the graph.

As shown in the Fig. 2, we conduct validation experiments on three different graph datasets, including one homophily graph (Cora) and two heterophily graphs (Chameleon, Actor). In this experiments, we construct graphs with node degree ranging from to and node-level homophily ratio ranging from to , totally 35 constructed graphs for each dataset. And then for each constructed graph, we train vanilla GCN (Kipf and Welling, 2016) on it three times and calculate the average test accuracy on node classification task. From the Fig. 2, we find that both the homophily graph and the heterophily graph follow the same rule: when the degree is fixed, the accuracy of GCN increases with the increase of the node-level homophily ratio; when the homophily ratio is fixed, the accuracy of GCN increases with the increase of the degree. It should be noted that when the homophily ratio equals 0 (i.e., all neighboring nodes are from different classes), it may have a higher GCN accuracy than that when the homophily ratio is very small (around ). Besides, when the homophily ratio is largr than a threshold, the GCN accuracy converges to 100%. In general, the GCN accuracy almost varies monotonically with the node-level homophily ratio and node degree. And this motivates us to use graph rewiring as a way of increasing both node-level homophily ratio and degree.

(a) Cora.
(b) Chameleon
(c) Actor
Figure 3. Mutual Information (MI) between different signals and edge polarity (i.e.homophily or heterophily).
Figure 4. Overview of the Similarity Learner for Graph Rewiring in DHGR. denotes the raw feature matrix and

denotes the adjacent matrix. Note that Node Pair-wise Cosine Similarity in the yellow block indicates the cosine similarity with decentralization calculated for each pair of nodes in the graph, which is defined in Eq. 

4.

3.2. Effects of Neighbor’s Label/Feature Distribution

From the Sec. 3.1

, we conclude that graph rewiring can be used as a way of reducing heterophily to make GNNs perform well on both homophily and heterophily graphs. However, it is not easy to accurately identify the edge polarity (homophily or heterophily) on a heterophily graph so that we can estimate the node-level homophily ratio. For a homophily graph, we can leverage its homophily property and use Laplacian Smoothing

(Ortega et al., 2018; Kalofolias, 2016) to make its node representation more distinguishable. However, heterophily graphs do not satisfy property of smoothness thus the information available is limited. A straightforward idea is to use node features to identify edge polarity, however, the information of this single signal is limited. In this paper, we propose to use similarity between the neighbor’s label-distribution for node-pairs as a measure of edge polarity. Besides, considering that not all node labels are observable, we also introduce neighbor’s feature-distribution (mean of neighbor features), which is completely observable, as signals in addition to neighbor’s label-distribution.

Up to now, we have three signals (i.e. raw node features, label-distribution and feature-distribution of neighbors) that can be used as measures for edge polarity. We quantitatively evaluate the effectiveness of the three signals and find that the distribution signals are more informative than the raw node feature through the following empirical experiments and analysis. To be specific, we consider the label/feature-distribution of the 1st-order and 2nd-order neighbors. Then we calculate the similarity between each node-pair with one of these signals and compute the mutual information between the node-pair similarity and edge polarity on the graph. The formula of Mutual information is written as follows:

(1)

In the case of discrete random variables, the integral operation is replaced by the sum operation.

As shown in the Fig. 3, we conduct statistical analysis on three datasets (i.e. Cora, Chameleon, Actor). From the Fig. 3, we find that both the similarity of neighbor’s label-distribution and neighbor’s feature-distribution have a strong correlation with edge polarity than that of the raw node features similarity, and neighbor’s label-distribution has a stronger correlation than neighbor’s feature-distribution in most cases. And this rule applies to both homophily graphs and heterophily graphs.

4. Method

Based on the observations mentioned above, we design the Deep Heterophily Graph Rewiring method (DHGR) for heterophily graphs, which can be easily plugged into various existing GNN models. Following the pipeline in Fig. 1, DHGR first learns a similarity matrix representing the similarity between each node-pair based on the neighbor distribution (i.e. label-distribution and feature-distribution of neighbors). Then we can rewire the graph structure by adding edges between high-similarity node-pair and pruning low-similarity edges on the original graph. Finally, the rewired graph is further fed into other GNN models for node classification tasks.

4.1. Similarity Learner Based on Neighbor Distribution

Before rewiring the graph, we first learn the similarity between each pair of nodes. According to the analysis in Sec. 3.2, we design a graph learner that learns the node-pair similarity based on the neighbor distribution. Considering that in the training process, only the labels of nodes in the training set are available, we cannot observe the full label-distribution of neighbors. Therefore, we also leverage the feature-distribution of neighbors which can be fully observed to enhance this similarity learning process with the intuition that node features have correlations to labels for an attributed-graph. Besides, the results shown in Sec. 3.2 also validate the effectiveness of neighbor’s feature-distribution.

The overview of similarity learner used in DHGR is shown in Fig. 4. Specifically, for an attributed graph, we can first calculate its observable label-distribution and feature-distribution for the -hop neighbors of each node using node-labels in the training set and all node-features:

(2)

and is respectively the label-distribution and feature-distribution of -order neighbors in the graph, is the maximum neighbor-order and we use in this paper. is the one-hot label matrix, the -th row of is the one-hot label vector of node if belongs to the training set, else use a zero-vector instead. is the adjacent matrix and is the corresponding degree diagonal matrix. Then for each node, we can get the observed label-distribution vector and feature-distribution vector of its neighbors. Next we calculate the cosine similarity between each node-pair with respect to both label-distribution and feature-distribution, and we can get the similarity matrix of label-distribution and similarity matrix of feature-distribution .

(3)

where

(4)

Note that before calculating cosine similarity, we first decentralize the input variable by subtracting the mean of this variable for all nodes. Considering that not all nodes have an observed label-distribution, e.g., if all neighbors of node do not belong to the training set, then the observed label-distribution of is a zero vector. Obviously, this is not ideal, so we compensate for this with the feature-distribution of neighbors. In addition, we restrict the utilization condition of neighbor label-distribution by using a mask. Specifically, for node , we leverage its neighbor label-distribution only when the percentage of its neighbors in the training set is larger than a threshold :

(5)

is the mask vector, is the neighbor set of , is the set of nodes in the training set.

Then our similarity learner targets at learning the similarity of node-pairs based on the neighbor distribution. Specifically, the similarity learner first aggregates and transforms the feature of neighboring nodes and then uses the aggregated node representation to calculate cosine similarity for each node-pair:

(6)

denotes the similarity between and , and similarity of all node-pairs form a similarity matrix . In practice, we also optionally use the concatenation of distribution feature and (transformed feature of node itself) for similarity calculation in Eq. 6. Finally, we use the and calculated in advance to guide the training of . We have the following two objective functions with respect and :

(7)
(8)

In practice, we first use to reconstruct as the pretraining process and then further use to reconstruct under as the finetuning process.

4.2. A Scalable Implementation of DHGR

However, directly optimizing the objective function mentioned above has quadratic computational complexity. For node attributes , the complexity is unacceptable for large graphs when . So we design a scalable training strategy with stochastic mini-batch. Specifically, we randomly select node-pairs as a batch and optimize the similarity matrix by a -sized sliding window in each iteration. We can assign small numbers to . We give the pseudocode in Algorithm 1.

Input: graph , Node set , label , batch-size , min-percentage , max neighbor-ordinal , MaxIteration, Epoch1, Epoch2.
Output: Similarity matrix
1 for epoch from 1 to Epoch1 do
2       for i from 1 to MaxIteration do
3             Sample nodes , ;
4             Sample nodes , ;
5             Calculate the similarity matrix between and . (see Eq. 6 and Eq. 3);
6             Update with (see Eq. 7);
7            
8      
9Node set .
10 for epoch from 1 to Epoch2 do
11       Sample nodes , ;
12       Sample nodes , ;
13       Calculate the similarity matrix between and . (see Eq. 6 and Eq. 3);
14       Update with the gradient of );
15      
Obtain the entry with Eq. 6. Final similarity
Algorithm 1 Training DHGR with stochastic mini-batch

4.3. Graph Rewiring with Learned Similarity

After we obtain the similarity of each node-pair, we can use the learned similarity to rewire the graph. Specifically, we add edges between node-pairs with high-similarity and remove edges with low-similarity on the original graph. Three parameters are set to control this process: indicates the maximum number of edges that can be added for each node; constrains that the similarity of node-pairs to add edges mush larger than a threshold . Finally another parameter is set for pruning edges with similarity smaller than . The details of the Graph Rewiring process are given in Algorithm 2. Finally, we can feed the rewired graph into any GNN-based models for node classification tasks.

Input: original graph , learned similarity matrix , max number of added edges , growing threshold , pruning threshold .
Output: Rewired Graph
1 foreach node  do
2       Select nodes from which have top- largest similarity with to form a node set ;
3       Calculate candidate node set ;
4       Adding edges to ;
5      
6foreach  do
7       if  then
8             Remove edges from ;
9            
10      
Algorithm 2 Graph Rewiring with DHGR
Dataset Chameleon Squirrel Actor FB100 Flickr Cornell Texas Wisconsin Cora CiteSeer PubMed
Nodes 2277 5201 7600 41554 89250 183 183 251 2708 3327 19717
Edges 36101 217073 30019 2724458 899756 298 325 511 10556 9104 88648
Features 2325 2089 93 2 4814 500 1703 1703 1703 1433 3703 500
Classes 5 5 5 2 7 5 5 5 7 6 3
H.R. 23.5% 22.4% 21.9% 47.0% 31.9% 30.5% 10.8% 19.6% 81.0% 73.6% 80.2%
Table 1. The stastical information of the datasets used to evaluate our model. H.R. indicates the overall homophily ratio (Pei et al., 2020) of the dataset, which means the percentage of homophilic edges in all edges of the graph.

4.4. Complexity Analysis

We analyze the computational complexity of Algorithm 1 and Algorithm 2 with respect to the number of nodes . For Algorithm 1, the complexity of random sampling nodes is . Lets denote the feature dimension as and denote the one-hot label dimension as . Considering that the complexity of calculating cosine similarity between two -dimension vectors is , the complexity of calculating the similarity matrix , , is . The complexity of calculating and equals to . Therefore, the final computational complexity of one epoch of Algorithm 1 is where are two constants. For Algorithm 2, we use Ball-Tree to compute the top-K nearest neighbors, the complexity of one top-K query is . Therefore, the time complexity of the first -loop which performs the topK algorithm is approximately . The second -loop filters each edge in the original Graph and thus its complexity is . Therefore the final complexity of Algorithm 2 is .

GNN Model Chameleon Squirrel Actor Flickr FB100 Cornell Texas Wisconsin
GCN vanilla 37.683.06 26.390.88 28.900.57 49.680.45 74.340.20 55.563.21 61.961.27 52.357.07
DHGR 70.832.03 67.151.43 36.290.12 51.010.25 77.010.14 67.385.33 81.780.89 76.473.62
GAT vanilla 44.341.42 29.820.98 29.100.57 49.670.81 70.010.66 56.226.02 60.365.55 49.616.20
DHGR 72.112.87 62.371.78 34.710.48 50.400.09 79.415.13 70.096.77 83.783.37 73.204.89
GraphSAGE vanilla 49.061.88 36.731.21 35.070.15 50.210.31 75.990.09 80.082.96 82.032.77 81.363.91
DHGR 69.571.28 68.08 1.55 37.170.11 50.850.05 76.560.10 82.885.56 85.682.72 83.161.72
APPNP vanilla 40.442.02 29.201.45 30.020.89 49.050.10 74.220.11 56.764.58 55.106.23 54.596.13
DHGR 70.352.62 60.311.51 36.930.86 49.360.05 75.460.11 68.116.59 81.584.36 77.653.06
GCNII vanilla 57.372.35 39.511.63 31.050.14 50.340.22 77.060.12 61.705.91 62.437.37 52.754.23
DHGR 74.572.56 58.381.79 36.030.12 50.730.31 78.380.91 72.976.73 81.086.02 78.244.99
GPRGNN vanilla 41.561.66 30.031.11 35.720.19 49.760.10 78.580.23 72.786.05 69.371.27 76.085.86
DHGR 71.581.59 64.822.07 37.430.78 50.560.32 82.280.56 76.565.77 83.982.54 79.414.98
H2GCN vanilla 49.212.57 34.581.61 35.610.31 79.066.36 80.275.41 80.204.51
DHGR 69.191.913 72.241.52 36.510.67 82.066.27 84.865.01 85.015.51
Avg Gain 25.51 32.44 4.23 0.70 3.15 8.27 15.89 15.17
Table 2. Node classification accuracy (%) on the test set of heterophily graph datasets. The bold numbers indicate that our method improves the base model. The dash symbols indicate that we were not able to run the experiments due to memory issue.

5. Experiments

In this section, we first give the experimental configurations, including the introduction of datasets, baselines and setups used in this paper. Then we give the results of experiments comparing DHGR with other graph rewiring methods on the node classification task under transductive learning scenarios. Besides, we also conduct extensive hyper-parameter studies and ablation studies to validate the effectiveness of DHGR.

5.1. Datasets

We evaluate the performanes of DHGR and the existing methods on eleven real-world graphs. To demonstrate the effectiveness of DHGR , we select eight heterophily graph datasets (i.e. Chameleon, Squirrel, Actor, Cornell, Texas, Wisconsin (Pei et al., 2020), FB100 (Traud et al., 2012), Flickr (Zeng et al., 2019)) and three homophily graph datasets (i.e. Cora, CiteSeer, PubMed (Kipf and Welling, 2016)). The detailed information of these datasets are presented in the Table 1. For graph rewiring methods, we use both the original graphs and the rewired graphs as the input of GNN models to validate their performance on the node classification task.

5.2. Baselines

DHGR can be viewed as a plug-in module for other state-of-the-art GNN models. And we select five GNN models tackling homophily, including GCN (Kipf and Welling, 2016), GAT (Veličković et al., 2017), GraphSAGE (Hamilton et al., 2017), APPNP (Klicpera et al., 2018) and GCNII (Chen et al., 2020a). To demonstrate the significant improvement on heterophily graphs caused by DHGR, we also choose two GNNs tackling heterophily (i.e. GPRGNN (Chien et al., 2020), H2GCN (Zhu et al., 2020a)). Besides, to validate the effectiveness of DHGR as a graph rewiring method, we also compare DHGR with two Graph Structure Learning (GSL) methods (i.e. LDS (Franceschi et al., 2019) and IDGL (Chen et al., 2020c)) and one Graph Rewiring methods (i.e. SDRF (Topping et al., 2021)), which are all aimed at optimizing the graph structure. For GPRGNN and H2GCN, we use the implementation from the benchmark (Lim et al., 2021)

, and we use the official implementation of other GNNs provided by Torch Geometric. For all Graph Rewiring methods except SDRF whose code is not available, we all use their official implementations proposed in the original papers.

5.3. Experimental Setup

For datasets in this paper, we all use their public released data splits. For Chameleon, Squirrel, Actor, Cornell, Texas ,and Wisconsin, ten random generated splits of data are provided by (Pei et al., 2020)

, and we therefore train models on each data split with 3 random seeds for model initialization (totally 30 trails for each dataset) and finally we calculate the average and standard deviation of all 30 results. And we use the official splits of other datasets (i.e. Cora 

(Kipf and Welling, 2016), PubMed (Kipf and Welling, 2016), CiteSeer (Kipf and Welling, 2016), Flickr (Zeng et al., 2019), FB100 (Lim et al., 2021)) from the corresponding papers. We train our DHGR models with 200 epochs for pretraining and 30 epochs for finetuning in all datasets. And we search the hyper-parameters of DHGR in the same space for all datasets. (the max order of neighbors) is searched in {1, 2}, (the growing threshold) is searched in {3, 6, 8, 16} and (the pruning threshold) is searched in {0., 0.3, 0.6}, where we do not prune edges for homophily datasets which equals to set to -1.0. The batch size for training DHGR is searched in {5000, 10000}. For other GSL methods (i.e. LDS (Franceschi et al., 2019), IDGL (Chen et al., 2020c)) , we adjust their hyper-parameters according to the configurations used in their papers. For GNNs used in this paper, we adjust the hyper-parameters in the same searching space for fairness. We search the hidden dimensions in {32, 64} for all GNNs and set the number of model layers to 2 for GNNs except for GCNII (Chen et al., 2020a) which is designed with deeper depth and we search the number of layers for GCN2 in {2, 64} according to its official implementation. We train 200/300/400 epoch for all models and select the best parameters via the validation set. The learning rate is searched in {1e-2, 1e-3, 1e-4}, the weight decay is searched in {1e-4, 1e-3, 5e-3}, and we use Adam optimizer to optimize all the models on Nvidia Tesla V100 GPU.

5.4. Main Results

We conduct experiments of node classification task on both heterophily and homophily graph datasets, and the results are presented in Table 2 and Table 3 respectively. We evaluate the performance of DHGR by comparing the classification accuracy of GNN with original graphs and graphs rewired by DHGR respectively. We also calculate the average gain (AG) of DHGR for all models on each dataset. The formula of average gain is given as follows:

(9)

where is the set of GNN models. is the short form of accuracy. is the original graph and is the graph rewired by DHGR. We also compare the proposed DHGR with other Graph Rewiring methods on their performance and running time, and the results of different graph rewiring methods are reported in Table 4 and Fig. 5. By analyzing these results, we have the following observations:

(1) All GNNs enhanced by DHGR, including GNNs for homophily and GNNs for heterophily, outperform their vanilla versions on the eight heterophily graph datasets. The average gain of DHGR on heterophily graph can be up to 32.44% on Squirrel. However, vanilla GCN on Squirrel only has 26.39% classification accuracy on the test set. Even with the sate-of-the-art GNNs for heterophily (i.e. GPRGNN, H2GCN), an test accuracy of no more than 40% can be achieved. The H2GCN enhanced by DHGR can achieve an astonishing 72.24% test accuracy on Squirrel, almost doubling. For most other heterophily datasets, GNN with DHGR can provide significant accuracy improvements. It demonstrates the importance of graph rewiring strategy for improving GNN’s performance on heterophily graphs. Besides, the significant average gain by DHGR also demonstrates the effectiveness of DHGR. For large-scale and edge-dense datasets such as Flickr and FB100 (), graph rewiring with DHGR can still provide a competitive boost for GNNs, which verifies the effectiveness and scalability of DHGR on large-scale graphs.

GNN Model Cora CiteSeer PubMed
GCN vanilla 81.090.39 70.130.45 78.380.39
DHGR 82.700.41 70.790.12 79.100.33
GAT vanilla 81.900.73 69.600.63 78.10.63
DHGR 82.930.51 70.430.65 78.810.93
GraphSAGE vanilla 80.620.47 70.300.57 77.10.23
DHGR 81.300.26 71.110.65 77.630.16
APPNP vanilla 83.250.42 70.460.31 78.90.45
DHGR 83.860.40 71.600.35 79.610.53
GCNII vanilla 83.110.37 70.900.73 79.460.33
DHGR 83.930.28 71.960.67 79.490.39
Avg Gain 0.95 0.90 0.54
Table 3. Node classification accuracy (%) on the test set of homophily graphs. The bold numbers indicate that our method improves the base model.

(2) For homophily graphs (i.e., Cora, Citeseer, Pubmed), the proposed DHGR can still provide competitive gain of node classification performance for the GNNs. Note that homophily graphs usually have a higher homophily ratio (i.e. 81%, 74%, 80% for Cora, CiteSeer and PubMed), so even vanilla GCNs can achieve great results and thus the benefit of adjusting the graph structure to achieve a higher homophily ratio is less than that for heterophily graphs. To be specific, DHGR gains best average gain on Cora, e.g., the classification accuracy of vanilla GCN on Cora is improved from 81.1% to 82.6%. For another two datasets, DHGR also provide average gain no less than 0.5% accuracy for all GNN models. These results demonstrate that our method can provide significant improvements for heterophily graphs while maintaining competitive improvements for homophily graphs.

Methods Chameleon Squirrel Actor Texas
Vanilla GCN 37.683.06 26.390.88 28.900.57 61.961.27
RandAddEdge 32.176.06 22.775.05 26.682.26 55.851.68
RandDropEdge 39.012.47 26.481.09 29.540.36 66.761.52
37.013.36 27.892.28 29.571.17 60.082.13
LDS 36.122.89 28.021.78 27.580.97 58.755.57
IDGL 37.283.36 23.572.07 27.170.85 67.575.85
SDRF* 44.460.17 41.470.21 29.850.07 70.350.60
DHGR 70.832.03 67.151.43 36.290.12 81.780.89
Table 4. Node classification accuracy (%) of GCN with different graph rewiring methods. Model with * means we use the results from the original paper (under the same settings of datasets) for their code is unavailable. The bold numbers indicate that our method improves the base model.

(3) To demonstrate the effectiveness of DHGR as a method of graph rewiring, we also compare the proposed approach with other graph rewiring methods (i.e. LDS, IDGL, SDRF). Besides, we also use two random graph structure transformation by adding or removing edges on the original graph with a probability of 0.5, namely RandAddEdge and RandDropEdge. To validate the effect of adding edges between same-class nodes with training label, we also design a method named

that randomly adds edges between same-class nodes within the training set (for we can only observe labels of node in the training set) with a probability of 0.5. As shown in Table 4, GCN with DHGR outperform GCN with other graph transformation methods on the presented four heterophily datasets. Note that which only use training label to add edges, though increases the homophily ratio, it cannot add edges beyond nodes in the training set. Only adding homophilic edges within the training set cannot guarantee an improvement of GCN’s performance and make the nodes in the training set easier to distinguish, increasing the risk of overfitting. The significant improvements made by DHGR demonstrates the effectiveness of DHGR as a graph rewiring method.

(a) Cornell.
(b) Chameleon
(c) Actor
Figure 5. Running time of GCN with DHGR and other GSL methods (i.e. LDS, IDGL). Note that for DHGR, we use the sum of the running time of DHGR and vanilla GCN as the final running of DHGR for fair comparison with GSL methods. We train 200 epoch for all methods.

(4) Note that the traditional paradigm of GSL methods (e.g., LDS, IDGL.) is training a graph learner and a GNN through an end2end manner and based on the dense matrix optimization, which have larger complexity. The running time of DHGR and two other GSL methods is presented in Fig. 5, we find that the running time of DHGR is significantly smaller than that of GSL methods under the same device environment. We did not present the running time of SDRF because its code has not been released publicly yet.

5.5. Hyper-Parameter Study

To demonstrate the robustness of the proposed approach, we study the effect of the four main hyper-parameter of DHGR, i.e. Batchsize, (maximum number of added edges for each node), (the threshold of lowest-similarity when adding edges) and training ratio of datasets in this section.

Dataset Squirrel FB100
Batchsize
100100 64.57 64.01 63.31 75.36 75.02 74.78
10001000 66.01 65.68 64.53 76.21 76.30 75.01
50005000 66.57 66.21 66.17 76.58 76.37 75.97
1000010000 67.79 67.66 66.32 77.32 76.57 76.32
67.79 67.66 66.32 77.23 76.87 76.21
Table 5. Node classification of GCN enhanced by DHGR with different training ratio and batch size. For each dataset under certain training ratio, we randomly generate 3 data splits and calculate the average accuracy.

5.5.1. The effect of batchsize and training set ratio

Table 5 shows the results of GCN with DHGR on two heterophily datasets varying with different batchsize for DHGR and training ratio (percentage of nodes in the training set.). The batchsize is ranging from to , where is the number of nodes and indicates using full-batch for training. Note that for the Squirrel dataset which has only 5201 nodes, the batchsize of equals full-batch. The results in Table 5 show that the proposed approach has stable improvements with different batchsize and training ratio. To be specific, GCN with DHGR only has a 3% decrease in accuracy when decreasing the batchsize to 100, which is extremely small and with no more than 2% decrease in accuracy with training ratio ranging from 40% to 10%. Besides, we usually set the batchsize of DHGR ranging from 5000 to 10000 in real applications, because the overhead of 1000010000 matrix storage and operation is completely acceptable. These results demonstrate the robustness of DHGR when adjusting the batchsize and training ratio.

(a) Homophily Ratio of rewired graphs
(b) GCN Accuracy on rewired graphs
Figure 6. Results of experiments with different and . is the maximum number of edges that can be added for each node. is the minimum similarity threshold of node-pairs between which edges can be added. Note that we remove all edges on the original graph for this experiment and only to verify the effects of edges added by DHGR.

5.5.2. The effect of and

We have two important hyper-parameters when rewiring graphs with DHGR, the maximum number of edges added for each node (denoted as ) and the threshold of lowest-similarity when adding edges (denoted as ). Given the learned similarity by DHGR, the two hyper-parameters almost determines the degree and homophily ratio of the rewired graph. Motivated by the obversations presented in Sec. 3, we verify the effectiveness of DHGR for graph rewiring by using different and . Fig. 6 (a) shows the homophily ratio of rewired graphs using different and and Fig. 6 (b) shows the node classification accuracy of GCN on the rewired graphs using different and . We observe that the homophily ratio usually increases when increasing with fixed , while decreases when increasing with fixed . Besides, the change of GCN node classification accuracy basically matches the change of homophily ratio with different and . This demonstrates the effectiveness and robustness of the rewired graphs learned by DHGR.

5.6. Ablation Study

Considering that DHGR leverage three different types of information (i.e. raw feature, label-distribution, feature-distribution), we also verify the effectiveness of each type of formation by removing them from DHGR and designing three variants of it. indicates removing the using of neighbor’s label-distribution (the finetuning process). indicates removing the using of neighbor’s feature-distribution (the pretraining process). indicates do not use the concatenation of distribution feature and (transformed feature of node itself) for similarity calculation in Eq. 6 (only use the distribution feature ). As shown in Table 6, the node classification of GCN with rewired graphs from almost all variants deteriorates to some extent on the four selected datasets (i.e. Cora, Cornell, Texas, FB100). For the Texas dataset, the results of that do not utilize neighbors feature-distribution have slight improvement over the full DHGR and we think it is caused by the poor performance of feature-distribution reflected by the results of , which only leverages the feature-distribution and feature of node itself on this dataset. And the result of DHGR on Texas dataset only decreases slightly with 0.2% accuracy compared with . The results of the ablation study demonstrate the effectiveness of neighbor label-distribution for modeling heterophily graphs. Also, it demonstrates that the proposed approach makes full use of the useful information from neighbor distribution and raw feature.

Methods Cora Cornell Texas FB100
80.970.05 65.385.53 79.671.79 75.950.16
81.30.13 67.086.08 82.021.06 76.680.56
81.70.11 62.214.49 67.851.02 75.650.26
DHGR 82.630.41 67.385.33 81.780.89 77.010.14
Table 6. Node classification accuracy (%) of the ablation studies to compare GCN with DHGR and its variants which remove certain component from the original DHGR architecture.

6. Related work

6.1. Graph Representation Learning

Graph Neural Networks (GNNs) have been popular for modeling graph data (Bi et al., 2022; Yang et al., 2021; Chen et al., 2021; Wang et al., 2019b; Du et al., 2022a). GCN (Kipf and Welling, 2016) proposed to use graph convolution based on neighborhood aggregation. GAT (Veličković et al., 2017) proposed to use attention mechanism to learn weights for neighbors. GraphSAGE (Hamilton et al., 2017) was proposed with graph sampling for inductive learning on graphs. These early methods are designed for homophily graphs, and they perform poorly on heterophily graphs. Recently, some studies (Abu-El-Haija et al., 2019; Pei et al., 2020; Zhu et al., 2020a; Chien et al., 2020; Du et al., 2022b) propose to design GNNs for modeling heterophily graphs. MixHop (Abu-El-Haija et al., 2019) was proposed to aggregate representations from multi-hops neighbors to alleviate heterophily. Geom-GCN (Pei et al., 2020) proposed a bi-level aggregation scheme considering both node embedding and structural neighborhood. GPR-GNN(Chien et al., 2020) proposed to adaptively learn the Generalized PageRank (GPR) weights to jointly optimize node feature and structural information extraction. More recently, GBK-GNN (Du et al., 2022b) was designed with bi-kernels for homophilic and heterophilic neighbors respectively.

6.2. Graph Rewiring

The traditional message passing GNNs usually assumes that messages are propagated on the original graph (Kipf and Welling, 2016; Veličković et al., 2017; Hamilton et al., 2017; Chen et al., 2020a). Recently, there is a trend to decouple the input graph from the graph used for message passing. For example, graph sampling methods for inductive learning (Hamilton et al., 2017; Zhang et al., 2019), motif-based methods (Monti et al., 2018) or graph filter leveraging multi-hop neighbors (Abu-El-Haija et al., 2019), or changing the graph either as a preprocessing step (Klicpera et al., 2019; Alon and Yahav, 2020) or adaptively for the downstream task (Kazi et al., 2022; Wang et al., 2019a). Besides, Graph Structure Learning (GSL) methods (Li et al., 2018; Franceschi et al., 2019; Chen et al., 2020c; Zhu et al., 2022; Gao et al., 2020; Wan and Kokel, 2021) aim at learning an optimized graph structure and its corresponding node representations jointly. Such methods of changing graphs for better performance of downstream tasks are often generically named as graph rewiring (Topping et al., 2021). The works of (Alon and Yahav, 2020; Topping et al., 2021) proposed rewiring the graph as a way of reducing the bottleneck, which is a structural property in the graph leading to over-squashing. Some GSL methods (Wan and Kokel, 2021; Gao et al., 2020) directly make adjacent matrix a learnable parameter and optimize it with GNN. Other GSL methods (Franceschi et al., 2019; Chen et al., 2020c) use a bilevel optimization pipeline, in which the inner loop denotes the downstream tasks and the outer loop learns the optimal graph structure with a structure learner. Some studies (Ying et al., 2021; Dwivedi et al., 2021) also use transformer-like GNNs to construct global connections between all nodes. However, both GSL methods and graph transformer-based methods usually have a higher time and space complexity than other graph rewiring methods. Most of existing Graph Rewiring methods are under the similar assumption (e.g., sparsity (Louizos et al., 2017), low-rank (Zhu et al., 2020b), smoothness (Ortega et al., 2018; Kalofolias, 2016)) on graphs. However, the property of low-rank and smoothness are not satisfied by heterophily graphs. Thus, graph rewiring methods for modeling heterophily graphs still need to be explored.

7. Conclusion

In this paper, we propose a new perspective of modeling heterophily graphs by graph rewiring, which targets at improving the homophily ratio and degree of the original graphs and making GNNs gain better performance on the node classification task. Besides, we design a learnable plug-in module of graph rewiring for heterophily graphs namely DHGR which can be easily plugged into any GNN models to improve their performance on heterophily graphs. DHGR improves homophily of graph by adjusting structure of the original graph based on neighbor’s label-distribution. And we design a scalable optimization strategy for training DHGR to guarantee a linear computational complexity. Experiments on eleven real-world datasets demonstrate that DHGR can provide significant performance gain for GNNs under heterophily, while gain competitive performance under homophily. The extensive ablation studies further demonstrate the effectiveness of the proposed approach.

References

  • S. Abu-El-Haija, B. Perozzi, A. Kapoor, N. Alipourfard, K. Lerman, H. Harutyunyan, G. Ver Steeg, and A. Galstyan (2019) Mixhop: higher-order graph convolutional architectures via sparsified neighborhood mixing. In Proceedings of International Conference on Machine :earning, pp. 21–29. Cited by: §1, §6.1, §6.2.
  • U. Alon and E. Yahav (2020) On the bottleneck of graph neural networks and its practical implications. arXiv preprint arXiv:2006.05205. Cited by: §1, §6.2.
  • W. Bi, L. Du, Q. Fu, Y. Wang, S. Han, and D. Zhang (2022)

    MM-gnn: mix-moment graph neural network towards modeling neighborhood feature distribution

    .
    arXiv preprint arXiv:2208.07012. Cited by: §6.1.
  • M. Chen, Z. Wei, Z. Huang, B. Ding, and Y. Li (2020a) Simple and deep graph convolutional networks. In Proceedings of International Conference on Machine Learning, pp. 1725–1735. Cited by: §5.2, §5.3, §6.2.
  • X. Chen, L. Du, M. Chen, Y. Wang, Q. Long, and K. Xie (2021) Fast hierarchy preserving graph embedding via subspace constraints. In ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 3580–3584. Cited by: §6.1.
  • X. Chen, Y. Zhang, L. Du, Z. Fang, Y. Ren, K. Bian, and K. Xie (2020b) TSSRGCN: temporal spectral spatial retrieval graph convolutional network for traffic flow forecasting. In 2020 IEEE International Conference on Data Mining (ICDM), Vol. , pp. 954–959. External Links: Document Cited by: §1.
  • Y. Chen, L. Wu, and M. Zaki (2020c) Iterative deep graph learning for graph neural networks: better and robust node embeddings. Advances in Neural Information Processing Systems 33, pp. 19314–19326. Cited by: §1, §5.2, §5.3, §6.2.
  • E. Chien, J. Peng, P. Li, and O. Milenkovic (2020) Adaptive universal generalized pagerank graph neural network. In Proceedings of International Conference on Learning Representations, Cited by: §1, §5.2, §6.1.
  • L. Du, X. Chen, F. Gao, Q. Fu, K. Xie, S. Han, and D. Zhang (2022a) Understanding and improvement of adversarial training for network embedding from an optimization perspective. In Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining, pp. 230–240. Cited by: §6.1.
  • L. Du, F. Gao, X. Chen, R. Jia, J. Wang, J. Zhang, S. Han, and D. Zhang (2021) TabularNet: a neural network architecture for understanding semantic structures of tabular data. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, pp. 322–331. Cited by: §1.
  • L. Du, X. Shi, Q. Fu, X. Ma, H. Liu, S. Han, and D. Zhang (2022b) GBK-gnn: gated bi-kernel graph neural networks for modeling both homophily and heterophily. In Proceedings of the ACM Web Conference 2022, pp. 1550–1558. Cited by: §1, §1, §6.1.
  • L. Du, G. Song, Y. Wang, J. Huang, M. Ruan, and Z. Yu (2018) Traffic events oriented dynamic traffic assignment model for expressway network: a network flow approach. IEEE Intelligent Transportation Systems Magazine 10 (1), pp. 107–120. Cited by: §1.
  • V. P. Dwivedi, A. T. Luu, T. Laurent, Y. Bengio, and X. Bresson (2021) Graph neural networks with learnable structural and positional representations. arXiv preprint arXiv:2110.07875. Cited by: §6.2.
  • L. Franceschi, M. Niepert, M. Pontil, and X. He (2019) Learning discrete structures for graph neural networks. In Proceedings of International conference on machine learning, pp. 1972–1982. Cited by: §1, §5.2, §5.3, §6.2.
  • X. Gao, W. Hu, and Z. Guo (2020) Exploring structure-adaptive graph learning for robust semi-supervised classification. In Proceedings of 2020 IEEE International Conference on Multimedia and Expo (ICME), pp. 1–6. Cited by: §6.2.
  • W. L. Hamilton, R. Ying, and J. Leskovec (2017) Inductive representation learning on large graphs. In Proceedings of the 31st International Conference on Neural Information Processing Systems, pp. 1025–1035. Cited by: §1, §5.2, §6.1, §6.2.
  • V. Kalofolias (2016) How to learn a graph from smooth signals. In

    Proceedings of Artificial Intelligence and Statistics

    ,
    pp. 920–929. Cited by: §1, §1, §3.2, §6.2.
  • A. Kazi, L. Cosmo, S. Ahmadi, N. Navab, and M. Bronstein (2022) Differentiable graph module (dgm) for graph convolutional networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, pp. 1–1. Cited by: §6.2.
  • T. N. Kipf and M. Welling (2016) Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907. Cited by: §1, §3.1, §5.1, §5.2, §5.3, §6.1, §6.2.
  • J. Klicpera, A. Bojchevski, and S. Günnemann (2018) Predict then propagate: graph neural networks meet personalized pagerank. arXiv preprint arXiv:1810.05997. Cited by: §5.2.
  • J. Klicpera, S. Weißenberger, and S. Günnemann (2019) Diffusion improves graph learning. arXiv preprint arXiv:1911.05485. Cited by: §6.2.
  • R. Li, S. Wang, F. Zhu, and J. Huang (2018)

    Adaptive graph convolutional neural networks

    .
    In Proceedings of the AAAI Conference on Artificial Intelligence, pp. 3546–3553. Cited by: §6.2.
  • D. Lim, X. Li, F. Hohne, and S. Lim (2021) New benchmarks for learning on non-homophilous graphs. arXiv preprint arXiv:2104.01404. Cited by: §5.2, §5.3.
  • C. Louizos, M. Welling, and D. P. Kingma (2017) Learning sparse neural networks through regularization. arXiv preprint arXiv:1712.01312. Cited by: §1, §6.2.
  • F. Monti, K. Otness, and M. M. Bronstein (2018) Motifnet: a motif-based graph convolutional network for directed graphs. In

    Proceedings of 2018 IEEE Data Science Workshop (DSW)

    ,
    pp. 225–228. Cited by: §6.2.
  • A. Ortega, P. Frossard, J. Kovačević, J. M. Moura, and P. Vandergheynst (2018) Graph signal processing: overview, challenges, and applications. Proceedings of the IEEE 106 (5), pp. 808–828. Cited by: §1, §1, §3.2, §6.2.
  • H. Pei, B. Wei, K. C. Chang, Y. Lei, and B. Yang (2020) Geom-gcn: geometric graph convolutional networks. arXiv preprint arXiv:2002.05287. Cited by: §1, §1, §3.1, Table 1, §5.1, §5.3, §6.1.
  • G. Song, Y. Li, J. Wang, and L. Du (2020) Inferring explicit and implicit social ties simultaneously in mobile social networks. Science China Information Sciences 63 (4), pp. 1–3. Cited by: §1.
  • J. Topping, F. Di Giovanni, B. P. Chamberlain, X. Dong, and M. M. Bronstein (2021) Understanding over-squashing and bottlenecks on graphs via curvature. arXiv preprint arXiv:2111.14522. Cited by: §1, §5.2, §6.2.
  • A. L. Traud, P. J. Mucha, and M. A. Porter (2012) Social structure of facebook networks. Physica A: Statistical Mechanics and its Applications 391 (16), pp. 4165–4180. Cited by: §5.1.
  • P. Veličković, G. Cucurull, A. Casanova, A. Romero, P. Lio, and Y. Bengio (2017) Graph attention networks. arXiv preprint arXiv:1710.10903. Cited by: §1, §5.2, §6.1, §6.2.
  • G. Wan and H. Kokel (2021) Graph sparsification via meta-learning. DLG@ AAAI. Cited by: §6.2.
  • T. Wang, R. Wang, D. Jin, D. He, and Y. Huang (2021) Powerful graph convolutioal networks with adaptive propagation mechanism for homophily and heterophily. arXiv preprint arXiv:2112.13562. Cited by: §1.
  • Y. Wang, L. Du, E. Shi, Y. Hu, S. Han, and D. Zhang (2020) Cocogum: contextual code summarization with multi-relational gnn on umls. Microsoft, Tech. Rep. MSR-TR-2020-16. Cited by: §1.
  • Y. Wang, Y. Sun, Z. Liu, S. E. Sarma, M. M. Bronstein, and J. M. Solomon (2019a) Dynamic graph cnn for learning on point clouds. Acm Transactions On Graphics (tog) 38 (5), pp. 1–12. Cited by: §6.2.
  • Y. Wang, L. Du, G. Song, X. Ma, L. Jin, W. Lin, and F. Sun (2019b)

    Tag2Gauss: learning tag representations via gaussian distribution in tagged networks.

    .
    In IJCAI, pp. 3799–3805. Cited by: §6.1.
  • Y. Yan, M. Hashemi, K. Swersky, Y. Yang, and D. Koutra (2021) Two sides of the same coin: heterophily and oversmoothing in graph convolutional neural networks. arXiv preprint arXiv:2102.06462. Cited by: §1.
  • S. Yang, G. Song, Y. Jin, and L. Du (2021) Domain adaptive classification on heterogeneous information networks. In Proceedings of the Twenty-Ninth International Conference on International Joint Conferences on Artificial Intelligence, pp. 1410–1416. Cited by: §6.1.
  • D. Yao, H. Hu, L. Du, G. Cong, S. Han, and J. Bi (2022) TrajGAT: a graph-based long-term dependency modeling approach for trajectory similarity computation. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pp. 2275–2285. Cited by: §1.
  • C. Ying, T. Cai, S. Luo, S. Zheng, G. Ke, D. He, Y. Shen, and T. Liu (2021) Do transformers really perform badly for graph representation?. Advances in Neural Information Processing Systems 34. Cited by: §6.2.
  • H. Zeng, H. Zhou, A. Srivastava, R. Kannan, and V. Prasanna (2019) Graphsaint: graph sampling based inductive learning method. arXiv preprint arXiv:1907.04931. Cited by: §5.1, §5.3.
  • Y. Zhang, S. Pal, M. Coates, and D. Ustebay (2019) Bayesian graph convolutional neural networks for semi-supervised classification. In Proceedings of the AAAI conference on artificial intelligence, Vol. 33, pp. 5829–5836. Cited by: §6.2.
  • J. Zhu, Y. Yan, L. Zhao, M. Heimann, L. Akoglu, and D. Koutra (2020a) Beyond homophily in graph neural networks: current limitations and effective designs. Advances in Neural Information Processing Systems 33, pp. 7793–7804. Cited by: §1, §1, §5.2, §6.1.
  • Y. Zhu, W. Xu, J. Zhang, Y. Du, J. Zhang, Q. Liu, C. Yang, and S. Wu (2022) A survey on graph structure learning: progress and opportunities. arXiv preprint arXiv:2103.03036. Cited by: §6.2.
  • Y. Zhu, Y. Xu, F. Yu, S. Wu, and L. Wang (2020b) Cagnn: cluster-aware graph neural networks for unsupervised graph representation learning. arXiv preprint arXiv:2009.01674. Cited by: §1, §6.2.