Log In Sign Up

Effective Eigendecomposition based Graph Adaptation for Heterophilic Networks

by   Vijay Lingam, et al.

Graph Neural Networks (GNNs) exhibit excellent performance when graphs have strong homophily property, i.e. connected nodes have the same labels. However, they perform poorly on heterophilic graphs. Several approaches address the issue of heterophily by proposing models that adapt the graph by optimizing task-specific loss function using labelled data. These adaptations are made either via attention or by attenuating or enhancing various low-frequency/high-frequency signals, as needed for the task at hand. More recent approaches adapt the eigenvalues of the graph. One important interpretation of this adaptation is that these models select/weigh the eigenvectors of the graph. Based on this interpretation, we present an eigendecomposition based approach and propose EigenNetwork models that improve the performance of GNNs on heterophilic graphs. Performance improvement is achieved by learning flexible graph adaptation functions that modulate the eigenvalues of the graph. Regularization of these functions via parameter sharing helps to improve the performance even more. Our approach achieves up to 11 heterophilic graphs.


page 1

page 2

page 3

page 4


Simple Truncated SVD based Model for Node Classification on Heterophilic Graphs

Graph Neural Networks (GNNs) have shown excellent performance on graphs ...

Incorporating Heterophily into Graph Neural Networks for Graph Classification

Graph neural networks (GNNs) often assume strong homophily in graphs, se...

A Piece-wise Polynomial Filtering Approach for Graph Neural Networks

Graph Neural Networks (GNNs) exploit signals from node features and the ...

Beyond Low-frequency Information in Graph Convolutional Networks

Graph neural networks (GNNs) have been proven to be effective in various...

Adaptive Kernel Graph Neural Network

Graph neural networks (GNNs) have demonstrated great success in represen...

My Body is a Cage: the Role of Morphology in Graph-Based Incompatible Control

Multitask Reinforcement Learning is a promising way to obtain models wit...

1 Introduction

Homophily (McPherson et al., 2001) is a principle in sociology that suggests that similarity breeds connections in real life. In the context of semi-supervised classification, this implies that nodes with similar labels are likely to be connected. Several real-world networks exhibit homophily; for example, people on a social network connect based on similar interests. Several real-world networks exhibit the opposite as well, which is heterophily. For example, the Wikipedia page on homophily is linked to other pages from sociology and connected to various pages from mathematics, graph theory, and statistics.

Graph Neural Networks (GNNs) (Kipf and Welling, 2017; Hamilton et al., 2017; Veličković et al., 2018) leverage network information along with node features to improve their semi-supervised classification performance. GNNs are primarily dependent on network homophily to be able to give improved performance. For heterophilic networks, their performance can degrade significantly. Several approaches have been proposed in the literature to mitigate this degradation in performance in the presence of heterophily. Pei et al. (2020) aggregates both over the graph neighbourhood and the neighbours in the latent space. However, the neighbours still influence the self-embedding of the central node and could bring in noise.  Zhu et al. (2020) keeps the self-embedding separate from the neighbour embeddings during aggregation, while also similarly incorporating higher-order neighbour embeddings.  Kim and Oh (2021)

proposed several simple attention models trained on an additional auxiliary task.

Bo et al. (2021) proposes to learn an attention mechanism that captures the proportion of low-frequency and high-frequency signals per edge. Chien et al. (2021) proposes an adaptive polynomial filter to pick up which low-frequency or high-frequency signals are helpful for the task. There have also been other approaches involving label-label compatibility matrix (Zhu et al., 2021).

For homophilic networks, existing models (Klicpera et al., 2019; Chien et al., 2021) already prove to be excellent. Our interest lies in heterophilic networks. We mainly focus on the class of methods that aim at modifying or adapting the graph to obtain better performance (Kim and Oh, 2021; Bo et al., 2021; Chien et al., 2021) in heterophilic networks. The more recent among these approaches adjust the eigenvalues of the graph to learn improved representation. Another interpretation for this adaptation is that it is suppressing some eigenvectors while accentuating others. We use this insight and propose simple yet effective methods of weighting the eigenvectors to improve task performance. We make the following contributions in this work:

  1. We present a simple eigendecomposition based approach and propose EigenNetwork models to learn flexible graph adaptation functions. We show that our models achieve significantly improved performance where the graphs are heterophilic.

  2. While most GNNs are aggregation-based models, we propose a simple and efficient concatenation model that is quite competitive with neighborhood aggregation models on several datasets.

  3. Finally, we propose a weight-tying based regularization to learn better adaptation functions to avoid any possible over-fitting, when learning from limited data.

  4. We conduct extensive experimentation and our approach achieves up to 11% improvement in performance over the state-of-the-art methods on heterophilic graphs.

In the following sections, we discuss related works (Section 2). We motivate our work with the recently proposed model (Chien et al., 2021) in Section 3. We give details about our proposed approach in Section 4. Finally, we give our experiment results, ablative studies and conclusion in Sections 5 and 6.

2 Related Works

In recent times, Graph Neural Networks (GNNs) have become an increasingly popular method for semi-supervised classification with graphs. Bruna et al. (2014) set the stage for early GNN models followed by various modifications (Defferrard et al., 2016; Kipf and Welling, 2017). GCN (Kipf and Welling, 2017) provided the fastest and simplest variant, where the convolution operation reduces to aggregating features over the neighbourhood. Improving the aggregation mechanism (Hamilton et al., 2017; Veličković et al., 2018) and incorporating random walk information (Abu-El-Haija et al., 2019b, a; Li et al., 2018) gave further improvements in these models, but they still suffered from over smoothing. To circumvent this problem, APPNP (Klicpera et al., 2019) proposed an approach derived from personalized Pagerank.

Most of the development in the GNNs were for homophilic graphs, and they performed poorly in heterophily setting. One of the early works to address heterophily in GNNs was Geom-GCN (Pei et al., 2020). They identified two key weaknesses in GNNs in the context of heterophily. Firstly, since the aggregation over the neighbourhood is permutation-invariant, it is difficult to identify which neighbours contribute positively and negatively to the final performance. Secondly, long-range information is difficult to aggregate. To mitigate these issues, they proposed aggregating over two sets of the neighbourhood - one from the graph and the other inferred in the latent space. H2GCN (Zhu et al., 2020) proposed to separate the self-embeddings from neighbour embeddings. To avoid mixing of information, they concatenate self-embeddings and neighbour embeddings instead of aggregating them. Higher-order neighbourhood embeddings are similarly combined to capture long-range information.

Recent approaches address the weaknesses by adapting the graph itself. SuperGAT (Kim and Oh, 2021) gave several simple attention models trained on the classification and an additional auxiliary task. They suggest that these attention models can improve model performance across several graphs with varying homophily scores. FAGCN (Bo et al., 2021) uses the attention mechanism and learns the weight of an edge as the difference in the proportion of low-frequency and high-frequency signals. They empirically show that negative edge-weights identify edges that connect nodes with different labels. GPR-GNN (Chien et al., 2021) takes the idea proposed in APPNP and generalizes the Pagerank model that works well for graphs with varying homophily scores. Our proposed approach is closely related to these methods that adapt graph for the task at hand. Mainly, we take inspiration from the GPR-GNN. We show that GPR-GNN effectively adapts the eigenvalues of the graph for the desired task via the polynomial with learnable coefficients (Section 4). We propose to replace this polynomial with a graph adaptation function, which allows our eigengraph based network model (EigenNetwork) to learn any sharp changes in the importance of eigenvectors. It enables our model to give better performance on several datasets. Additionally, we observe that instead of aggregating features over the graph, we can get competitive performance with simpler models in some datasets if we concatenate our adapted graph with the features.

In the following section, we give a brief overview of the problem setting and GPR-GNN model focussing on the key elements and ideas to motivate our work.

3 Problem Setup and Motivation

We focus on the problem of semi-supervised node classification on a simple graph , where is the set of vertices and is the set of edges. Let be the adjacency matrix associated with , where is the number of nodes. Let be the set of all possible class labels. Let be the -dimensional feature matrix for all the nodes in the graph. Given a training set of nodes whose labels are known, along with and , our goal is to predict the labels of the remaining nodes. The proportion of edges that connect two nodes with the same labels in a graph is called the homophily score of the graph. In our problem, we are particularly concerned with graphs that exhibit low homophily scores. In the next sub-section, we provide background material on the GPR-GNN modelling method and its approach to graph adaptation.

3.1 Gpr-Gnn Model

The GPR-GNN (Chien et al., 2021) model consists of two core components: (a) a non-linear network that transforms raw feature input : and (b) a generalized page ranking (GPR) component, , that essentially aggregates the transformed output recursively as: . Notice that there is no nonlinear operation involved after each aggregation step over . Therefore, the functionality of the GPR component can be written using an operator defined as: and we obtain aggregated node embedding by applying on the nonlinear network output: . Using eigendecomposition, , Chien et al. (2021) presented an interpretation that the GPR component essentially performs a graph filtering operation: where is a polynomial graph filter applied element-wise and where is the eigen value. As explained in Chien et al. (2021), learning filter coefficients (i.e.,) help to get improved performance. Since the coefficients can take negative values the GPR-GNN model is able to capture high-frequency components of the graph signals, enabling the model to achieve improved performance on heterophilic graphs. In the following sub-section, we analyze the importance of eigenvectors in heterophilic graphs.

3.2 Eigenvector Analysis

We conduct the following experiment to study why adapting eigenvalues is important in heterophilic settings empirically. We consider the eigendecomposition of given graph as and we assign the node features as . The columns in this feature matrix are essentially the eigenvectors. We construct new training data with these features using the known labels from training set 

. We train a Logistic Regression model with this data. Figure 

1 plots the per-class (indicated by different colors) weights learnt by the model. The x-axis is the index of the eigenvalues sorted in descending order. The y-axis denotes the weight assigned to the corresponding eigenvector. These weights give the importance of eigenvectors. These plots show the spread of useful signals across the spectrum. For instance, the Chameleon and Crocodile dataset exhibit a dumbbell-like distribution. Both low-frequency and high-frequency components have high (absolute) weights suggesting that selecting/weighing eigenvectors is the key. Correctly selecting/weighing them will give us a good performance on heterophilic graphs. In the next section, we give our proposed approach for adapting the graph based on the analysis presented here.

(a) Crocodile
(b) Chameleon
(c) Squirrel
Figure 1: Weight Plots

4 Proposed Approach: Eigen Network Models

In this section, we present an alternate interpretation of the GPR-GNN model and suggest a simple eigendecomposition based graph adaptation approach. We present several model variants, each one of them is motivated by considering different aspects of the problem.

4.1 Eigen Network Model

We start by observing closely the GPR component output given by:


Our first observation is that learning filter coefficients is equivalent to learning a new graph, which is dependent on the fixed set of eigenvectors and eigenvalues, but, parameterised using . Therefore, the GPR-GNN model may be interpreted as adapting the original adjacency matrix . Next, as noted in the previous section, using the structures present in eigendecomposition and polynomial function, we can expand (1) by unrolling over eigenvalues and interchanging the summation as:


Since is dependent only on our proposal is to replace the polynomial function with any general smooth function, , which need not be a polynomial and we call as graph adaptation function. We discuss several choices of such functions shortly. We rewrite (2) in matrix form, after substituting the graph adaptation matrix and leaving out the input embedding () as:


We refer to (3) as EigenNetwork as it involves eigenvectors and learnable graph adaptation function that is dependent on eigenvalues. This network forms the basis of our eigendecomposition based modeling approach. Note that (3) is essentially a single layer network.

Choice of Graph Adaptation Function. We use graph adaptation function that is a non-negative function of eigenvalue. One general function is:

where we use subscripts to differentiate scaling and exponent activation functions. We find that

is useful for both scaling and exponentiation, and is also useful for scaling purpose. Composition with two functions is quite flexible and helps to adapt for graphs having diverse eigenvalue decay rates. Note that we can recover original eigenvalues for suitable choices of and

. There may be other choices of graph adaptive functions that perform better. From a practical viewpoint, several functions can be evaluated with the best function selected using traditional hyperparameter optimization strategy.

Regularization of Graph Adaptation Function. While it is possible to use separate set of parameters for each eigenvalue, the number of model parameters can go up significantly, depending on the number of eigenvectors. We can mitigate this problem, for example, by using same parameter, for several eigenvalues. Weight tying is a popular mechanism of regularization used in CNN (Lecun et al., 1998), statistical relational learning (Getoor and Taskar, 2007), Markov logic networks (Domingos and Lowd, 2009), probabilistic soft logic (Bach et al., 2017)

, language models 

(Vaswani et al., 2017) etc. In Figure 1, we observe that nearby eigenvalues tend to have similar importance. Based on this observation, we divide the sorted set of eigenvalues in fixed-length bins and assign one variable to each bin. We call this model as RegEigen-EigenNetwork. This allows us to reduce the number of learnable parameters in the model and offer a simple but effective regularization.

Computational Aspect.

One main difficulty arises when the graph adaptation parameters are jointly learned with the nonlinear feature transformation and classifier model weights. This is because computing node embedding (

1) using (3) to transform input features (i.e., ) involves dense matrix multiplications and is expensive. We can reduce the computational cost in several ways: (1) We use raw input features . This helps to pre-compute projected features, . The other possibility is to use pre-trained feature embedding and use . (2) It also helps to reduce the dimension of input features whenever the dimension of raw features or pre-trained embedding is high. We provide more details in Section 4.2. (3) Since we are adapting the eigenvalues we may not need a large number of eigenvectors to get good performance, as validated in our experiments.

Remarks. We observe that the learned graph is nothing but and is dense with all nodes in the graph are essentially used to learn node embedding. From the experimental study presented in Section 3.2 we see that the adaptation function required to improve performance can be quite complex for heterophily graphs. Therefore, GPR-GNN that uses polynomial functions to learn the adaptation function may not be able to adapt the graph effectively in some situations. On the other hand, our approach has the ability to learn more complex adaptation function and is expected to perform better. It is worth noting that the necessity of negative edges has been highlighted in Bo et al. (2021) and Chien et al. (2021) using graph filtering concept. Bo et al. (2021) use attention mechanism to learn negative edges. On the other hand, Chien et al. (2021) uses polynomial function with negative weights to obtain negative edges. In contrast, our approach to learn negative edges is quite different. It emerges from adapting eigengraphs () starting with good initialization obtained from approximating the graph using eigendecomposition. Our experimental results show that the proposed approach is highly powerful and outperforms both FAGCN and GPR-GNN on several heterophily datasets.

4.2 Eigen-Eigen Network Model

It is often useful to reduce dimension of raw features using principal component analysis. Let

be the eigendecomposition of . Using this decomposition, we define parameterised node embedding for as: . Upon substituting node embedding for raw features in the EigenNetwork model (3), we get:


where . We refer (4) as Eigen-EigenNetwork as it involves eigenvectors of both and , and, involves learning weights for eigenvectors. Making use of the fact that and are diagonal, we rewrite (4) as:


where and denote element-wise product operation. and

denote vectors of diagonal entries. Model learning involves learning

using labeled data by optimizing cross-entropy loss function. In this paper, we are primarily interested in adapting the graph. Therefore, we optimize only in (5), keeping fixed to eigen values. We leave conducting an experimental study with optimization as a future work.

4.3 Eigen-Concat Models

We suggest a simple alternate modeling approach that works quite well for heterophilic graphs. The motivation for this approach is two-fold. While conducting the empirical study discussed in Section 3.2, we found that eigenvectors of alone can give good performance for heterophily graphs and neighborhood aggregation using traditional methods only degrade the performance. Furthermore, difficulties arise when and are incompatible in the sense that neighborhood aggregation degrades the performance due to violation of assumptions made. Though graph adaptation methods mitigate the effect of any violation, they still operate within the field of improving neighborhood aggregation. Therefore, it may be difficult to improve beyond some limit with the neighborhood aggregation restriction. Also, it may only add more computational burden. In this context, we explored the approach of concatenating node features ( (or transformed features, ) and fixed or adapted eigen vectors of , and learning a classifier model. Note that learning the adaptation function using only is possible. Since the features are decoupled now, there is significant reduction in computational cost. Therefore, the Eigen-Concat model is faster to train. We found this simple approach to be competitive in several heterophily benchmark datasets. Therefore, it is a simple but important baseline to include in any work that aim at improving performance on heterophily graphs.

4.4 Model Training and Complexity

Let and

denote the model predicted probability vector and 1-hot binary representation of true class label. We use the following standard cross-entropy based loss function:

where and denote the set of labeled examples and number of classes respectively, and indices and index examples and classes respectively. The model predicted probabilities are computed using Softmax function on classifier model scores, .

Let and denote the number of eigen components used in our model. Assume that the graph adaptive function is parameterized with parameters for each of the component. Then, the number of model parameters needed in the EigenNetwork model (5) is . As explained earlier, we do not optimize in our graph adaptation experiments. Given , the cost of computing in (5) is . Likewise, given , the embedding computing cost per node is . Since (5) requires fresh computation whenever is updated, our method is computationally expensive compared to the GPR-GNN method.

5 Experiments

We validate our proposed models by comparing against several baselines and state-of-the-art heterophily graph networks on node classification task. In Section 5.1 we describe the baseline models and hyper-parameters tuning setup. In Section 5.2 we describe our proposed models implementation details. In Section 5.3, we present our main experimental results on Heterophilic datasets. Although our key focus is on Heterophilic datasets, we present results on Homophilic datasets in Section 5.4. Finally, in Section 5.6, we present analysis and ablative studies.

5.1 Baselines

We provide the methods in comparison along with the hyper-parameters ranges for each model. For all the models, we sweep the common hyper-parameters in same ranges. Learning rate is swept over [0.001, 0.003, 0.005, 0.008, 0.01], dropout over [0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8], weight decay over [1e-4, 5e-4, 1e-3, 5e-3, 1e-2, 5e-2, 1e-1], and hidden dimensions over [16, 32, 64]. For model specific hyper-parameters, we tune over author prescribed ranges. We use undirected graphs with symmetric normalization for all graph networks in comparison. For all models, test accuracy is reported for the configuration that achieves the highest validation accuracy. We report standard deviation wherever applicable.

LR and MLP:

We trained Logistic Regression classifier and Multi Layer Perceptron on the given node features. For MLP, we limit the number of hidden layers to one.

SGCN: SGCN (Wu et al., 2019) is a spectral method that models a low pass filter and uses a linear classifier. The number of layers in SGCN is treated as a hyper-parameter and swept over [1, 2].

SuperGAT: SuperGAT (Kim and Oh, 2021) is an improved graph attention model designed to also work with noisy graphs. SuperGAT  employs a link-prediction based self-supervised task to learn attention on edges. As suggested by the authors, on datasets with homophily levels lower than 0.2 we use SuperGATSD. For other datasets, we use SuperGATMX. We rely on authors code111 for our experiments.

Geom-GCN: Geom-GCN (Pei et al., 2020) proposes a geometric aggregation scheme that can capture structural information of nodes in neighborhoods and also capture long range dependencies. We quote author reported numbers for Geom-GCN. We could not run Geom-GCN on other benchmark datasets because of the unavailability of a pre-processing function that is not publicly available.

H2GCN: H2GCN (Zhu et al., 2020) proposes an architecture, specially for heterophilic settings, that incorporates three design choices: i) ego and neighbor-embedding separation, higher-order neighborhoods, and combining intermediate representations. We quote author reported numbers where available, and sweep over author prescribed hyper-parameters for reporting results on the rest datasets. We rely on author’s code222 for our experiments.

FAGCN: FAGCN (Bo et al., 2021) adaptively aggregates different low-frequency and high-frequency signals from neighbors belonging to same and different classes to learn better node representations. We rely on author’s code333 for our experiments.

APPNP: APPNP (Klicpera et al., 2019) is an improved message propagation scheme derived from personalized PageRank. APPNP’s addition of probability of teleporting back to root node permits it to use more propagation steps without oversmoothing. We use GPR-GNN’s  implementation of APPNP for our experiments.

GPR-GNN: GPR-GNN (Chien et al., 2021) adaptively learns weights to jointly optimize node representations and the level of information to be extracted from graph topology. We rely on author’s code444 for our experiments.

5.2 Implementation Details

In this subsection, we present several important points that are useful for practical implementation of our proposed methods and other experiments related details. The eigendecomposition approach is based on adaptation of eigen graphs constructed using eigen components. Following Kipf and Welling (2017), we use a symmetric normalized version () of adjacency matrix with self-loops: where , and . We work with eigen matrix and eigen values of . From a practical viewpoint, it is difficult to work with all eigen components for two reasons: (a) the method becomes infeasible for large graphs and (b) in many applications, we do not need all eigen components and a fairly small to moderate number of components are sufficient to get excellent performance (See Figure 3). Noting the relation between singular and eigen vectors/values of a symmetric matrix (Golub and Van Loan, 1996), we use top- singular vectors/values of in all our experiments. and consists of top-

singular vectors and singular values respectively. We used

unless otherwise specified, where denotes the numbers of nodes. We provide additional details on our proposed models below.

EigenNetwork. Recall that EigenNetwork  uses only the graph, , to learn embedding and subsequently, the classifier model. Specifically, the score matrix is to be computed as: where is a learnable adaptation function. However, learning this model is quite expensive for large graphs. Therefore, we simplify our model by substitution: = and can learn directly for linear models. This helps to avoid expensive matrix multiplication with . Using the same idea, we simplify our nonlinear model and learn EigenNetwork  embedding as: that is fed to a non-linear network; here, denotes the number of eigen components. Note that this model uses only the topological/graph features and do not leverage the available node features.

Eigen-EigenNetwork. Eigen-EigenNetwork  is defined as where and denote element-wise product operation. Recall that where is the eigen matrix of . and denote vectors of diagonal entries. Since we are interested in graph adaptation primarily in this work, we used fixed (i.e., fixed low dimensional node features). The reduced dimensionality of node features was set to where is the dimension of features in . We next present details regarding choices of adaptation functions.

Adaptation Functions. The adaptation functions can take the form . We experimented with two different adaptation functions.

  1. . Each eigenvalue , is adapted by an individual scaling coefficient , but the exponentiation coefficient is shared across all the eigenvalues. lies in the range . Combined with suitable choice of , different parts of the eigen spectrum can be suppressed or enhanced, as needed for the supervised task.

  2. . Each eigenvalue , is adapted by individual scaling and exponentiation coefficients and respectively. is in the range . This function also has the ability to suppress and enhance different parts of the eigen spectrum. Unlike the above adaptation function, we learn individual exponentiation parameters for each eigenvalue which makes this adaptation function very powerful.

Note that even though that the adaptation function is powerful as it learns an exponentiation coefficient for each eigenvalue, it may be overparameterized for some datasets. This can result in overfitting, especially in limited labelled settings. Whereas, in the adaptation function , we learn a global exponentiation coefficient and this may be insufficient for some datasets. Therefore, the above two functions operate in two extremes. To get the best of both these parameterizations, we propose RegEigen-EigenNetwork.

RegEigen-EigenNetwork. To reduce the number of learnable parameters, RegEigen-EigenNetwork partitions the eigenvalues into several contiguous bins and uses shared parameters for each of the bins. This partitioning is done for both and discussed in the aforementioned adaptation functions. We treat the number of bins as a hyper-parameter and sweep it in the range [10% of the no. of. nodes, 90% of the no. of nodes]. For RegEigen-EigenNetwork models, there is an additional weight regularization applied to and swept in the range [1e-3, 1e3] in logarithmic steps.

In our experiments, we feed embedding outputs of all our networks to a fully-connected non-linear network with a single hidden layer with ReLu activation and Softmax final layer. We observed that using a scaled helps to get improved performance for datasets like Chameleon and Squirrel (with a scaling value, ). We treated the two adaptation functions described above as hyperparameters.

All models use the Adam optimizer Kingma and Ba (2015)

. For our proposed models that involve learning, we set early stopping to 30 and maximum number of epochs to 300. We utilize learning rate with decay, with decay factor set to 0.99 and decay frequency set to 50. All our experiments were performed on a machine with Intel Xeon 2.60Ghz processor, 112GB Ram, Nvidia Tesla P-100 GPU with 16GB of memory, python 3.6, and Tensorflow 1.15

(Abadi et al., 2015). We used Optuna (Akiba et al., 2019) to optimize the hyperparameter search.

5.3 Experiments on Heterophilic Datasets

Dataset Texas Wisconsin Actor Squirrel Chameleon Crocodile Cornell
Homophily level 0.11 0.21 0.22 0.22 0.23 0.26 0.30
#Nodes 183 251 7600 5201 2277 11631 183
#Edges 492 750 37256 222134 38328 191506 478
#Features 1703 1703 932 2089 500 500 1703
#Classes 5 5 5 5 5 6 5
#Train/Val/Test 87/59/37 120/80/51 3648/2432/1520 2496/1664/1041 1092/729/456 120/180/11331 87/59/37
Table 1: Datasets Statistics

Datasets. We evaluate on seven heterophilic datasets to show the effectiveness of our approach. Detailed statistics of the datasets used are provided in Table 1. We borrowed Texas, Cornell, Wisconsin from WebKB555, where nodes represent web pages and edges denote hyperlinks between them. Actor is a co-occurence network borrowed from Tang et al. (2009), where nodes correspond to an actor, and and edge represents the co-occurrence on the same Wikipedia page. Chameleon, Squirrel, and Crocodile are borrowed from Rozemberczki et al. (2021). Nodes correspond to web pages and edges capture mutual links between pages. For all benchmark datasets, we use feature vectors, class labels from Kim and Oh (2021). For datasets in (Texas, Wisconsin, Cornell, Chameleon, Squirrel, Actor), we use 10 random splits (48%/32%/20% of nodes for train/validation/test set) from Pei et al. (2020). For Crocodile, we create 10 random splits following Kim and Oh (2021).

Texas Wisconsin Actor Squirrel Chameleon Crocodile Cornell
LR 81.35 (6.33) 84.12 (4.25) 34.70 (0.89) 34.73 (1.39) 48.25 (2.67) 48.25 (2.67) 83.24 (5.64)
MLP 81.24 (6.35) 84.43 (5.36) 36.06 (1.11) 35.38 (1.38) 51.64 (1.89) 54.47 (1.99) 83.78 (5.80)
SGCN 62.43 (4.43) 55.69 (3.53) 30.44 (0.91) 45.72 (1.55) 60.77 (2.11) 51.54 (1.47) 62.43 (4.90)
GCN 61.62 (6.14) 53.53 (4.73) 30.32 (1.05) 46.04 (1.61) 61.43 (2.70) 52.34 (2.61) 62.97 (5.41)
SuperGAT 61.08 (4.97) 56.47 (3.90) 29.32 (1.00) 31.84 (1.26) 43.22 (1.71) 52.41 (1.92) 57.30 (8.53)
Geom-GCN 67.57* 64.12* 31.63* 38.14* 60.90* NA 60.81*
H2GCN 84.86 (6.77)* 86.67 (4.69)* 35.86 (1.03)* 37.90 (2.02)* 58.40 (2.77) 53.17 (1.21) 82.16 (4.80)*
FAGCN 82.43 (6.89) 82.94 (7.95) 34.87 (1.25) 42.59 (0.79) 55.22 (3.19) 54.35 (1.05) 79.19 (9.79)
APPNP 81.89 (5.85) 85.49 (4.45) 35.93 (1.04) 39.15 (1.88) 47.79 (2.35) 53.13 (1.93) 81.89 (6.25)
GPR-GNN 81.35 (5.32) 82.55 (6.23) 35.16 (0.90) 46.31 (2.46) 62.59 (2.04) 52.71 (1.84) 78.11 (6.55)
Eigen Network Models
EigenNetwork 58.92 (3.78) 53.14 (4.84) 25.37 (0.88) 54.62 (1.5) 67.28 (2.21) 45.54 (2.08) 57.30 (5.10)
Eigen-EigenNetwork 82.70 (6.42) 82.75 (4.79) 35.04 (0.91) 57.11 (1.94) 65.79 (1.16) 54.51 (1.93) 77.30 (6.19)
RegEigen-EigenNetwork 84.05 (5.76) 89.80 (4.22) 34.84 (0.53) 57.61 (1.92) 66.45 (2.77) 55.03 (2.12) 84.86 (4.80)
Eigen-ConcatNetwork 78.11 (3.72) 85.69 (4.81) 34.75 (0.83) 53.66 (1.76) 66.51 (1.21) 54.20 (1.61) 80.81 (6.10)
Table 2: Comparison of various Eigen Networks with Baselines. The results marked with "*" are obtained from the corresponding paper.

We propose the following models i) EigenNetwork: We observe from Table 2 that EigenNetwork models, which only rely on topological information, perform better than baselines on Squirrel and Chameleon datasets. This indicates that graph features by themselves are useful for a few datasets.

ii) Eigen-EigenNetwork: these models extend EigenNetwork models to incorporate aggregation. In our experiments, we restrict adaptation to only the topology by setting to 1. This also allows us to study the effect of graph adaptation. Eigen-EigenNetwork models are powerful than EigenNetwork models mainly because they are also able to churn out information from the feature space. In comparison against baselines, this approach clearly outperforms baseline models. The greatest gains can be noted in Squirrel and Chameleon datasets with accuracy gains of up to 11%. We believe that graph adaptation function is the reason for performance gains as it is able to highlight important signals in the topology that can complement the aggregation. We can empirically observe that this proposed way of aggregation is effective for heterophilic datasets.

iii) RegEigen-EigenNetwork is a regularized version of Eigen-EigenNetwork, which is able to gain further improvements by reducing the number of learning parameters. In specific, we observe that RegEigen-EigenNetwork model consistently outperform across several datasets. We empirically observe that grouping contiguous eigenvalues and learning shared coefficients provides a regularizing effect and improves model’s generalizability. It can be inferred from Table 2 that although several baselines were proposed to address heterophily in graphs, there is no single baseline that consistently achieves good performance across the benchmark datasets.

iv) Eigen-ConcatNetwork: these models deviate from the popular aggregation scheme and offers an effective solution that leverage signals from the given topology and features. For instance, on Wisconsin and Chameleon, these models outperform non-graph based methods and several aggregation methods including GCN, SGCN, SuperGAT and even Geom-GCN. With respect to our proposed models, we make a global observation that graph adaptation persistently improves performance with gains of up to 11% as showcased in Table 6.

5.4 Experiments on Homophilic Datasets

Our paper mainly focuses on heterophilic datasets. However, it is also important to understand the performance of our model on homophilic datasets. Towards that end, we ran experiments on some homophilic datasets. The datasets used and their corresponding statistics are available in Table 3. We borrowed Cora, Citeseer, and Pubmed datasets and the corresponding train/val/test set splits from Pei et al. (2020). The remaining datasets were borrowed from Kim and Oh (2021). We follow the same dataset setup mentioned in Kim and Oh (2021) to create 10 random splits for each of these datasets.

Statistics Flickr Cora-Full Wiki-CS Citeseer Pubmed Cora Computer Photos
Homophily Score 0.32 0.59 0.68 0.74 0.80 0.81 0.81 0.85
Number of Nodes 89250 19793 11701 3327 19717 2708 13752 7650
Number of Edges 989006 83214 302220 12431 108365 13264 259613 126731
Number of Features 500 500 300 3703 500 1433 767 745
Classes 7 70 10 6 3 7 10 8
#Train 44625 1395 580 1596 9463 1192 200 160
#Validation 22312 2049 1769 1065 6310 796 300 240
#Test 22313 16349 5847 666 3944 497 13252 7250
Table 3: Datasets Statistics

Performance on Homophilic v/s Heterophilic Datasets. The performance results of the various baselines and our approach is given in Table 4. We observe that on homophilic datasets, Eigen-EigenNetwork which is an aggregation based model, tends to perform better than Eigen-ConcatNetwork models. This trend is expected; as the homophily levels of the graph increases the discord between the node features (X) and topology (A) reduces, leading to improvements in aggregation methods. Baseline models including GPR-GNN, SuperGAT, FAGCN, and APPNP, which are also aggregation based methods, perform better on homophilic datasets. However, our proposed models are not far off.

Test Acc Cora-Full Wiki-CS Citeseer Pubmed Cora Computer Photos
LR 39.10 (0.43) 72.28 (0.59) 72.22 (1.54) 87.00 (0.40) 73.94 (2.47) 64.92 (2.59) 77.57 (2.29)
MLP 43.03 (0.82) 73.74 (0.71) 73.83 (1.73) 87.77 (0.27) 77.06 (2.16) 64.95 (3.57) 76.96 (2.46)
GCN 45.44 (1.01) 77.64 (0.49) 76.47 (1.33) 87.86 (0.47) 87.28 (1.34) 78.16 (1.85) 86.38 (1.71)
SGCN 61.31 (0.78) 78.30 (0.75) 76.77 (1.52) 88.48 (0.45) 86.96 (0.78) 80.65 (2.78) 89.99 (0.69)
SuperGAT 57.75 (0.97) 77.92 (0.82) 76.58 (1.59) 87.19 (0.50) 86.75 (1.24) 83.04 (1.02) 90.31 (1.22)
GeomGCN NA NA 77.99* 90.05* 85.27* NA NA
H2GCN 57.83 (1.47) OOM 77.07 (1.64)* 89.59 (0.33)* 87.81 (1.35)* OOM 91.17 (0.89)
FAGCN 60.07 (1.43) 79.23 (0.66) 76.80 (1.63) 89.04 (0.50) 88.21 (1.37) 82.16 (1.48) 90.91 (1.11)
GPR-GNN 61.37 (0.96) 79.68 (0.50) 76.84 (1.69) 89.08 (0.39) 87.77 (1.31) 82.38 (1.60) 91.43 (0.89)
APPNP 60.83 (0.55) 79.13 (0.50) 76.86 (1.51) 89.57 (0.53) 88.13 (1.53) 82.03 (2.04) 91.68 (0.62)
Our Models
EigenNetwork 43.93 (1.19) 61.75 (1.31) 65.53 (3.49) 81.07 (0.41) 80.12 (1.54) 64.17 (6.19) 74.79 (3.44)
Eigen-EigenNetwork 56.10 (1.03) 77.96 (0.53) 76.67 (1.83) 89.30 (0.42) 87.10 (1.10) 78.86 (1.86) 88.50 (0.92)
Eigen-ConcatNetwork 47.47 (0.92) 74.13 (0.87) 74.86 (1.90) 88.38 (0.15) 84.43 (1.77) 66.20 (3.33) 76.97 (2.63)
RegEigen-EigenNetwork 58.19 (0.62) 78.31 (0.69) 77.20 (1.36) 89.22 (0.43) 87.17 (1.18) 81.06 (1.80) 89.01 (1.05)
Table 4: Homophily Datasets Results. We bold the the results of the best performing models for each dataset. We underline and italicize the best performing Eigen models for ease of comparison. ’*’ indicates that the results were borrowed from the corresponding papers.

5.5 Experiments on Large Dataset

We additionally performed one large scale dataset experiment on Flickr dataset. We use the publicly available fixed split for this dataset. We show the results of all the models on Flickr in Table 5. We first note that SuperGAT gives the best performance among all baselines followed by GPR-GNN. However, three of our models: EigenNetwork, Eigen-EigenNetwork and RegEigen-EigenNetwork outperform all the baselines. Amongst our models, EigenNetwork gives the best performance with 54.4% test accuracy. Another thing to note here is that Eigen-ConcatNetwork model is not far behind the other EigenNetwork models and do better than APPNP and SGCN. This suggests that concatenation based models can offer effective alternative for aggregation based approaches.

Model Test Acc Model Test Acc Model Test Acc
LR 46.51 SuperGAT 53.47 EigenNetwork 54.4
MLP 46.93 GeomGCN NA Eigen-EigenNetwork 53.78
GCN 53.4 H2GCN OOM Eigen-ConcatNetwork 51.78
SGCN 50.75 FAGCN OOM RegEigen-EigenNetwork 53.83
GPR-GNN 52.74
APPNP 50.33
Table 5: Results on Flickr Dataset. We report Test Accuracy on the standard split for all the models.

5.6 Analysis

Effect of Adaptation: We carry out an ablation study in Table 6 by freezing the original eigenvalues. We observe that adaptation helps in most of the datasets for all the proposed EigenNetwork variants. There is a significant improvement for Eigen-EigenNetwork over the freezed variant in datasets like Texas, Wisconsin and Cornell. As observed in Section 3.2, there exists different weightings for the eigenvalues which are more useful for the supervised task. Additionally in Figure 2, we plotted the ratio of the adapted eigenvalues to that of the original eigenvalues of the graph. We see that different regions of eigenvalues are given high weights as seen in Crocodile and Squirrel plots. It might be difficult for models like GPR-GNN to learn such behaviour with a polynomial. Our proposed approaches are able to learn such behavior, which is also reflected in the performance where we gain 3% over GPR-GNN on Crocodile and up to 11% on Squirrel.

(a) Crocodile
(b) Chameleon
(c) Squirrel
Figure 2: We plot the ratio of the tuned eigenvalues to the original eigenvalues of the graph.
Texas Wisconsin Actor Squirrel Chameleon Crocodile Cornell
EigenNetwork  w/o Adaptation 56.76 (4.83) 50.78 (4.92) 25.24 (0.84) 53.67 (1.4) 65.61 (1.63) 44.96 (1.78) 55.14 (7.47)
EigenNetwork 58.92 (3.78) 53.14 (4.84) 25.37 (0.88) 54.62 (1.5) 67.28 (2.21) 45.54 (2.08) 57.30 (5.10)
Eigen-ConcatNetwork  w/o adaptation 78.38 (5.92) 84.9 (5.12) 34.93 (0.63) 47.57 (1.75) 63.79 (2.14) 54.42 (1.62) 81.89 (5.41)
Eigen-ConcatNetwork 78.11 (3.72) 85.69 (4.81) 34.75 (0.83) 53.66 (1.76) 66.51 (1.21) 54.20 (1.61) 80.81 (6.10)
Eigen-EigenNetwork  w/o adaptation 65.41 (6.71) 69.41 (4.04) 27.66 (0.78) 56.01 (1.48) 60.29 (2.54) 53.89 (0.96) 67.03 (4.49)
Eigen-EigenNetwork 82.70 (6.42) 82.75 (4.79) 35.04 (0.91) 57.11 (1.94) 65.79 (1.16) 54.51 (1.93) 77.30 (6.19)
Table 6: Effect of Eigen Value Adaptation

Node and Graph Features: We conduct a study to answer the question: Are both node and graph features helpful and needed? We experimented using three different models, fully connected multi-layer nonlinear network (MLP) with a single hidden layer with hidden dimensions swept in the range [16,32,64], EigenNetwork and Eigen-ConcatNetwork. MLP uses only the node features, EigenNetwork  uses only the graph/topological information. Eigen-ConcatNetwork  leverages both topological and node features. In Table 7, we observe across several datasets that Eigen-ConcatNetwork  model performs better than either of the individual models. We see  4% improvement in Cora-Full and Cora datasets over the best performing individual model (EigenNetwork). This highlights the fact that both these sources are helpful and useful to build simple and efficient Eigen-ConcatNetwork  models.

Test Acc Cora-Full Wiki-CS Citeseer Pubmed Cora Computer Photos
MLP 43.03 (0.82) 73.74 (0.71) 73.83 (1.73) 87.77 (0.27) 77.06 (2.16) 64.95 (3.57) 76.96 (2.46)
EigenNetwork 43.93 (1.19) 61.75 (1.31) 65.53 (3.49) 81.07 (0.41) 80.12 (1.54) 64.17 (6.19) 74.79 (3.44)
Eigen-ConcatNetwork 47.47 (0.92) 74.13 (0.87) 74.86 (1.90) 88.38 (0.15) 84.43 (1.77) 66.20 (3.33) 76.97 (2.63)
Table 7: Complementary Information: Node and Graph Features

Using Node Features with Aggregation Models: Recall that GPR-GNN  propagation model uses the graph operator, . We observe that the monomial corresponding to , is independent of neighborhood aggregation using powers of (). This could be a very useful signal to include, in particular, when the node features are of high quality and possess relatively strong discriminative power. Eigen-EigenNetwork  models do not consume the node features directly, but, can be easily included. In Table 8, we report performance numbers on a few datasets, with and without explicitly augmenting Order-0 monomial to Eigen-EigenNetwork. We see  7.5% and  5% improvement in Actor and Pubmed. However, this improvement was not observed on several other datasets.

Eigen-EigenNetwork Actor Pubmed
Without Order-0 Monomial 27.51 (0.91) 84.24 (0.50)
With Order-0 Monomial 35.04 (0.91) 89.30 (0.42)
Table 8: Effect of Order-0 Monomial on selected datasets.
(a) Chameleon
(b) Squirrel
(c) Cora
(d) Cora-Full
Figure 3: Varying Eigen Components Plots

Are EigenNetwork models lacking on Homophilic Datasets?. Neighborhood aggregation based methods work with directly. It means that they have access to the entire eigenvalue spectrum. For example, in GCN when we multiply with , we get


thereby leveraging the entire eigenvalue spectrum. However, in our EigenNetwork models, we restrict the number of components to only top few eigen components. We believe this may be a cause for the gap in performance on homophilic datasets. To study this, we do the following experiment - we vary the number of eigen components and observe the test performance. The experimental results are given in Figure 3. On heterophilic datasets, Chameleon and Squirrel, we do not see any observable trend in performance. However, on homophilic datasets, Cora and Cora-Full, the performance does go up with the number of components. However, it also seems to drop occasionally. For example, we observe that in Cora-Full, the test accuracy dropped by 2% going from 2048 to 4096 components. This suggests that while number of components may have a role to play in the lower performance on homophilic datasets, that alone may not be the only contributor. This requires further investigation and we plan to make it a part of our future work.

Utility of weight-tying in limited labeled data setting: To analyze the importance of this regularization, we study the effect of varying the training set size for Eigen-EigenNetwork and RegEigen-EigenNetwork and plot them in Figure 4. On Squirrel, we observe that even with 20% labeled data, RegEigen-EigenNetwork as good as having 48% labeled data. On Chameleon, we observe something even more interesting. The base Eigen-EigenNetwork model does not improve much in performance even with increased labeled data. However, the RegEigen-EigenNetwork model continue to improve in performance with increased labeled data. Weight-tying is dependent on finding good partitions of eigenvalues which is currently tuned as a hyperparameter. We believe that in Chameleon, with increased labeled data, the model does better on validation set and thus is able to find even better partitions of the eigenvalues leading to improved performance.

(a) Chameleon
(b) Squirrel
Figure 4: Varying Label Plots
Homophily Rank Heterophily Rank
LR 11.25 7.29
MLP 10.75 5.29
GCN 7.88 8.71
SGCN 6.13 8.57
SuperGAT 6.25 10.71
H2GCN 6.88 4.43 ()
FAGCN 4.38 5.86
GPR-GNN 2.88 () 5.57
APPNP 3.50 () 5.86
EigenNetwork 10.75 9.00
Eigen-EigenNetwork 5.88 4.29 ()
RegEigen-EigenNetwork 4.00 () 2.00 ()
Table 9: Average Ranking of Models on Homophilic and Heterophilic datasets.

Model Comparison: Varying levels of Homophily: We group all the datasets with homophily score 0.50 and refer to them as heterophilic datasets. The rest datasets are referred to as homophilic datasets. For all the models in comparison, we compute their rank across datasets and report the average rank on homophlilic and heterophilic datasets in Table 9. We used the heterophily datasets result table (Table 2) and Table 4 to compute the ranks.

We further group the models for ease of comparison. We observe that the simple non-graph based models, linear and nonlinear networks that use only node features perform reasonably well on heterophilic datasets. The second class of models like SGCN and GCN belong to popular neighborhood aggregation based methods. We see that these models perform better on homophilic datasets, as expected. However, on heterophilic datasets, they perform poor because of the violation of the assumption: connected neighbors have same class labels.

The third group of models including SuperGAT, GPR-GNN, FAGCN, and APPNP were specifically designed to work across datasets with varying levels of homophily. We notice that SuperGAT performs poorly on heterophilic datasets. We believe that attention trained on auxiliary task alone may not be sufficient to address heterophily. A detailed investigation is required to understand the gaps, and is beyond the scope of this work. However, the other models in this group - GPR-GNN, FAGCN and APPNP have better average ranking compared to common GNN methods like GCN and SGCN. For homophilic datasets, we observe that models like APPNP and GPR-GNN perform the best. We noticed in GPR-GNN paper that they perform better than APPNP on several datasets. However, we see APPNP has an edge over GPR-GNN in our experiments. We believe this is the case because of amount of labeled data and different splits used in our experiments. In specific, GPR-GNN reports numbers in their paper using 60% training data for heterophilic datasets, while we follow Pei et al. (2020) and use 48% training data.

The last set in Table 9 corresponds to proposed models. We observe that EigenNetwork model does not perform well on several datasets. This is not surprising because EigenNetwork  solely relies on topological features for solving the task at hand. However, the merits of this model can be observed on datasets like Squirrel and Chameleon. Therefore, EigenNetwork  is still useful in some scenarios. Eigen-EigenNetwork  makes use of node features with neighborhood aggregation and performs significantly better compared to EigenNetwork  and other baselines. RegEigen-EigenNetwork offers the best performance on heterophilic datasets and does competitively on homophilic datasets.

6 Conclusion and Future Work

In this paper, we presented an eigendecomposition based approach and proposed the EigenNetwork models. These models are inspired by the GPR-GNN Chien et al. (2021) model which we show can be interpreted as selecting/weighing the eigenvectors by scaling the corresponding eigenvalues. We propose a weight tying based regularization model, that enables our model to avoid overfitting on the data and generalize better. We show that our models do well across all heterophilic datasets. We plan to study the optimization of variable in Eigen-EigenNetwork model and behaviour of this model on homophily datasets as part of our future work. In this paper, we also propose an alternative concatenation based model that is competitive with aggregation based approach on heterophilic datasets. This model is simple and computationally cheaper. It begs the question whether there are alternative ways to model Graph Neural Networks that work across varying homophily scores. We leave it as a future work.


  • Abadi et al. (2015) Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S. Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Ian Goodfellow, Andrew Harp, Geoffrey Irving, Michael Isard, Yangqing Jia, Rafal Jozefowicz, Lukasz Kaiser, Manjunath Kudlur, Josh Levenberg, Dandelion Mané, Rajat Monga, Sherry Moore, Derek Murray, Chris Olah, Mike Schuster, Jonathon Shlens, Benoit Steiner, Ilya Sutskever, Kunal Talwar, Paul Tucker, Vincent Vanhoucke, Vijay Vasudevan, Fernanda Viégas, Oriol Vinyals, Pete Warden, Martin Wattenberg, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng.

    TensorFlow: Large-scale machine learning on heterogeneous systems, 2015.

    URL Software available from
  • Abu-El-Haija et al. (2019a) Sami Abu-El-Haija, Bryan Perozzi, Amol Kapoor, Hrayr Harutyunyan, Nazanin Alipourfard, Kristina Lerman, Greg Ver Steeg, and Aram Galstyan. Mixhop: Higher-order graph convolution architectures via sparsified neighborhood mixing. In International Conference on Machine Learning (ICML), 2019a.
  • Abu-El-Haija et al. (2019b) Sami Abu-El-Haija, Bryan Perozzi, Amol Kapoor, and Joonseok Lee. N-gcn: Multi-scale graph convolutionfor semi-supervised node classification. In

    Conference on Uncertainty in Artificial Intelligence (UAI)

    , 2019b.
  • Akiba et al. (2019) Takuya Akiba, Shotaro Sano, Toshihiko Yanase, Takeru Ohta, and Masanori Koyama. Optuna: A next-generation hyperparameter optimization framework. ArXiv, abs/1907.10902, 2019.
  • Bach et al. (2017) Stephen H. Bach, Matthias Broecheler, Bert Huang, and Lise Getoor. Hinge-loss markov random fields and probabilistic soft logic. Journal of Machine Learning Research (JMLR), 2017.
  • Bo et al. (2021) Deyu Bo, X. Wang, Chuan Shi, and Hua-Wei Shen. Beyond low-frequency information in graph convolutional networks. In Association for the Advancement of Artificial Intelligence (AAAI), 2021.
  • Bruna et al. (2014) Joan Bruna, Wojciech Zaremba, Arthur Szlam, and Yann LeCun. Spectral networks and locally connected networks on graphs. In International Conference on Learning Representations (ICLR), 2014.
  • Chien et al. (2021) Eli Chien, Jianhao Peng, Pan Li, and Olgica Milenkovic. Adaptive universal generalized pagerank graph neural network. In International Conference on Learning Representations (ICLR), 2021.
  • Defferrard et al. (2016) Michaël Defferrard, Xavier Bresson, and Pierre Vandergheynst. Convolutional neural networks on graphs with fast localized spectral filtering. In Neural Information Processing Systems (NeurIPS), 2016.
  • Domingos and Lowd (2009) Pedro Domingos and Daniel Lowd. Markov Logic: An Interface Layer for Artificial Intelligence. Morgan & Claypool, 2009.
  • Getoor and Taskar (2007) Lise Getoor and Ben Taskar. Introduction to Statistical Relational Learning. MIT Press, 2007.
  • Golub and Van Loan (1996) Gene H. Golub and Charles F. Van Loan. Matrix Computations (3rd Ed.). Johns Hopkins University Press, USA, 1996. ISBN 0801854148.
  • Hamilton et al. (2017) William L. Hamilton, Rex Ying, and Jure Leskovec. Inductive representation learning on large graphs. In Neural Information Processing Systems (NeurIPS), 2017.
  • Kim and Oh (2021) Dongkwan Kim and Alice Oh. How to find your friendly neighborhood: Graph attention design with self-supervision. In International Conference on Learning Representations (ICLR), 2021.
  • Kingma and Ba (2015) Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization. In International Conference on Learning Representations (ICLR), 2015.
  • Kipf and Welling (2017) Thomas N. Kipf and Max Welling. Semi-supervised classification with graph convolutional networks. In International Conference on Learning Representations (ICLR), 2017.
  • Klicpera et al. (2019) Johannes Klicpera, Aleksandar Bojchevski, and Stephan Günnemann. Combining neural networks with personalized pagerank for classification on graphs. In International Conference on Learning Representations (ICLR), 2019.
  • Lecun et al. (1998) Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998. doi: 10.1109/5.726791.
  • Li et al. (2018) Qimai Li, Zhichao Han, and Xiao-Ming Wu.

    Deeper insights into graph convolutional networks for semi-supervised learning.

    In Association for the Advancement of Artificial Intelligence (AAAI), 2018.
  • McPherson et al. (2001) Miller McPherson, Lynn Smith-Lovin, and James M Cook. Birds of a feather: Homophily in social networks. Annual Review of Sociology, 2001.
  • Pei et al. (2020) Hongbin Pei, Bingzhe Wei, Kevin Chen-Chuan Chang, Yu Lei, and Bo Yang. Geom-gcn: Geometric graph convolutional networks. In International Conference on Learning Representations (ICLR), 2020.
  • Rozemberczki et al. (2021) Benedek Rozemberczki, Carl Allen, and Rik Sarkar. Multi-scale attributed node embedding. Journal of Complex Networks, 2021.
  • Tang et al. (2009) Jie Tang, Jimeng Sun, Chi Wang, and Zi Yang. Social influence analysis in large-scale networks. In ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), 2009.
  • Vaswani et al. (2017) Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. Attention is all you need. In Neural Information Processing Systems (NeurIPS), 2017.
  • Veličković et al. (2018) Petar Veličković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Liò, and Yoshua Bengio. Graph Attention Networks. In International Conference on Learning Representations (ICLR), 2018.
  • Wu et al. (2019) Felix Wu, Amauri Souza, Tianyi Zhang, Christopher Fifty, Tao Yu, and Kilian Weinberger. Simplifying graph convolutional networks. In International Conference on Machine Learning (ICML), 2019.
  • Zhu et al. (2020) Jiong Zhu, Yujun Yan, Lingxiao Zhao, Mark Heimann, Leman Akoglu, and Danai Koutra. Beyond homophily in graph neural networks: Current limitations and effective designs. In Neural Information Processing Systems (NeurIPS), 2020.
  • Zhu et al. (2021) Jiong Zhu, Ryan A. Rossi, Anup Rao, Tung Mai, Nedim Lipka, Nesreen K. Ahmed, and Danai Koutra. Graph Neural Networks with Heterophily. In Association for the Advancement of Artificial Intelligence (AAAI), 2021.