1 Introduction
Homophily (McPherson et al., 2001) is a principle in sociology that suggests that similarity breeds connections in real life. In the context of semisupervised classification, this implies that nodes with similar labels are likely to be connected. Several realworld networks exhibit homophily; for example, people on a social network connect based on similar interests. Several realworld networks exhibit the opposite as well, which is heterophily. For example, the Wikipedia page on homophily is linked to other pages from sociology and connected to various pages from mathematics, graph theory, and statistics.
Graph Neural Networks (GNNs) (Kipf and Welling, 2017; Hamilton et al., 2017; Veličković et al., 2018) leverage network information along with node features to improve their semisupervised classification performance. GNNs are primarily dependent on network homophily to be able to give improved performance. For heterophilic networks, their performance can degrade significantly. Several approaches have been proposed in the literature to mitigate this degradation in performance in the presence of heterophily. Pei et al. (2020) aggregates both over the graph neighbourhood and the neighbours in the latent space. However, the neighbours still influence the selfembedding of the central node and could bring in noise. Zhu et al. (2020) keeps the selfembedding separate from the neighbour embeddings during aggregation, while also similarly incorporating higherorder neighbour embeddings. Kim and Oh (2021)
proposed several simple attention models trained on an additional auxiliary task.
Bo et al. (2021) proposes to learn an attention mechanism that captures the proportion of lowfrequency and highfrequency signals per edge. Chien et al. (2021) proposes an adaptive polynomial filter to pick up which lowfrequency or highfrequency signals are helpful for the task. There have also been other approaches involving labellabel compatibility matrix (Zhu et al., 2021).For homophilic networks, existing models (Klicpera et al., 2019; Chien et al., 2021) already prove to be excellent. Our interest lies in heterophilic networks. We mainly focus on the class of methods that aim at modifying or adapting the graph to obtain better performance (Kim and Oh, 2021; Bo et al., 2021; Chien et al., 2021) in heterophilic networks. The more recent among these approaches adjust the eigenvalues of the graph to learn improved representation. Another interpretation for this adaptation is that it is suppressing some eigenvectors while accentuating others. We use this insight and propose simple yet effective methods of weighting the eigenvectors to improve task performance. We make the following contributions in this work:

We present a simple eigendecomposition based approach and propose EigenNetwork models to learn flexible graph adaptation functions. We show that our models achieve significantly improved performance where the graphs are heterophilic.

While most GNNs are aggregationbased models, we propose a simple and efficient concatenation model that is quite competitive with neighborhood aggregation models on several datasets.

Finally, we propose a weighttying based regularization to learn better adaptation functions to avoid any possible overfitting, when learning from limited data.

We conduct extensive experimentation and our approach achieves up to 11% improvement in performance over the stateoftheart methods on heterophilic graphs.
In the following sections, we discuss related works (Section 2). We motivate our work with the recently proposed model (Chien et al., 2021) in Section 3. We give details about our proposed approach in Section 4. Finally, we give our experiment results, ablative studies and conclusion in Sections 5 and 6.
2 Related Works
In recent times, Graph Neural Networks (GNNs) have become an increasingly popular method for semisupervised classification with graphs. Bruna et al. (2014) set the stage for early GNN models followed by various modifications (Defferrard et al., 2016; Kipf and Welling, 2017). GCN (Kipf and Welling, 2017) provided the fastest and simplest variant, where the convolution operation reduces to aggregating features over the neighbourhood. Improving the aggregation mechanism (Hamilton et al., 2017; Veličković et al., 2018) and incorporating random walk information (AbuElHaija et al., 2019b, a; Li et al., 2018) gave further improvements in these models, but they still suffered from over smoothing. To circumvent this problem, APPNP (Klicpera et al., 2019) proposed an approach derived from personalized Pagerank.
Most of the development in the GNNs were for homophilic graphs, and they performed poorly in heterophily setting. One of the early works to address heterophily in GNNs was GeomGCN (Pei et al., 2020). They identified two key weaknesses in GNNs in the context of heterophily. Firstly, since the aggregation over the neighbourhood is permutationinvariant, it is difficult to identify which neighbours contribute positively and negatively to the final performance. Secondly, longrange information is difficult to aggregate. To mitigate these issues, they proposed aggregating over two sets of the neighbourhood  one from the graph and the other inferred in the latent space. H_{2}GCN (Zhu et al., 2020) proposed to separate the selfembeddings from neighbour embeddings. To avoid mixing of information, they concatenate selfembeddings and neighbour embeddings instead of aggregating them. Higherorder neighbourhood embeddings are similarly combined to capture longrange information.
Recent approaches address the weaknesses by adapting the graph itself. SuperGAT (Kim and Oh, 2021) gave several simple attention models trained on the classification and an additional auxiliary task. They suggest that these attention models can improve model performance across several graphs with varying homophily scores. FAGCN (Bo et al., 2021) uses the attention mechanism and learns the weight of an edge as the difference in the proportion of lowfrequency and highfrequency signals. They empirically show that negative edgeweights identify edges that connect nodes with different labels. GPRGNN (Chien et al., 2021) takes the idea proposed in APPNP and generalizes the Pagerank model that works well for graphs with varying homophily scores. Our proposed approach is closely related to these methods that adapt graph for the task at hand. Mainly, we take inspiration from the GPRGNN. We show that GPRGNN effectively adapts the eigenvalues of the graph for the desired task via the polynomial with learnable coefficients (Section 4). We propose to replace this polynomial with a graph adaptation function, which allows our eigengraph based network model (EigenNetwork) to learn any sharp changes in the importance of eigenvectors. It enables our model to give better performance on several datasets. Additionally, we observe that instead of aggregating features over the graph, we can get competitive performance with simpler models in some datasets if we concatenate our adapted graph with the features.
In the following section, we give a brief overview of the problem setting and GPRGNN model focussing on the key elements and ideas to motivate our work.
3 Problem Setup and Motivation
We focus on the problem of semisupervised node classification on a simple graph , where is the set of vertices and is the set of edges. Let be the adjacency matrix associated with , where is the number of nodes. Let be the set of all possible class labels. Let be the dimensional feature matrix for all the nodes in the graph. Given a training set of nodes whose labels are known, along with and , our goal is to predict the labels of the remaining nodes. The proportion of edges that connect two nodes with the same labels in a graph is called the homophily score of the graph. In our problem, we are particularly concerned with graphs that exhibit low homophily scores. In the next subsection, we provide background material on the GPRGNN modelling method and its approach to graph adaptation.
3.1 GprGnn Model
The GPRGNN (Chien et al., 2021) model consists of two core components: (a) a nonlinear network that transforms raw feature input : and (b) a generalized page ranking (GPR) component, , that essentially aggregates the transformed output recursively as: . Notice that there is no nonlinear operation involved after each aggregation step over . Therefore, the functionality of the GPR component can be written using an operator defined as: and we obtain aggregated node embedding by applying on the nonlinear network output: . Using eigendecomposition, , Chien et al. (2021) presented an interpretation that the GPR component essentially performs a graph filtering operation: where is a polynomial graph filter applied elementwise and where is the eigen value. As explained in Chien et al. (2021), learning filter coefficients (i.e.,) help to get improved performance. Since the coefficients can take negative values the GPRGNN model is able to capture highfrequency components of the graph signals, enabling the model to achieve improved performance on heterophilic graphs. In the following subsection, we analyze the importance of eigenvectors in heterophilic graphs.
3.2 Eigenvector Analysis
We conduct the following experiment to study why adapting eigenvalues is important in heterophilic settings empirically. We consider the eigendecomposition of given graph as and we assign the node features as . The columns in this feature matrix are essentially the eigenvectors. We construct new training data with these features using the known labels from training set
. We train a Logistic Regression model with this data. Figure
1 plots the perclass (indicated by different colors) weights learnt by the model. The xaxis is the index of the eigenvalues sorted in descending order. The yaxis denotes the weight assigned to the corresponding eigenvector. These weights give the importance of eigenvectors. These plots show the spread of useful signals across the spectrum. For instance, the Chameleon and Crocodile dataset exhibit a dumbbelllike distribution. Both lowfrequency and highfrequency components have high (absolute) weights suggesting that selecting/weighing eigenvectors is the key. Correctly selecting/weighing them will give us a good performance on heterophilic graphs. In the next section, we give our proposed approach for adapting the graph based on the analysis presented here.4 Proposed Approach: Eigen Network Models
In this section, we present an alternate interpretation of the GPRGNN model and suggest a simple eigendecomposition based graph adaptation approach. We present several model variants, each one of them is motivated by considering different aspects of the problem.
4.1 Eigen Network Model
We start by observing closely the GPR component output given by:
(1) 
Our first observation is that learning filter coefficients is equivalent to learning a new graph, which is dependent on the fixed set of eigenvectors and eigenvalues, but, parameterised using . Therefore, the GPRGNN model may be interpreted as adapting the original adjacency matrix . Next, as noted in the previous section, using the structures present in eigendecomposition and polynomial function, we can expand (1) by unrolling over eigenvalues and interchanging the summation as:
(2) 
Since is dependent only on our proposal is to replace the polynomial function with any general smooth function, , which need not be a polynomial and we call as graph adaptation function. We discuss several choices of such functions shortly. We rewrite (2) in matrix form, after substituting the graph adaptation matrix and leaving out the input embedding () as:
(3) 
We refer to (3) as EigenNetwork as it involves eigenvectors and learnable graph adaptation function that is dependent on eigenvalues. This network forms the basis of our eigendecomposition based modeling approach. Note that (3) is essentially a single layer network.
Choice of Graph Adaptation Function. We use graph adaptation function that is a nonnegative function of eigenvalue. One general function is:
where we use subscripts to differentiate scaling and exponent activation functions. We find that
is useful for both scaling and exponentiation, and is also useful for scaling purpose. Composition with two functions is quite flexible and helps to adapt for graphs having diverse eigenvalue decay rates. Note that we can recover original eigenvalues for suitable choices of and. There may be other choices of graph adaptive functions that perform better. From a practical viewpoint, several functions can be evaluated with the best function selected using traditional hyperparameter optimization strategy.
Regularization of Graph Adaptation Function. While it is possible to use separate set of parameters for each eigenvalue, the number of model parameters can go up significantly, depending on the number of eigenvectors. We can mitigate this problem, for example, by using same parameter, for several eigenvalues. Weight tying is a popular mechanism of regularization used in CNN (Lecun et al., 1998), statistical relational learning (Getoor and Taskar, 2007), Markov logic networks (Domingos and Lowd, 2009), probabilistic soft logic (Bach et al., 2017)
(Vaswani et al., 2017) etc. In Figure 1, we observe that nearby eigenvalues tend to have similar importance. Based on this observation, we divide the sorted set of eigenvalues in fixedlength bins and assign one variable to each bin. We call this model as RegEigenEigenNetwork. This allows us to reduce the number of learnable parameters in the model and offer a simple but effective regularization.Computational Aspect.
One main difficulty arises when the graph adaptation parameters are jointly learned with the nonlinear feature transformation and classifier model weights. This is because computing node embedding (
1) using (3) to transform input features (i.e., ) involves dense matrix multiplications and is expensive. We can reduce the computational cost in several ways: (1) We use raw input features . This helps to precompute projected features, . The other possibility is to use pretrained feature embedding and use . (2) It also helps to reduce the dimension of input features whenever the dimension of raw features or pretrained embedding is high. We provide more details in Section 4.2. (3) Since we are adapting the eigenvalues we may not need a large number of eigenvectors to get good performance, as validated in our experiments.Remarks. We observe that the learned graph is nothing but and is dense with all nodes in the graph are essentially used to learn node embedding. From the experimental study presented in Section 3.2 we see that the adaptation function required to improve performance can be quite complex for heterophily graphs. Therefore, GPRGNN that uses polynomial functions to learn the adaptation function may not be able to adapt the graph effectively in some situations. On the other hand, our approach has the ability to learn more complex adaptation function and is expected to perform better. It is worth noting that the necessity of negative edges has been highlighted in Bo et al. (2021) and Chien et al. (2021) using graph filtering concept. Bo et al. (2021) use attention mechanism to learn negative edges. On the other hand, Chien et al. (2021) uses polynomial function with negative weights to obtain negative edges. In contrast, our approach to learn negative edges is quite different. It emerges from adapting eigengraphs () starting with good initialization obtained from approximating the graph using eigendecomposition. Our experimental results show that the proposed approach is highly powerful and outperforms both FAGCN and GPRGNN on several heterophily datasets.
4.2 EigenEigen Network Model
It is often useful to reduce dimension of raw features using principal component analysis. Let
be the eigendecomposition of . Using this decomposition, we define parameterised node embedding for as: . Upon substituting node embedding for raw features in the EigenNetwork model (3), we get:(4) 
where . We refer (4) as EigenEigenNetwork as it involves eigenvectors of both and , and, involves learning weights for eigenvectors. Making use of the fact that and are diagonal, we rewrite (4) as:
(5) 
where and denote elementwise product operation. and
denote vectors of diagonal entries. Model learning involves learning
using labeled data by optimizing crossentropy loss function. In this paper, we are primarily interested in adapting the graph. Therefore, we optimize only in (5), keeping fixed to eigen values. We leave conducting an experimental study with optimization as a future work.4.3 EigenConcat Models
We suggest a simple alternate modeling approach that works quite well for heterophilic graphs. The motivation for this approach is twofold. While conducting the empirical study discussed in Section 3.2, we found that eigenvectors of alone can give good performance for heterophily graphs and neighborhood aggregation using traditional methods only degrade the performance. Furthermore, difficulties arise when and are incompatible in the sense that neighborhood aggregation degrades the performance due to violation of assumptions made. Though graph adaptation methods mitigate the effect of any violation, they still operate within the field of improving neighborhood aggregation. Therefore, it may be difficult to improve beyond some limit with the neighborhood aggregation restriction. Also, it may only add more computational burden. In this context, we explored the approach of concatenating node features ( (or transformed features, ) and fixed or adapted eigen vectors of , and learning a classifier model. Note that learning the adaptation function using only is possible. Since the features are decoupled now, there is significant reduction in computational cost. Therefore, the EigenConcat model is faster to train. We found this simple approach to be competitive in several heterophily benchmark datasets. Therefore, it is a simple but important baseline to include in any work that aim at improving performance on heterophily graphs.
4.4 Model Training and Complexity
Let and
denote the model predicted probability vector and 1hot binary representation of true class label. We use the following standard crossentropy based loss function:
where and denote the set of labeled examples and number of classes respectively, and indices and index examples and classes respectively. The model predicted probabilities are computed using Softmax function on classifier model scores, .
Let and denote the number of eigen components used in our model. Assume that the graph adaptive function is parameterized with parameters for each of the component. Then, the number of model parameters needed in the EigenNetwork model (5) is . As explained earlier, we do not optimize in our graph adaptation experiments. Given , the cost of computing in (5) is . Likewise, given , the embedding computing cost per node is . Since (5) requires fresh computation whenever is updated, our method is computationally expensive compared to the GPRGNN method.
5 Experiments
We validate our proposed models by comparing against several baselines and stateoftheart heterophily graph networks on node classification task. In Section 5.1 we describe the baseline models and hyperparameters tuning setup. In Section 5.2 we describe our proposed models implementation details. In Section 5.3, we present our main experimental results on Heterophilic datasets. Although our key focus is on Heterophilic datasets, we present results on Homophilic datasets in Section 5.4. Finally, in Section 5.6, we present analysis and ablative studies.
5.1 Baselines
We provide the methods in comparison along with the hyperparameters ranges for each model. For all the models, we sweep the common hyperparameters in same ranges. Learning rate is swept over [0.001, 0.003, 0.005, 0.008, 0.01], dropout over [0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8], weight decay over [1e4, 5e4, 1e3, 5e3, 1e2, 5e2, 1e1], and hidden dimensions over [16, 32, 64]. For model specific hyperparameters, we tune over author prescribed ranges. We use undirected graphs with symmetric normalization for all graph networks in comparison. For all models, test accuracy is reported for the configuration that achieves the highest validation accuracy. We report standard deviation wherever applicable.
LR and MLP:
We trained Logistic Regression classifier and Multi Layer Perceptron on the given node features. For MLP, we limit the number of hidden layers to one.
SGCN: SGCN (Wu et al., 2019) is a spectral method that models a low pass filter and uses a linear classifier. The number of layers in SGCN is treated as a hyperparameter and swept over [1, 2].
SuperGAT: SuperGAT (Kim and Oh, 2021) is an improved graph attention model designed to also work with noisy graphs. SuperGAT employs a linkprediction based selfsupervised task to learn attention on edges. As suggested by the authors, on datasets with homophily levels lower than 0.2 we use SuperGAT_{SD}. For other datasets, we use SuperGAT_{MX}. We rely on authors code^{1}^{1}1https://github.com/dongkwankim/SuperGAT for our experiments.
GeomGCN: GeomGCN (Pei et al., 2020) proposes a geometric aggregation scheme that can capture structural information of nodes in neighborhoods and also capture long range dependencies. We quote author reported numbers for GeomGCN. We could not run GeomGCN on other benchmark datasets because of the unavailability of a preprocessing function that is not publicly available.
H_{2}GCN: H_{2}GCN (Zhu et al., 2020) proposes an architecture, specially for heterophilic settings, that incorporates three design choices: i) ego and neighborembedding separation, higherorder neighborhoods, and combining intermediate representations. We quote author reported numbers where available, and sweep over author prescribed hyperparameters for reporting results on the rest datasets. We rely on author’s code^{2}^{2}2https://github.com/GemsLab/H2GCN for our experiments.
FAGCN: FAGCN (Bo et al., 2021) adaptively aggregates different lowfrequency and highfrequency signals from neighbors belonging to same and different classes to learn better node representations. We rely on author’s code^{3}^{3}3https://github.com/bdy9527/FAGCN for our experiments.
APPNP: APPNP (Klicpera et al., 2019) is an improved message propagation scheme derived from personalized PageRank. APPNP’s addition of probability of teleporting back to root node permits it to use more propagation steps without oversmoothing. We use GPRGNN’s implementation of APPNP for our experiments.
GPRGNN: GPRGNN (Chien et al., 2021) adaptively learns weights to jointly optimize node representations and the level of information to be extracted from graph topology. We rely on author’s code^{4}^{4}4https://github.com/jianhao2016/GPRGNN for our experiments.
5.2 Implementation Details
In this subsection, we present several important points that are useful for practical implementation of our proposed methods and other experiments related details. The eigendecomposition approach is based on adaptation of eigen graphs constructed using eigen components. Following Kipf and Welling (2017), we use a symmetric normalized version () of adjacency matrix with selfloops: where , and . We work with eigen matrix and eigen values of . From a practical viewpoint, it is difficult to work with all eigen components for two reasons: (a) the method becomes infeasible for large graphs and (b) in many applications, we do not need all eigen components and a fairly small to moderate number of components are sufficient to get excellent performance (See Figure 3). Noting the relation between singular and eigen vectors/values of a symmetric matrix (Golub and Van Loan, 1996), we use top singular vectors/values of in all our experiments. and consists of top
singular vectors and singular values respectively. We used
unless otherwise specified, where denotes the numbers of nodes. We provide additional details on our proposed models below.EigenNetwork. Recall that EigenNetwork uses only the graph, , to learn embedding and subsequently, the classifier model. Specifically, the score matrix is to be computed as: where is a learnable adaptation function. However, learning this model is quite expensive for large graphs. Therefore, we simplify our model by substitution: = and can learn directly for linear models. This helps to avoid expensive matrix multiplication with . Using the same idea, we simplify our nonlinear model and learn EigenNetwork embedding as: that is fed to a nonlinear network; here, denotes the number of eigen components. Note that this model uses only the topological/graph features and do not leverage the available node features.
EigenEigenNetwork. EigenEigenNetwork is defined as where and denote elementwise product operation. Recall that where is the eigen matrix of . and denote vectors of diagonal entries. Since we are interested in graph adaptation primarily in this work, we used fixed (i.e., fixed low dimensional node features). The reduced dimensionality of node features was set to where is the dimension of features in . We next present details regarding choices of adaptation functions.
Adaptation Functions. The adaptation functions can take the form . We experimented with two different adaptation functions.

. Each eigenvalue , is adapted by an individual scaling coefficient , but the exponentiation coefficient is shared across all the eigenvalues. lies in the range . Combined with suitable choice of , different parts of the eigen spectrum can be suppressed or enhanced, as needed for the supervised task.

. Each eigenvalue , is adapted by individual scaling and exponentiation coefficients and respectively. is in the range . This function also has the ability to suppress and enhance different parts of the eigen spectrum. Unlike the above adaptation function, we learn individual exponentiation parameters for each eigenvalue which makes this adaptation function very powerful.
Note that even though that the adaptation function is powerful as it learns an exponentiation coefficient for each eigenvalue, it may be overparameterized for some datasets. This can result in overfitting, especially in limited labelled settings. Whereas, in the adaptation function , we learn a global exponentiation coefficient and this may be insufficient for some datasets. Therefore, the above two functions operate in two extremes. To get the best of both these parameterizations, we propose RegEigenEigenNetwork.
RegEigenEigenNetwork. To reduce the number of learnable parameters, RegEigenEigenNetwork partitions the eigenvalues into several contiguous bins and uses shared parameters for each of the bins. This partitioning is done for both and discussed in the aforementioned adaptation functions. We treat the number of bins as a hyperparameter and sweep it in the range [10% of the no. of. nodes, 90% of the no. of nodes]. For RegEigenEigenNetwork models, there is an additional weight regularization applied to and swept in the range [1e3, 1e3] in logarithmic steps.
In our experiments, we feed embedding outputs of all our networks to a fullyconnected nonlinear network with a single hidden layer with ReLu activation and Softmax final layer. We observed that using a scaled helps to get improved performance for datasets like Chameleon and Squirrel (with a scaling value, ). We treated the two adaptation functions described above as hyperparameters.
All models use the Adam optimizer Kingma and Ba (2015)
. For our proposed models that involve learning, we set early stopping to 30 and maximum number of epochs to 300. We utilize learning rate with decay, with decay factor set to 0.99 and decay frequency set to 50. All our experiments were performed on a machine with Intel Xeon 2.60Ghz processor, 112GB Ram, Nvidia Tesla P100 GPU with 16GB of memory, python 3.6, and Tensorflow 1.15
(Abadi et al., 2015). We used Optuna (Akiba et al., 2019) to optimize the hyperparameter search.5.3 Experiments on Heterophilic Datasets
Dataset  Texas  Wisconsin  Actor  Squirrel  Chameleon  Crocodile  Cornell 

Homophily level  0.11  0.21  0.22  0.22  0.23  0.26  0.30 
#Nodes  183  251  7600  5201  2277  11631  183 
#Edges  492  750  37256  222134  38328  191506  478 
#Features  1703  1703  932  2089  500  500  1703 
#Classes  5  5  5  5  5  6  5 
#Train/Val/Test  87/59/37  120/80/51  3648/2432/1520  2496/1664/1041  1092/729/456  120/180/11331  87/59/37 
Datasets. We evaluate on seven heterophilic datasets to show the effectiveness of our approach. Detailed statistics of the datasets used are provided in Table 1. We borrowed Texas, Cornell, Wisconsin from WebKB^{5}^{5}5http://www.cs.cmu.edu/afs/cs.cmu.edu/project/theo11/www/wwkb, where nodes represent web pages and edges denote hyperlinks between them. Actor is a cooccurence network borrowed from Tang et al. (2009), where nodes correspond to an actor, and and edge represents the cooccurrence on the same Wikipedia page. Chameleon, Squirrel, and Crocodile are borrowed from Rozemberczki et al. (2021). Nodes correspond to web pages and edges capture mutual links between pages. For all benchmark datasets, we use feature vectors, class labels from Kim and Oh (2021). For datasets in (Texas, Wisconsin, Cornell, Chameleon, Squirrel, Actor), we use 10 random splits (48%/32%/20% of nodes for train/validation/test set) from Pei et al. (2020). For Crocodile, we create 10 random splits following Kim and Oh (2021).
Texas  Wisconsin  Actor  Squirrel  Chameleon  Crocodile  Cornell  
LR  81.35 (6.33)  84.12 (4.25)  34.70 (0.89)  34.73 (1.39)  48.25 (2.67)  48.25 (2.67)  83.24 (5.64) 
MLP  81.24 (6.35)  84.43 (5.36)  36.06 (1.11)  35.38 (1.38)  51.64 (1.89)  54.47 (1.99)  83.78 (5.80) 
SGCN  62.43 (4.43)  55.69 (3.53)  30.44 (0.91)  45.72 (1.55)  60.77 (2.11)  51.54 (1.47)  62.43 (4.90) 
GCN  61.62 (6.14)  53.53 (4.73)  30.32 (1.05)  46.04 (1.61)  61.43 (2.70)  52.34 (2.61)  62.97 (5.41) 
SuperGAT  61.08 (4.97)  56.47 (3.90)  29.32 (1.00)  31.84 (1.26)  43.22 (1.71)  52.41 (1.92)  57.30 (8.53) 
GeomGCN  67.57*  64.12*  31.63*  38.14*  60.90*  NA  60.81* 
H_{2}GCN  84.86 (6.77)*  86.67 (4.69)*  35.86 (1.03)*  37.90 (2.02)*  58.40 (2.77)  53.17 (1.21)  82.16 (4.80)* 
FAGCN  82.43 (6.89)  82.94 (7.95)  34.87 (1.25)  42.59 (0.79)  55.22 (3.19)  54.35 (1.05)  79.19 (9.79) 
APPNP  81.89 (5.85)  85.49 (4.45)  35.93 (1.04)  39.15 (1.88)  47.79 (2.35)  53.13 (1.93)  81.89 (6.25) 
GPRGNN  81.35 (5.32)  82.55 (6.23)  35.16 (0.90)  46.31 (2.46)  62.59 (2.04)  52.71 (1.84)  78.11 (6.55) 
Eigen Network Models  
EigenNetwork  58.92 (3.78)  53.14 (4.84)  25.37 (0.88)  54.62 (1.5)  67.28 (2.21)  45.54 (2.08)  57.30 (5.10) 
EigenEigenNetwork  82.70 (6.42)  82.75 (4.79)  35.04 (0.91)  57.11 (1.94)  65.79 (1.16)  54.51 (1.93)  77.30 (6.19) 
RegEigenEigenNetwork  84.05 (5.76)  89.80 (4.22)  34.84 (0.53)  57.61 (1.92)  66.45 (2.77)  55.03 (2.12)  84.86 (4.80) 
EigenConcatNetwork  78.11 (3.72)  85.69 (4.81)  34.75 (0.83)  53.66 (1.76)  66.51 (1.21)  54.20 (1.61)  80.81 (6.10) 
We propose the following models i) EigenNetwork: We observe from Table 2 that EigenNetwork models, which only rely on topological information, perform better than baselines on Squirrel and Chameleon datasets. This indicates that graph features by themselves are useful for a few datasets.
ii) EigenEigenNetwork: these models extend EigenNetwork models to incorporate aggregation. In our experiments, we restrict adaptation to only the topology by setting to 1. This also allows us to study the effect of graph adaptation. EigenEigenNetwork models are powerful than EigenNetwork models mainly because they are also able to churn out information from the feature space. In comparison against baselines, this approach clearly outperforms baseline models. The greatest gains can be noted in Squirrel and Chameleon datasets with accuracy gains of up to 11%. We believe that graph adaptation function is the reason for performance gains as it is able to highlight important signals in the topology that can complement the aggregation. We can empirically observe that this proposed way of aggregation is effective for heterophilic datasets.
iii) RegEigenEigenNetwork is a regularized version of EigenEigenNetwork, which is able to gain further improvements by reducing the number of learning parameters. In specific, we observe that RegEigenEigenNetwork model consistently outperform across several datasets. We empirically observe that grouping contiguous eigenvalues and learning shared coefficients provides a regularizing effect and improves model’s generalizability. It can be inferred from Table 2 that although several baselines were proposed to address heterophily in graphs, there is no single baseline that consistently achieves good performance across the benchmark datasets.
iv) EigenConcatNetwork: these models deviate from the popular aggregation scheme and offers an effective solution that leverage signals from the given topology and features. For instance, on Wisconsin and Chameleon, these models outperform nongraph based methods and several aggregation methods including GCN, SGCN, SuperGAT and even GeomGCN. With respect to our proposed models, we make a global observation that graph adaptation persistently improves performance with gains of up to 11% as showcased in Table 6.
5.4 Experiments on Homophilic Datasets
Our paper mainly focuses on heterophilic datasets. However, it is also important to understand the performance of our model on homophilic datasets. Towards that end, we ran experiments on some homophilic datasets. The datasets used and their corresponding statistics are available in Table 3. We borrowed Cora, Citeseer, and Pubmed datasets and the corresponding train/val/test set splits from Pei et al. (2020). The remaining datasets were borrowed from Kim and Oh (2021). We follow the same dataset setup mentioned in Kim and Oh (2021) to create 10 random splits for each of these datasets.
Statistics  Flickr  CoraFull  WikiCS  Citeseer  Pubmed  Cora  Computer  Photos 

Homophily Score  0.32  0.59  0.68  0.74  0.80  0.81  0.81  0.85 
Number of Nodes  89250  19793  11701  3327  19717  2708  13752  7650 
Number of Edges  989006  83214  302220  12431  108365  13264  259613  126731 
Number of Features  500  500  300  3703  500  1433  767  745 
Classes  7  70  10  6  3  7  10  8 
#Train  44625  1395  580  1596  9463  1192  200  160 
#Validation  22312  2049  1769  1065  6310  796  300  240 
#Test  22313  16349  5847  666  3944  497  13252  7250 
Performance on Homophilic v/s Heterophilic Datasets. The performance results of the various baselines and our approach is given in Table 4. We observe that on homophilic datasets, EigenEigenNetwork which is an aggregation based model, tends to perform better than EigenConcatNetwork models. This trend is expected; as the homophily levels of the graph increases the discord between the node features (X) and topology (A) reduces, leading to improvements in aggregation methods. Baseline models including GPRGNN, SuperGAT, FAGCN, and APPNP, which are also aggregation based methods, perform better on homophilic datasets. However, our proposed models are not far off.
Test Acc  CoraFull  WikiCS  Citeseer  Pubmed  Cora  Computer  Photos 

LR  39.10 (0.43)  72.28 (0.59)  72.22 (1.54)  87.00 (0.40)  73.94 (2.47)  64.92 (2.59)  77.57 (2.29) 
MLP  43.03 (0.82)  73.74 (0.71)  73.83 (1.73)  87.77 (0.27)  77.06 (2.16)  64.95 (3.57)  76.96 (2.46) 
GCN  45.44 (1.01)  77.64 (0.49)  76.47 (1.33)  87.86 (0.47)  87.28 (1.34)  78.16 (1.85)  86.38 (1.71) 
SGCN  61.31 (0.78)  78.30 (0.75)  76.77 (1.52)  88.48 (0.45)  86.96 (0.78)  80.65 (2.78)  89.99 (0.69) 
SuperGAT  57.75 (0.97)  77.92 (0.82)  76.58 (1.59)  87.19 (0.50)  86.75 (1.24)  83.04 (1.02)  90.31 (1.22) 
GeomGCN  NA  NA  77.99*  90.05*  85.27*  NA  NA 
H2GCN  57.83 (1.47)  OOM  77.07 (1.64)*  89.59 (0.33)*  87.81 (1.35)*  OOM  91.17 (0.89) 
FAGCN  60.07 (1.43)  79.23 (0.66)  76.80 (1.63)  89.04 (0.50)  88.21 (1.37)  82.16 (1.48)  90.91 (1.11) 
GPRGNN  61.37 (0.96)  79.68 (0.50)  76.84 (1.69)  89.08 (0.39)  87.77 (1.31)  82.38 (1.60)  91.43 (0.89) 
APPNP  60.83 (0.55)  79.13 (0.50)  76.86 (1.51)  89.57 (0.53)  88.13 (1.53)  82.03 (2.04)  91.68 (0.62) 
Our Models  
EigenNetwork  43.93 (1.19)  61.75 (1.31)  65.53 (3.49)  81.07 (0.41)  80.12 (1.54)  64.17 (6.19)  74.79 (3.44) 
EigenEigenNetwork  56.10 (1.03)  77.96 (0.53)  76.67 (1.83)  89.30 (0.42)  87.10 (1.10)  78.86 (1.86)  88.50 (0.92) 
EigenConcatNetwork  47.47 (0.92)  74.13 (0.87)  74.86 (1.90)  88.38 (0.15)  84.43 (1.77)  66.20 (3.33)  76.97 (2.63) 
RegEigenEigenNetwork  58.19 (0.62)  78.31 (0.69)  77.20 (1.36)  89.22 (0.43)  87.17 (1.18)  81.06 (1.80)  89.01 (1.05) 
5.5 Experiments on Large Dataset
We additionally performed one large scale dataset experiment on Flickr dataset. We use the publicly available fixed split for this dataset. We show the results of all the models on Flickr in Table 5. We first note that SuperGAT gives the best performance among all baselines followed by GPRGNN. However, three of our models: EigenNetwork, EigenEigenNetwork and RegEigenEigenNetwork outperform all the baselines. Amongst our models, EigenNetwork gives the best performance with 54.4% test accuracy. Another thing to note here is that EigenConcatNetwork model is not far behind the other EigenNetwork models and do better than APPNP and SGCN. This suggests that concatenation based models can offer effective alternative for aggregation based approaches.
Model  Test Acc  Model  Test Acc  Model  Test Acc 

LR  46.51  SuperGAT  53.47  EigenNetwork  54.4 
MLP  46.93  GeomGCN  NA  EigenEigenNetwork  53.78 
GCN  53.4  H2GCN  OOM  EigenConcatNetwork  51.78 
SGCN  50.75  FAGCN  OOM  RegEigenEigenNetwork  53.83 
GPRGNN  52.74  
APPNP  50.33 
5.6 Analysis
Effect of Adaptation: We carry out an ablation study in Table 6 by freezing the original eigenvalues. We observe that adaptation helps in most of the datasets for all the proposed EigenNetwork variants. There is a significant improvement for EigenEigenNetwork over the freezed variant in datasets like Texas, Wisconsin and Cornell. As observed in Section 3.2, there exists different weightings for the eigenvalues which are more useful for the supervised task. Additionally in Figure 2, we plotted the ratio of the adapted eigenvalues to that of the original eigenvalues of the graph. We see that different regions of eigenvalues are given high weights as seen in Crocodile and Squirrel plots. It might be difficult for models like GPRGNN to learn such behaviour with a polynomial. Our proposed approaches are able to learn such behavior, which is also reflected in the performance where we gain 3% over GPRGNN on Crocodile and up to 11% on Squirrel.
Texas  Wisconsin  Actor  Squirrel  Chameleon  Crocodile  Cornell  

EigenNetwork w/o Adaptation  56.76 (4.83)  50.78 (4.92)  25.24 (0.84)  53.67 (1.4)  65.61 (1.63)  44.96 (1.78)  55.14 (7.47) 
EigenNetwork  58.92 (3.78)  53.14 (4.84)  25.37 (0.88)  54.62 (1.5)  67.28 (2.21)  45.54 (2.08)  57.30 (5.10) 
EigenConcatNetwork w/o adaptation  78.38 (5.92)  84.9 (5.12)  34.93 (0.63)  47.57 (1.75)  63.79 (2.14)  54.42 (1.62)  81.89 (5.41) 
EigenConcatNetwork  78.11 (3.72)  85.69 (4.81)  34.75 (0.83)  53.66 (1.76)  66.51 (1.21)  54.20 (1.61)  80.81 (6.10) 
EigenEigenNetwork w/o adaptation  65.41 (6.71)  69.41 (4.04)  27.66 (0.78)  56.01 (1.48)  60.29 (2.54)  53.89 (0.96)  67.03 (4.49) 
EigenEigenNetwork  82.70 (6.42)  82.75 (4.79)  35.04 (0.91)  57.11 (1.94)  65.79 (1.16)  54.51 (1.93)  77.30 (6.19) 
Node and Graph Features: We conduct a study to answer the question: Are both node and graph features helpful and needed? We experimented using three different models, fully connected multilayer nonlinear network (MLP) with a single hidden layer with hidden dimensions swept in the range [16,32,64], EigenNetwork and EigenConcatNetwork. MLP uses only the node features, EigenNetwork uses only the graph/topological information. EigenConcatNetwork leverages both topological and node features. In Table 7, we observe across several datasets that EigenConcatNetwork model performs better than either of the individual models. We see 4% improvement in CoraFull and Cora datasets over the best performing individual model (EigenNetwork). This highlights the fact that both these sources are helpful and useful to build simple and efficient EigenConcatNetwork models.
Test Acc  CoraFull  WikiCS  Citeseer  Pubmed  Cora  Computer  Photos 

MLP  43.03 (0.82)  73.74 (0.71)  73.83 (1.73)  87.77 (0.27)  77.06 (2.16)  64.95 (3.57)  76.96 (2.46) 
EigenNetwork  43.93 (1.19)  61.75 (1.31)  65.53 (3.49)  81.07 (0.41)  80.12 (1.54)  64.17 (6.19)  74.79 (3.44) 
EigenConcatNetwork  47.47 (0.92)  74.13 (0.87)  74.86 (1.90)  88.38 (0.15)  84.43 (1.77)  66.20 (3.33)  76.97 (2.63) 
Using Node Features with Aggregation Models: Recall that GPRGNN propagation model uses the graph operator, . We observe that the monomial corresponding to , is independent of neighborhood aggregation using powers of (). This could be a very useful signal to include, in particular, when the node features are of high quality and possess relatively strong discriminative power. EigenEigenNetwork models do not consume the node features directly, but, can be easily included. In Table 8, we report performance numbers on a few datasets, with and without explicitly augmenting Order0 monomial to EigenEigenNetwork. We see 7.5% and 5% improvement in Actor and Pubmed. However, this improvement was not observed on several other datasets.
EigenEigenNetwork  Actor  Pubmed 

Without Order0 Monomial  27.51 (0.91)  84.24 (0.50) 
With Order0 Monomial  35.04 (0.91)  89.30 (0.42) 
Are EigenNetwork models lacking on Homophilic Datasets?. Neighborhood aggregation based methods work with directly. It means that they have access to the entire eigenvalue spectrum. For example, in GCN when we multiply with , we get
(6) 
thereby leveraging the entire eigenvalue spectrum. However, in our EigenNetwork models, we restrict the number of components to only top few eigen components. We believe this may be a cause for the gap in performance on homophilic datasets. To study this, we do the following experiment  we vary the number of eigen components and observe the test performance. The experimental results are given in Figure 3. On heterophilic datasets, Chameleon and Squirrel, we do not see any observable trend in performance. However, on homophilic datasets, Cora and CoraFull, the performance does go up with the number of components. However, it also seems to drop occasionally. For example, we observe that in CoraFull, the test accuracy dropped by 2% going from 2048 to 4096 components. This suggests that while number of components may have a role to play in the lower performance on homophilic datasets, that alone may not be the only contributor. This requires further investigation and we plan to make it a part of our future work.
Utility of weighttying in limited labeled data setting: To analyze the importance of this regularization, we study the effect of varying the training set size for EigenEigenNetwork and RegEigenEigenNetwork and plot them in Figure 4. On Squirrel, we observe that even with 20% labeled data, RegEigenEigenNetwork as good as having 48% labeled data. On Chameleon, we observe something even more interesting. The base EigenEigenNetwork model does not improve much in performance even with increased labeled data. However, the RegEigenEigenNetwork model continue to improve in performance with increased labeled data. Weighttying is dependent on finding good partitions of eigenvalues which is currently tuned as a hyperparameter. We believe that in Chameleon, with increased labeled data, the model does better on validation set and thus is able to find even better partitions of the eigenvalues leading to improved performance.
Homophily Rank  Heterophily Rank  

LR  11.25  7.29 
MLP  10.75  5.29 
GCN  7.88  8.71 
SGCN  6.13  8.57 
SuperGAT  6.25  10.71 
H2GCN  6.88  4.43 () 
FAGCN  4.38  5.86 
GPRGNN  2.88 ()  5.57 
APPNP  3.50 ()  5.86 
EigenNetwork  10.75  9.00 
EigenEigenNetwork  5.88  4.29 () 
RegEigenEigenNetwork  4.00 ()  2.00 () 
Model Comparison: Varying levels of Homophily: We group all the datasets with homophily score 0.50 and refer to them as heterophilic datasets. The rest datasets are referred to as homophilic datasets. For all the models in comparison, we compute their rank across datasets and report the average rank on homophlilic and heterophilic datasets in Table 9. We used the heterophily datasets result table (Table 2) and Table 4 to compute the ranks.
We further group the models for ease of comparison. We observe that the simple nongraph based models, linear and nonlinear networks that use only node features perform reasonably well on heterophilic datasets. The second class of models like SGCN and GCN belong to popular neighborhood aggregation based methods. We see that these models perform better on homophilic datasets, as expected. However, on heterophilic datasets, they perform poor because of the violation of the assumption: connected neighbors have same class labels.
The third group of models including SuperGAT, GPRGNN, FAGCN, and APPNP were specifically designed to work across datasets with varying levels of homophily. We notice that SuperGAT performs poorly on heterophilic datasets. We believe that attention trained on auxiliary task alone may not be sufficient to address heterophily. A detailed investigation is required to understand the gaps, and is beyond the scope of this work. However, the other models in this group  GPRGNN, FAGCN and APPNP have better average ranking compared to common GNN methods like GCN and SGCN. For homophilic datasets, we observe that models like APPNP and GPRGNN perform the best. We noticed in GPRGNN paper that they perform better than APPNP on several datasets. However, we see APPNP has an edge over GPRGNN in our experiments. We believe this is the case because of amount of labeled data and different splits used in our experiments. In specific, GPRGNN reports numbers in their paper using 60% training data for heterophilic datasets, while we follow Pei et al. (2020) and use 48% training data.
The last set in Table 9 corresponds to proposed models. We observe that EigenNetwork model does not perform well on several datasets. This is not surprising because EigenNetwork solely relies on topological features for solving the task at hand. However, the merits of this model can be observed on datasets like Squirrel and Chameleon. Therefore, EigenNetwork is still useful in some scenarios. EigenEigenNetwork makes use of node features with neighborhood aggregation and performs significantly better compared to EigenNetwork and other baselines. RegEigenEigenNetwork offers the best performance on heterophilic datasets and does competitively on homophilic datasets.
6 Conclusion and Future Work
In this paper, we presented an eigendecomposition based approach and proposed the EigenNetwork models. These models are inspired by the GPRGNN Chien et al. (2021) model which we show can be interpreted as selecting/weighing the eigenvectors by scaling the corresponding eigenvalues. We propose a weight tying based regularization model, that enables our model to avoid overfitting on the data and generalize better. We show that our models do well across all heterophilic datasets. We plan to study the optimization of variable in EigenEigenNetwork model and behaviour of this model on homophily datasets as part of our future work. In this paper, we also propose an alternative concatenation based model that is competitive with aggregation based approach on heterophilic datasets. This model is simple and computationally cheaper. It begs the question whether there are alternative ways to model Graph Neural Networks that work across varying homophily scores. We leave it as a future work.
References

Abadi et al. (2015)
Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen,
Craig Citro, Greg S. Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin,
Sanjay Ghemawat, Ian Goodfellow, Andrew Harp, Geoffrey Irving, Michael Isard,
Yangqing Jia, Rafal Jozefowicz, Lukasz Kaiser, Manjunath Kudlur, Josh
Levenberg, Dandelion Mané, Rajat Monga, Sherry Moore, Derek Murray, Chris
Olah, Mike Schuster, Jonathon Shlens, Benoit Steiner, Ilya Sutskever, Kunal
Talwar, Paul Tucker, Vincent Vanhoucke, Vijay Vasudevan, Fernanda Viégas,
Oriol Vinyals, Pete Warden, Martin Wattenberg, Martin Wicke, Yuan Yu, and
Xiaoqiang Zheng.
TensorFlow: Largescale machine learning on heterogeneous systems, 2015.
URL https://www.tensorflow.org/. Software available from tensorflow.org.  AbuElHaija et al. (2019a) Sami AbuElHaija, Bryan Perozzi, Amol Kapoor, Hrayr Harutyunyan, Nazanin Alipourfard, Kristina Lerman, Greg Ver Steeg, and Aram Galstyan. Mixhop: Higherorder graph convolution architectures via sparsified neighborhood mixing. In International Conference on Machine Learning (ICML), 2019a.

AbuElHaija et al. (2019b)
Sami AbuElHaija, Bryan Perozzi, Amol Kapoor, and Joonseok Lee.
Ngcn: Multiscale graph convolutionfor semisupervised node
classification.
In
Conference on Uncertainty in Artificial Intelligence (UAI)
, 2019b.  Akiba et al. (2019) Takuya Akiba, Shotaro Sano, Toshihiko Yanase, Takeru Ohta, and Masanori Koyama. Optuna: A nextgeneration hyperparameter optimization framework. ArXiv, abs/1907.10902, 2019.
 Bach et al. (2017) Stephen H. Bach, Matthias Broecheler, Bert Huang, and Lise Getoor. Hingeloss markov random fields and probabilistic soft logic. Journal of Machine Learning Research (JMLR), 2017.
 Bo et al. (2021) Deyu Bo, X. Wang, Chuan Shi, and HuaWei Shen. Beyond lowfrequency information in graph convolutional networks. In Association for the Advancement of Artificial Intelligence (AAAI), 2021.
 Bruna et al. (2014) Joan Bruna, Wojciech Zaremba, Arthur Szlam, and Yann LeCun. Spectral networks and locally connected networks on graphs. In International Conference on Learning Representations (ICLR), 2014.
 Chien et al. (2021) Eli Chien, Jianhao Peng, Pan Li, and Olgica Milenkovic. Adaptive universal generalized pagerank graph neural network. In International Conference on Learning Representations (ICLR), 2021.
 Defferrard et al. (2016) Michaël Defferrard, Xavier Bresson, and Pierre Vandergheynst. Convolutional neural networks on graphs with fast localized spectral filtering. In Neural Information Processing Systems (NeurIPS), 2016.
 Domingos and Lowd (2009) Pedro Domingos and Daniel Lowd. Markov Logic: An Interface Layer for Artificial Intelligence. Morgan & Claypool, 2009.
 Getoor and Taskar (2007) Lise Getoor and Ben Taskar. Introduction to Statistical Relational Learning. MIT Press, 2007.
 Golub and Van Loan (1996) Gene H. Golub and Charles F. Van Loan. Matrix Computations (3rd Ed.). Johns Hopkins University Press, USA, 1996. ISBN 0801854148.
 Hamilton et al. (2017) William L. Hamilton, Rex Ying, and Jure Leskovec. Inductive representation learning on large graphs. In Neural Information Processing Systems (NeurIPS), 2017.
 Kim and Oh (2021) Dongkwan Kim and Alice Oh. How to find your friendly neighborhood: Graph attention design with selfsupervision. In International Conference on Learning Representations (ICLR), 2021.
 Kingma and Ba (2015) Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization. In International Conference on Learning Representations (ICLR), 2015.
 Kipf and Welling (2017) Thomas N. Kipf and Max Welling. Semisupervised classification with graph convolutional networks. In International Conference on Learning Representations (ICLR), 2017.
 Klicpera et al. (2019) Johannes Klicpera, Aleksandar Bojchevski, and Stephan Günnemann. Combining neural networks with personalized pagerank for classification on graphs. In International Conference on Learning Representations (ICLR), 2019.
 Lecun et al. (1998) Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner. Gradientbased learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998. doi: 10.1109/5.726791.

Li et al. (2018)
Qimai Li, Zhichao Han, and XiaoMing Wu.
Deeper insights into graph convolutional networks for semisupervised learning.
In Association for the Advancement of Artificial Intelligence (AAAI), 2018.  McPherson et al. (2001) Miller McPherson, Lynn SmithLovin, and James M Cook. Birds of a feather: Homophily in social networks. Annual Review of Sociology, 2001.
 Pei et al. (2020) Hongbin Pei, Bingzhe Wei, Kevin ChenChuan Chang, Yu Lei, and Bo Yang. Geomgcn: Geometric graph convolutional networks. In International Conference on Learning Representations (ICLR), 2020.
 Rozemberczki et al. (2021) Benedek Rozemberczki, Carl Allen, and Rik Sarkar. Multiscale attributed node embedding. Journal of Complex Networks, 2021.
 Tang et al. (2009) Jie Tang, Jimeng Sun, Chi Wang, and Zi Yang. Social influence analysis in largescale networks. In ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), 2009.
 Vaswani et al. (2017) Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. Attention is all you need. In Neural Information Processing Systems (NeurIPS), 2017.
 Veličković et al. (2018) Petar Veličković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Liò, and Yoshua Bengio. Graph Attention Networks. In International Conference on Learning Representations (ICLR), 2018.
 Wu et al. (2019) Felix Wu, Amauri Souza, Tianyi Zhang, Christopher Fifty, Tao Yu, and Kilian Weinberger. Simplifying graph convolutional networks. In International Conference on Machine Learning (ICML), 2019.
 Zhu et al. (2020) Jiong Zhu, Yujun Yan, Lingxiao Zhao, Mark Heimann, Leman Akoglu, and Danai Koutra. Beyond homophily in graph neural networks: Current limitations and effective designs. In Neural Information Processing Systems (NeurIPS), 2020.
 Zhu et al. (2021) Jiong Zhu, Ryan A. Rossi, Anup Rao, Tung Mai, Nedim Lipka, Nesreen K. Ahmed, and Danai Koutra. Graph Neural Networks with Heterophily. In Association for the Advancement of Artificial Intelligence (AAAI), 2021.