SLGCN: Structure Learning Graph Convolutional Networks for Graphs under Heterophily

by   Mengying Jiang, et al.

The performances of GNNs for representation learning on the graph-structured data are generally limited to the issue that existing GNNs rely on one assumption, i.e., the original graph structure is reliable. However, since real-world graphs is inevitably noisy or incomplete, this assumption is often unrealistic. In this paper, we propose a structure learning graph convolutional networks (SLGCNs) to alleviate the issue from two aspects, and the proposed approach is applied to node classification. Specifically, the first is node features, we design a efficient-spectral-clustering with anchors (ESC-ANCH) approach to efficiently aggregate feature representationsfrom all similar nodes, no matter how far away they are. The second is edges, our approach generates a re-connected adjacency matrix according to the similarities between nodes and optimized for the downstream prediction task so as to make up for the shortcomings of original adjacency matrix, considering that the original adjacency matrix usually provides misleading information for aggregation step of GCN in the graphs with low level of homophily. Both the re-connected adjacency matrix and original adjacency matrix are applied to SLGCNs to aggregate feature representations from nearby nodes. Thus, SLGCNs can be applied to graphs with various levels of homophily. Experimental results on a wide range of benchmark datasets illustrate that the proposed SLGCNs outperform the stat-of-the-art GNN counterparts.



There are no comments yet.


page 1

page 2

page 3

page 4


Node Similarity Preserving Graph Convolutional Networks

Graph Neural Networks (GNNs) have achieved tremendous success in various...

AMA-GCN: Adaptive Multi-layer Aggregation Graph Convolutional Network for Disease Prediction

Recently, Graph Convolutional Networks (GCNs) have proven to be a powerf...

AutoLL: Automatic Linear Layout of Graphs based on Deep Neural Network

Linear layouts are a graph visualization method that can be used to capt...

Attack Graph Convolutional Networks by Adding Fake Nodes

Graph convolutional networks (GCNs) have been widely used for classifyin...

Efficient Graph Compression Using Huffman Coding Based Techniques

Graphs have been extensively used to represent data from various domains...

Dynamic Graph Correlation Learning for Disease Diagnosis with Incomplete Labels

Disease diagnosis on chest X-ray images is a challenging multi-label cla...

On the Inverse of Forward Adjacency Matrix

During routine state space circuit analysis of an arbitrarily connected ...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Graph Neural Networks (GNNs) are powerful for representation learning on graphs with varieties of applications ranging from knowledge graphs to financial networks 

(Gilmer et al., 2017; Scarselli et al., 2020; Bronstein et al., 2017). In recent years, GNNs have developed many artificial neural networks, e.g., Graph Convolutional Network (GCN) (Kipf & Welling, 2017), Graph Attention Network (GAT) (Velickovic et al., 2018), Graph Isomorphism Network (GIN) (Xu et al., 2018), and GraphSAGE (Hamilton et al., 2017). For GNNs, each node can iteratively update its feature representation via aggregating the ones of the node itself and its neighbors (Kipf & Welling, 2017). The neighbors are usually defined as all adjacent nodes in graph, meanwhile a diversity of aggregation functions can be adopted to GNNs (Pei et al., 2020; Corso et al., 2020), e.g., summation, maximum, and mean.

GCNs is one of the most attractive GNNs and have been widely applied in a variety of scenarios (Pei et al., 2020; Hamilton et al., 2017b; Bronstein et al., 2017). However, one fundamental weakness of GCNs limits the representation ability of GCNs on graph-structured data, i.e., GCNs may not capture long-range dependencies in graphs, considering that GCNs update the feature representations of nodes via simply summing the normalized feature representations from all one-hop neighbors (Velickovic et al., 2018; Pei et al., 2020). Furthermore, this weakness will be magnified in graphs with heterophily or low/medium levels of homophily.

Homophily is a very important principle of many real-world graphs, whereby the linked nodes tend to have similar features and belong to the same class have similar features (Jiong Zhu et al., 2020). For instance, papers are more likely to cite papers from the same research area, and friends tend to have similar age or political beliefs. However, there are also settings about ”opposites attract” in the real world, leading to graphs with low homogeneity, i.e., the proximal nodes are usually from different classes and have dissimilar features (Jiong Zhu et al., 2020). For example, most of people tend to chat with people of the opposite gender in dating website, and fraudsters are more prefer to contact with accomplices than other fraudsters in online gambling networks. Most existing GNNs assume strong homophily, including GCNs, therefore they perform poorly on generalizing graphs under high heterophily even worse than the MLP (Horniket al., 1991) that relies only on the node features for classification (Shashank Pandit et al., 2007; Jiong Zhu et al., 2020).

For addressing the mentioned weakness, now the straightforward strategy is to utilize a multi-layer GCN to aggregate the features from distant nodes, while this strategy may cause over-smoothing and over-fitting (Rong et al., 2020). Recently some related approaches has been bulit to alleviating this curse, such as Geometric Graph Convolutional Network (GEOM-GCN) (Pei et al., 2020), HGCN (Jiong Zhu et al., 2020). Although GEOM-GCN improves the performance of representation learning of GCNs, the performance of node classification is often unsatisfactory if the concerned datasets is the graphs with low homogeneity (Pei et al., 2020). HGCN improves the classification performance of GCN, while it is only able to aggregate information from near nodes, resulting in lacking an ability for capturing the features from distant but similar nodes. Nevertheless, it is notable that HGCN still has a lot of room for improvement.

In this paper, we propose a novel GNN approach to solve the above mentioned problem, referred to as the structure learning graph convolutional networks (SLGCNs). Following the spectral clustering (SC) method (Nie et al., 2011), the nodes of graph are mapped into a new feature space, furthermore, the nodes connected closely or possess similar features in graph are usually proximal in new feature space. Therefore, nodes can aggregate features from similar nodes if SC is employed to process graph-structured data, contributing to that GCN can capture long-range dependencies. It should be noted, however, the computational complexity of SC is greatly high for large-scale graphs, hence we design a efficient-spectral-clustering with anchors (ESC-ANCH) method to efficiently extract SC features. Then, the extracted SC features combined with the original node features as enhanced features (EF), and the EF is utilized to train GNNs.

Following the research in Jiong Zhu et al. (2020)

, the nodes of the same class always possess highly similar features, no matter whether the homophily of graph is high or low. For our approach, it can learn a re-connected graph related to the similarities between nodes and optimized for the downstream prediction task, and afterwards it uses the original adjacency matrix and the re-connected adjacency matrix to obtain multiple intermediate representations corresponding to different rounds of aggregation. Then, we combine several key intermediate representations as the new node embedding, and use a learnable weighted vector to highlight the important dimensions of the new node embedding. Finally, we set the result of this calculation as the final node embedding for node classification.

Compared with other GNNs, the contributions of SLGCNs can be summarized as follows. 1) SC is integrated into GNNs for capturing long range dependencies on graphs, and a ESC-ANCH alogrithm is proposed to efficiently implement SC on graph-structured data; 2) Our SLGCNs can learn the re-connected adjacency matrix which not only relates to the node similarity but also benefit the downstream prediction task; 3) The SLGCNs proposes the improvements for handling heterophily from the aspects of node features and edges, respectively. Meanwhile, SLGCNs combines the two improvements and makes them supplement each other.

2 Related Work

In this paper, our work is directly related to SC and GCN. The theories about the SC and GCN are reviewed in subsection 2.1 and subsection 2.2, respectively.

2.1 Spectral Clustering

Spectral clustering (SC) is an advanced algorithm evolved from graph theory, has attracted much attention (Nie et al., 2011). Compared with most traditional clustering methods, the implementation of SC is much simpler (Nie et al., 2011). It is remarkable that, for SC, a weighted graph is utilized to partition the dataset. Assume that represents a dataset. The task of clustering is to segment into clusters. The cluster assignment matrix is denoted as , where is the cluster assignment vector for the pattern . From another perspective, can be considered as the feature representation of in the -dimensional feature space. SC contains different types, while our work focuses on the SC with -way normalized cut due to considering the overall performance of the algorithm, where the related concepts are explained in S. X. Yu et al. (2003).

Let be an undirected weighted graph, where denotes the set of nodes, and

denotes affinity matrix, and

is the number of nodes in . Note that is symmetric matrix, and each element represents the affinity of a pair of nodes in . The most common way to built is full connection method. Following the description in Nie et al. (2011), can be represented as:


where and represent the features of -th node and -th node in , respectively, and can control the degree of similarity between nodes. Then, the Laplacian matrix is defined as , where degree matrix is a diagonal matrix, and the diagonal element is denoted as .

The objective function of SC with normalized cut is


where is the clustering indicator matrix, and the objective function can be rewritten as:


The optimal solution

of objective function is constructed by the eigenvectors corresponding to the

largest eigenvalues of

. In general, can be not only considered as the result of clustering of nodes, but also regarded as the new feature matrix of nodes, in which node has feature elements.

Figure 1: SLGCNs pipeline. It consists of three stages: (S1) efficient spectral clustering with anchors, (S2) re-connected graph, and (S3) structure learning graph convolutional networks. In (S1), we employ ESC-ANCH to generate the SC feature . In (S2), we construct a re-connected graph according to the similarities between nodes, and the re-connected graph can be gradually optimized with the training of the SLGCNs model. In (S3), we combine the original feature and SC feature as the enhanced feature , and perform feature aggregation on by using the re-connected adjacency matrix and original , respectively. Then, the results of aggregation and the are combined as the , and a weighted vector is used to highlight important dimensions of so as to make SLGCNs adapt to graphs with various levels of homophily.

2.2 Gcn

Following the work in Scarselli et al. (2020), in GCN, the updates of node features adopt an isotropic averaging operation over the feature representations of one-hop neighbors. Let be the feature representation of node in the -th GCN layer, and we have


where represents the set of one-hop neighbors of nodes , is weight matrix, and

is employed as the activation function. Note that

and are the in-degrees of -th node and -th node, respectively. Furthermore, the forward model of a 2-layer GCN can be represented as:


where is the feature matrix of nodes and is also the input of first GCN layer, and is the adjacency matrix. The adjacency matrix with self-loop is , where

is an identity matrix. Here, the adjacency matrix with self-loop can be normalized by

, and the normalized adjacency matrix is employed for aggregating the representation of neighbors, where is the degree matrix of . Each element is defined as:


3 Proposed SLGCN Approach

In this section, we present a novel GNN: structure learning graph convolutional networks (SLGCNs) for node classification of graph-structured data. The pipeline of SLGCNs is shown in Figure 1. The remainder of this section is organized as follows. Subsection 3.1 describes the proposed ESC-ANCH approach. In subsection 3.2, we give the generation details of the Re-connected adjacency matrix. Subsection 3.3 describes the proposed SLGCNs in detail.

3.1 Efficient Spectral Clustering with Anchors

In graph-structured data, a large number of nodes of the same class possess similar feature but are far apart from each other. However, GCN simply aggregates the information from one-hop neighbors, and the depth of GCN is usually limited. Therefore, the information from distant but similar nodes is always ignored, while SC can divide nodes according to the affinities between nodes. Specifically, the closely connected and similar nodes are more likely to be proximal in new feature space, and vice versa. Thus, it is very appropriate for combining GCN with SC to extract the features of distant but similar nodes. Following subsection 2.1, the object of performing SC is to generate the cluster assignment matrix . can only be calculated by eigenvalue decomposition on the normalized similar matrix , which takes time complexity, where and are the the numbers of nodes and clusters, respectively. For some large-scale graphs, the computational complexity is an unbearable burden.

In order to overcome this problem, we propose the efficient spectral clustering (ESC) to efficiently perform SC. In the proposed ESC, instead of the S is calculated by equation (1), we employ inner product to construct affinity matrix . Thus, the normalized similar matrix in the ESC method can be represented as:


Then, we define , thus

. The singular value decomposition (SVD) of

can be represent as follows:


where , and

are left singular vector matrix, singular value matrix and right singular vector matrix, respectively. It is obvious that the column vectors of

are the eigenvectors of . Therefore, we can easily construct by using the eigenvectors from , and the eigenvectors used are corresponding to the largest eigenvalues in . The computational complexity of SVD performed on () is much lower, compared with directly performing eigenvalue decomposition on ().

However, the dimension of nodes’ original features are usually high in many graph-structured data, the efficiency of the ESC method still needs to be improved.

Therefore, we propose the ESC-ANCH. Specifically, we first randomly select nodes from the set of nodes as the anchor nodes, where and . Then, we calculate the node-anchor similar matrix , and the chosen similarity metric function is cosine. Here, is employed as the new feature representations of nodes. Thus, in ESC-ANCH, can be redefined as , and D is the degree matrix of . Therefore, the ESC-ANCH takes time complexity, which is much lower than SC and ESC.

3.2 Re-connected Graph

Figure 2: Reconnected graph is learned from original node features via similarity learning.

Most existing GNNs are designed for graphs with high homophily, where the linked nodes are more likely possess similar feature representation and belong to the same class, e.g., community networks and citation networks (Newman, 2002).

However, there are a large number of real-world graphs under heterophily, where the linked nodes usually possess dissimilar feature and belong to different classes, e.g., webpage linking networks (Ribeiro et al., 2017). Thus, these GNNs designed under the assumption of homophily are greatly inappropriate for the graphs under heterophily.

Fortunately, regardless of the homophily level of graphs, the nodes of the same class always possess high similar feature. Therefore, as shown in Figure 2, for helping GCN to capture information from nodes in the same class, we learn a re-connected adjacency matrix according to the similarities between nodes and the downstream tasks.

A good similarity metric function should be expressively powerful and learnable (Chen et al., 2020)

. Here, we set cosine similarity as our metric function. The similarity between a pair of nodes can be represented as:


where and are the feature representations of node and , and is a learnable weighted vector. Afterwards, we can generate the similarity matrix . The generated similarity matrix S is symmetric, and the elements in S range between .

However, an adjacency matrix is supposed to be non-negative and sparse. This is owning to the fully connected graph is very computational and might introduce noise. Therefore, we need to extract a non-negative and sparse adjacency matrix from . Specifically, we define a non-negative threshold , and set those elements in which are smaller than to zero.

3.3 Structure Learning Graph Convolutional Networks

  Input: : feature matrix of nodes, : label matrix of nodes for training, : adjacency matrix, : trainable weighted matrixs, : trainable weighted vector.

: probability distribution of nodes over a set of classes.

  1. in 3.1.
  2. in 3.2.
  3. in the way of average in Eq. 10, or in the way of concatenation in Eq. 11.
  4. in Eq. 12.
  repeat (initialize to 1)
  5. Update , in Eq. 13.
  6. in Eq. 14.
  7. in Eq. 15
  8. in Eq. 17.
  9. LOSS
  12. Back-propagate to update model weights.
Algorithm 1 SLGCNs

We first utilize original feature and SC feature to construct the EF. The first layer of the proposed SLGCNs is represented as:


where , are trainable weight matrix of the first layer. Meanwhile, we can also use concatenation to combine and .


where represents concatenation. Here, we call the EF as EF if average methd is adopted, and call EF as EF if concatenation method is adopted.

After the first layer is constructed, We use to aggregate and update features of nodes to obtain the intermediate representations:


where is the row normalized . Similarly, we use to aggregate and update features to obtain the intermediate representations of nodes.


where , and represents the times of feature aggregation. . After several rounds of feature aggregation, we combine several most key intermediate representations as the new node embeddings as follows:


For graphs with high homophily, and are sufficient for representing embeddings of nodes. This can be proved by GCN (Kipf & Welling, 2017) and GAT (Velickovic et al., 2018). In addition, can be treated as the supplement to and . For graphs with under heterophily, and can also perform well on learning feature representation. In order to exploit the advantage of these intermediate representations to the full, we use concatenation to combine these intermediate representations. The intermediate representations adopted are not mixed, by the way of concatenation.

Afterwards, we generate a learnable weight vector which has the same dimension as , and take the Hadamard product between and as the final feature representations of nodes,


The purpose of this step is highlight the important dimensions of

. Afterward, nodes are classified based on their final embedding

as follows:


where is trainable weighted matrix of the last layer. The classification loss is as follows:


where Y represent the labels of nodes, and is cross-entropy function for measuring the difference between predictions and the true labels .

Since the re-connected graph learning enable the SLGCNs possess stronger ability to fit the downstream task, SLGCNs is more likely to suffer from over-fitting. Thus, we should apply a regularization term to the learned graph.

Since the re-connected graph is learned according to the similarity between nodes, the homophily of re-connected graph is high. In addition to the homophily, the connectivity and sparsity of is also important (Chen et al., 2020). Thus, the regularization term is defined as follows:


where and

are non-negative hyperparameters. The first term encourages

to be connected via the logarithmic barrier, and the second term encourages to be sparse (Chen et al., 2020).

Ultimately, the loss of SLGCNs is as follows:


In this paper, the SLGCN that uses EF is called as SLGCN, and similarly, the SLGCN that uses EF is named as SLGCN. A pseudocode of the proposed SLGCNs is given in Algorithm 1.

4 Experimental Results

Dataset Cora Citeseer Pubmed Squirrel Chameleon Cornell Texas Wisconsin
 Hom.ratio 0.81 0.74 0.8 0.22 0.23 0.3 0.11 0.21
# Nodes 2708 3327 19717 5201 2277 183 183 251
# Edges 5429 4732 44338 198493 31421 295 309 499
# Features 1433 3703 500 2089 2325 1703 1703 1703
# Classes 7 6 3 5 5 5 5 5
Table 1: Summary of the datasets utilized in our experiments.

Here, we validate the merit of SLGCNs by comparing SLGCNs with some state-of-the-art GNNs on transductive node classification tasks on a variety of open graph datasets.

4.1 Datasets

In simulations, we adopt three common citation networks, three sub-networks of the WebKB networks, and two Wikipedia networks to validate the proposed SLGCN.

Citation networks, i.e., Cora, Citeseer, and Pubmed are standard citation network benchmark datasets (Sen et al., 2008),  (Namata et al., 2012). In these datasets, nodes correspond to papers and edges correspond to citations. Node features represent the bag-of-words representation of the paper, and the label of each node is the academic topics of the paper.

Sub-networks of the WebKB networks , i.e., Cornell, Texas, and Wisconsin. They are collected from various universities’ computer science departments (Pei et al., 2020). In these datasets, nodes correspond to webpages, and edges represent hyperlinks between webpages. Node features denote the bag-of-words representation of webpages. These webpages can be divided into 5 classes.

Wikipedia networks, i.e., Chameleon and Squirrel are page-page networks about specific topics in wikipedia. Nodes correspond to pages, and edges correspond to the mutual links between pages. Node features represent some informative nouns in Wikipedia pages. The nodes are classified into four categories based on the amount of their average traffic.

For all datasets, we randomly split nodes per class into , , and

for training, validation and testing, and the experimental results are the mean and standard deviation of ten runs. Testing is performed when validation losses achieves minimum on each run. An overview summary of characteristics of all datasets are shown in Table 


The level of homophily of graphs is one of the most important characteristics of graphs, it is significant for us to analyze and employ graphs. Here, we utilize the edge homophily ratio to describe the homophily level of a graph, where donate the set of edges, and respectively represent the label of node and . The is the fraction of edges in a graph which linked nodes that have the same class label (i.e., intra-class edges). This definition is proposed in Jiong Zhu et al. (2020). Obviously, graphs have strong homophily when is high (), and graphs have strong heterophily or weak homophily when is low (). The of each graph are listed in Table 1. From the homophily ratios of all adopted graph, we can find that all the citation networks are high homogeneous graphs, and all the WebKB networks and wikipedia networks are low homogeneous graphs.

4.2 Baselines

We compare our proposed SLGCN and SLGCN with various baselines:

MLP (Horniket al., 1991) is the simplest deep neural networks. It just makes prediction based on the feature vectors of nodes, without considering any local or non-local information.

GCN (Kipf & Welling, 2017) is the most common GNN. As introduced in Section 2.2, GCN makes prediction solely aggregating local information.

GAT (Velickovic et al., 2018) is one of the most common GNNs. GAT enable specifying different weights to different nodes in a neighborhood by employing the attention mechanism.

MixHop (Sami Abu-El-Haija, 2019) model can learn relationships by repeatedly mixing feature representations of neighbors at various distances.

Geom-GCN (Pei et al., 2020) is recently proposed and can capture long-range dependencies. Geom-GCN employs three embedding methods, Isomap, struc2vec, and Poincare embedding, which make for three Geom-GCN variants: Geom-GCN-I, Geom-GCN-S, and Geom-GCN-P. In this paper, we report the best results of three Geom-GCN variants without specifying the employed embedding method.

HGCN (Jiong Zhu et al., 2020) identifies a set of key designs: ego and neighbor embedding separation, higher order neighborhoods. HGCN adapts to both heterophily and homophily by effectively synthetizing these designs. We consider two variants: HGCN-1 that uses one embedding round (K = 1) and HGCN-2 that uses two rounds (K = 2).

4.3 Experimental Setup

For comparison, we use six state-of-the-art node classification algorithms, i.e., MLP, GCN, GAT, MixHop, GEOM-GCN, and HGCN. For the above six GNNs, all hyper-parameters are set according to Jiong Zhu et al. (2020). For the proposed SLGCNs, we perform a hyper-parameter search on validation set. The description of hyper-parameters that need to be searched are provided in Table 5 in the supplementary material. Table 6 summaries the hyper-parameters and accuracy of SLGCNs with the best accuracy on different datasets. We utilize RELU as activation function. All models are trained to minimize cross-entropy on the training nodes. Adam optimizer is employed for all models  (Kingma & Ba, 2017).

Dataset Cora Citeseer Pubmed Squirrel Chamele. Cornell Texas Wiscons.
74.71.9 72.11.8 87.11.4 29.42.9 49.42.9 82.65.5 82.33.9 86.52.1
SLGCN 89.60.4 77.32.2 88.51.1 37.13.2 57.52.8 76.68.6 81.26.2 81.85.3
88.81.3 77.22.5 89.71.2 37.54.1 59.42.2 87.44.1 86.54.9 86.74.8
75.22.7 71.72.7 86.90.6 30.21.4 48.73.1 82.45.9 82.14.8 86.22.2
SLGCN 88.70.6 771.5 89.90.8 35.63.0 55.93.7 76.85.9 81.93.2 846.0
89.11.3 76.91.1 89.50.8 35.11.2 57.22.3 85.46.3 85.93.9 87.14.2
Table 2: Classification Accuracies (%) of the Proposed SLGCN and SLGCN for All Graphs

4.4 Does Re-connected Adjacency Matrix Work?

In this experiment, we first examine the effectiveness of structural learning of SLGCNs. Figure 3 shows the homophily ratios of original graph, initialized graph and re-connected graph in WebKB and Wikipedia networks. We can observe that the homophily ratio of re-connected graph are much higher than the original graph and randomly initialized graphs for each network. This is because of the re-connected graphs are constructed in terms of the similarities between nodes, and the weighted parameters involved in similarity learning can be optimized with the training of the model. Therefore, the proposed SLGCNs can aggregate features from nodes with same class by exploiting re-connected adjacent matrix.

Afterwards, we explore the impact of the learned re-connected graph on the accuracy of the proposed SLGCNs by using ablation study. Table 2 exhibits the accuracies of SLGCNs in all adopted networks. We can see that the original adjacency matrix is very important for SLGCN in citation networks. However, the impact of is very limited and even bad in WebKB networks and Wikipedia networks. This is because citation networks are the graphs with high level of homophily, but WebKB networks and Wikipedia networks have low homophily ratios. It is difficult for SLGCNs to aggregate useful information by only using in graphs with low homophily ratio. By contrast, the introduction of re-connected adjacent matrix is helpful for SLGCN in WebKB networks and Wikipedia networks, and does not hurt the performance of SLGCN in citation networks. This is owning to the re-connected graphs is learned according to the similarity of nodes and downstream tasks. Therefore, the learned re-connected graph can adapt to all levels of homophily.

Figure 3: Comparisons of homophily between the original graph, initialized graph and the re-connected graph on Squirrel, Chameleon, Cornell, Texas, and Wisconsin.

4.5 Effect of SC Feature on Accuracy

Figure 4: Ablation study about SC feature on the proposed SLGCNs.
Method Cora Citeseer Pubmed Squirrel Chameleon Cornell Texas Wisconsin
SC 58.1 63.2 OM 98.5 29.3 1.7 1.7 1.8
ESC 1.7 4.6 2.0 2.5 2.2 1.8 1.8 1.9
ESC-ANCH 1.2 0.98 1.4 1.1 0.96 0.96 0.83 0.89
Table 3: Running Time (Seconds) of SC, the Proposed ESC and ESC-ANCH on All Graph Datasets (OM Error)

In this experiment, we explore the impact of SC feature extracted by the proposed ESC-ANCH method on the classification accuracy of SLGCN. Here, we still use ablation study. The classification accuracies of SLGCN

, SLGCN, and SLGCN without SC feature are shown in Figure 4, where all graph datasets are adopted. As can be seen, SLGCN and SLGCN get better performance than SLGCN without SC feature. This owning to the the SC features not only reflect the ego-embedding of nodes but also the similar nodes.

As a further insight, we focus on the running times of original SC, the proposed ESC and ESC-ANCH in all adopted graph datasets. From Table 3

, we can see that ESC and ESC-ANCH are much more efficient than SC. Meanwhile, ESC-ANCH is faster than ESC due to the introduction of anchor nodes. Specifically, ESC-ANCH only takes 1.2 s in Cora dataset, which is 48 times faster than original SC method. Moreover, ESC-ANCH only takes 1.4 s, and original SC can not work in PubMed dataset due to ”out-of-memory (OM) error.” In Squirrel networks, ESC-ANCH takes 1.1 s, which is 90 times faster than original SC method. ESC-ANCH takes about half as long as original SC in WebKB networks. In conclusion, ESC-ANCH is a very efficient SC method.

4.6 Comparison Among Different GNNs

Figure 5: Performance of Multi-layer GCNs and the proposed SLGCN and SLGCN on Cora.

In Figure 5, we implement 3-layer GCN, 4-layer GCN, 5-layer GCN and the proposed SLGCN and SLGCN on Cora dataset. As can be seen, the performance of multi-layer GCNs get worse as the depth increases. This may be caused by over-fitting and over-smoothing. Meanwhile, for SLGCN and SLGCN, the hyper-parameter that controls the rounds of aggregation is . Thus, SLGCNs can aggregate the same features of neighbors as 5-layer GCN. However, SLGCNs perform well and do not suffer from over-fitting. Thus, we can conclude that the increase of GCN layers leads to over-fitting. By contrast, SLGCNs do not need to increase the depth of the network to aggregate more node features. Thus, SLGCNs can be immune to over-fitting no matter how many node features are aggregated.

Table 4 gives classification results per benchmark. We observe that most GNN models can achieve satisfactory result in citation networks. However, some GNN models perform even worse than MLP in WebKB networks and Wikipedia networks, i.e., GCN, GAT, Geom-GCN, and MixHop. The main reason for this phenomenon is these GNN models aggregate useless information from neighborhoods, and do not separate ego-embedding and the useless neighbor-embedding. By contrast, no matter which graphs adopted HGCN and SLGCNs can get good results. Meanwhile, the results of SLGCNs are relatively better than HGCN. This thanks to the re-connected graph of SLGCNs. In addition, the introduction of SC feature can improve the ego-embedding of nodes by clustering similar nodes together.

Method Cora Citeseer Pubmed Squirrel Chameleon Cornell Texas Wisconsin
MLP 74.82.2 72.42.2 86.70.4 29.71.8 46.42.5 81.16.4 81.94.8 85.33.6
GCN 87.31.3 76.71.6 87.40.7 36.91.3 59.04.7 57.04.7 59.55.2 59.87
GAT 82.71.8 75.51.7 84.70.4 30.62.1 54.72.0 58.93.3 58.44.5 55.38.7
Geom-GCN* 85.3 78.0 90.1 38.1 60.9 60.8 67.6 64.1
MixHop 87.60.9 76.31.3 85.30.6 43.81.5 60.52.5 73.56.3 77.87.7 75.94.9
HGCN-1 86.91.4 77.11.6 89.40.3 36.41.9 57.11.6 82.24.8 84.96.8 86.74.7
HGCN-2 87.81.4 76.91.7 89.60.3 37.92.0 59.42.0 82.26.0 82.25.3 85.94.2
SLGCN 89.31.3 77.22.5 89.71.2 37.93.1 59.53.2 87.95.1 86.66.2 87.84.8
SLGCN 89.11.1 772.1 89.50.8 37.62.2 57.92.5 86.36.3 865.9 864.2
Table 4: Classification Accuracies (%) of the MLP, GCN, GAT, Geom-GCN*, MixHop, HGCN and the Proposed SLGCN for All Graphs.

5 Conclusion

In this paper, we propose an effective GNN approach, referred to as SLGCNs. This paper includes three main contributions, compared with other GCNs. 1) SC is integrated into GNNs for capturing long range dependencies on graphs, and we propose a ESC-ANCH algorithm for dealing with graph-structured data, efficiently; 2) The proposed SLGCNs can learn a re-connected adjacency matrix to improve SLGCNs from the aspect of edges; 3) SLGCNs is appropriate for all levels of homophily by combining multiple node embeddings. Experimental results have illustrated the proposed SLGCNs is superior to other existing counterparts. In the future, we plan to explore effective models to handle more challenging scenarios where both graph adjacency matrix and node features are noisy.


  • Kipf & Welling (2017) Thomas, N., and Max, W. Semi-supervised classification with graph convolutional networks. In ICLR, 2017.
  • Velickovic et al. (2018) Velickovic, P., Cucurull, G., Casanova, A., Romero, A., Lio, P., and Bengio, Y. Graph attention networks. In ICLR, 2018.
  • Xu et al. (2018) Xu, K., Hu, W., Leskovec, J., and Jegelka, S. How powerful are graph neural networks? In ICLR, 2019.
  • Hamilton et al. (2017) Hamilton, W., Ying, Z., and Leskovec, J. Inductive representation learning on large graphs. In NIPS, 2017.
  • Gilmer et al. (2017) Gilmer, J., Schoenholz, S. S., Riley, P. F., Vinyals, O., and Dahl, G. E. Neural message passing for quantum chemistry. In ICLR, 2017.
  • Pei et al. (2020) Pei, H., Wei, B., Chang, C., et al. Geom-GCN: Geometric Graph Convolutional Networks. In ICLR, 2020.
  • Corso et al. (2020) Corso, G., Cavalleri, L., Beaini, D. Principal neighbourhood aggregation for graph nets. arXiv preprint arXiv:2004.05718, 2020.
  • Hamilton et al. (2017b) Hamilton, W., Ying, Z., and Leskovec, J. Representation learning on graphs: Methods and applications. IEEE Data Engineering Bulletin, 2017b.
  • Bronstein et al. (2017) Michael M. B., Joan B., Yann L., Arthur S., and Pierre V.

    Geometric deep learning: Going beyond euclidean data.

    IEEE Signal Processing Magazine, Jul 2017.
  • Rong et al. (2020) Rong, Y., Huang, W., Xu, T., and Huang, J. Dropedge:Towards deep graph convolutional networks on node clas-sification In ICLR, 2020.
  • Li et al. (2018) Li, R., Wang, S., Zhu, F., and Huang, J.

    Adaptive graph convolutional neural networks.

    In AAAI, 2018b.
  • Klicpera et al. (2019) Johannes K., Aleksandar B., and Stephan G. Predict then propagate: Graph neural networks meet personalized pagerank. In ICLR, 2019.
  • Nie et al. (2011) Nie, F., Zeng, Z., Tsang, I. W., Xu, D., and Zhang, C. Spectral Embedded Clustering: A Framework for In-Sample and Out-of-Sample Spectral Clustering. IEEE Transactions on Neural Networks, vol. 22, no. 11, Nov. 2011.
  • Scarselli et al. (2020) Dwivedi, V. P., Joshi, C. K., Laurent, T., Bengio, Y., and Bresson, X. Benchmarking graph neural networks. arXiv:2003.00982v3, Jul, 2020.
  • Kingma & Ba (2017) Kingma, Diederik. P., and Ba, J. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
  • Horniket al. (1991) Hornik, K. Approximation capabilities of multilayer feedforward networks. Neural Networks, 1991.
  • Yang et al. (2019) Yang, L., Chen, Z., Gu, J., and Guo, Y. Dual self-paced graph convolutional network: towards reducing attribute distortions induced by topology. IJCAI, 2019.
  • Keyulu et al. (2018) Xu, K., Li, C., Tian, Y., Sonobe, T., Kawarabayashi, K, and Jegelka, S. Representation learning on graphs with jumping knowledge networks. In ICML, 2018a.
  • Newman (2002) Newman, M. Assortative mixing in networks. Physical review letters, 2002.
  • Ribeiro et al. (2017) Leonardo, F. R., Pedro, H. S., and Daniel, R. F. struc2vec: Learning node representations from structural identity. Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 385-394, 2017.
  • Jiong Zhu et al. (2020) Zhu, J., Yan, Y., Zhao, L., Mark Heimann, Leman Akoglu, and Danai Koutra. Beyond homophily in graph neural networks: current limitations and effective designs. In NeurIPS, 2020.
  • Shashank Pandit et al. (2007) Pandit, S., Chau, D. H., Wang, S., and Faloutsos, C. NetProbe: A Fast and Scalable System for Fraud Detection in Online Auction Networks. In Proceedings of the 16th international conference on World Wide Web. ACM, 201-210. 2007.
  • Sen et al. (2008) Prithviraj, S., Galileo, N., Mustafa, B., Lise, G., Brian, G., and Tina, E. Collective classification in network data. AI magazine, 29(3):93-93, 2008.
  • Namata et al. (2012) Namata, G., London, B., Getoor, L., Huang, B. Query-driven active surveying for collective classification. In 10th International Workshop on Mining and Learning with Graphs, pp. 8, 2012.
  • Huang et al. (2018) Huang, W., Zhang, T., Rong, Y., and Huang, J. Adaptive sampling towards fast graph representation learning. In NeurIPS, pp. 4558-4567, 2018.
  • Chen et al. (2018) Chen, J., Ma, T., and Cao, X. Fastgcn: Fast learning with graph convolutional networks via importance sampling. In ICLR, 2018.
  • S. X. Yu et al. (2003) S. X. Yu and J. Shi. Multiclass spectral clustering In Proc. Int. Conf. Comput. Vis, Beijing, China, 2003, pp. 313-319.
  • Chen et al. (2020) Yu Chen, Lingfei Wu, and Mohammed J. Zaki. Iterative deep graph learning for graph neural networks: better and robust node embeddings In NeurIPS, Vancouver, Canada, 2020.
  • Meng Liu (2020) Meng Liu, ZhengyangWang, and Shuiwang Ji. Non-local graph neural networks. arXiv preprint arXiv:2005.14612(2020).
  • Sami Abu-El-Haija (2019) Sami Abu-El-Haija Bryan Perozzi, Amol Kapoor, Hrayr Harutyunyan, Nazanin Alipourfard, Kristina Lerman, Greg Ver Steeg, and Aram Galstyan. MixHop: Higher-Order Graph Convolution Architectures via Sparsified Neighborhood Mixing. In ICML, 2019 .

6 Appendix


Hyper-para. Description
learning rate
weight decay
dropout rate
output dimension of weighted matrix
non-negative threshold in similarity matrix
number of anchor nodes
dimension of SC features
the rounds of aggregation
number of hidden units in GCN layer
random seed
, regularization parameters

Table 5: Hyper-parameter Description
Method Dataset Accuracy Hyper-parameters
Cora 89.31.1 :0.02, :5e-4, :0.5, :16, :0.55, :700, :75, :4, :32, = 42, :0.01, :0.1
Citeseer 77.22.5 :0.02, :5e-4, :0.4, :16, :0.05, :500, :30, :3, :32, = 42, :0.01, :0.1
Pubmed 89.71.2 :0.02, :5e-4, :0.5, :16, :0.25, :500, :25, :2, :64, = 42, :0, :0

Squirrel 37.93.1 :0.01, :5e-4, :0.4, :16, :0.5, :500, :15, :2, :32, = 42, :0.03, :0.3

Chameleon 59.53.2 :0.005, :5e-4, :0.6, :16, :0.5, :500, :25, :2, :32, = 42, :0.5, :0.3

Cornell 87.95.1 :0.01, :5e-4, :0.4, :50, :0.05, :100, :30, :1, :32, = 42, :0.1, :0.1
Texas 86.66.2 :0.005, :5e-4, :0.3, :40, :0.15, :100, :35, :1, :32, = 42, :0.2, :0.5
Wisconsin 87.84.9 :0.01, :5e-4, :0.3, :40, :0.1, :100, :25, :1, :32, = 42, :0.2, :0.1
Cora 89.11.3 :0.02, :5e-4, :0.4, :16, :0.55, :700, :75, :4, :32, = 42, :0.01, :0.1
Citeseer 772.1 :0.02, :5e-4, :0.4, :16, :0.05, :500, :30, :3, :32, = 42, :0.01, :0.1
Pubmed 89.71.2 :0.02, :5e-4, :0.5, :16, :0.25, :500, :25, :2, :64, = 42, :0, :0

Squirrel 37.62.2 :0.01, :5e-4, :0.4, :16, :0.5, :500, :15, :2, :32, = 42, :0.03, :0.3

Chameleon 57.52.5 :0.005, :5e-4, :0.6, :16, :0.5, :500, :25, :2, :32, = 42, :0.5, :0.3

Cornell 86.36.3 :0.01, :5e-4, :0.4, :50, :0.05, :100, :50, :1, :32, = 42, :0.1, :0.9
Texas 86.05.9 :0.005, :5e-4, :0.3, :40, :0.15, :100, :35, :1, :32, = 42, :0.2, :0.5
Wisconsin 864.9 :0.01, :5e-4, :0.3, :40, :0.1, :100, :25, :1, :32, = 42, :0.2, :0.1

Table 6: The hyper-parameters of best accuracy for SLGCNs on all datasets.