1 Introduction
Graph Neural Networks (GNNs) are powerful for representation learning on graphs with varieties of applications ranging from knowledge graphs to financial networks
(Gilmer et al., 2017; Scarselli et al., 2020; Bronstein et al., 2017). In recent years, GNNs have developed many artificial neural networks, e.g., Graph Convolutional Network (GCN) (Kipf & Welling, 2017), Graph Attention Network (GAT) (Velickovic et al., 2018), Graph Isomorphism Network (GIN) (Xu et al., 2018), and GraphSAGE (Hamilton et al., 2017). For GNNs, each node can iteratively update its feature representation via aggregating the ones of the node itself and its neighbors (Kipf & Welling, 2017). The neighbors are usually defined as all adjacent nodes in graph, meanwhile a diversity of aggregation functions can be adopted to GNNs (Pei et al., 2020; Corso et al., 2020), e.g., summation, maximum, and mean.GCNs is one of the most attractive GNNs and have been widely applied in a variety of scenarios (Pei et al., 2020; Hamilton et al., 2017b; Bronstein et al., 2017). However, one fundamental weakness of GCNs limits the representation ability of GCNs on graphstructured data, i.e., GCNs may not capture longrange dependencies in graphs, considering that GCNs update the feature representations of nodes via simply summing the normalized feature representations from all onehop neighbors (Velickovic et al., 2018; Pei et al., 2020). Furthermore, this weakness will be magnified in graphs with heterophily or low/medium levels of homophily.
Homophily is a very important principle of many realworld graphs, whereby the linked nodes tend to have similar features and belong to the same class have similar features (Jiong Zhu et al., 2020). For instance, papers are more likely to cite papers from the same research area, and friends tend to have similar age or political beliefs. However, there are also settings about ”opposites attract” in the real world, leading to graphs with low homogeneity, i.e., the proximal nodes are usually from different classes and have dissimilar features (Jiong Zhu et al., 2020). For example, most of people tend to chat with people of the opposite gender in dating website, and fraudsters are more prefer to contact with accomplices than other fraudsters in online gambling networks. Most existing GNNs assume strong homophily, including GCNs, therefore they perform poorly on generalizing graphs under high heterophily even worse than the MLP (Horniket al., 1991) that relies only on the node features for classification (Shashank Pandit et al., 2007; Jiong Zhu et al., 2020).
For addressing the mentioned weakness, now the straightforward strategy is to utilize a multilayer GCN to aggregate the features from distant nodes, while this strategy may cause oversmoothing and overfitting (Rong et al., 2020). Recently some related approaches has been bulit to alleviating this curse, such as Geometric Graph Convolutional Network (GEOMGCN) (Pei et al., 2020), HGCN (Jiong Zhu et al., 2020). Although GEOMGCN improves the performance of representation learning of GCNs, the performance of node classification is often unsatisfactory if the concerned datasets is the graphs with low homogeneity (Pei et al., 2020). HGCN improves the classification performance of GCN, while it is only able to aggregate information from near nodes, resulting in lacking an ability for capturing the features from distant but similar nodes. Nevertheless, it is notable that HGCN still has a lot of room for improvement.
In this paper, we propose a novel GNN approach to solve the above mentioned problem, referred to as the structure learning graph convolutional networks (SLGCNs). Following the spectral clustering (SC) method (Nie et al., 2011), the nodes of graph are mapped into a new feature space, furthermore, the nodes connected closely or possess similar features in graph are usually proximal in new feature space. Therefore, nodes can aggregate features from similar nodes if SC is employed to process graphstructured data, contributing to that GCN can capture longrange dependencies. It should be noted, however, the computational complexity of SC is greatly high for largescale graphs, hence we design a efficientspectralclustering with anchors (ESCANCH) method to efficiently extract SC features. Then, the extracted SC features combined with the original node features as enhanced features (EF), and the EF is utilized to train GNNs.
Following the research in Jiong Zhu et al. (2020)
, the nodes of the same class always possess highly similar features, no matter whether the homophily of graph is high or low. For our approach, it can learn a reconnected graph related to the similarities between nodes and optimized for the downstream prediction task, and afterwards it uses the original adjacency matrix and the reconnected adjacency matrix to obtain multiple intermediate representations corresponding to different rounds of aggregation. Then, we combine several key intermediate representations as the new node embedding, and use a learnable weighted vector to highlight the important dimensions of the new node embedding. Finally, we set the result of this calculation as the final node embedding for node classification.
Compared with other GNNs, the contributions of SLGCNs can be summarized as follows. 1) SC is integrated into GNNs for capturing long range dependencies on graphs, and a ESCANCH alogrithm is proposed to efficiently implement SC on graphstructured data; 2) Our SLGCNs can learn the reconnected adjacency matrix which not only relates to the node similarity but also benefit the downstream prediction task; 3) The SLGCNs proposes the improvements for handling heterophily from the aspects of node features and edges, respectively. Meanwhile, SLGCNs combines the two improvements and makes them supplement each other.
2 Related Work
In this paper, our work is directly related to SC and GCN. The theories about the SC and GCN are reviewed in subsection 2.1 and subsection 2.2, respectively.
2.1 Spectral Clustering
Spectral clustering (SC) is an advanced algorithm evolved from graph theory, has attracted much attention (Nie et al., 2011). Compared with most traditional clustering methods, the implementation of SC is much simpler (Nie et al., 2011). It is remarkable that, for SC, a weighted graph is utilized to partition the dataset. Assume that represents a dataset. The task of clustering is to segment into clusters. The cluster assignment matrix is denoted as , where is the cluster assignment vector for the pattern . From another perspective, can be considered as the feature representation of in the dimensional feature space. SC contains different types, while our work focuses on the SC with way normalized cut due to considering the overall performance of the algorithm, where the related concepts are explained in S. X. Yu et al. (2003).
Let be an undirected weighted graph, where denotes the set of nodes, and
denotes affinity matrix, and
is the number of nodes in . Note that is symmetric matrix, and each element represents the affinity of a pair of nodes in . The most common way to built is full connection method. Following the description in Nie et al. (2011), can be represented as:(1) 
where and represent the features of th node and th node in , respectively, and can control the degree of similarity between nodes. Then, the Laplacian matrix is defined as , where degree matrix is a diagonal matrix, and the diagonal element is denoted as .
The objective function of SC with normalized cut is
(2) 
where is the clustering indicator matrix, and the objective function can be rewritten as:
(3) 
The optimal solution
of objective function is constructed by the eigenvectors corresponding to the
largest eigenvalues of
. In general, can be not only considered as the result of clustering of nodes, but also regarded as the new feature matrix of nodes, in which node has feature elements.2.2 Gcn
Following the work in Scarselli et al. (2020), in GCN, the updates of node features adopt an isotropic averaging operation over the feature representations of onehop neighbors. Let be the feature representation of node in the th GCN layer, and we have
(4) 
where represents the set of onehop neighbors of nodes , is weight matrix, and
is employed as the activation function. Note that
and are the indegrees of th node and th node, respectively. Furthermore, the forward model of a 2layer GCN can be represented as:(5) 
where is the feature matrix of nodes and is also the input of first GCN layer, and is the adjacency matrix. The adjacency matrix with selfloop is , where
is an identity matrix. Here, the adjacency matrix with selfloop can be normalized by
, and the normalized adjacency matrix is employed for aggregating the representation of neighbors, where is the degree matrix of . Each element is defined as:(6) 
3 Proposed SLGCN Approach
In this section, we present a novel GNN: structure learning graph convolutional networks (SLGCNs) for node classification of graphstructured data. The pipeline of SLGCNs is shown in Figure 1. The remainder of this section is organized as follows. Subsection 3.1 describes the proposed ESCANCH approach. In subsection 3.2, we give the generation details of the Reconnected adjacency matrix. Subsection 3.3 describes the proposed SLGCNs in detail.
3.1 Efficient Spectral Clustering with Anchors
In graphstructured data, a large number of nodes of the same class possess similar feature but are far apart from each other. However, GCN simply aggregates the information from onehop neighbors, and the depth of GCN is usually limited. Therefore, the information from distant but similar nodes is always ignored, while SC can divide nodes according to the affinities between nodes. Specifically, the closely connected and similar nodes are more likely to be proximal in new feature space, and vice versa. Thus, it is very appropriate for combining GCN with SC to extract the features of distant but similar nodes. Following subsection 2.1, the object of performing SC is to generate the cluster assignment matrix . can only be calculated by eigenvalue decomposition on the normalized similar matrix , which takes time complexity, where and are the the numbers of nodes and clusters, respectively. For some largescale graphs, the computational complexity is an unbearable burden.
In order to overcome this problem, we propose the efficient spectral clustering (ESC) to efficiently perform SC. In the proposed ESC, instead of the S is calculated by equation (1), we employ inner product to construct affinity matrix . Thus, the normalized similar matrix in the ESC method can be represented as:
(7)  
Then, we define , thus
. The singular value decomposition (SVD) of
can be represent as follows:(8) 
where , and
are left singular vector matrix, singular value matrix and right singular vector matrix, respectively. It is obvious that the column vectors of
are the eigenvectors of . Therefore, we can easily construct by using the eigenvectors from , and the eigenvectors used are corresponding to the largest eigenvalues in . The computational complexity of SVD performed on () is much lower, compared with directly performing eigenvalue decomposition on ().However, the dimension of nodes’ original features are usually high in many graphstructured data, the efficiency of the ESC method still needs to be improved.
Therefore, we propose the ESCANCH. Specifically, we first randomly select nodes from the set of nodes as the anchor nodes, where and . Then, we calculate the nodeanchor similar matrix , and the chosen similarity metric function is cosine. Here, is employed as the new feature representations of nodes. Thus, in ESCANCH, can be redefined as , and D is the degree matrix of . Therefore, the ESCANCH takes time complexity, which is much lower than SC and ESC.
3.2 Reconnected Graph
Most existing GNNs are designed for graphs with high homophily, where the linked nodes are more likely possess similar feature representation and belong to the same class, e.g., community networks and citation networks (Newman, 2002).
However, there are a large number of realworld graphs under heterophily, where the linked nodes usually possess dissimilar feature and belong to different classes, e.g., webpage linking networks (Ribeiro et al., 2017). Thus, these GNNs designed under the assumption of homophily are greatly inappropriate for the graphs under heterophily.
Fortunately, regardless of the homophily level of graphs, the nodes of the same class always possess high similar feature. Therefore, as shown in Figure 2, for helping GCN to capture information from nodes in the same class, we learn a reconnected adjacency matrix according to the similarities between nodes and the downstream tasks.
A good similarity metric function should be expressively powerful and learnable (Chen et al., 2020)
. Here, we set cosine similarity as our metric function. The similarity between a pair of nodes can be represented as:
(9) 
where and are the feature representations of node and , and is a learnable weighted vector. Afterwards, we can generate the similarity matrix . The generated similarity matrix S is symmetric, and the elements in S range between .
However, an adjacency matrix is supposed to be nonnegative and sparse. This is owning to the fully connected graph is very computational and might introduce noise. Therefore, we need to extract a nonnegative and sparse adjacency matrix from . Specifically, we define a nonnegative threshold , and set those elements in which are smaller than to zero.
3.3 Structure Learning Graph Convolutional Networks
We first utilize original feature and SC feature to construct the EF. The first layer of the proposed SLGCNs is represented as:
(10) 
where , are trainable weight matrix of the first layer. Meanwhile, we can also use concatenation to combine and .
(11) 
where represents concatenation. Here, we call the EF as EF if average methd is adopted, and call EF as EF if concatenation method is adopted.
After the first layer is constructed, We use to aggregate and update features of nodes to obtain the intermediate representations:
(12) 
where is the row normalized . Similarly, we use to aggregate and update features to obtain the intermediate representations of nodes.
(13) 
where , and represents the times of feature aggregation. . After several rounds of feature aggregation, we combine several most key intermediate representations as the new node embeddings as follows:
(14) 
For graphs with high homophily, and are sufficient for representing embeddings of nodes. This can be proved by GCN (Kipf & Welling, 2017) and GAT (Velickovic et al., 2018). In addition, can be treated as the supplement to and . For graphs with under heterophily, and can also perform well on learning feature representation. In order to exploit the advantage of these intermediate representations to the full, we use concatenation to combine these intermediate representations. The intermediate representations adopted are not mixed, by the way of concatenation.
Afterwards, we generate a learnable weight vector which has the same dimension as , and take the Hadamard product between and as the final feature representations of nodes,
(15) 
The purpose of this step is highlight the important dimensions of
. Afterward, nodes are classified based on their final embedding
as follows:(16) 
where is trainable weighted matrix of the last layer. The classification loss is as follows:
(17) 
where Y represent the labels of nodes, and is crossentropy function for measuring the difference between predictions and the true labels .
Since the reconnected graph learning enable the SLGCNs possess stronger ability to fit the downstream task, SLGCNs is more likely to suffer from overfitting. Thus, we should apply a regularization term to the learned graph.
Since the reconnected graph is learned according to the similarity between nodes, the homophily of reconnected graph is high. In addition to the homophily, the connectivity and sparsity of is also important (Chen et al., 2020). Thus, the regularization term is defined as follows:
(18) 
where and
are nonnegative hyperparameters. The first term encourages
to be connected via the logarithmic barrier, and the second term encourages to be sparse (Chen et al., 2020).Ultimately, the loss of SLGCNs is as follows:
(19) 
In this paper, the SLGCN that uses EF is called as SLGCN, and similarly, the SLGCN that uses EF is named as SLGCN. A pseudocode of the proposed SLGCNs is given in Algorithm 1.
4 Experimental Results
Dataset  Cora  Citeseer  Pubmed  Squirrel  Chameleon  Cornell  Texas  Wisconsin 

Hom.ratio  0.81  0.74  0.8  0.22  0.23  0.3  0.11  0.21 
# Nodes  2708  3327  19717  5201  2277  183  183  251 
# Edges  5429  4732  44338  198493  31421  295  309  499 
# Features  1433  3703  500  2089  2325  1703  1703  1703 
# Classes  7  6  3  5  5  5  5  5 
Here, we validate the merit of SLGCNs by comparing SLGCNs with some stateoftheart GNNs on transductive node classification tasks on a variety of open graph datasets.
4.1 Datasets
In simulations, we adopt three common citation networks, three subnetworks of the WebKB networks, and two Wikipedia networks to validate the proposed SLGCN.
Citation networks, i.e., Cora, Citeseer, and Pubmed are standard citation network benchmark datasets (Sen et al., 2008), (Namata et al., 2012). In these datasets, nodes correspond to papers and edges correspond to citations. Node features represent the bagofwords representation of the paper, and the label of each node is the academic topics of the paper.
Subnetworks of the WebKB networks , i.e., Cornell, Texas, and Wisconsin. They are collected from various universities’ computer science departments (Pei et al., 2020). In these datasets, nodes correspond to webpages, and edges represent hyperlinks between webpages. Node features denote the bagofwords representation of webpages. These webpages can be divided into 5 classes.
Wikipedia networks, i.e., Chameleon and Squirrel are pagepage networks about specific topics in wikipedia. Nodes correspond to pages, and edges correspond to the mutual links between pages. Node features represent some informative nouns in Wikipedia pages. The nodes are classified into four categories based on the amount of their average traffic.
For all datasets, we randomly split nodes per class into , , and
for training, validation and testing, and the experimental results are the mean and standard deviation of ten runs. Testing is performed when validation losses achieves minimum on each run. An overview summary of characteristics of all datasets are shown in Table
1.The level of homophily of graphs is one of the most important characteristics of graphs, it is significant for us to analyze and employ graphs. Here, we utilize the edge homophily ratio to describe the homophily level of a graph, where donate the set of edges, and respectively represent the label of node and . The is the fraction of edges in a graph which linked nodes that have the same class label (i.e., intraclass edges). This definition is proposed in Jiong Zhu et al. (2020). Obviously, graphs have strong homophily when is high (), and graphs have strong heterophily or weak homophily when is low (). The of each graph are listed in Table 1. From the homophily ratios of all adopted graph, we can find that all the citation networks are high homogeneous graphs, and all the WebKB networks and wikipedia networks are low homogeneous graphs.
4.2 Baselines
We compare our proposed SLGCN and SLGCN with various baselines:
MLP (Horniket al., 1991) is the simplest deep neural networks. It just makes prediction based on the feature vectors of nodes, without considering any local or nonlocal information.
GCN (Kipf & Welling, 2017) is the most common GNN. As introduced in Section 2.2, GCN makes prediction solely aggregating local information.
GAT (Velickovic et al., 2018) is one of the most common GNNs. GAT enable specifying different weights to different nodes in a neighborhood by employing the attention mechanism.
MixHop (Sami AbuElHaija, 2019) model can learn relationships by repeatedly mixing feature representations of neighbors at various distances.
GeomGCN (Pei et al., 2020) is recently proposed and can capture longrange dependencies. GeomGCN employs three embedding methods, Isomap, struc2vec, and Poincare embedding, which make for three GeomGCN variants: GeomGCNI, GeomGCNS, and GeomGCNP. In this paper, we report the best results of three GeomGCN variants without specifying the employed embedding method.
HGCN (Jiong Zhu et al., 2020) identifies a set of key designs: ego and neighbor embedding separation, higher order neighborhoods. HGCN adapts to both heterophily and homophily by effectively synthetizing these designs. We consider two variants: HGCN1 that uses one embedding round (K = 1) and HGCN2 that uses two rounds (K = 2).
4.3 Experimental Setup
For comparison, we use six stateoftheart node classification algorithms, i.e., MLP, GCN, GAT, MixHop, GEOMGCN, and HGCN. For the above six GNNs, all hyperparameters are set according to Jiong Zhu et al. (2020). For the proposed SLGCNs, we perform a hyperparameter search on validation set. The description of hyperparameters that need to be searched are provided in Table 5 in the supplementary material. Table 6 summaries the hyperparameters and accuracy of SLGCNs with the best accuracy on different datasets. We utilize RELU as activation function. All models are trained to minimize crossentropy on the training nodes. Adam optimizer is employed for all models (Kingma & Ba, 2017).
Dataset  Cora  Citeseer  Pubmed  Squirrel  Chamele.  Cornell  Texas  Wiscons.  

–  –  74.71.9  72.11.8  87.11.4  29.42.9  49.42.9  82.65.5  82.33.9  86.52.1  
SLGCN  ✓  –  89.60.4  77.32.2  88.51.1  37.13.2  57.52.8  76.68.6  81.26.2  81.85.3 
✓  ✓  88.81.3  77.22.5  89.71.2  37.54.1  59.42.2  87.44.1  86.54.9  86.74.8  
–  –  75.22.7  71.72.7  86.90.6  30.21.4  48.73.1  82.45.9  82.14.8  86.22.2  
SLGCN  ✓  –  88.70.6  771.5  89.90.8  35.63.0  55.93.7  76.85.9  81.93.2  846.0 
✓  ✓  89.11.3  76.91.1  89.50.8  35.11.2  57.22.3  85.46.3  85.93.9  87.14.2 
4.4 Does Reconnected Adjacency Matrix Work?
In this experiment, we first examine the effectiveness of structural learning of SLGCNs. Figure 3 shows the homophily ratios of original graph, initialized graph and reconnected graph in WebKB and Wikipedia networks. We can observe that the homophily ratio of reconnected graph are much higher than the original graph and randomly initialized graphs for each network. This is because of the reconnected graphs are constructed in terms of the similarities between nodes, and the weighted parameters involved in similarity learning can be optimized with the training of the model. Therefore, the proposed SLGCNs can aggregate features from nodes with same class by exploiting reconnected adjacent matrix.
Afterwards, we explore the impact of the learned reconnected graph on the accuracy of the proposed SLGCNs by using ablation study. Table 2 exhibits the accuracies of SLGCNs in all adopted networks. We can see that the original adjacency matrix is very important for SLGCN in citation networks. However, the impact of is very limited and even bad in WebKB networks and Wikipedia networks. This is because citation networks are the graphs with high level of homophily, but WebKB networks and Wikipedia networks have low homophily ratios. It is difficult for SLGCNs to aggregate useful information by only using in graphs with low homophily ratio. By contrast, the introduction of reconnected adjacent matrix is helpful for SLGCN in WebKB networks and Wikipedia networks, and does not hurt the performance of SLGCN in citation networks. This is owning to the reconnected graphs is learned according to the similarity of nodes and downstream tasks. Therefore, the learned reconnected graph can adapt to all levels of homophily.
4.5 Effect of SC Feature on Accuracy
Method  Cora  Citeseer  Pubmed  Squirrel  Chameleon  Cornell  Texas  Wisconsin 

SC  58.1  63.2  OM  98.5  29.3  1.7  1.7  1.8 
ESC  1.7  4.6  2.0  2.5  2.2  1.8  1.8  1.9 
ESCANCH  1.2  0.98  1.4  1.1  0.96  0.96  0.83  0.89 
In this experiment, we explore the impact of SC feature extracted by the proposed ESCANCH method on the classification accuracy of SLGCN. Here, we still use ablation study. The classification accuracies of SLGCN
, SLGCN, and SLGCN without SC feature are shown in Figure 4, where all graph datasets are adopted. As can be seen, SLGCN and SLGCN get better performance than SLGCN without SC feature. This owning to the the SC features not only reflect the egoembedding of nodes but also the similar nodes.As a further insight, we focus on the running times of original SC, the proposed ESC and ESCANCH in all adopted graph datasets. From Table 3
, we can see that ESC and ESCANCH are much more efficient than SC. Meanwhile, ESCANCH is faster than ESC due to the introduction of anchor nodes. Specifically, ESCANCH only takes 1.2 s in Cora dataset, which is 48 times faster than original SC method. Moreover, ESCANCH only takes 1.4 s, and original SC can not work in PubMed dataset due to ”outofmemory (OM) error.” In Squirrel networks, ESCANCH takes 1.1 s, which is 90 times faster than original SC method. ESCANCH takes about half as long as original SC in WebKB networks. In conclusion, ESCANCH is a very efficient SC method.
4.6 Comparison Among Different GNNs
In Figure 5, we implement 3layer GCN, 4layer GCN, 5layer GCN and the proposed SLGCN and SLGCN on Cora dataset. As can be seen, the performance of multilayer GCNs get worse as the depth increases. This may be caused by overfitting and oversmoothing. Meanwhile, for SLGCN and SLGCN, the hyperparameter that controls the rounds of aggregation is . Thus, SLGCNs can aggregate the same features of neighbors as 5layer GCN. However, SLGCNs perform well and do not suffer from overfitting. Thus, we can conclude that the increase of GCN layers leads to overfitting. By contrast, SLGCNs do not need to increase the depth of the network to aggregate more node features. Thus, SLGCNs can be immune to overfitting no matter how many node features are aggregated.
Table 4 gives classification results per benchmark. We observe that most GNN models can achieve satisfactory result in citation networks. However, some GNN models perform even worse than MLP in WebKB networks and Wikipedia networks, i.e., GCN, GAT, GeomGCN, and MixHop. The main reason for this phenomenon is these GNN models aggregate useless information from neighborhoods, and do not separate egoembedding and the useless neighborembedding. By contrast, no matter which graphs adopted HGCN and SLGCNs can get good results. Meanwhile, the results of SLGCNs are relatively better than HGCN. This thanks to the reconnected graph of SLGCNs. In addition, the introduction of SC feature can improve the egoembedding of nodes by clustering similar nodes together.
Method  Cora  Citeseer  Pubmed  Squirrel  Chameleon  Cornell  Texas  Wisconsin 

MLP  74.82.2  72.42.2  86.70.4  29.71.8  46.42.5  81.16.4  81.94.8  85.33.6 
GCN  87.31.3  76.71.6  87.40.7  36.91.3  59.04.7  57.04.7  59.55.2  59.87 
GAT  82.71.8  75.51.7  84.70.4  30.62.1  54.72.0  58.93.3  58.44.5  55.38.7 
GeomGCN*  85.3  78.0  90.1  38.1  60.9  60.8  67.6  64.1 
MixHop  87.60.9  76.31.3  85.30.6  43.81.5  60.52.5  73.56.3  77.87.7  75.94.9 
HGCN1  86.91.4  77.11.6  89.40.3  36.41.9  57.11.6  82.24.8  84.96.8  86.74.7 
HGCN2  87.81.4  76.91.7  89.60.3  37.92.0  59.42.0  82.26.0  82.25.3  85.94.2 
SLGCN  89.31.3  77.22.5  89.71.2  37.93.1  59.53.2  87.95.1  86.66.2  87.84.8 
SLGCN  89.11.1  772.1  89.50.8  37.62.2  57.92.5  86.36.3  865.9  864.2 
5 Conclusion
In this paper, we propose an effective GNN approach, referred to as SLGCNs. This paper includes three main contributions, compared with other GCNs. 1) SC is integrated into GNNs for capturing long range dependencies on graphs, and we propose a ESCANCH algorithm for dealing with graphstructured data, efficiently; 2) The proposed SLGCNs can learn a reconnected adjacency matrix to improve SLGCNs from the aspect of edges; 3) SLGCNs is appropriate for all levels of homophily by combining multiple node embeddings. Experimental results have illustrated the proposed SLGCNs is superior to other existing counterparts. In the future, we plan to explore effective models to handle more challenging scenarios where both graph adjacency matrix and node features are noisy.
References
 Kipf & Welling (2017) Thomas, N., and Max, W. Semisupervised classification with graph convolutional networks. In ICLR, 2017.
 Velickovic et al. (2018) Velickovic, P., Cucurull, G., Casanova, A., Romero, A., Lio, P., and Bengio, Y. Graph attention networks. In ICLR, 2018.
 Xu et al. (2018) Xu, K., Hu, W., Leskovec, J., and Jegelka, S. How powerful are graph neural networks? In ICLR, 2019.
 Hamilton et al. (2017) Hamilton, W., Ying, Z., and Leskovec, J. Inductive representation learning on large graphs. In NIPS, 2017.
 Gilmer et al. (2017) Gilmer, J., Schoenholz, S. S., Riley, P. F., Vinyals, O., and Dahl, G. E. Neural message passing for quantum chemistry. In ICLR, 2017.
 Pei et al. (2020) Pei, H., Wei, B., Chang, C., et al. GeomGCN: Geometric Graph Convolutional Networks. In ICLR, 2020.
 Corso et al. (2020) Corso, G., Cavalleri, L., Beaini, D. Principal neighbourhood aggregation for graph nets. arXiv preprint arXiv:2004.05718, 2020.
 Hamilton et al. (2017b) Hamilton, W., Ying, Z., and Leskovec, J. Representation learning on graphs: Methods and applications. IEEE Data Engineering Bulletin, 2017b.

Bronstein et al. (2017)
Michael M. B., Joan B., Yann L., Arthur S., and Pierre V.
Geometric deep learning: Going beyond euclidean data.
IEEE Signal Processing Magazine, Jul 2017.  Rong et al. (2020) Rong, Y., Huang, W., Xu, T., and Huang, J. Dropedge:Towards deep graph convolutional networks on node classification In ICLR, 2020.

Li et al. (2018)
Li, R., Wang, S., Zhu, F., and Huang, J.
Adaptive graph convolutional neural networks.
In AAAI, 2018b.  Klicpera et al. (2019) Johannes K., Aleksandar B., and Stephan G. Predict then propagate: Graph neural networks meet personalized pagerank. In ICLR, 2019.
 Nie et al. (2011) Nie, F., Zeng, Z., Tsang, I. W., Xu, D., and Zhang, C. Spectral Embedded Clustering: A Framework for InSample and OutofSample Spectral Clustering. IEEE Transactions on Neural Networks, vol. 22, no. 11, Nov. 2011.
 Scarselli et al. (2020) Dwivedi, V. P., Joshi, C. K., Laurent, T., Bengio, Y., and Bresson, X. Benchmarking graph neural networks. arXiv:2003.00982v3, Jul, 2020.
 Kingma & Ba (2017) Kingma, Diederik. P., and Ba, J. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
 Horniket al. (1991) Hornik, K. Approximation capabilities of multilayer feedforward networks. Neural Networks, 1991.
 Yang et al. (2019) Yang, L., Chen, Z., Gu, J., and Guo, Y. Dual selfpaced graph convolutional network: towards reducing attribute distortions induced by topology. IJCAI, 2019.
 Keyulu et al. (2018) Xu, K., Li, C., Tian, Y., Sonobe, T., Kawarabayashi, K, and Jegelka, S. Representation learning on graphs with jumping knowledge networks. In ICML, 2018a.
 Newman (2002) Newman, M. Assortative mixing in networks. Physical review letters, 2002.
 Ribeiro et al. (2017) Leonardo, F. R., Pedro, H. S., and Daniel, R. F. struc2vec: Learning node representations from structural identity. Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 385394, 2017.
 Jiong Zhu et al. (2020) Zhu, J., Yan, Y., Zhao, L., Mark Heimann, Leman Akoglu, and Danai Koutra. Beyond homophily in graph neural networks: current limitations and effective designs. In NeurIPS, 2020.
 Shashank Pandit et al. (2007) Pandit, S., Chau, D. H., Wang, S., and Faloutsos, C. NetProbe: A Fast and Scalable System for Fraud Detection in Online Auction Networks. In Proceedings of the 16th international conference on World Wide Web. ACM, 201210. 2007.
 Sen et al. (2008) Prithviraj, S., Galileo, N., Mustafa, B., Lise, G., Brian, G., and Tina, E. Collective classification in network data. AI magazine, 29(3):9393, 2008.
 Namata et al. (2012) Namata, G., London, B., Getoor, L., Huang, B. Querydriven active surveying for collective classification. In 10th International Workshop on Mining and Learning with Graphs, pp. 8, 2012.
 Huang et al. (2018) Huang, W., Zhang, T., Rong, Y., and Huang, J. Adaptive sampling towards fast graph representation learning. In NeurIPS, pp. 45584567, 2018.
 Chen et al. (2018) Chen, J., Ma, T., and Cao, X. Fastgcn: Fast learning with graph convolutional networks via importance sampling. In ICLR, 2018.
 S. X. Yu et al. (2003) S. X. Yu and J. Shi. Multiclass spectral clustering In Proc. Int. Conf. Comput. Vis, Beijing, China, 2003, pp. 313319.
 Chen et al. (2020) Yu Chen, Lingfei Wu, and Mohammed J. Zaki. Iterative deep graph learning for graph neural networks: better and robust node embeddings In NeurIPS, Vancouver, Canada, 2020.
 Meng Liu (2020) Meng Liu, ZhengyangWang, and Shuiwang Ji. Nonlocal graph neural networks. arXiv preprint arXiv:2005.14612(2020).
 Sami AbuElHaija (2019) Sami AbuElHaija Bryan Perozzi, Amol Kapoor, Hrayr Harutyunyan, Nazanin Alipourfard, Kristina Lerman, Greg Ver Steeg, and Aram Galstyan. MixHop: HigherOrder Graph Convolution Architectures via Sparsified Neighborhood Mixing. In ICML, 2019 .
6 Appendix
MORE DETAILS IN EXPERIMENTS
Hyperpara.  Description 

learning rate  
weight decay  
dropout rate  
output dimension of weighted matrix  
nonnegative threshold in similarity matrix  
number of anchor nodes  
dimension of SC features  
the rounds of aggregation  
number of hidden units in GCN layer  
random seed  
,  regularization parameters 

Method  Dataset  Accuracy  Hyperparameters 

Cora  89.31.1  :0.02, :5e4, :0.5, :16, :0.55, :700, :75, :4, :32, = 42, :0.01, :0.1  
Citeseer  77.22.5  :0.02, :5e4, :0.4, :16, :0.05, :500, :30, :3, :32, = 42, :0.01, :0.1  
Pubmed  89.71.2  :0.02, :5e4, :0.5, :16, :0.25, :500, :25, :2, :64, = 42, :0, :0  

Squirrel  37.93.1  :0.01, :5e4, :0.4, :16, :0.5, :500, :15, :2, :32, = 42, :0.03, :0.3 
SLGCN 
Chameleon  59.53.2  :0.005, :5e4, :0.6, :16, :0.5, :500, :25, :2, :32, = 42, :0.5, :0.3 

Cornell  87.95.1  :0.01, :5e4, :0.4, :50, :0.05, :100, :30, :1, :32, = 42, :0.1, :0.1 
Texas  86.66.2  :0.005, :5e4, :0.3, :40, :0.15, :100, :35, :1, :32, = 42, :0.2, :0.5  
Wisconsin  87.84.9  :0.01, :5e4, :0.3, :40, :0.1, :100, :25, :1, :32, = 42, :0.2, :0.1  
Cora  89.11.3  :0.02, :5e4, :0.4, :16, :0.55, :700, :75, :4, :32, = 42, :0.01, :0.1  
Citeseer  772.1  :0.02, :5e4, :0.4, :16, :0.05, :500, :30, :3, :32, = 42, :0.01, :0.1  
Pubmed  89.71.2  :0.02, :5e4, :0.5, :16, :0.25, :500, :25, :2, :64, = 42, :0, :0  

Squirrel  37.62.2  :0.01, :5e4, :0.4, :16, :0.5, :500, :15, :2, :32, = 42, :0.03, :0.3 
SLGCN 
Chameleon  57.52.5  :0.005, :5e4, :0.6, :16, :0.5, :500, :25, :2, :32, = 42, :0.5, :0.3 

Cornell  86.36.3  :0.01, :5e4, :0.4, :50, :0.05, :100, :50, :1, :32, = 42, :0.1, :0.9 
Texas  86.05.9  :0.005, :5e4, :0.3, :40, :0.15, :100, :35, :1, :32, = 42, :0.2, :0.5  
Wisconsin  864.9  :0.01, :5e4, :0.3, :40, :0.1, :100, :25, :1, :32, = 42, :0.2, :0.1  

Comments
There are no comments yet.