I Introduction
Graph neural network (GNN) has emerged as a powerful technique for the representation learning of graphs
[57, 59]. Current GNN models usually follow a neighborhood aggregation scheme, where the feature representation of a node is updated by recursively aggregating representation vectors of its neighbors
[31, 52]. Benefited from the promising ability to model graph structural data, traditional learning problems can be revisited from the perspective of graph representation learning, ranging from recommendation [13]and computer vision
[36, 5, 11], to combinatorial optimization
[16, 50].In recent years, there is increasing attention for generalizing convolution to the graph domain. The existing graph convolutional networks (GCNs) often broadly categorize into spectral approaches and spatial approaches [59, 52]. Spectral models commonly use the spectral graph theory to design spectral filters, such as ChebyNet [9] and the vanilla GCN [27]. While spatial models define graph convolutions based on a node’s spatial relations, e.g., GraphSAGE [19, 10] and Graph Attention networks (GAT) [46]. Due to the good theoretical guarantee, spectral models have been extensively studied by the recent mainstream. Despite their success, most of the stateoftheart (SOTA) GCN models are dominated by shallow and simple models because of the oversmoothing problem [31, 35]
. To circumvent the problem, many efforts have been paid to develop deeper and more robust models. The proven techniques include residual connection
[35], randomly drop nodes [14] or edges [43], and data augmentation [58], etc.More recent attempts in simplifying GCN have collectively suggested that a GCN model can be decoupled into two successive stages: parameterfree neighborhood propagation and taskspecific prediction. Following this scheme, Wu et al. [51] initially proposed Simple Graph Convolution (SGC) [51]
by simplifying the vanilla GCN as a lowpass filter followed by a linear classifier. While Approximate Personalized Propagation of Neural Predictions (APPNP)
[28] swaps the two stages and establishes a connection between GCN and the wellknown PageRank [40]. These successful attempts signify several helpful tendencies. First, simplified GCNs have a similar capability to the elaborated ones in handling structural data. Second, a linear GCN is as powerful as the nonlinear counterpart for most graph scenarios. Nonetheless, fewer studies have managed to develop a unified framework between traditional linear models and GCNs. Moreover, gradient descentbased training of simplified GCNs often suffers from local optimum, tedious hyperparameters and training tricks. Hence, it would result in a more simplified GCN if a globally optimal solution in a closed form is provided.Earlier linear models, such as ridge regression [2] and subspace clustering [12], have been frequently applied in practice owing to their simplicity, efficiency, and efficacy. However, these classic models essentially work in the Euclidean domain, leading to insufficient handling of graph structured data. Before GCNs emerged, graph Laplacian regularization (or manifold regularization) [3, 6] had been widely applied in various linear models to incorporate structural information. This inspires a series of classic graphregularized semisupervised approaches, e.g., Laplacian Regularized Least Squares (LapRLS) [3], and graphregularized unsupervised approaches, e.g., Laplacian Regularized LowRank Representation [55]. Despite being a useful technique, the Laplacian regularization encounters three shortcomings: 1) it is typically dependent upon the Euclidean domain and, thus, it is hard to directly generalize to realword graph data; 2) it merely considers the 1order neighborhoods while ignoring important longrange interdependency; 3) its additional regularization coefficient needs to be appropriately adjusted.
In this paper, we propose a very simple and unified framework, referred to as Fully Linear Graph Convolution (FLGC), for both semisupervised learning (SSL) and unsupervised learning (USL). Our goal is to further simplify GCNs and generalize it to existing linear models, finally enabling traditional models to explore graphs directly. Specifically, we linearize GCN and then decouple it into neighborhood propagation and prediction stages, resulting in a flexible framework. Such simplification makes it possible to efficiently calculate a globally optimal solution during training and also easy to incorporate with various previous linear models. On the basis of the resulting framework, we further present a semisupervised FLGC for node classification problems and an unsupervised FLGC for subspace clustering problems. To prevent the oversmoothing issue, we introduce an initial residual in neighborhood propagation to capture the longrange dependency across a graph.
To sum up, the main contributions of this paper are as follows:

We present a simple yet effective FLGC framework to deal with graph data and regular data. The framework consists of a parameterfree neighborhood propagation and a taskspecific linear model with a closedform solution. The framework not only simplifies the training of existing GCNs with a general but makes it easier to implement and apply in practice.

We implement two FLGCs for semisupervised node classification and unsupervised clustering tasks, respectively. Also, we show that the semisupervised and unsupervised FLGCs act as a generalization of classic ridge regression and subspace clustering in the nonEuclidean domain. Such generalization enables traditional linear models to explore graph structure without losing their original simplicity and efficacy.

We extend the personalized propagation scheme to balance the contribution of structure and node features, which endows FLGC with the capability of capturing longrange neighborhoods, thereby reducing the risk of oversmoothing.

We empirically show that FLGCs consistently outperform a large number of previous SOTA approaches on both semisupervised learning and clustering tasks across realworld graph data and regular grid data. Such superiority further offers a promising scheme to revisit traditional linear models in a pure nonEuclidean domain.
The rest of the paper is structured as follows. In Section II, we briefly review the related concepts on recent GCN models and subspace clustering. Section III systematically introduces the motivation, framework, formulation, and implementation of the proposed FLGCs. Extensive qualitative and quantitative evaluations and comparisons are presented in Section IV and Section V, followed by the conclusions. Conclusions and future works are given in Section VI.
Ii Notation and Concepts
Iia Notations
In this paper, boldface lowercase italics symbols (e.g., ), boldface uppercase roman symbols (e.g., ), regular italics symbols (e.g., ), and calligraphy symbols (e.g., ) orderly denote vectors, matrices, scalars, and sets. A graph is represented as , where denotes the node set with and , indicates the edge set with , and signifies the dimensional node feature matrix of while the corresponding class onehot target matrix is . We define as the adjacency matrix of and the diagonal degree matrix as , where . The graph Laplacian matrix is defined as , and its normalized version is given by , which has an eigendecomposition of . Here,
is a diagonal matrix of the eigenvalues of
, andis a unitary matrix consisting of the eigenvectors of
. Bedsides, denotes the transpose of matrix anddenotes an identity matrix with the size of
. The trace operation and the Frobenius norm of matrix are defined as and , respectively.IiB Vanilla GCN
We refer to the GCN model proposed by Kipf et al. [27] as the vanilla GCN because of its great success and numerous followers. The vanilla GCN suggests that the graph convolution operation can be approximated by the th order polynomial of Laplacians, i.e.,
(1) 
where is a graph filter parameterized by in the Fourier domain and denotes the polynomial coefficient. The vanilla GCN [27] adopts two crucial strategies to simplify and enhance Eq. (1). First, it uses a 1order polynomial with settings of and to approximate Eq. (1), resulting in a simplified convolution operation, i.e., . Second, it introduces a renormalization trick to guarantee its stability. Specifically, the trick can be expressed as
(2) 
We call
as propagation matrix. As a result, in analogy to convolutional neural networks (CNN)
[18, 4], a general and layerwise graph convolution propagation rule can be defined by(3) 
Here, is the th layer’s graph embedding () and is a trainable parameters matrix. However, many works [59, 31, 51, 28, 52, 35] have demonstrated that stacking multiple GCN layers will lead to the oversmoothing effect, that is to say, all vertices will converge to the same value. Thus, the vanilla GCN usually adopts a shallow architecture, e.g., two GCN layers.
IiC Sgc
SGC [51] removes nonlinear activations in the vanilla GCN and collapses all trainable weights into a single matrix. This enables it to raise the repeated multiplication of the normalized adjacency matrix as a th power of a matrix, i.e.,
(4) 
Furthermore, SGC can be regarded as a fixed feature extraction/smoothing component
followed by a linear logistic regression classifier
. In [51], Wu et al. suggested that SGC acts as a lowpass filter and such a signified model performs comparably to many SOTA models.IiD Appnp
Personalized propagation of neural predictions (PPNP) and its fast approximation, APPNP [28], consider the relationship between GCN and PageRank to derive an improved propagation scheme based on personalized PageRank. Let
be a multilayer perceptron (MLP) parameterized by
. Then PPNP is defined as(5) 
where
is the teleport (or restart) probability of the topicsensitive PageRank.
Similar to SGC, PPNP separates the neural network used for generating predictions from the propagation scheme. While APPNP further approximates topicsensitive PageRank via power iteration, i.e.,
(6) 
One of the great advantages of PPNP and APPNP is that they decouple feature transformation and propagation procedures of the vanilla GCN without increasing the number of trainable parameters.
IiE Classic Linear Models
We broadly divide classic linear models into supervised methods and unsupervised methods. Similar to SGC, a typical supervised linear model can be treated as a fully linearized MLP, given by
(7) 
Here, and denote training samples and corresponding target matrix, respectively. Such a model is also known as a ridge regression classifier [2, 20]. Besides, logistic regression and softmax regression are its two most frequentlyused variants in deep neural networks [17, 57, 29, 18, 7].
The unsupervised fashion of a linear model often follows a common assumption, i.e., data points lie in a union of linear subspaces. While the subspace representation coefficients can be obtained by solving the following linear selfexpressive model, i.e.,
(8) 
Notably, the main difference between Eq. (7) and Eq. (8) is that the former considers the combination between every feature, while the latter considers samples. In order to achieve an effective solution, various norm constraints are often imposed on or . Sparse Subspace Clustering (SSC) [12] utilizes an norm , while Low Rank Subspace Clustering (LRSC) [48] adopts a nuclear norm , just to name a few. Despite their success, the objective functions derived from these constraints are not smooth, leading to inefficient solutions. In contrast, the Frobenius norm will result in a closedform solution for linear models.
Iii Fully Linear Graph Convolutional Networks
We present the general FLGC framework for semisupervised classification and unsupervised classification problems, as illustrated in Fig. 1 (a). The core behind our FLGC is to generalize GCNs to traditional linear models so that a) training linear GCN model with global optimal solution b) enabling traditional linear models to work on graphstructured data, 3) further simplifying the existing GCN models.
Iiia Fully Linearization of GCN
Inspired by the SGC [51] model, we further remove all nonlinear operations of a layer GCN, including the logistic regression classifier^{1}^{1}1It should be noticed that both Logistic (Sigmoid) and Softmax function are often used as nonlinear activation in neural networks. Thus, we consider SGC to be not fully linear. That is why it cannot calculate a closedform solution.. It derives the following linear GCN
(9) 
This linearization brings three major benefits. First, it is easy to decouple the fully linear GCN into two stages: a parameterfree feature propagation stage (i.e., , where denotes graph embedding) and a targetdependent prediction stage (i.e., ). The former aggregates hop neighborhoods based on a predefined propagation matrix . While the later acts as a ridge regression classifier parameterized by . Second, it establishes a relationship between traditional linear models and GCN models. This relationship enables us to reconsider the traditional linear models from the graph representation learning point of view. Also, it endows the classic ridge regression classifier with the ability to handle graphs directly. Third, the linearization makes it possible to efficiently solve the global optimal solution of GCN without using gradient descent. We refer to such GCN as Fully Linear Graph Convolution (FLGC). A matrixform data flow of FLGC is depicted in Fig. 1 (b). More details will be introduced as follows.
IiiB Multihop Propagation
In light of the aforementioned linearization, we can define various propagation strategies and incorporate them into Eq. (9). Here we introduce a stable propagation scheme for our FLGC. According to the spectral graph theory, will converge to a stationary state as the number of propagation steps increases [28, 35]. While the node representations on the same connected component of a graph become indistinguishable, i.e., oversmoothing problem [28, 35]. Moreover, serves as a structure aggregation term and ignores the importance of initial node features. The initial node features often imply unique discriminant information, especially for those data without directly available graph structures.
To balance the structure between initial features, we adopt an extended APPNP’s propagation scheme to propagate multihop neighboring information. The propagation procedure with power iteration is recursively calculated by
(10) 
Instead of using a neural network to generate a prediction as it is done in APPNP [28], we directly use the initial as topics to be ranked in the topicsensitive PageRank [40]. Let the resulting final graph embedding be . This ensures that is always contributed by both structure and initial features with a fixed proportion . It is trivial to prove that SGC’s propagation (i.e., Eq. (9)) is a special case of Eq. (10) with . Furthermore, when , SGC degrades into an ordinary neural network, in which no structural information is used. From the residual network’s point of view [27, 35, 21], our propagation scheme is a special form of residual connection, where each forward step connects with initial inputs and weighted by [as shown in Fig. 1 (b)]. Thus, our propagation mechanism is also called initial residual [35].
IiiC FLGC for SemiSupervised Learning
Having introduced the FLFC framework, we are ready to calculate the closedform solution for a specific downstream task. We first use the FLGC for the semisupervised node classification problem. Let be an augmented target matrix, where labeled nodes are presented as onehot vectors while unlabeled nodes are marked as zero vectors. Further let be a diagonal mask matrix associated with the labeled and unlabeled nodes. Thus, the semisupervised FLGC can be denoted by
(11) 
To effectively solve this problem, we rewrite the objective function as a Frobenius norm minimization problem, i.e.,
(12) 
The problem can be further expressed in the following form by using the Lagrangian multiplier
(13) 
where denotes a regularization coefficient. The partial derivative of with respect to is
(14) 
The global optimal solution is derived when , i.e.,
(15) 
Consequently, we denote the solution in closed form as
(16) 
Finally, we infer the test node’s labels via a transductive manner.
IiiD FLGC for Clustering
Our unsupervised FLGC follows the classic subspace clustering by assuming that the hop graph embeddings lie in a union of
subspaces. More precisely, every node belonging to a certain subspace can be explicitly represented using other nodes in this subspace while subspaces do not interact with each other. We refer to this property of data as selfexpressiveness. However, using an initial input to model such a property may lead to an unreliable estimate of subspace coefficients due to outliers and noisy points. Instead, we model our unsupervised FLGC based on the graph embedding of inputs. The motivation behind our method is that the intraclass variation of the initial inputs can be smoothed by using the graph convolution. Formally, we express our unsupervised FLGC as follows
(17) 
Here, is an affinity matrix, in which the th column denotes the representation coefficient vector of node , and indicates the norm of a matrix. By analogy with classic subspace clustering models, will lead to a standard sparse subspace representation while replacing with the nuclear norm will derive a lowrank subspace representation.
In this paper, we aim to calculate a dense subspace representation by adopting the Frobenius norm, as well as maintaining the consistency of our FLGC framework. It has been proven [41] that the constraint can be relaxed and discarded by using a Frobenius norm. Hence, our unsupervised FLGC can be rewritten as
(18) 
We further compute as
(19) 
Similar to the semisupervised FLGC, we can give the global optimal closedform solution of Eq. (18), i.e.,
(20) 
, we perform the spectral clustering on
to segment subspaces after using a blockstructure heuristic.
IiiE Remarks on FLGC
In Algorithm refalg:pesudocode, we provide the pseudocode for our semisupervised and unsupervised FLGC. Noticed that both methods share a unified learning procedure and are easy to implement and train. In reality, our proposed FLGC models can be treated as natural generalizations of classic linear models in the nonEuclidean domain.
Algorithm 1 Pseudocode of FLGC in a PyTorchlike style. 

# lambda: regularization coefficient 
# alpha: teleport (or restart) probability 
# 
# fit: calculate closed form solutions 
# sc: spectral clustering 
# mm: matrixmatrix multiplication 
# gcn_norm: normlize adjacency matrix 
# compute augmented normalized adjacency 
P = gcn_norm(A) # NxD 
# compute Khop graph embedding 
H = X # NxD 
for i range(K): 
# propagate neighborhood using Eq.(10) 
H = (1  alpha) * mm(A_hat, H) + alpha * X # NxD 
# SEMISUPERVISED FLGC 
if task == ’ssl’: 
# compute W using Eq.(16) 
W = fit(H, Y_aug, M) # DxC 
# predict unlabeled nodes 
y = mm(H, W) 
# UNSUPERVISED FLGC 
elif task == ’usl’: 
# compute Z using Eq.(20) 
Z = fit(H, X) 
# assign node labels through spectral clustering 
y = sc(Z) 
Let be the generalized linear learning model defined on a graph . FLGC can be broadly denoted as a classic model, , multiplied by a hop propagation matrix, , i.e.,
(21) 
The only trainable parameter, , is derived from the classic model which can be solved in a similar manner to the existing models. It is easy to prove that FLGC is a generalization of classic linear models in the nonEuclidean domain. When , signifying that does not have any edges except for selfloops, FLGC is equivalent to classic linear models. Benefited from the decoupled design, is targetindependent and can be precomputed. Thus, FLGC does not increase the computational burden compared to the classic linear models. Here, we show the connections of FLGC to existing models.

FLGC v.s. Manifold Regularization Let be the objective function of a manifold regularized model, where , , and denote the empirical error term, the structural risk term, and the manifold prior term, orderly. Further let be the proposed FLGC. As suggested in [60], both and are derived from the same optimization framework. Nonetheless, there is a considerable difference between them. That is, our FLGC directly works in the nonEuclidean domain, while the manifold regularization is proposed for Euclidean data. During the learning, manifold regularized models use the graph structure as the prior knowledge, generally defined as , which is incorporated as a single term balanced by an additional regularization coefficient. In particular, the manifold regularization cannot model the longrange relationships. In contrast, our FLGC propagates multihop structural information in a more general and flexible manner.

FLGC v.s. AutoEncoder
Our unsupervised FLGC is highly related to linear autoencoders
[5]. We define a linear graph autoencoder as , where and are the decoder and encoder, respectively. By collapsing weights matrices of a layer encoder in a single matrix , an autoencoder with a selfexpressive layer becomes and . By replacing using a fixed unitary matrix , such encoder becomes the propagation stage of the unsupervised FLGC, while our is a single layer selfexpression. Furthermore, our FLGC aims to reconstruct node features rather than the structure that adopted in the graph autoencoder [26].
Iv Experiments of SSL
In this section, we compare the proposed semisupervised FLGC model against many SOTAs and classic methods on 3 realworld graph benchmark datasets and 24 regular UCI datasets. Furthermore, numerous ablation experiments are conducted to systematically and comprehensively analyze the effectiveness and robustness of the proposed FLGC.
Dataset  #Nodes  #Edges  #Classes  #Features  Train/Dev/Test 

Cora  2,708  5,429  7  1433  140/500/1,000 
Citeseer  3,327  4,732  6  3703  120/500/1,000 
Pubmed  19,717  44,338  3  500  60/500/1,000 
Method  Cora  Citeseer  PubMed 

GCN  81.4 0.4  70.9 0.5  79.0 0.4 
GAT  83.3 0.7  72.6 0.6  78.5 0.3 
FastGCN  79.8 0.3  68.8 0.6  77.4 0.3 
GIN  77.6 1.1  66.1 0.9  77.0 1.2 
LNet  80.2 3.0  67.3 0.5  78.3 0.6 
AdaLNet  81.9 1.9  70.6 0.8  77.8 0.7 
DGI  82.5 0.7  71.6 0.7  78.4 0.7 
SGC  81.00.0  71.9 0.1  78.9 0.0 
MixHop  81.9 0.4  71.40.8  80.80.6 
DropEdge  82.8  72.3  79.6 
GNN  82.50.2  74.40.3  77.9 0.4 
GCN  81.10.2  69.80.2  79.40.1 
GCNLinear  80.80.0  68.70.1  79.40.1 
SGC  81.70.0  71.10.0  76.60.0 
APPNP  82.50.2  70.40.1  79.40.3 
FLGC  82.90.0  72.70.0  79.20.0 
FLGC*  84.00.0  73.20.0  81.10.0 
Method  Random Split  Full Split  

Cora  Citeseer  PubMed  Cora  Citeseer  PubMed  
GCN  79.11.8  67.91.2  76.92.9  86.4  75.4  85.9 
GCNLinear  79.82.1  68.42.1  76.71.4  87.0  75.7  83.9 
SGC  81.31.7  68.52.2  76.43.3  86.2  77.6  83.7 
APPNP  81.00.0  68.50.0  75.10.0  88.4  78.6  82.3 
FLGC  81.50.6  71.00.9  77.60.3  87.0  78.1  87.9 
FLGC*  82.00.1  72.10.0  77.70.0  88.5  79.2  88.3 
Classification accuracy vs. training time with varying propagation step on (a) Cora, (b) Citeseer, and (c) Pubmed datasets. We keep other hyperparameters fixed except for
for each model so that running time is affected only by the propagation step. In the figures, each curve contains 15 points (i.e., 15 propagation steps) and each corresponds to the running time (xaxis) and test accuracy (yaxis) yielded by a specific hop model, where and increases with running time. The insets indicate the zoomup of SGC.Iva Results on RealWorld Benchmarks
IvA1 Dataset Description
We evaluate our proposed FLGCs on three standard citation network datasets available from the PyTorch Geometric library, including the Cora, Citeseer, and PubMed [44]. The summary of these datasets is reported in Table I. In these datasets, nodes correspond to documents, and edges correspond to citations; each node feature corresponds to the bagofwords representation of the document and belongs to one of the academic topics [54]. Given a portion of nodes and their labeled categories, e.g., history and science, the task is to predict the category for other unlabeled nodes over the same graph.
IvA2 Baselines and Setup
For citation network datasets, the proposed FLGCs compare against numerous SOTA graph neural network models, including the vanilla GCN [27], GAT [46], FastGCN [8], GIN [53], LNET, AdaLNet [32], DGI [47], SGC [51], MixHop [1], DropEdge [43], and GNN [34]. For these models, we give their results reported in the corresponding literature. Moreover, we reproduce the vanilla GCN w/o nonlinear activation (GCN or GCNLinear), SGC, and APPNP [28]. In our reproduction, we follow the settings suggested in the corresponding papers. Specifically, we implement GCN and GCNLinear using twolayer graph convolution each with
hidden neurons, and apply an
regularization with on trainable parameters. For APPNP, we adopt a twolayer MLP, each of which contains hidden neurons and. For a fair comparison, we discard other training tricks involved in backpropagation except for weight decay.
We implement two variants of our FLGC model with the PyTorch library^{2}^{2}2Relies on Pytorch Geometric 1.6.3., i.e., FLGC* indicates our method that uses our propagation mechanism and FLGC denotes our model that uses the SGC propagation. The hyperparameters in our models are determined by a grid search among , , and . We train and test all baselines with the same data splits and random seeds on an NVIDIA GTX 1080 Ti GPU, and report the average accuracy over runs. In our experiment, we provide three types of data splits, i.e., public splits as described in [44], random splits where training/validation/test sets are generated randomly with the same proportion as the public splits, and full splits where all remaining nodes are considered as the training set.
IvA3 Comparison with SOTAs
Table II reports the classification accuracies of node classification with public splits. The results shown in the top part of Table II are collected from [27, 46, 8, 53, 32, 47, 51, 1, 43, 34] while the results shown in the middle part of Table II are reproduced in our experiment. It can be seen that our FLGC models consistently achieve largemargin outperformance across all datasets. Through a pointbypoint comparison, FLGC improves upon SGC by a margin of , , and (absolute differences) on Cora, Citeseer, and Pubmed, respectively, while the margins improved by FLGC* upon APPNP are , , and , respectively. Through a vertical comparison, FLGC* achieves , and improvement over FLGC, respectively.
In Table III, we further report the comparison results using the random splits and full splits. We can observe that the proposed FLGCs collectively outperform the competitors in terms of average classification accuracy. It should be noted that our FLGCs tend to obtain a more stable result than other baselines because of their ability to offer closedform solutions. In a nutshell, the above experiments demonstrate that our FLGC framework is capable of achieving the SOTA performance.
IvA4 Running Time and Oversmoothing Analysis
Fig. 2 depicts the interaction between training time and classification accuracy. To ensure a fair comparison, all the training times are collected from the same hardware. In particular, the precomputation time of is taken into account for SGC and FLGCs. We use fixed hyperparameters to train each model and let be the only variable increasing from to . Thus, the running time of each model will definitely increase with , and then will indirectly affect the test accuracy. Remarkably, GCN and linear GCN suffer from unstable performance when because of oversmoothing. On the contrary, both FLGC and FLGC* are robust to the propagation steps. Especially, there is no significant oversmoothing effect in FLGC* across three figures, signifying the effectiveness of our propagation scheme. Also, it suggests that the residual connection is helpful to address the oversmoothing problem.
For a given graph, the training time of a graph model is positively associated with the propagation steps (or layers). Nonetheless, our FLGCs show a notable advantage over other methods (e.g., GCN and APPNP). Especially, it is almost no increase in the computation time using our multihop propagation scheme. Instead, such a strategy improves FLGC significantly, particularly with large . Despite computation efficiency, SGC suffers from oversmoothing and its training relies on the optimizer and its parameter settings. In summary, our FLGC model achieves a good balance between classification accuracy and training time.
Dataset  #classes  #instances  #features  #train  #test  Dataset  #classes  #instances  #features  #train  #test 

air  3  359  64  37  322  appendicitis  2  106  7  12  94 
ecoli  8  336  7  38  298  heart  2  270  13  27  243 
iris  3  150  4  15  135  cleve  2  296  13  30  266 
fertility  2  100  9  11  89  segmentation  7  210  18  21  189 
wine  3  178  13  19  159  x8d5k  5  1000  8  100  900 
wdbc  2  569  30  58  511  vote  2  435  16  44  391 
haberman  3  306  3  32  274  wbc  2  683  9  69  614 
spectf  2  267  44  28  239  WBC  2  683  9  69  614 
cotton  6  356  21  37  319  breast  2  277  9  29  248 
seeds  3  210  7  21  189  australian  2  690  14  70  620 
glass  6  214  10  23  191  diabetes  2  768  8  78  690 
zoo  7  101  16  13  88  dnatest  3  1186  180  120  1066 
IvA5 Impact of and
We study the sensitivity of and of FLGC. As depicted in Fig. 3 (a)(f), both and have a significant effect on the accuracy. Due to the difference in the neighborhood structure, the optimum of and will be varied for different datasets. Usually, a large value tends to bring a compact model, while a small value increases the risk of overfitting. For Cora and Citeseer datasets, a large
is desired by the FLGC models, while this value should be small on the Pubmed dataset. It should be noticed that FLGC* is equivalent to FLGC and the classic ridge regression classifier when
and , respectively. It can be seen that the best settings for this hyperparameter is around . The teleport probability serves as a proportion showing to what extent original features contribute to the propagation. Our further analysis revealed the following tendencies: First, compared to the two endpoints ( and ) in Fig. 3, FLGC improves the classic ridge regression with a significant margin. This means that structure information is pretty useful for the traditional linear model. Second, the original node features are helpful to improve FLGC, which makes it possible to aggregate higher hop neighbors.IvB Results on Regular Datasets
IvB1 Dataset Description
To further explore the generalization ability of FLGCs, we conduct a series of experiences on widelyused regular classification datasets taken from the University of California at Irvine (UCI) repository^{3}^{3}3http://archive.ics.uci.edu/ml/index.php. These datasets include a number of binaryclass and multiclass classification problems. In the preprocessing, all datasets are scaled into the range of using the minmax standardization technique. For each dataset, we randomly take samples from each class as the labeled set and the rest as the unlabeled set. A detailed description of these datasets is provided in Table IV.
Dataset  SSELM  TSVM  LapRLS  GCNLinear  GCN  DropEdge  SGC  GCNII  APPNP  FLGC  FLGC* 

air  76.234.08  80.193.65  76.096.08  90.094.09  90.123.80  86.434.13  78.943.71  89.162.53  90.314.17  90.652.89  91.023.44 
ecoli  77.415.31  79.434.34  85.861.99  83.413.90  83.553.87  80.712.85  77.131.47  83.282.61  81.323.96  84.092.10  84.731.92 
iris  80.194.26  92.563.67  91.933.34  91.634.63  90.814.89  88.815.95  88.596.29  92.672.05  91.413.49  96.300.66  96.810.47 
Fertility  75.2810.03  71.748.72  77.255.80  73.039.91  74.3810.31  83.484.69  88.760.00  76.188.92  77.879.36  88.760.00  88.760.00 
wine  79.537.66  94.092.10  95.132.12  93.081.57  93.141.14  90.442.00  91.134.50  93.332.69  91.642.22  94.031.69  94.281.47 
wdbc  89.175.90  93.641.83  85.922.41  93.561.75  93.602.05  93.681.60  91.781.68  93.520.98  93.621.86  94.580.90  95.600.67 
Haberman  71.732.44  62.575.73  70.023.15  70.512.15  70.402.83  67.964.15  73.720.00  70.882.05  69.712.63  73.720.00  73.720.00 
SPECTF  77.411.83  74.333.41  49.774.59  77.242.55  78.491.93  76.994.24  79.500.00  79.500.00  78.582.07  79.500.00  79.500.00 
CAR  74.183.00  86.462.19  83.781.20  79.361.18  85.521.23  82.771.80  73.890.73  85.401.10  85.031.17  78.910.78  78.960.84 
cotton  60.644.00  78.103.24  76.103.90  74.833.12  75.453.26  71.254.43  72.882.05  75.492.03  75.862.30  73.705.70  78.244.37 
Seeds  87.227.08  91.932.58  92.831.76  90.632.40  90.422.83  78.3120.88  89.890.69  91.481.60  89.472.93  92.060.63  93.171.12 
glass  64.267.54  77.164.27  73.584.79  78.744.18  77.854.09  56.3920.35  71.475.92  79.633.20  76.345.79  74.555.49  74.875.47 
zoo  78.7810.07  98.581.24  97.031.99  92.593.57  92.822.65  85.064.34  91.063.12  92.593.41  92.242.94  93.063.30  93.061.11 
appendicitis  81.445.92  75.1111.76  72.555.83  80.437.86  80.327.81  81.175.81  82.342.29  82.667.93  81.496.83  83.196.86  83.305.69 
heart  71.716.32  73.274.16  73.853.18  76.095.46  76.134.12  75.519.46  79.143.39  76.676.58  77.286.13  81.111.31  81.651.38 
cleve  71.075.87  73.985.49  71.603.15  76.282.81  75.793.63  71.504.59  76.391.48  76.882.98  77.033.79  77.371.93  77.442.35 
segmentation  55.635.85  76.066.44  80.615.02  76.982.91  75.983.55  69.217.27  71.323.03  78.203.84  74.663.31  76.514.13  79.104.16 
X8D5K  94.334.29  100.00.0  100.00.0  100.00.0  100.00.0  100.00.0  100.00.0  100.00.0  100.00.0  100.00.0  100.000.00 
vote  80.545.32  90.521.75  84.903.02  89.232.48  89.642.42  87.372.50  87.341.04  90.202.06  89.852.19  91.101.51  91.791.46 
WBC  92.733.43  92.382.68  93.981.33  95.701.56  95.881.07  96.690.51  95.900.51  95.640.50  95.650.81  96.190.47  96.480.66 
breast  69.963.86  63.474.85  71.732.52  66.254.18  66.904.05  71.133.29  70.970.00  67.503.13  68.793.07  72.101.41  73.351.49 
austra  78.545.48  77.653.83  81.492.37  85.441.64  85.231.98  76.9711.37  81.871.37  84.351.06  83.983.10  85.161.54  85.660.68 
diabetes  69.932.96  66.752.75  70.691.55  68.130.92  69.101.05  67.131.81  65.010.87  70.091.13  69.101.50  69.311.55  69.481.10 
dnatest  48.962.27  82.951.53  83.471.19  85.301.07  85.541.05  81.911.42  74.401.85  87.801.00  82.035.27  84.920.92  85.391.27 
Average  75.295.20  81.373.84  80.843.01  82.863.16  83.213.15  80.045.39  81.391.92  83.882.64  83.053.37  84.621.91  85.271.71 
SSELM  TSVM  LapRLS  GCNLinear  GCN  DropEdge  SGC  GCNII  APPNP  FLGC  FLGC*  

SSELM    66.0  54.0  23.0  18.0  56.0  15.0  8.0  10.0  1.0  1.0 
TSVM  234.0    136.0  74.0  66.0  157.0  142.0  48.0  80.0  53.0  37.0 
LapRLS  246.0  140.0    110.0  100.0  167.0  146.0  74.0  106.0  67.0  41.0 
GCNLinear  277.0  202.0  166.0    99.0  215.0  191.0  42.5  135.5  41.5  22.0 
GCN  282.0  210.0  176.0  177.0    231.0  197.0  48.0  172.5  63.0  38.0 
DropEdge  244.0  119.0  109.0  61.0  45.0    101.0  34.0  37.0  13.0  9.0 
SGC  285.0  134.0  130.0  85.0  79.0  175.0    65.5  71.0  5.0  5.0 
GCNII  292.0  228.0  202.0  257.5  228.0  242.0  234.5    208.0  104.5  65.5 
APPNP  290.0  196.0  170.0  140.5  127.5  239.0  205.0  68.0    46.0  29.0 
FLGC  299.0  223.0  209.0  235.5  214.0  263.0  295.0  195.5  230.0    5.0 
FLGC*  299.0  239.0  235.0  254.0  238.0  267.0  295.0  234.5  247.0  271.0   
IvB2 Baselines and Setup
In this experiment, the selected baselines include GCN variants, i.e., GCNLinear, GCN [27], DropEdge [43], SGC [51], GCNII [35], and APPNP [28], and classic semisupervised classification models, i.e., SSELM [22], TSVM [25], and LapRLS [3]. There is no offtheshelf structure in these regular datasets, thus, we construct NN graphs [5, 6] for the representation of structured information. Specifically, we adopt the Euclidean distance to measure the similarity between sample pairs and choose top neighbors centered on a certain sample as its edges. To avoid hyperparameter , we empirically set it as .
IvB3 Comparison with SOTAs and Statistical Test
In Table V, we provide the comparative results on the 24 UCI datasets. All the results are calculated by averaging independent runs. At the bottom of the table, we summarize the arithmetic mean accuracy over 24 datasets. Remarkably, our FLGC models consistently outperform not only the classic semisupervised models but also the recent GCN variants. Specifically, FLGC* achieves the highest accuracy on out of datasets. On average, FLGC and FLGC* respectively obtain and accuracy across 24 datasets, which improve upon SGC and APPNP by margins of and , respectively. Furthermore, we notice that GCN variants are generally superior to classic semisupervised models even on the regular grid datasets. This is a valuable clue in designing semisupervised models on regular datasets.
To further rank all baselines, we carry on a nonparametric statistical test on the results reported in Table V. To this end, we follow the suggestion posed by Garcia et al. [15] on adopting the Wilcoxon signedranks test^{4}^{4}4We use KEEL (Knowledge Extraction based on Evolutionary Learning) tool available from http://www.keel.es/ to conduct the Wilcoxon signedranks test. to compute the sum of ranks for each pair of methods. Table VI shows the detailed statistical results. According to the exact critical value table of the Wilcoxon test, the critical values on datasets for a confidence level of and correspond to (lower diagonal) and ( upper diagonal), respectively. We can observe that our proposed FLGC* is significantly better than all the other competitors for different confidence levels, while the FLGC model performs equally to GCNII and GCN for the confidence level of and , respectively. The results markedly demonstrate that our proposed FLGC models can generalize to the regular Euclidean data and can achieve promising performance.
IvB4 Comparison w.r.t. Different Sizes of Training Samples
We conduct an experiment to compare the performance of baselines with varying labeled samples size. Fig. 4 (a)(c) illustrate the comparative results on three selected datasets, i.e., Iris, Wine, and WDBC. We gradually increase the training samples per class from 1 to 20 and plot the test accuracy curves of competitive methods. It can be seen that both of our FLGC models show competitive performance w.r.t. the other baselines under different training sizes. Particularly, our methods remarkably outperform many baselines (e.g., SSELM, TSVM, and LapRLS) using all datasets when using an extremely small training size, e.g., only 1 labeled sample per class.
IvB5 Study on OverSmoothing
Fig. 5 (a)(c) show the effect of the propagation step on the selected datasets. Several tendencies can be observed from the figures. Firstly, GCNII, APPNP, FLGC, and FLGC* face less risk of falling into the oversmoothing dilemma, while that occurs in the other methods. Secondly, linear GCN tends to outperform the nonlinear GCN on the selected datasets. A conceivable reason is that the nonlinear activation accelerates the speed of oversmoothing. Also, this is affected by the predefined graph structure. Thirdly, by adding initial residual, FLGC* benefits from the longerrange propagation, and thus significantly improves upon FLGC.
Datasets  Metric  KMeans  NMF  NCut  CAN  CLR  SSR  KMM  RRCSL  EGCSC  FLGC  FLGC* 

Iris  ACC  0.797  0.636  0.586  0.573  0.846  0.686  0.817  0.873  0.940  0.953  0.960 
NMI  0.637  0.404  0.572  0.427  0.723  0.580  0.682  0.754  0.851  0.874  0.886  
Wine  ACC  0.960  0.722  0.959  0.926  0.915  0.955  0.541  0.972  0.949  0.949  0.983 
NMI  0.862  0.447  0.881  0.780  0.732  0.848  0.361  0.881  0.820  0.820  0.928  
Yale  ACC  0.472  0.339  0.511  0.521  0.509  0.593  0.442  0.600  0.515  0.546  0.630 
NMI  0.540  0.412  0.561  0.549  0.582  0.584  0.506  0.631  0.558  0.557  0.657 
Noise Type  Noise Intensity  ACC  NMI  

EDSC  EGCSC  FLGC  FLGC*  EDSC  EGCSC  FLGC  FLGC*  
Gaussian  0.853  0.863  0.863  0.901  0.897  0.881  0.881  0.919  
0.838  0.830  0.830  0.891  0.884  0.855  0.855  0.906  
0.472  0.748  0.748  0.824  0.748  0.780  0.780  0.866  
0.680  0.643  0.643  0.774  0.774  0.710  0.710  0.813  
Salt & Pepper  0.845  0.891  0.891  0.908  0.886  0.901  0.901  0.922  
0.842  0.862  0.862  0.896  0.885  0.883  0.883  0.913  
0.843  0.847  0.847  0.894  0.885  0.870  0.870  0.912  
0.838  0.826  0.826  0.877  0.881  0.850  0.850  0.901 
IvB6 Effect of the Neighborhood Size
We aim to further explore the effect of the predefined graph structure on the classification performance of our FLGC. Fig. 6 shows the tendency of accuracy varied with neighborhood sizes. When , meaning , FLGC performs identically to a ridge regression classifier. At this point, FLGC’s performance becomes more unstable and worse than that considered neighborhood information. This demonstrates that a pairwise relationship defines the intrinsic structure of regular data. Different from realword graph data, however, the predefined edges (i.e., NN graph) cannot perfectly describe such structures. As a result, the performance of FLGC varies with neighborhood sizes. Empirically, a large neighborhood leads to more performance improvement since that enlarges the firstorder receptive field. However, a too large neighborhood size will inevitably degrade the performance because of the increased risk of noisy edges and oversmoothing. It is still an open problem to find an optimum neighborhood size.
V Experiments of Clustering
In this section, we extensively evaluate our proposed unsupervised FLGC on several challenging clustering benchmarks and compare it with many previous clustering models.
Va Dataset Description
Except for two simple UCI datasets (i.e., Iris and Wine), we add four challenging image clustering benchmarks for performance comparison, i.e., COIL20 object image dataset and Yale, Extended Yale B, and ORL face image datasets. The COIL20 dataset contains grayscale image samples with a spatial size of and consists of distinct objects, e.g., duck, cat, and car models. The ORL dataset is composed of human face images, with subjects each having samples. Following [24], we downsampled the original face images from to . The Yale and Extended Yale B datasets are popular benchmarks for subspace clustering. The former includes face images collected from individuals. The latter database is more challenging than the former because it contains images of human subjects acquired under different poses and illumination conditions. The resolution of the two face databases is scaled to and , respectively. Some selected sample images from COIL20, ORL, and Extended Yale B are illustrated in Fig. 7.
VB Baselines and Setup
We divide our experiment into two parts. In the first part, we aim to evaluate our methods on three small and simple datasets, i.e., Iris, Wine, and Yale. This part follows the settings suggested in [49]. We compare our method with the following baselines: KMeans, NMF [30], Normalized Cuts (NCut) [45], CAN [38], CLR [39], SSR [23], KMultipleMeans (KMM) [37], EGCSC [5], and RRCSL [49]. Similar to Section IV, we use FLGC* to denote our method with the initial residual. In the second part, we conduct experiments on three challenging image datasets (i.e., COIL20, ORL, and Extended Yale B). The baselines that we compare FLGCs with in this part include Low Rank Representation (LRR) [33], LRSC [48], SSC [12], KSSC [42], SSC by Orthogonal Matching Pursuit (SSCOMP) [56], EDSC [41], and EGCSC [5]. We follow the experiment setups reported in [24]. For our FLGC, we search for the optimum parameter setting among , , , and a NN graph of .
VC Quantitative and Qualitative Results
Table VII gives the quantitative comparison of different clustering models on three small datasets. We can observe that our FLGC models consistently achieve superior ACC and NMI with significant margins compared with many existing clustering models. Specifically, FLGC* obtains and ACC on Iris and Wine datasets, respectively, outperforming the advanced RRCSL ( and ) and KMM ( and ) by large margins. In Fig. 8, we provide a visual comparison of the clustering performance on the three challenging datasets. The results reveal that the proposed FLGC models can markedly improve many selfexpressivenessbased clustering models. As an extension of subspace clustering, our unsupervised FLGC reduces the intraclass variations through neighborhood propagation, making it more robust to find the inherent subspace structure among data. Taking Iris and Extended Yale B datasets as examples, we visualize the affinity matrices yielded by FLGC*, as shown in Fig. 9. The visualizations exhibit distinctly blockdiagonal structures, which are highly close to the corresponding ground truths.
VD Analysis of Robustness
To analyze the robustness of FLGCs, we conduct experiments to compare the clustering performance under different noise conditions. Specifically, we adopt a Gaussian noise and a saltandpepper noise to corrupt images. The variance
of the Gaussian noise and the proportion of corrupted pixels by the salt and pepper noise are treated as the intensity of noise. We test our method with different intensities of 0.01, 0.05, 0.1, and 0.2. It can be observed from Table VIII, our FLGC* is more robust to noise than other methods. This superiority is benefited from the graph structure, as well as the initial residual propagation scheme. FLGC and EGCSC have the same performance which is because FLGC obtains the best performance when , and FLGC degrades into SGCSC at this point. Compared with EDSC, the other three methods show lower sensitivity to noise, demonstrating the robustness of the graph convolution.VE Influence of and
To study the effect of oversmoothing, we show clustering accuracy under different propagation steps on the COIL20, ORL, and Extended Yale B datasets, given in Fig. 10 (a) and (b). We find that FLGC* (Fig. 10 (a)) is robust to large propagation steps since its performance is almost unaffected by a large on the Extended Yale B dataset. In comparison, the accuracy obtained by FLGC drops about 0.70 in terms of ACC for . This robustness to large propagation step further demonstrates that the initial features are crucial for regular data. This conclusion is further supported in Fig. 10 (c), where the clustering ACC tends to be increased by large for the ORL and Extended Yale B datasets. It should be noticed that the structure information shows a higher contribution than initial features for the COIL20 datasets. A conceivable reason is that the samples within COIL20 have a strong interclass difference, which results in a more accurate structure.
Vi Conclusions
In this paper, we have presented a unified and simple graph convolutional framework, i.e., fully linear graph convolution networks, which incorporates multihop neighborhood aggregation into classic linear models to further simplify the training, applying, and implementing of GCN. Technically, we train FLGC by computing a global optimal closedform solution, resulting in efficient computation. Also, based on the framework, we developed a semisupervised FLGC and an unsupervised FLGC for semisupervised node classification tasks and unsupervised clustering tasks, respectively. Furthermore, we showed that FLGC acts as a generalization of traditional linear models on the nonEuclidean data. In comparison with existing approaches, our FLGCs achieved superior performance across realword graphs and regular grid data concurrently. The success of our FLGC establishes a connection between GCN and classic linear models. Future work may include exploring more scalable linear models to deal with largescale graph, inductive learning, and extending FLGC to different GCNs.
Acknowlegment
The authors would like to thank the anonymous reviewers for their constructive suggestions and criticisms.
References

[1]
(2019)
Mixhop: higherorder graph convolutional architectures via sparsified neighborhood mixing.
In
international conference on machine learning
, pp. 21–29. Cited by: §IVA2, §IVA3. 
[2]
(2007)
Face recognition using kernel ridge regression.
In
2007 IEEE Conference on Computer Vision and Pattern Recognition
, Vol. , pp. 1–7. Cited by: §I, §IIE.  [3] (2006) Manifold regularization: a geometric framework for learning from labeled and unlabeled examples. Journal of machine learning research 7 (Nov), pp. 2399–2434. Cited by: §I, §IVB2.
 [4] (2020) BSnets: an endtoend framework for band selection of hyperspectral image. IEEE Transactions on Geoscience and Remote Sensing 58 (3), pp. 1969–1984. Cited by: §IIB.
 [5] (2021) Graph convolutional subspace clustering: a robust subspace clustering framework for hyperspectral image. IEEE Transactions on Geoscience and Remote Sensing 59 (5), pp. 4191–4202. Cited by: §I, 2nd item, §IIID, §IVB2, §VB, §VB.
 [6] (2021) Hypergraphstructured autoencoder for unsupervised and semisupervised classification of hyperspectral image. IEEE Geoscience and Remote Sensing Letters (), pp. 1–5. Note: doi: 10.1109/LGRS.2021.3054868 Cited by: §I, §IVB2.
 [7] (2020) Densely connected convolutional extreme learning machine for hyperspectral image classification. Neurocomputing. Note: doi:10.1016/j.neucom.2020.12.064 External Links: ISSN 09252312 Cited by: §IIE.
 [8] (2018) FastGCN: fast learning with graph convolutional networks via importance sampling. In International Conference on Learning Representations, Cited by: §IVA2, §IVA3.
 [9] (2016) Convolutional neural networks on graphs with fast localized spectral filtering. In Proceedings of the 30th International Conference on Neural Information Processing Systems, NIPS’16, Red Hook, NY, USA, pp. 3844–3852. External Links: ISBN 9781510838819 Cited by: §I.
 [10] (2021) Graph sample and aggregateattention network for hyperspectral image classification. IEEE Geoscience and Remote Sensing Letters (), pp. 1–5. Note: doi: 10.1109/LGRS.2021.3062944 Cited by: §I.
 [11] (2021) Multiscale graph sample and aggregate network with contextaware learning for hyperspectral image classification. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 14 (), pp. 4561–4572. External Links: Document Cited by: §I.
 [12] (201311) Sparse subspace clustering: algorithm, theory, and applications. IEEE Transactions on Pattern Analysis and Machine Intelligence 35 (11), pp. 2765–2781. External Links: ISSN 01628828 Cited by: §I, §IIE, §VB.
 [13] (2019) Graph neural networks for social recommendation. In The World Wide Web Conference, WWW ’19, pp. 417–426. External Links: Document Cited by: §I.
 [14] (2020) Graph random neural networks for semisupervised learning on graphs. In Advances in Neural Information Processing Systems, H. Larochelle, M. Ranzato, R. Hadsell, M. F. Balcan, and H. Lin (Eds.), Vol. 33, pp. 22092–22103. Cited by: §I.
 [15] (2008) An extension on“statistical comparisons of classifiers over multiple data sets”for all pairwise comparisons. Journal of machine learning research 9 (Dec), pp. 2677–2694. Cited by: §IVB3.
 [16] (2019) Exact combinatorial optimization with graph convolutional neural networks. In Advances in Neural Information Processing Systems, Vol. 32, pp. 15580–15592. Cited by: §I.

[17]
(2018Sep.)
New frontiers in spectralspatial hyperspectral image classification: the latest advances based on mathematical morphology, markov random fields, segmentation, sparse representation, and deep learning
. IEEE Geoscience and Remote Sensing Magazine 6 (3), pp. 10–43. Cited by: §IIE.  [18] (2018) Recent advances in convolutional neural networks. Pattern Recognition 77, pp. 354–377. External Links: ISSN 00313203 Cited by: §IIB, §IIE.
 [19] (2017) Inductive representation learning on large graphs. In Proceedings of the 31st International Conference on Neural Information Processing Systems, NIPS’17, Red Hook, NY, USA, pp. 1025–1035. External Links: ISBN 9781510860964 Cited by: §I.
 [20] (2017) Graph regularized nonlinear ridge regression for remote sensing data analysis. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 10 (1), pp. 277–285. External Links: ISSN 19391404 21511535 Cited by: §IIE.
 [21] (201606) Deep residual learning for image recognition. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Cited by: §IIIB.
 [22] (201412) Semisupervised and unsupervised extreme learning machines. IEEE Transactions on Cybernetics 44 (12), pp. 2405–2417. Cited by: §IVB2.
 [23] (2015) A new simplex sparse learning model to measure data similarity for clustering. In TwentyFourth International Joint Conference on Artificial Intelligence, Cited by: §VB.
 [24] (2017) Deep subspace clustering networks. In Advances in Neural Information Processing Systems 30, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.), pp. 24–33. Cited by: §VA, §VB.

[25]
(1999)
Transductive inference for text classification using support vector machines
. In ICML, Vol. 99, pp. 200–209. Cited by: §IVB2.  [26] (2016) Variational graph autoencoders. arXiv preprint arXiv:1611.07308. Cited by: 2nd item.
 [27] (2017) Semisupervised classification with graph convolutional networks. In 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 2426, 2017, Conference Track Proceedings, Cited by: §I, §IIB, §IIIB, §IVA2, §IVA3, §IVB2.
 [28] (2019) Predict then propagate: graph neural networks meet personalized pagerank. In International Conference on Learning Representations (ICLR), Cited by: §I, §IIB, §IID, §IIIB, §IIIB, §IVA2, §IVB2.
 [29] (2015) Deep learning. Nature 521 (7553), pp. 436–444. External Links: ISSN 00280836 Cited by: §IIE.
 [30] (1999) Learning the parts of objects by nonnegative matrix factorization. Nature 401 (6755), pp. 788–791. Cited by: §VB.
 [31] (2018) Deeper insights into graph convolutional networks for semisupervised learning. In ThirtySecond AAAI Conference on Artificial Intelligence, Cited by: §I, §I, §IIB.
 [32] (2019) LanczosNet: multiscale deep graph convolutional networks. In 7th International Conference on Learning Representations, ICLR 2019, Cited by: §IVA2, §IVA3.
 [33] (2012) Robust recovery of subspace structures by lowrank representation. IEEE transactions on pattern analysis and machine intelligence 35 (1), pp. 171–184. Cited by: §VB.
 [34] (2019) A flexible generative framework for graphbased semisupervised learning. In 33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Cited by: §IVA2, §IVA3.
 [35] (2020) Simple and deep graph convolutional networks. In Proceedings of the 37th International Conference on Machine Learning, Cited by: §I, §IIB, §IIIB, §IIIB, §IVB2.
 [36] (201707) Geometric deep learning on graphs and manifolds using mixture model cnns. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Cited by: §I.
 [37] (2019) Kmultiplemeans: a multiplemeans clustering method with specified k clusters. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 959–967. Cited by: §VB.
 [38] (2014) Clustering and projected clustering with adaptive neighbors. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 977–986. Cited by: §VB.
 [39] (2016) The constrained laplacian rank algorithm for graphbased clustering. In Proceedings of the AAAI conference on artificial intelligence, Vol. 30. Cited by: §VB.
 [40] (199911) The pagerank citation ranking: bringing order to the web.. Technical Report Technical Report 199966, Stanford InfoLab. Note: Previous number = SIDLWP19990120 Cited by: §I, §IIIB.
 [41] (201403) Efficient dense subspace clustering. In IEEE Winter Conference on Applications of Computer Vision, Vol. , pp. 461–468. External Links: ISSN Cited by: §IIID, §VB.
 [42] (2014) Kernel sparse subspace clustering. In 2014 ieee international conference on image processing (icip), pp. 2849–2853. Cited by: §VB.
 [43] (2019) DropEdge: towards deep graph convolutional networks on node classification. In International Conference on Learning Representations, Cited by: §I, §IVA2, §IVA3, §IVB2.
 [44] (2008) Collective classification in network data. AI magazine 29 (3), pp. 93–93. Cited by: §IVA1, §IVA2.
 [45] (2000) Normalized cuts and image segmentation. IEEE Transactions on pattern analysis and machine intelligence 22 (8), pp. 888–905. Cited by: §VB.
 [46] (2018) Graph attention networks. In International Conference on Learning Representations, Cited by: §I, §IVA2, §IVA3.
 [47] (2019) Deep graph infomax.. In ICLR (Poster), Cited by: §IVA2, §IVA3.
 [48] (2014) Low rank subspace clustering (lrsc). Pattern Recognition Letters 43, pp. 47 – 61. Note: ICPR2012 Awarded Papers External Links: ISSN 01678655 Cited by: §IIE, §VB.
 [49] (2021) Robust rankconstrained sparse learning: a graphbased framework for single view and multiview clustering. IEEE Transactions on Cybernetics (), pp. 1–12. Note: doi: 10.1109/TCYB.2021.3067137 Cited by: §VB, §VB.
 [50] (2019) End to end learning and optimization on graphs. In Advances in Neural Information Processing Systems, H. Wallach, H. Larochelle, A. Beygelzimer, F. dAlchéBuc, E. Fox, and R. Garnett (Eds.), Vol. 32, pp. 4672–4683. Cited by: §I.
 [51] (2019) Simplifying graph convolutional networks. In International Conference on Machine Learning, pp. 6861–6871. Cited by: §I, §IIB, §IIC, §IIIA, §IVA2, §IVA3, §IVB2.
 [52] (2020) A comprehensive survey on graph neural networks. IEEE Transactions on Neural Networks and Learning Systems (), pp. 1–21. Cited by: §I, §I, §IIB.
 [53] (2018) How powerful are graph neural networks?. In International Conference on Learning Representations, Cited by: §IVA2, §IVA3.
 [54] (2016) Revisiting semisupervised learning with graph embeddings. In International conference on machine learning, pp. 40–48. Cited by: §IVA1.
 [55] (201603) Laplacian regularized lowrank representation and its applications. IEEE Transactions on Pattern Analysis and Machine Intelligence 38 (3), pp. 504–517. External Links: ISSN 01628828 Cited by: §I.
 [56] (2016) Scalable sparse subspace clustering by orthogonal matching pursuit. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3918–3927. Cited by: §VB.
 [57] (2020) Deep learning on graphs: a survey. IEEE Transactions on Knowledge and Data Engineering (), pp. 1–1. Cited by: §I, §IIE.
 [58] (2021) Data augmentation for graph neural networks. In The ThirtyFifth AAAI Conference on Artificial Intelligence, pp. . Cited by: §I.
 [59] (2018) Graph neural networks: A review of methods and applications. CoRR abs/1812.08434. Cited by: §I, §I, §IIB.
 [60] (2021) Interpreting and unifying graph neural networks with an optimization framework. arXiv preprint arXiv:2101.11859. Cited by: 1st item.
Comments
There are no comments yet.