1 Introduction
In the era of Internet, networkstructured data has penetrated into every corner of life. Representative examples include shopping networks Shchur et al. (2018), social networks Piao et al. (2021), recommendation systems Huang et al. (2021), citation networks Wan et al. (2021), etc.. Realworld scenarios such as these can be modeled as attributed graphs, i.e
., topological graphs structure with node attributes (or features). Due to nonEuclidean topological graph structure and complex node attribute, most existing machine learning approaches cannot be directly applied to analyze such data. To this end, graph neural networks (GNNs)
Kipf and Welling (2017)arises at the historic moment and have made great development in recent years. GNN aims to learn lowdimensional node representation for downstream tasks via simultaneously encoding the topological graph and node attribute. In this article, we will study the attributed graph clustering problem, which is one of the most challenging tasks in the fields of AI.
Attributed graph clustering, i.e., node clustering, aims to divide massive nodes into several disjoint clusters without intense manual guidance. To date, numerous attributed graph clustering methods have been proposed Wang et al. (2017); Zhang et al. (2019); Park et al. (2019); Cheng et al. (2020); Fan et al. (2020); Xia et al. (2021); Lin and Kang (2021), among which, most of them are based on graph autoencoder (GAE) and variational GAE (VGAE) Kipf and Welling (2016). For example, to learn a robust node representation, the variants of GAE and VGAE are proposed by Pan et al. (2018, 2020), namely adversarially regularized graph autoencoder (ARGA) and adversarially regularized variational graph autoencoder (ARVGA). To build a clusteringdirected network, inspired by deep embedding clustering (DEC) Xie et al. (2016), Wang et al. (2019) minimized the mismatch between clustering distribution and target distribution to improve the quality of node representation, and proposed deep attentional embedded graph clustering (DAEGC) approach. Similarly, Bo et al. (2020) presented structural deep clustering network (SDCN) to embed the topological structure into deep clustering. SDCN used the traditional autoencoder to get new node feature via encoding node attribute, and then used GNN to simultaneously encode topological structure and new node feature to learn final node representation for clustering. Tu et al. (2021) proposed deep fusion clustering network (DFCN), which used a dynamic crossmodality fusion mechanism for obtaining consensus node representation, thereby generating more robust target distribution for network optimizing. Although aforementioned methods have made encouraging progress, how to mine the highly heterogeneous information embedded in the attribute graph remains to be explored.
Recently, due to its powerful unsupervised representation learning ability, contrastive learning (CL) has made vast inroads into computer vision community
Chen et al. (2020); He et al. (2020). Motivated by this, several recent studies Velickovic et al. (2019); Sun et al. (2020); Zhang et al. (2021); Qiu et al. (2020); You et al. (2020); Zhu et al. (2021); Jin et al. (2021); Zhao et al. (2021) show promising results on unsupervised graph representation learning (GRL) using approaches related to CL, we call this kind of methods graph contrast representation learning methods (GCRL for short in this paper). For example, Velickovic et al. (2019) proposed deep graph information maximization (DGI) to learn node representation by contrasting the local nodelevel representation and the global graphlevel representation. Similarly, Sun et al. (2020) proposed to learn graphlevel representation by maximizing the mutual information between the graphlevel representation and representations of substructures. Based on the contrastive loss in SimCLR Chen et al. (2020), You et al. (2020) proposed a new graph contrastive learning network with kinds of graph augmentation approaches (GraphCL) for facilitating node representation learning. More recently, Zhu et al. (2021) first used adaptive graph augmentation schemes to construct different graph views, then extracted node representation via maximizing the agreement of node representation between graph views.Though driven by various motivations and achieved commendable results, many existing GCRL methods still have the following challenging issues:

They are taskagnostic, thus, will need a postprocessing to get clustering labels, resulting in suboptimal node representation for downstream node clustering task.

They fail to benefit from imprecise clustering labels, thus suffering from inferior performances.

They cannot handle outofsample (OOS) nodes, which limits their application in practical engineering.
As shown in Figure 1
, we propose the selfsupervised contrastive attributed graph clustering (SCAGC), a new attributed graph clustering approach that targets at addressing aforementioned limitations. In SCAGC, we first leverage graph augmentation methods to generate abundant attributed graph views, then, each augmented attributed graph has two compact representations: a clustering assignment probability produced by the clustering module and a lowdimension node representation produced by graph representation learning module. The two representations interact with each other and jointly evolve in an endtoend framework. Specifically, the clustering module is trained via contrastive clustering loss to maximize the agreement between representations of the same cluster. The graph representation learning module is trained using the proposed selfsupervised contrastive loss on pseudo labels,
i.e., clustering labels, where nodes within a same cluster are trained to have similar representations. We perform experiments on four attributed graph datasets and compare with 11 stateoftheart GRL and GCRL methods. The proposed SCAGC substantially outperforms all baselines across all benchmarks. The main contribution of the proposed SCAGC is twofold:
To the best of our knowledge, SCAGC could be the first contrastive attributed graph clustering work without postprocessing. SCAGC can directly predict the clustering assignment result of given unlabeled attributed graph. For OOS nodes, SCAGC can also directly calculate the clustering labels without retraining the entire attributed graph, which accelerates the implementation of SCAGC in practical engineering.

By benefiting form the clustering labels, we propose a new selfsupervised CL loss, which facilitates the graph representation learning. Extensive experimental results witness its effectiveness for attributed clustering.
2 Methodology
In this section, we first formalize the node clustering task on attributed graphs. Then, the overall framework of the proposed SCAGC will be introduced. Finally, we detail each component of the proposed network.
2.1 Problem Formalization
Given an arbitrary attributed graph , where is the vertex set, E is the edge set, is the node attribute matrix, N is the number of nodes, and d is the dimension of node attribute matrix. is the adjacency matrix of , and iff , i.e., there is an edge from node to .
In this article, we study one of the most representative downstream tasks of GNNs, i.e., node clustering. The target of node clustering is to divide the given N unlabeled nodes into K disjoint clusters , such that the node in the same cluster has high similarity to each other Cui et al. (2020); Xia et al. (2021).
2.2 Overall Network Architecture
As shown in Figure 1, the network architecture of the proposed SCAGC consists of the following joint optimization components: shared graph convolutional encoder, contrastive clustering module and selfsupervised graph contrastive representation learning module.

Shared Graph Convolutional Encoder: It aims to simultaneously map the augmented node attribute and topological graph structure to a new lowdimensional space for downstream node clustering task.

SelfSupervised GCRL Module: To learn more discriminative graph representation and utilize the useful information embedded in inaccurate clustering labels, this module is designed to maximize the similarities of intracluster nodes, i.e., positive pairs, while minimizing the similarities of intercluster nodes, i.e., negative pairs.

Contrastive Clustering Module: To directly get clustering labels, this module builds a clustering network by contrasting the representation of different clusters.
2.3 Shared Graph Convolutional Encoder
Graph contrastive representation has attracted much attention, due to its ability to utilize graph augmentation schemes to generate positive and negative node pairs for representation learning You et al. (2020); Zhu et al. (2021). Specifically, given an arbitrary attributed graph with node attribute X and topological graph G, two stochastic graph augmentation schemes and are leveraged to construct two correlated attributed graph views {} and {}, where , and , is the vth graph augmentation, denotes the set of all kinds of graph augmentation methods, including attribute masking, edge perturbation. To be specific, attribute masking randomly adds noise to node attributes, and edge perturbation randomly adds or drops edges in topological graph. The underlying prior of these two graph augmentation schemes is to keep the intrinsic topological structure and node attribute of attributed graph unchanged. Based on this prior, the learned node representation will be robust to perturbation on insignificant attributes and edges. In this article, we implement the graph augmentations following the setting in GCA Zhu et al. (2021).
After obtaining two augmented attributed graph views {} and {}, we utilize a shared twolayer graph convolutional network to simultaneously encode node attributes and topological graphs of augmented attributed graph views. Thus, we have
(1) 
(2) 
where is the 1st layer’s output of shared GNN; is the node representation under the vth graph augmentation; denotes the trainable parameter of graph convolutional encoder; ; ; I
is an identity matrix;
represents the nonlinear ReLU activation function.
So far, we have obtained the node representations and of two augmented attributed graph views.
2.4 SelfSupervised GCRL Module
In the field of GRL, contrastive learning based GRL has been an effective paradigm for maximizing the similarities of positive pairs while minimizing the similarities of negative pairs to learn discriminative graph representation. For a given attributed graph with N nodes, there are 2N augmented nodes. Traditional CL regard the representations of a node under two different augmentation as a positive pair, and leave other 2N2 pairs to be negative (see Figure 2 (a)). While having promising performance, this assumption runs counter to the criterion of clustering. In node clustering, we hope that the nodes in the same cluster have high similarity to each other while the nodes in different clusters have low similarity to each other. However, existing methods fail to well consider this criterion, i.e., neglecting the existence of falsenegative pairs .
In this article, by leveraging pseudo clustering labels , we can easily get the samples’ index of different clusters. As shown in Figure 2 (b), we aim to maximize the similarities of intracluster nodes, i.e., positive pairs, while minimizing the similarities of intercluster nodes, i.e., negative pairs. To this end, we first map the node representations and to obtain enhanced node representations and via a shared twolayer fully connected network with parameter , which also help to form and preserve more information in and , where , is the dimension of new node representation. After that, for the i
th node, we propose a new selfsupervised contrastive loss function, which is defined as
(3) 
where is the temperature parameter, represents the ith row of node representation . represents the set of nodes that belong to the same cluster as the ith node, and is its cardinality, which can be obtained from the pseudo clustering assignment matrix . is the set of indices of all nodes except the ith node.
Then, taking all nodes into account, the selfsupervised contrastive loss is
(4) 
2.5 Contrastive Clustering Module
How to obtain the clustering labels is crucial for downstream clustering task. Most existing methods directly implement classical clustering algorithms, e.g., K
Means or spectral clustering, on the learned node representation to get clustering results. However, such strategy executes the node representation and clustering in two separated steps, which limits clustering performance. To this end, we build a clustering network to directly obtain the clustering labels. Specifically, as shown in Figure
1, the clustering network is applied to transform the pattern structures of andinto probability distribution of clustering labels
and .To share the parameters across augmentations, we execute and through a shared twolayer fully connected network with parameter . Under this setting, we can ensure and own the same coding scheme. Thus, is the output of clustering network under the 1st augmented attributed graph view, and for the 2nd augmented attributed graph view, where K is the number of clusters, represents the probability that assigning the ith node to the kthe cluster .
For the obtained assignment matrices and , in the column direction, each column of is the representation of the kth cluster. Thus, we should push closer the cluster representation of the same class, and also push far away the cluster representation of different class. That is to say, for the kth cluster in each augmented attributed graph view, there is only one positive pair , and 2K2 negative pairs. To this end, motivated by the great success of contrastive learning Chen et al. (2020), we leverage the contrastive loss function to implement this constraint. Thus, for the kth cluster in the 1st augmentation, we have
(5) 
where
is parameter to control the softness. Given two vectors
f and s,is the cosine similarity between them. In this article, we use the function
to measure the similarity of node pairs. Then, taking all positive pairs into account, the contrastive clustering loss is defined as(6) 
Moreover, to avoid trivial solution, i.e., making sure that all nodes could be evenly assigned into all clusters, similar to Li et al. (2021); Mao et al. (2021), we herein introduce a clustering regularizer , which is defined as
(7) 
where .
In the proposed SCAGC training process, when we take the unaugmented attributed graph as the input of SCAGC, then we can get the clustering assignment matrix by discretizing the continuous output probability .
Remark 1
Solving outofsample nodes. For OOS nodes , SCAGC can directly take as input to calculate the clustering assignment matrix. While existing GRL and GCRL based methods is inefficient in OOS nodes , which require training the whole attributed graph, i.e., {}.
2.6 Optimization
Finally, we integrate the aforementioned three submodules into an endto end optimization framework, the overall objective function of SCAGC can be formulated as
(8) 
where is a tradeoff parameter. By optimizing Eq. (8), some nodes with correct labels will propagate useful information for graph representation learning, where the latter is used in turn to conduct the subsequent clustering. By this strategy, the node clustering and graph representation learning are seamlessly connected, with the aim to achieve better clustering results. We employ Adam optimizer Kingma and Ba (2015) with learning rate to optimize the proposed SCAGC, i.e., Eq. (8). Algorithm 1 presents the pseudocode of optimizing the proposed SCAGC.
Dataset  # Nodes  # Attribute dimension  # Edges  # Classes  Type  Scale 

ACM Tang et al. (2008)  3, 025  1, 870  29, 281  3  Paper relationship  Small 
DBLP Pan et al. (2016)  4, 057  334  5, 000, 495  4  Author relationship  Small 
AmazonPhoto Shchur et al. (2018)  7, 650  745  119, 081  8  Commodity purchase relationship  Medium 
AmazonComputers Shchur et al. (2018)  13, 752  767  245, 861  10  Commodity purchase relationship  Large 
3 Experiments
3.1 Experiment Setup
3.1.1 Benchmark Datasets
In this article, we use four realworld attributed graph datasets from different domains, e.g., academic network, shopping network, to evaluate the effectiveness of the proposed SCAGC, including ACM^{1}^{1}1http://dl.acm.org, DBLP ^{2}^{2}2https://dblp.unitrier.de/, AmazonPhoto^{3}^{3}3https://github.com/shchur/gnnbenchmark/raw/master/data/npz/amazon_electronics_photo.npz and AmazonComputers^{4}^{4}4https://github.com/shchur/gnnbenchmark/raw/master/data/npz/amazon_electronics_computers.npz. Table 1 presents detailed statistics of these datasets.
3.1.2 Baseline Methods
We compare clustering performance of the proposed SCAGC with 11 stateoftheart node clustering methods, including the following three categories:

Classical clustering methods: Kmeans, and spectral clustering (SC);
For the first category, Kmeans takes raw node attribute as input, and SC takes raw topological graph structure as input. As for the second and third categories, they take raw node attribute and topological graph structure as input. For GAE, VGAE, ARGA, ARVGA, SDCN, DFCN, GraphCL and GCA, the clustering assignment matrix is obtained by running Kmeans on the extracted node representation.
3.1.3 Evaluation Metrics
Similar to Bo et al. (2020); Tu et al. (2021), we leverage four commonly used metrics to evaluate the efficiency of all methods, i.e., accuracy (ACC), normalized mutual information (NMI), average rand index (ARI), and macro F1score (F1). For these metrics, the higher the value, the better the performance.
Dataset  ACM  DBLP  

Metric  ACC ()  NMI ()  ()  ARI ()  ACC ()  NMI ()  ()  ARI () 
KMeans  67.26 0.75  31.91 0.35  54.47 0.32  30.76 0.62  39.08 0.36  10.11 0.21  38.01 0.37  7.28 0.29 
SC  36.80 0.00  0.75 0.00  42.63 0.00  0.58 0.00  29.57 0.01  0.08 0.00  40.86 0.00  0.70 0.00 
GAE (NeurIPS’ 16)  82.47 0.92  50.29 1.86  82.65 0.89  54.59 1.99  59.25 0.40  26.37 0.29  59.84 0.32  20.95 0.43 
VGAE (NeurIPS’ 16)  82.85 0.63  50.22 1.24  82.85 0.62  55.56 1.15  62.22 0.83  26.62 1.37  60.70 0.85  25.08 1.23 
ARGA (IEEE TC’ 20)  86.85 0.64  58.05 1.53  86.84 0.60  64.77 1.53  64.60 0.95  28.65 0.63  64.49 0.63  27.44 1.27 
ARVGA (IEEE TC’ 20)  84.84 0.36  52.89 0.84  84.86 0.35  59.67 0.85  64.10 0.96  31.01 0.89  64.36 1.01  25.69 1.51 
DAEGC (IJCAI’ 19)  87.18 0.05  59.32 0.12  87.27 0.05  65.46 0.12  75.87 0.46  42.45 0.58  75.41 0.45  46.80 0.87 
SDCN (WWW’ 20)  89.44 0.26  65.89 0.95  89.40 0.28  71.47 0.67  71.91 0.57  37.80 1.06  71.21 0.73  40.45 1.18 
DFCN (AAAI’ 21)  90.15 0.05  67.98 0.18  90.14 0.05  73.25 0.14  75.42 0.82  43.20 0.74  75.31 0.71  45.07 1.91 
GraphCL (NeurIPS’ 20)  90.18 0.04  68.24 0.12  90.04 0.05  73.38 0.09  74.90 0.10  45.14 0.14  74.51 0.10  45.86 0.19 
GCA (WWW’ 21)  88.95 0.26  65.33 0.56  89.07 0.26  69.82 0.67  73.90 0.48  41.35 0.79  72.91 0.76  43.65 0.65 
SCAGC  91.83 0.03  71.28 0.06  91.84 0.03  77.29 0.07  79.42 0.02  49.05 0.02  78.88 0.02  54.04 0.03 
3.1.4 Implementation Details
The proposed SCAGC and the baseline methods are implemented on a Windows 10 machine with an Intel (R) Xeon (R) Gold 6230 CPU and dual NVIDIA Tesla P100PCIE GPUs. The deep learning environment consists of PyTorch 1.6.0 platform, PyTorch Geometric 1.6.1 platform, and TensorFlow 1.13.1. To ensure the availability of the initial pseudo clustering assignment matrix
, we pretrain the shared graph convolutional encoder and graph contrastive representation learning module via a classic contrastive learning loss.The hyperparameters of the proposed methods on each datasets are reported in supplementary material. In this article, we use the adaptive graph augmentation functions proposed by Zhu et al. (2021) to augment node attribute and topological structure. Notably, the degree centrality is used as the node centrality function to generate different topology graph views. The output size of shared graph convolutional encoder is set to 256, the output size of graph contrastive representation learning subnetwork is set to 128, and the output size of contrastive clustering subnetwork is set to be equal to the number of clusters K.
For all baseline methods, we follow the hyperparameter settings as reported in their articles and run their released code to obtain the clustering results. To avoid the randomness of the clustering results, we repeat each experiment of SCAGC and baseline methods for 10 times and report their average values and the corresponding standard deviations.
Dataset  AmazonPhoto  AmazonComputers  

Metric  ACC ()  NMI ()  ()  ARI ()  ACC ()  NMI ()  ()  ARI () 
KMeans  36.53 4.11  19.31 3.75  32.63 1.90  12.61 3.54  36.44 2.64  16.64 4.59  28.08 1.44  2.71 1.98 
SC  25.58 0.02  0.60 0.02  5.50 0.00  0.03 0.00  36.47 0.01  0.37 0.02  5.81 0.00  0.59 0.00 
GAE (NeurIPS’ 16)  42.03 0.54  31.87 0.51  34.01 0.42  19.31 0.53  43.14 1.74  35.47 1.58  27.06 2.63  19.61 1.85 
VGAE (NeurIPS’ 16)  40.67 0.92  31.46 2.03  38.01 2.67  15.70 1.18  42.44 0.16  37.62 0.23  24.94 0.14  22.16 0.35 
ARGA (IEEE TC’ 20)  57.79 2.26  48.01 1.65  52.56 2.68  34.44 1.58  45.67 0.37  37.21 0.92  40.02 1.29  26.28 1.02 
ARVGA (IEEE TC’ 20)  47.89 1.36  41.37 1.39  42.96 1.46  27.72 1.06  47.16 0.26  38.84 0.96  41.51 0.83  27.27 0.84 
DAEGC (IJCAI’ 19)  60.14 0.93  58.03 1.25  52.37 2.39  43.55 1.76  49.26 0.49  39.28 4.97  33.71 5.76  35.29 1.97 
SDCN (WWW’ 20)  71.43 0.31  64.13 0.10  68.74 0.22  51.17 0.13  54.12 1.13  39.90 1.51  28.84 4.20  31.59 1.08 
DFCN (AAAI’ 21)  73.43 0.61  64.74 1.04  69.96 0.49  52.39 1.01  56.24 0.16  41.83 0.40  33.39 1.11  33.02 0.39 
GraphCL (NeurIPS’ 20)  66.61 0.56  57.35 0.32  58.52 0.55  45.13 0.44  50.22 0.66  41.78 2.44  32.89 2.16  36.94 3.20 
GCA (WWW’ 21)  71.17 0.27  60.70 0.41  64.12 1.21  49.09 0.62  54.92 0.55  44.36 0.86  40.43 0.45  35.61 0.62 
SCAGC  75.25 0.10  67.18 0.13  72.77 0.16  56.86 0.23  58.43 0.12  49.92 0.08  43.14 0.09  38.29 0.07 
3.2 Node Clustering Performance
Table 2 and Table 3 present the node clustering results of the proposed SCAGC and all baseline methods. From these results, we have the following observations:

The proposed SCAGC and other GCN based methods (GAE, VGAE, ARGA, ARVGA, DAEGC, SDCN, DFCN, GraphCL, GCA) significantly and consistently outperforms KMeans and SC. The reason may be that GCN based methods simultaneously explore the information embedded in node attribute and topological graph structure. In contrast, these classical clustering methods only use the node attribute or topological structure. Moreover, compared with classical clustering methods, GCN based methods uses a multilayer nonlinear graph neural network as the feature extractor, then map input data into a new subspace to carry out downstream clustering. These results well demonstrate the effectiveness of GCN on processing attributed graph data.

The proposed SCAGC achieves much better clustering results than some representative graph autoencoder (GAE, VGAE, ARGA, ARVGA). This is because compared with traditional graph autoencoder, SCAGC leverages graph augmentation scheme to generate useful attributed graph, and take the relationship between positive pair and negative pair into account. These strategies help to improve the quality of node representation.

In some cases, the clustering performance of GCL based baselines, i.e., GraphCL and GCA, are inferior to clusteringdirected, i.e., DAEGC, SDCN, DFCN and the proposed SCAGC. This is because SCAGC integrate the node clustering and representation into an endtoend framework, which helps to better explore the cluster structure. In contrast, GraphCL and GCA execute the node representation and clustering in two separated steps, which limits their performances.

The proposed SCAGC consistently outperforms all the stateoftheart baselines on all four datasets. Particularly, SCAGC surpasses the closest competitor GCA by 5.95% on ACM and 7.7% on DBLP, in terms of NMI. These remarkable performance verify the clustering ability of SCAGC. And it demonstrates that contrastive clustering module and selfsupervised graph contrastive representation learning module are effective at benefiting the node representation learning and clustering.
3.3 Ablation Studies
To better illustrate the effectiveness of different components in SCAGC, two ablation scenarios are implemented to further verify the effectiveness of contrastive clustering module, and the proposed selfsupervised GCRL loss.
3.3.1 Effect of Contrastive Clustering Module
To better illustrate the effectiveness of contrastive clustering module, we compare the clustering results of SCAGC and SCAGC without contrastive clustering module (termed SCAGC w/o CCM) on ACM and DBLP datasets. Note that, in this scenario, SCAGC w/o CCM is trained using traditional contrastive loss Chen et al. (2020); Zhu et al. (2021), i.e., SCAGC w/o CCM is clusteringagnostic. As shown in Figure 3 (ab), the clustering performances of SCAGC (see the red bar) are substantially superior to SCAGC w/o CCM (see the yellow bar). This is because SCAGC can better extract node representation benefiting from contrastive clustering module. While in the absence of the specific clustering task, SCAGC w/o CCM fails to explore the cluster structure, resulting in the quick drop of the performance of SCAGC.
3.3.2 Importance of the Proposed SelfSupervised GCRL Loss
To this end, we compare the clustering performances of SCAGC and SCAGC without selfsupervised GCRL loss (termed SCAGC w/o SSC) on ACM and DBLP datasets. Note that, in this scenario, SCAGC w/o CCM is trained by replacing the first term of Eq. (8), i.e., Eq. (3), to a standard contrastive loss Chen et al. (2020); Zhu et al. (2021). As reported in Figure 3 (ab), SCAGC (see red bar) always achieves the best performance in terms of all four metrics. These results demonstrate that pseudo label supervision guides the GCRL, thus, leveraging clustering labels are promising methods for unsupervised clustering task.
3.4 Model Discussion
3.4.1 Visualizations of Clustering Results
By simultaneously exploiting the good property of GCRL and taking advantage of the clustering labels, SCAGC ought to learn a discriminative node representation and desirable clustering label at the same time. To illustrate how SCAGC achieves the goal, as shown in Figure 4, we implement tSNE van der Maaten and Hinton (2008) on the learned M at four different training iterations on ACM and DBLP datasets, where different colors indicated different clustering labels predicted by SCAGC. As observed, the cluster assignments become more reasonable, and different clusters scatter and gather more distinctly. These results indicate that the learned node representation become more compact and discriminative the increasing of the number of iteration.
3.4.2 Convergence Analysis
Taking ACM dataset as an example, we investigate the convergence of SCAGC. We record the objective values and clustering results of SCAGC with iteration and plot them in Figure 5. As shown in Figure 5, the objective values (see the blue line) decrease a lot in the first 100 iterations, then continuously decrease until convergence. Moreover, the ACC of SCAGC continuously increases to a maximum in the first 200 iterations, and generally maintain stable to slight variation. The curves in terms of NMI metric has a similar trend. These observations clearly indicate that SCAGC usually converges quickly.
4 Conclusion and Future Work
To conclude, we propose a novel selfsupervised contrastive attributed clustering (SCAGC) approach, which can directly predict the clustering labels of unlabeled attributed graph and handle outofsample nodes. We also propose a new selfsupervised contrastive loss based on imprecise clustering label to improve the quality of node representation. We believe that the proposed SCAGC will help facilitate the exploration of attributed graph where labels are time and labor consuming to acquire. In the future, we will study how to better explore reliable information embedded in imprecise clustering labels and use it to improve the contrastive loss.
References
 Structural deep clustering network. In WWW, pp. 1400–1410. Cited by: §1, item 2, §3.1.3.
 A simple framework for contrastive learning of visual representations. In ICML, pp. 1597–1607. Cited by: §1, §2.5, §3.3.1, §3.3.2.
 Multiview attribute graph convolution networks for clustering. In IJCAI, pp. 2973–2979. Cited by: §1.
 Adaptive graph encoder for attributed graph embedding. In ACM SIGKDD, pp. 976–985. Cited by: §2.1.

One2Multi graph autoencoder for multiview graph clustering
. In WWW, pp. 3070–3076. Cited by: §1.  Momentum contrast for unsupervised visual representation learning. In IEEE CVPR, pp. 9726–9735. Cited by: §1.
 Knowledgeaware coupled graph neural network for social recommendation. In AAAI, pp. 4115–4122. Cited by: §1.
 Multiscale contrastive siamese networks for selfsupervised graph representation learning. In IJCAI, pp. 1477–1483. Cited by: §1.
 Adam: A method for stochastic optimization. In ICLR, Cited by: §2.6.
 Variational graph autoencoders. In NeurIPS Workshop on Bayesian Deep Learning, Cited by: §1, item 2.
 Semisupervised classification with graph convolutional networks. In ICLR, Cited by: §1.
 Contrastive clustering. In AAAI, pp. 8547–8555. Cited by: §2.5.
 Graph filterbased multiview attributed graph clustering. In IJCAI, pp. 2723–2729. Cited by: §1.
 Deep mutual information maximin for crossmodal clustering. In AAAI, pp. 8893–8901. Cited by: §2.5.
 Learning graph embedding with adversarial training methods. IEEE Trans. Cybern. 50 (6), pp. 2475–2487. Cited by: §1, item 2.
 Adversarially regularized graph autoencoder for graph embedding. In IJCAI, pp. 2609–2615. Cited by: §1.
 Triparty deep network representation. In IJCAI, pp. 1895–1901. Cited by: Table 1.
 Symmetric graph convolutional autoencoder for unsupervised graph representation learning. In IEEE ICCV, pp. 6518–6527. Cited by: §1.
 Predicting customer value with social relationships via motifbased graph attention networks. In WWW, pp. 3146–3157. Cited by: §1.
 GCC: graph contrastive coding for graph neural network pretraining. In ACM SIGKDD, pp. 1150–1160. Cited by: §1.
 Pitfalls of graph neural network evaluation. In NeurIPS Workshop on Relational Representation Learning, Cited by: §1, Table 1.
 InfoGraph: unsupervised and semisupervised graphlevel representation learning via mutual information maximization. In ICLR, Cited by: §1.
 ArnetMiner: extraction and mining of academic social networks. In ACM SIGKDD, pp. 990–998. Cited by: Table 1.
 Deep fusion clustering network. In AAAI, pp. 9978–9987. Cited by: §1, item 2, §3.1.3.
 Visualizing data using tsne. Journal of Machine Learning Research 9 (86), pp. 2579–2605. Cited by: §3.4.1.
 Deep graph infomax. In ICLR, Cited by: §1.

Contrastive and generative graph convolutional networks for graphbased semisupervised learning
. In AAAI, pp. 10049–10057. Cited by: §1.  Attributed graph clustering: A deep attentional embedding approach. In IJCAI, pp. 3670–3676. Cited by: §1, item 2.
 MGAE: marginalized graph autoencoder for graph clustering. In CIKM, pp. 889–898. Cited by: §1.
 Selfsupervised graph convolutional network for multiview clustering. IEEE Trans. Multim. doi: 10.1109/TMM.2021.3094296. Cited by: §1, §2.1.

Unsupervised deep embedding for clustering analysis
. In ICML, Vol. 48, pp. 478–487. Cited by: §1.  Graph contrastive learning with augmentations. In NeurIPS, Cited by: §1, §2.3, item 3.
 Deep contrastive graph representation via adaptive homotopy learning. CoRR abs/2106.09244. Cited by: §1.
 Attributed graph clustering via adaptive graph convolution. In IJCAI, pp. 4327–4333. Cited by: §1.
 Graph debiased contrastive learning with joint representation clustering. In IJCAI, pp. 3434–3440. Cited by: §1.
 Graph contrastive learning with adaptive augmentation. In WWW, pp. 2069–2080. Cited by: §1, §2.3, item 3, §3.1.4, §3.3.1, §3.3.2.