1 Introduction
Graphs are essential tools for mining connections among data. Due to the nonEuclidean topology structure, Graph Convolutional Network (GCN) can better handle the complex graph data where connections information among nodes is unordered and variable in size. For example, analyzing graph data plays an important role in various data mining tasks including node classification [Kipf and Welling2016a], link prediction [Pan et al.2018] and node clustering [Zhang et al.2019]
. Graph neural networks are roughly categorized into four categories: recurrent graph neural networks (RecGNN), convolutional graph neural networks (ConvGNN), graph autoencoders (GAE), and spatialtemporal graph neural networks (STGNN)
[Wu et al.2019a].Graph autoencoder (GAE) is an effective unsupervised learning framework that encoders the node features and graph structure into the latent representations and decoder the graph structure. GAE can be used to learn network embeddings and graph generative distributions. For network embedding, GAE mainly learns the latent representations by reconstructing graph structure such as the graph adjacency matrix. And there are many approaches that build GAEs for network embedd learning, such as DNGR [Cao et al.2016], SDNE [Wang et al.2016], ARVGA [Pan et al.2018]. For graph generation, GAE encodes graphs into latent representations and decodes a graph structure from latent representations to learn the generative distribution of graphs [Li et al.2018].
Network embedding (NE) methods aim at learning lowdimensional latent representations of nodes in a network. These representations can be used as features for a wide range of tasks on graphs such as classification, clustering and link prediction. The classical methods such as PCA [Wold et al.1987] [Zhang and Tong2019] and Laplacian eigenmaps (LE) [Belkin and Niyogi2002] have been widely used in network embedding. GAE can also address the network embedding problems and the main distinction between GAE and network embedding is that GAE is designed for various tasks while network embedding covers various kinds of methods targeting the same task.
Clustering plays an important role in detecting communities and analyzing structures of these networks and is an important task in network embedding. Compared with classical methods such as Spectral Clustering
[Ng et al.2002], CAN [Nie et al.2014], the network embedding algorithms such as DEC [Xie et al.2016] perform more effectively and high nonlinearity. However, some early network embedding methods such as DeepWalk [Perozzi et al.2014], node2vec [Grover and Leskovec2016] only applicable to unweighted graphs, which restricts those methods’ development on the weighted graph filed.Some GAE methods such as [Kipf and Welling2016b] only focus on the reconstruction of the graph adjacency matrix while ignoring the reconstruction of node features for network embedding, which may lead to overfitting due to the capacity of the autoencoders. And the properties of the fixed adjacency matrix make these methods cannot properly represent the connections among nodes in latent space. To solve this problem, in this paper, we propose a novel Graph Convolutional Autoencoder with Bidecoder and Adaptivesharing Adjacency method, namely BAGA. The contributions can be summarized below:
Equipped with the bidecoder, our framework can minimize the reconstruction loss of node features and graph structure simultaneously and avoid the overfitting due to the simply reconstruction of the graph adjacency matrix.
The proposed method can update the adjacency matrix adaptively by latent representations, which result in a clearer graph structure and more accurate connections.
The embedded adjacency matrix can not only be adaptively updated but also be shared with the Laplacian graph construction. Experimental results support the effectiveness of updating the adjacency matrix.
2 Related Work
In this section, we outline the background of graph autoencoder and network embedding by listing related literature.
2.1 Graph Autoencode
As an important unsupervised learning framework, Graph autoencoder (GAE) arouses considerable research interests and an elaborate survey can be found in [Wu et al.2019b]. GAE is a deep neural architecture that maps graph data into a latent space and decoder graph structure information from latent representations. Earlier approaches such as DNGR [Cao et al.2016] and SDNE [Wang et al.2016]
mainly build the GAE frameworks for network embedding by multilayer perceptrons. And they employ deep autoencoders to preserve the graph proximities and model positive pointwise mutual information (PPMI). Nevertheless, DNGR and SDNE only consider node structure information but ignore the feature information of nodes. Variational Graph Autoencoder (VGAE)
[Kipf and Welling2016b] is a variational version of GAE to learn the distribution of data.2.2 Network Embedding
The goal of network embedding is to find latent lowdimensional representations of nodes which preserves the topological information of nodes. Some other methods preserve observed pairwise similarity and structural equivalence in recently. For example, DeepWalk [Perozzi et al.2014] uses a random walk to generate sequences of nodes from a network and transform unweighted graph structure information into linear sequences. Inspired by DeepWalk, DRNE [Tu et al.2018]
adopts a Long Short Term Memory (LSTM) network to aggregate a node’s neighbors. Similar to DRNE, NetRA
[Yu et al.2018] also uses the LSTM network with random walks rooted on each node and regularizes the learned network embeddings within a prior distribution via adversarial training.3 Problem and Framework
In this section, we define problem related concepts and propose our framework.
3.1 Problem Formulation
Given a nondirected weighted graph , where is a set of nodes with , is the connecting edge among each node that can be represented as an adjacency matrix , and is the feature matrix of all nodes, i.e., , where
is the feature vector of node
. is the number of nodes and is the dimension of input data. Edge weights in the adjacency matrix are real numbers denoting the degree of connection among nodes rather than simple {0, 1} in the unweighted graph. Although edge weights in weighted graphs can be negative, we only consider nonnegative weights in this paper. And the sum of weights between a node and the other nodes connected to it is 1.Our goal is to learn a latent representation of each node with the formal format as: , where is the th row of the matrix and is the dimension of embedding. The is the embedding matrix in latent space which should preserve the topological structure as well as node features information. Furthermore, the latent representations can more accurately represent the connections among nodes for partitioning the nodes of the graph into clusters .
3.2 Overall Framework
Our objective is to learn better latent representations of nodes and more accurate connections in the graph . To this end, we propose a novel graph convolution autoencoder with biencoder and adaptivesharing adjacency method whose adjacency matrix can be adaptively updated. Fig. 1 shows the workflow of BAGA which consists of two three modules: the graph convolutional encoder, bidecoder and the update of the adjacency matrix .
Graph Convolutional Encoder The proposed framework takes the adjacency matrix and feature matrix as input to learn the latent representations .
Bidecoder The framework reconstructs the node features and adjacency matrix from learned simultaneously which harnesses the overall loss of the network.
Adjacency Matrix Update Since the feature matrix becomes better latent representations during the iteration, the adjacency matrix also needs to be adaptively updated to so as to better represent the connections among nodes in latent space. The adaptive update process of the adjacency matrix is also the most important contribution of this paper.
4 Proposed Algorithm
The graph convolutional autoencoder in the framework has a graph convolutional encoder and a bidecoder which include node features reconstruction and graph structure reconstruction. For better maintain the topology structure, the latent representations are embedded into the Laplacian graph construction.
4.1 Graph Convolutional Encoder Model
The neural network of BAGA consists of layers with transformations layers and an initial layer, where is an even number. The first hidden layers are the encoders, which learn a layerwise transformation by a spectral convolution function
(1) 
where is the input for convolution, is the output after convolution, is the layer of encoder network. is the weighted parameter matrix that need to be learned in the neural network. In this paper, is the input node features matrix. Specifically speaking each layer of our graph convolution network can be calculated as follows:
(2) 
Here, , ,
is the identity matrix of
andis the activation function such as Relu(t) = max(0, t) or sigmoid(t) =
. There are two transformations performed alternately in the process of encoder that one is an activation layer with a Relu() activation function and the other is a linear layer. The graph encoder is constructed as follows:(3)  
And this process encodes both graph structure and node features into the latent representations which is shared by the encoder and the decoder.
4.2 Bidecoder Model
The last layers are the decoders, which reconstruct both graph structure and node features for the learned latent representations. The node features decoder is the mirror symmetry process of the encoder, which also has a linear layer and an activation layer with a Relu() function. The node features decoder is constructed as follows:
(4)  
where is the layer of encoder network. The reconstruction loss of node features is calculated as:
(5) 
where is the reconstructed node features.
For the graph structure decoder, the reconstructed graph can be presented as follows:
(6) 
where is the embedding latent representations that . The reconstruction loss of the graph structure is calculated as:
(7) 
By reconstructing both graph structure and node features simultaneously, the framework can exploit both latent node features and graph structure information.
4.3 Laplacian Graph Structure and Loss Function
As [Roweis and Saul2000] proved, the aforementioned reconstruction criterion can smoothly capture the data manifolds and thus preserve the similarity among nodes. Nevertheless, we also want to make the nodes that have similar neighborhood structures have similar structures in latent space. Therefore, we equip our framework with the Laplacian graph structure that shares the same latent representations as follows:
(8)  
s.t. 
Here, is the latent representation of node , denotes the th column vector of adjacency matrix , is the regularization parameters. Additionally, it is worth emphasizing that is not a hyperparameter but a parameter that can be learned adaptively, which will be proved in the next chapter. And is the Laplacian matrix which can be calculated as , where is the diagonal matrix whose the th diagonal element is . Therefore, the loss of the whole framework can be expressed as:
(9) 
where is a 2norm regularizer term with coefficient to prevent overfitting, which is defined as follows:
(10) 
5 Optimization
To our best knowledge, few methods can update the adjacency matrix during the optimization process in the graph convolutional network (GCN). In this section, we introduce the update of and optimization of the framework.
5.1 Update The Adjacency Matrix
Though the element of the adjacency matrix denotes the degree of connection among nodes, the initial value may not be optimal. Since we embed updating adjacency matrix in the optimization, the learned adjacency matrix can represent the more accurate connections among nodes in latent space. In practical applications, sparse adjacency matrix tends to bringing better results and that is a reason why we do not update the adjacency matrix by backpropagation algorithm directly, which will produce a meaningless dense matrix.
Although there are the adjacency matrix in graph reconstruction loss and node features reconstruction loss, for obtaining a sparse adjacency matrix, we only focus on the Laplacian graph structure when updating the adjacency matrix. The process of updating the adjacency matrix
is represented as follows:(11)  
Denote the distance between two nodes , as and is a vector with its th element , and is a vector with its th element . The Lagrangian equation of problem (11) is represented as:
(12) 
where and are the Lagrangian mulipliers. Using the KarushKuhnTucker (KKT) conditions, we can derive the optimal solution of as
(13) 
where . For a sparse similar matrices , when calculating the similarity matrix , only the k sample points closest to are taken into consideration. Therefore, satisfies , and we have
(14) 
Based on Eq. (14), The overall is set to the mean of and it can be learned adaptively as:
(15) 
which verified is not a hyperparameter. Without loss of generality, suppose are ordered from small to large. The adjacency matrix can be solved as:
(16) 
Based on Eq. (16), the adjacency matrix can be updated into by the latent representations for clearer graph structure.
In the experiments, adjacency matrix is updated in each iteration where the change of the adjacency matrix is showed in Fig. 2. The adjacency matrix on the iteration of 20, 60, 100, 150, 200, 250 are selected to be demonstrated. Since the datasets used in this paper are arranged in cluster order, the higher the degree of diagonalization of the adjacency matrix is, the more accurate connections among nodes. The results in Fig. 2 show the graph construction becomes more and more clear, which verifies the effectiveness of updating the adjacency matrix in our method.
5.2 Optimize The Model
To optimize the aforementioned model, the goal is to minimize the overall loss as a function of the neural network parameter . The calculation of the partial derivative of
is estimated using the backpropagation. Furthermore, the proposed framework can be optimized by using adaptive moment estimation (Adam), where
is the maximum iteration number. Besides that, pseudocode of our method is summarized in Algorithm 1.6 Experiments
We show experimental results of our proposed method against baseline algorithms on several datasets. The results not only demonstrate the advantage of bidecoder and adaptivesharing adjacency but also support the effectiveness of updating the adjacency matrix by latent representations for our method.
6.1 Datasets
Since the proposed method deals with weighted graphs rather than unweighted graphs, we do not use graph datasets such as Cora, Citeseer and PubMed which have been widely used in other graph convolutional networks methods. The datasets used in this paper and the neurons of each layer are listed in Table
1.Dataset  Nodes  Features  Neurons in each layer 

UMIST  575  32 32  1024256802561024 
COIL20  1440  32 32  1024256802561024 
ORL  400  32 32  1024256802561024 
ATT40  400  32 32  1024256802561024 
YALEB  16128  64 64  4096102425610244096 
FEI  2800  80 80  6400102425610246400 
Methods  UMIST  COIL20  YALEB  ORL  ATT40  FEI  

ACC  NMI  ACC  NMI  ACC  NMI  ACC  NMI  ACC  NMI  ACC  NMI  
OURS  55.99  73.66  59.81  72.32  50.30  54.23  59.45  78.22  64.02  79.93  42.30  74.11 
OURS_NA  52.69  72.54  64.19  78.64  46.33  49.61  56.64  74.27  58.65  73.74  41.09  73.34 
GAE  47.90  65.25  60.83  74.88  45.21  51.51  59.83  78.21  56.26  75.98  37.74  71.60 
ARGA  48.61  68.77  65.75  79.11  40.24  42.06  51.87  72.80  53.75  74.33  31.63  66.28 
SAE  41.23  60.19  55.21  70.34  40.18  42.77  48.03  65.67  52.75  73.74  31.38  66.16 
DEC  36.47  56.96  52.33  67.13  42.30  43.52  37.45  38.93  52.63  72.64  30.52  60.38 
SC  46.59  64.39  62.34  78.47  45.81  50.58  55.97  72.82  60.30  78.31  39.43  62.19 
means  42.06  63.95  58.54  73.16  39.30  45.14  51.62  72.48  52.12  72.60  35.30  69.48 
6.2 Baselines
We compare our algorithm against several stateoftheart algorithms for the clustering task on the learned latent representations.
GAE [Kipf and Welling2016a] is an autoencoderbased unsupervised framework for graph data, which naturally leverages both topological and content information.
ARGA [Pan et al.2018] is an adversarially regularized autoencoder algorithm that uses graph autoencoder to learn the embedding.
SAE [Bengio et al.2007] is a greedy layerwise unsupervised training strategy consisting of multiple layers of such autoencoders.
DEC [Xie et al.2016] is a wellknown autoencoder based deep clustering model which simultaneously learns feature representations and cluster assignments using deep neural network.
Spectral Clustering (SC) [Ng et al.2002] uses the similarity matrix constructed by linear kernel to perform dimensionality reduction before clustering and is widely used in graph clustering.
means
is the base of many clustering methods. Here we run kmeans on original node features as a benchmark.
To verify the effectiveness of updating the adjacency matrix in our method, we deliberately make a baseline
which has the same loss function as our method except for updating of the adjacency matrix
.It is worth explaining that the method which only suitable for unweighted graphs, such as Deep Walk [Perozzi et al.2014], is not included in the baselines.
6.3 Experimental Settings
Since not all baselines rely on the graph structure, we do not take the adjacency matrix as the similarity matrix to perform the classical spectral clustering on the latent representations like [Zhang et al.2019]. For the fairness of the experiment, we run the means algorithm on the learned latent representations 10 times with different initializations and report the average result for all baselines.
In terms of parameter settings, all the hyperparameters settings in our method are listed in Table 3. And for other baselines, the parameters are set to the values that make the best experimental results. We optimize methods with the Adam algorithm and report the excellent results during the .
Para.  

Value  4  15  0.001  0.001  1e4  300 
6.4 Experimental Results and Analysis
In the experiment, we run all methods to generate latent representations that are used for the clustering task. The accuracy(ACC) [Papadimitriou and Steiglitz1982] and normalized mutual information (NMI) [Fan1949] are used as the metric to measure performance. The best result on each dataset has been highlighted and the secondbest result is underlined in Table 2.
Fig. 3 shows the partial visualization results on COIL20 and each row corresponds to a cluster. From the result of Fig. 3, our method corresponds to clusters very well, except for confusing cats and cars. And the running time on the same experimental condition for our method and the baseline ARGA is given in Table 4.
Methods  Ours  ARGA 
Running Time  180.09  217.36 
The observations are as follows:
1) BAGE outperforms the other methods that only exploit either latent node features or graph structure by a very large margin on many datasets, due to the clear reason that BAGE integrates both kinds of information by node features decoder and graph structure decoder, which can complement each other and greatly improve clustering performance.
2) BAGE outperforms the GAE and OURS_NA a very large margin, which verifies the effectiveness of updating the adjacency matrix with latent representations. As shown in Fig. 2, the graph structure of the latent representations has become clearer and connections among nodes become more accurate with the update of adjacency matrix A.
3) The embedded adjacency matrix can not only be adaptively updated but also be shared with the Laplacian graph structure, which embeds the latent representations into the Laplacian graph structure and results in more correct connections.
7 Conclusion
In this paper, we propose a novel bidecoder and adaptivesharing adjacency method based on graph convolutional autoencoder for network embedding. We address the case that most existing graph autoencoder algorithms neglect the construction of node features and suffer from the incorrect connections among latent representations due to the fixed adjacency matrix. We proposed an effective way to update the adjacency matrix and enforce the adjacency matrix can more accurately represent the connections among nodes in latent space. The embedded adjacency matrix can not only be adaptively updated but also be shared with the Laplacian graph structure. Furthermore, the unique bidecoder mechanism harnesses the overall loss of the network and avoids overfitting due to the capacity of the autoencoders. Therefore, the learned latent representations can better represent the information of nodes, which enables our method to achieve competitive performance compared to classical and stateoftheart methods. Experimental results support the effectiveness of our method by comparing it to other startoftheart algorithms.
References
 [Belkin and Niyogi2002] Mikhail Belkin and Partha Niyogi. Laplacian eigenmaps and spectral techniques for embedding and clustering. In Advances in neural information processing systems, pages 585–591, 2002.
 [Bengio et al.2007] Yoshua Bengio, Pascal Lamblin, Dan Popovici, and Hugo Larochelle. Greedy layerwise training of deep networks. In Advances in neural information processing systems, pages 153–160, 2007.

[Cao et al.2016]
Shaosheng Cao, Wei Lu, and Qiongkai Xu.
Deep neural networks for learning graph representations.
In
Thirtieth AAAI Conference on Artificial Intelligence
, 2016. 
[Fan1949]
Ky Fan.
On a theorem of weyl concerning eigenvalues of linear transformations i.
Proceedings of the National Academy of Sciences of the United States of America, 35(11):652, 1949.  [Grover and Leskovec2016] Aditya Grover and Jure Leskovec. node2vec: Scalable feature learning for networks. In Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining, pages 855–864. ACM, 2016.
 [Kipf and Welling2016a] Thomas N Kipf and Max Welling. Semisupervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907, 2016.
 [Kipf and Welling2016b] Thomas N Kipf and Max Welling. Variational graph autoencoders. arXiv preprint arXiv:1611.07308, 2016.
 [Li et al.2018] Yujia Li, Oriol Vinyals, Chris Dyer, Razvan Pascanu, and Peter Battaglia. Learning deep generative models of graphs. arXiv preprint arXiv:1803.03324, 2018.

[Ng et al.2002]
Andrew Y Ng, Michael I Jordan, and Yair Weiss.
On spectral clustering: Analysis and an algorithm.
In Advances in neural information processing systems, pages 849–856, 2002.  [Nie et al.2014] Feiping Nie, Xiaoqian Wang, and Heng Huang. Clustering and projected clustering with adaptive neighbors. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 977–986. ACM, 2014.
 [Pan et al.2018] Shirui Pan, Ruiqi Hu, Guodong Long, Jing Jiang, Lina Yao, and Chengqi Zhang. Adversarially regularized graph autoencoder for graph embedding. arXiv preprint arXiv:1802.04407, 2018.
 [Papadimitriou and Steiglitz1982] Christos H Papadimitriou and Kenneth Steiglitz. Combinatorial optimization, volume 24. Prentice Hall Englewood Cliffs, 1982.
 [Perozzi et al.2014] Bryan Perozzi, Rami AlRfou, and Steven Skiena. Deepwalk: Online learning of social representations. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 701–710. ACM, 2014.
 [Roweis and Saul2000] Sam T Roweis and Lawrence K Saul. Nonlinear dimensionality reduction by locally linear embedding. science, 290(5500):2323–2326, 2000.
 [Tu et al.2018] Ke Tu, Peng Cui, Xiao Wang, Philip S Yu, and Wenwu Zhu. Deep recursive network embedding with regular equivalence. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pages 2357–2366. ACM, 2018.
 [Wang et al.2016] Daixin Wang, Peng Cui, and Wenwu Zhu. Structural deep network embedding. In Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining, pages 1225–1234. ACM, 2016.
 [Wold et al.1987] Svante Wold, Kim Esbensen, and Paul Geladi. Principal component analysis. Chemometrics and intelligent laboratory systems, 2(13):37–52, 1987.
 [Wu et al.2019a] Zonghan Wu, Shirui Pan, Fengwen Chen, Guodong Long, Chengqi Zhang, and Philip S Yu. A comprehensive survey on graph neural networks. arXiv preprint arXiv:1901.00596, 2019.
 [Wu et al.2019b] Zonghan Wu, Shirui Pan, Fengwen Chen, Guodong Long, Chengqi Zhang, and Philip S Yu. A comprehensive survey on graph neural networks. arXiv: Learning, 2019.

[Xie et al.2016]
Junyuan Xie, Ross Girshick, and Ali Farhadi.
Unsupervised deep embedding for clustering analysis.
In
International conference on machine learning
, pages 478–487, 2016.  [Yu et al.2018] Wenchao Yu, Cheng Zheng, Wei Cheng, Charu C Aggarwal, Dongjin Song, Bo Zong, Haifeng Chen, and Wei Wang. Learning deep network representations with adversarially regularized autoencoders. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pages 2663–2671. ACM, 2018.
 [Zhang and Tong2019] Rui Zhang and Hanghang Tong. Robust principal component analysis with adaptive neighbors. In Advances in Neural Information Processing Systems, pages 6959–6967, 2019.
 [Zhang et al.2019] Xiaotong Zhang, Han Liu, Qimai Li, and XiaoMing Wu. Attributed graph clustering via adaptive graph convolution. arXiv preprint arXiv:1906.01210, 2019.
Comments
There are no comments yet.