Graph Convolutional Auto-encoder with Bi-decoder and Adaptive-sharing Adjacency

03/10/2020 ∙ by Rui Zhang, et al. ∙ 16

Graph autoencoder (GAE) serves as an effective unsupervised learning framework to represent graph data in a latent space for network embedding. Most exiting approaches typically focus on minimizing the reconstruction loss of graph structure but neglect the reconstruction of node features, which may result in overfitting due to the capacity of the autoencoders. Additionally, the adjacency matrix in these methods is always fixed such that the adjacency matrix cannot properly represent the connections among nodes in latent space. To solve this problem, in this paper, we propose a novel Graph Convolutional Auto-encoder with Bidecoder and Adaptive-sharing Adjacency method, namely BAGA. The framework encodes the topological structure and node features into latent representations, on which a bi-decoder is trained to reconstruct the graph structure and node features simultaneously. Furthermore, the adjacency matrix can be adaptively updated by the learned latent representations for better representing the connections among nodes in latent space. Experimental results on datasets validate the superiority of our method to the state-of-the-art network embedding methods on the clustering task.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 6

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Graphs are essential tools for mining connections among data. Due to the non-Euclidean topology structure, Graph Convolutional Network (GCN) can better handle the complex graph data where connections information among nodes is unordered and variable in size. For example, analyzing graph data plays an important role in various data mining tasks including node classification [Kipf and Welling2016a], link prediction [Pan et al.2018] and node clustering [Zhang et al.2019]

. Graph neural networks are roughly categorized into four categories: recurrent graph neural networks (RecGNN), convolutional graph neural networks (ConvGNN), graph autoencoders (GAE), and spatial-temporal graph neural networks (STGNN)

[Wu et al.2019a].

Graph autoencoder (GAE) is an effective unsupervised learning framework that encoders the node features and graph structure into the latent representations and decoder the graph structure. GAE can be used to learn network embeddings and graph generative distributions. For network embedding, GAE mainly learns the latent representations by reconstructing graph structure such as the graph adjacency matrix. And there are many approaches that build GAEs for network embedd learning, such as DNGR [Cao et al.2016], SDNE [Wang et al.2016], ARVGA [Pan et al.2018]. For graph generation, GAE encodes graphs into latent representations and decodes a graph structure from latent representations to learn the generative distribution of graphs [Li et al.2018].

Network embedding (NE) methods aim at learning low-dimensional latent representations of nodes in a network. These representations can be used as features for a wide range of tasks on graphs such as classification, clustering and link prediction. The classical methods such as PCA [Wold et al.1987] [Zhang and Tong2019] and Laplacian eigenmaps (LE) [Belkin and Niyogi2002] have been widely used in network embedding. GAE can also address the network embedding problems and the main distinction between GAE and network embedding is that GAE is designed for various tasks while network embedding covers various kinds of methods targeting the same task.

Clustering plays an important role in detecting communities and analyzing structures of these networks and is an important task in network embedding. Compared with classical methods such as Spectral Clustering

[Ng et al.2002], CAN [Nie et al.2014], the network embedding algorithms such as DEC [Xie et al.2016] perform more effectively and high non-linearity. However, some early network embedding methods such as DeepWalk [Perozzi et al.2014], node2vec [Grover and Leskovec2016] only applicable to unweighted graphs, which restricts those methods’ development on the weighted graph filed.

Some GAE methods such as [Kipf and Welling2016b] only focus on the reconstruction of the graph adjacency matrix while ignoring the reconstruction of node features for network embedding, which may lead to overfitting due to the capacity of the autoencoders. And the properties of the fixed adjacency matrix make these methods cannot properly represent the connections among nodes in latent space. To solve this problem, in this paper, we propose a novel Graph Convolutional Auto-encoder with Bi-decoder and Adaptive-sharing Adjacency method, namely BAGA. The contributions can be summarized below:

Equipped with the bi-decoder, our framework can minimize the reconstruction loss of node features and graph structure simultaneously and avoid the overfitting due to the simply reconstruction of the graph adjacency matrix.

The proposed method can update the adjacency matrix adaptively by latent representations, which result in a clearer graph structure and more accurate connections.

The embedded adjacency matrix can not only be adaptively updated but also be shared with the Laplacian graph construction. Experimental results support the effectiveness of updating the adjacency matrix.

2 Related Work

In this section, we outline the background of graph autoencoder and network embedding by listing related literature.

2.1 Graph Autoencode

As an important unsupervised learning framework, Graph autoencoder (GAE) arouses considerable research interests and an elaborate survey can be found in [Wu et al.2019b]. GAE is a deep neural architecture that maps graph data into a latent space and decoder graph structure information from latent representations. Earlier approaches such as DNGR [Cao et al.2016] and SDNE [Wang et al.2016]

mainly build the GAE frameworks for network embedding by multi-layer perceptrons. And they employ deep autoencoders to preserve the graph proximities and model positive pointwise mutual information (PPMI). Nevertheless, DNGR and SDNE only consider node structure information but ignore the feature information of nodes. Variational Graph Autoencoder (VGAE)

[Kipf and Welling2016b] is a variational version of GAE to learn the distribution of data.

2.2 Network Embedding

The goal of network embedding is to find latent low-dimensional representations of nodes which preserves the topological information of nodes. Some other methods preserve observed pair-wise similarity and structural equivalence in recently. For example, DeepWalk [Perozzi et al.2014] uses a random walk to generate sequences of nodes from a network and transform unweighted graph structure information into linear sequences. Inspired by DeepWalk, DRNE [Tu et al.2018]

adopts a Long Short Term Memory (LSTM) network to aggregate a node’s neighbors. Similar to DRNE, NetRA

[Yu et al.2018] also uses the LSTM network with random walks rooted on each node and regularizes the learned network embeddings within a prior distribution via adversarial training.

3 Problem and Framework

In this section, we define problem related concepts and propose our framework.

Figure 1: The architecture of the graph convolution autoencoder with bi-encoder and adaptive-sharing adjacency (BAGA). The left tier is a graph convolutional encoder that encoders the input feature matrix and adjacency matrix into the latent representations . The right tier is a bi-decoder that reconstructs the node features and adjacency matrix . The learned latent representations are in the middle tier where the adjacency matrix can be updated to for the next iteration (See Algorithm 1 for details).

3.1 Problem Formulation

Given a non-directed weighted graph , where is a set of nodes with , is the connecting edge among each node that can be represented as an adjacency matrix , and is the feature matrix of all nodes, i.e., , where

is the feature vector of node

. is the number of nodes and is the dimension of input data. Edge weights in the adjacency matrix are real numbers denoting the degree of connection among nodes rather than simple {0, 1} in the unweighted graph. Although edge weights in weighted graphs can be negative, we only consider non-negative weights in this paper. And the sum of weights between a node and the other nodes connected to it is 1.

Our goal is to learn a latent representation of each node with the formal format as: , where is the -th row of the matrix and is the dimension of embedding. The is the embedding matrix in latent space which should preserve the topological structure as well as node features information. Furthermore, the latent representations can more accurately represent the connections among nodes for partitioning the nodes of the graph into clusters .

3.2 Overall Framework

Our objective is to learn better latent representations of nodes and more accurate connections in the graph . To this end, we propose a novel graph convolution autoencoder with bi-encoder and adaptive-sharing adjacency method whose adjacency matrix can be adaptively updated. Fig. 1 shows the workflow of BAGA which consists of two three modules: the graph convolutional encoder, bi-decoder and the update of the adjacency matrix .

Graph Convolutional Encoder The proposed framework takes the adjacency matrix and feature matrix as input to learn the latent representations .

Bi-decoder The framework reconstructs the node features and adjacency matrix from learned simultaneously which harnesses the overall loss of the network.

Adjacency Matrix Update Since the feature matrix becomes better latent representations during the iteration, the adjacency matrix also needs to be adaptively updated to so as to better represent the connections among nodes in latent space. The adaptive update process of the adjacency matrix is also the most important contribution of this paper.

4 Proposed Algorithm

The graph convolutional autoencoder in the framework has a graph convolutional encoder and a bi-decoder which include node features reconstruction and graph structure reconstruction. For better maintain the topology structure, the latent representations are embedded into the Laplacian graph construction.

4.1 Graph Convolutional Encoder Model

The neural network of BAGA consists of layers with transformations layers and an initial layer, where is an even number. The first hidden layers are the encoders, which learn a layer-wise transformation by a spectral convolution function

(1)

where is the input for convolution, is the output after convolution, is the layer of encoder network. is the weighted parameter matrix that need to be learned in the neural network. In this paper, is the input node features matrix. Specifically speaking each layer of our graph convolution network can be calculated as follows:

(2)

Here, , ,

is the identity matrix of

and

is the activation function such as Relu(t) = max(0, t) or sigmoid(t) =

. There are two transformations performed alternately in the process of encoder that one is an activation layer with a Relu() activation function and the other is a linear layer. The graph encoder is constructed as follows:

(3)

And this process encodes both graph structure and node features into the latent representations which is shared by the encoder and the decoder.

4.2 Bi-decoder Model

The last layers are the decoders, which reconstruct both graph structure and node features for the learned latent representations. The node features decoder is the mirror symmetry process of the encoder, which also has a linear layer and an activation layer with a Relu() function. The node features decoder is constructed as follows:

(4)

where is the layer of encoder network. The reconstruction loss of node features is calculated as:

(5)

where is the reconstructed node features.

For the graph structure decoder, the reconstructed graph can be presented as follows:

(6)

where is the embedding latent representations that . The reconstruction loss of the graph structure is calculated as:

(7)

By reconstructing both graph structure and node features simultaneously, the framework can exploit both latent node features and graph structure information.

4.3 Laplacian Graph Structure and Loss Function

As [Roweis and Saul2000] proved, the aforementioned reconstruction criterion can smoothly capture the data manifolds and thus preserve the similarity among nodes. Nevertheless, we also want to make the nodes that have similar neighborhood structures have similar structures in latent space. Therefore, we equip our framework with the Laplacian graph structure that shares the same latent representations as follows:

(8)
s.t.

Here, is the latent representation of node , denotes the -th column vector of adjacency matrix , is the regularization parameters. Additionally, it is worth emphasizing that is not a hyper-parameter but a parameter that can be learned adaptively, which will be proved in the next chapter. And is the Laplacian matrix which can be calculated as , where is the diagonal matrix whose the -th diagonal element is . Therefore, the loss of the whole framework can be expressed as:

(9)

where is a 2-norm regularizer term with coefficient to prevent overfitting, which is defined as follows:

(10)

5 Optimization

To our best knowledge, few methods can update the adjacency matrix during the optimization process in the graph convolutional network (GCN). In this section, we introduce the update of and optimization of the framework.

5.1 Update The Adjacency Matrix

Though the element of the adjacency matrix denotes the degree of connection among nodes, the initial value may not be optimal. Since we embed updating adjacency matrix in the optimization, the learned adjacency matrix can represent the more accurate connections among nodes in latent space. In practical applications, sparse adjacency matrix tends to bringing better results and that is a reason why we do not update the adjacency matrix by back-propagation algorithm directly, which will produce a meaningless dense matrix.

Although there are the adjacency matrix in graph reconstruction loss and node features reconstruction loss, for obtaining a sparse adjacency matrix, we only focus on the Laplacian graph structure when updating the adjacency matrix. The process of updating the adjacency matrix

is represented as follows:

(11)

Denote the distance between two nodes , as and is a vector with its -th element , and is a vector with its -th element . The Lagrangian equation of problem (11) is represented as:

(12)

where and are the Lagrangian mulipliers. Using the Karush-Kuhn-Tucker (KKT) conditions, we can derive the optimal solution of as

(13)

where . For a sparse similar matrices , when calculating the similarity matrix , only the k sample points closest to are taken into consideration. Therefore, satisfies , and we have

(14)

Based on Eq. (14), The overall is set to the mean of and it can be learned adaptively as:

(15)

which verified is not a hyper-parameter. Without loss of generality, suppose are ordered from small to large. The adjacency matrix can be solved as:

(16)

Based on Eq. (16), the adjacency matrix can be updated into by the latent representations for clearer graph structure.

In the experiments, adjacency matrix is updated in each iteration where the change of the adjacency matrix is showed in Fig. 2. The adjacency matrix on the iteration of 20, 60, 100, 150, 200, 250 are selected to be demonstrated. Since the datasets used in this paper are arranged in cluster order, the higher the degree of diagonalization of the adjacency matrix is, the more accurate connections among nodes. The results in Fig. 2 show the graph construction becomes more and more clear, which verifies the effectiveness of updating the adjacency matrix in our method.

5.2 Optimize The Model

To optimize the aforementioned model, the goal is to minimize the overall loss as a function of the neural network parameter . The calculation of the partial derivative of

is estimated using the back-propagation. Furthermore, the proposed framework can be optimized by using adaptive moment estimation (Adam), where

is the maximum iteration number. Besides that, pseudocode of our method is summarized in Algorithm 1.

0:   = : a Graph with node features and adjacency matrix ;maxmum iteration number ;parameters , , and .
0:  latent representations .
1:  for iterator = 1,2,3, , do
2:   Minimizing using Eq. (9)
3:    Compute the loss of reconstructing node features using Eq. (5)
4:    Compute the loss of reconstructing graph structure using Eq. (7)
5:    Compute the Laplacian graph structure using Eq.(8)
6:    Compute the whole loss using Eq. (9)
7:

    Backpropagate loss and update

8:   Update the adjacency matrix to using Eq.(16)
9:      
10:  end for
Algorithm 1 Algorithm to BAGE
(a) itera. = 20
(b) itera. = 60
(c) itera. = 100
(d) itera. = 150
(e) itera. = 200
(f) itera. = 300
Figure 2: The change of the adjacency matrix on COIL20 dataset. Best viewed on screen.

6 Experiments

We show experimental results of our proposed method against baseline algorithms on several datasets. The results not only demonstrate the advantage of bi-decoder and adaptive-sharing adjacency but also support the effectiveness of updating the adjacency matrix by latent representations for our method.

6.1 Datasets

Since the proposed method deals with weighted graphs rather than unweighted graphs, we do not use graph datasets such as Cora, Citeseer and PubMed which have been widely used in other graph convolutional networks methods. The datasets used in this paper and the neurons of each layer are listed in Table

1.

Dataset Nodes Features Neurons in each layer
UMIST 575 32 32 1024-256-80-256-1024
COIL20 1440 32 32 1024-256-80-256-1024
ORL 400 32 32 1024-256-80-256-1024
ATT40 400 32 32 1024-256-80-256-1024
YALEB 16128 64 64 4096-1024-256-1024-4096
FEI 2800 80 80 6400-1024-256-1024-6400
Table 1: Information of the datasets.
Methods UMIST COIL20 YALEB ORL ATT40 FEI
ACC NMI ACC NMI ACC NMI ACC NMI ACC NMI ACC NMI
OURS 55.99 73.66 59.81 72.32 50.30 54.23 59.45 78.22 64.02 79.93 42.30 74.11
OURS_NA 52.69 72.54 64.19 78.64 46.33 49.61 56.64 74.27 58.65 73.74 41.09 73.34
GAE 47.90 65.25 60.83 74.88 45.21 51.51 59.83 78.21 56.26 75.98 37.74 71.60
ARGA 48.61 68.77 65.75 79.11 40.24 42.06 51.87 72.80 53.75 74.33 31.63 66.28
SAE 41.23 60.19 55.21 70.34 40.18 42.77 48.03 65.67 52.75 73.74 31.38 66.16
DEC 36.47 56.96 52.33 67.13 42.30 43.52 37.45 38.93 52.63 72.64 30.52 60.38
SC 46.59 64.39 62.34 78.47 45.81 50.58 55.97 72.82 60.30 78.31 39.43 62.19
-means 42.06 63.95 58.54 73.16 39.30 45.14 51.62 72.48 52.12 72.60 35.30 69.48
Table 2: Clustering performance (%).

6.2 Baselines

We compare our algorithm against several state-of-the-art algorithms for the clustering task on the learned latent representations.

GAE [Kipf and Welling2016a] is an autoencoder-based unsupervised framework for graph data, which naturally leverages both topological and content information.

ARGA [Pan et al.2018] is an adversarially regularized autoencoder algorithm that uses graph autoencoder to learn the embedding.

SAE [Bengio et al.2007] is a greedy layer-wise unsupervised training strategy consisting of multiple layers of such autoencoders.

DEC [Xie et al.2016] is a well-known auto-encoder based deep clustering model which simultaneously learns feature representations and cluster assignments using deep neural network.

Spectral Clustering (SC) [Ng et al.2002] uses the similarity matrix constructed by linear kernel to perform dimensionality reduction before clustering and is widely used in graph clustering.

-means

is the base of many clustering methods. Here we run k-means on original node features as a benchmark.

To verify the effectiveness of updating the adjacency matrix in our method, we deliberately make a baseline

which has the same loss function as our method except for updating of the adjacency matrix

.

It is worth explaining that the method which only suitable for unweighted graphs, such as Deep Walk [Perozzi et al.2014], is not included in the baselines.

6.3 Experimental Settings

Since not all baselines rely on the graph structure, we do not take the adjacency matrix as the similarity matrix to perform the classical spectral clustering on the latent representations like [Zhang et al.2019]. For the fairness of the experiment, we run the -means algorithm on the learned latent representations 10 times with different initializations and report the average result for all baselines.

In terms of parameter settings, all the hyper-parameters settings in our method are listed in Table 3. And for other baselines, the parameters are set to the values that make the best experimental results. We optimize methods with the Adam algorithm and report the excellent results during the .

Para.
Value 4 15 0.001 0.001 1e-4 300
Table 3: Parameter settings.

6.4 Experimental Results and Analysis

In the experiment, we run all methods to generate latent representations that are used for the clustering task. The accuracy(ACC) [Papadimitriou and Steiglitz1982] and normalized mutual information (NMI) [Fan1949] are used as the metric to measure performance. The best result on each dataset has been highlighted and the second-best result is underlined in Table 2.

Fig. 3 shows the partial visualization results on COIL20 and each row corresponds to a cluster. From the result of Fig. 3, our method corresponds to clusters very well, except for confusing cats and cars. And the running time on the same experimental condition for our method and the baseline ARGA is given in Table 4.

Methods Ours ARGA
Running Time 180.09 217.36
Table 4: Time comparison (seconds) on COIL20

The observations are as follows:

1) BAGE outperforms the other methods that only exploit either latent node features or graph structure by a very large margin on many datasets, due to the clear reason that BAGE integrates both kinds of information by node features decoder and graph structure decoder, which can complement each other and greatly improve clustering performance.

2) BAGE outperforms the GAE and OURS_NA a very large margin, which verifies the effectiveness of updating the adjacency matrix with latent representations. As shown in Fig. 2, the graph structure of the latent representations has become clearer and connections among nodes become more accurate with the update of adjacency matrix A.

3) The embedded adjacency matrix can not only be adaptively updated but also be shared with the Laplacian graph structure, which embeds the latent representations into the Laplacian graph structure and results in more correct connections.

Figure 3: Visualization of clustering results of COIL20.

7 Conclusion

In this paper, we propose a novel bi-decoder and adaptive-sharing adjacency method based on graph convolutional auto-encoder for network embedding. We address the case that most existing graph autoencoder algorithms neglect the construction of node features and suffer from the incorrect connections among latent representations due to the fixed adjacency matrix. We proposed an effective way to update the adjacency matrix and enforce the adjacency matrix can more accurately represent the connections among nodes in latent space. The embedded adjacency matrix can not only be adaptively updated but also be shared with the Laplacian graph structure. Furthermore, the unique bi-decoder mechanism harnesses the overall loss of the network and avoids overfitting due to the capacity of the autoencoders. Therefore, the learned latent representations can better represent the information of nodes, which enables our method to achieve competitive performance compared to classical and state-of-the-art methods. Experimental results support the effectiveness of our method by comparing it to other start-of-the-art algorithms.

References