Graph representation and learning which aims to represent each graph node as a low-dimensional feature vector is a fundamental problem in pattern recognition and machine learning area. Recently, Graph Convolutional Networks (GCNs) have been widely studied for graph representation and learningduvenaud2015convolutional ; atwood2016diffusion ; adaptive_GCN ; velickovic2017graph . These methods can be categorized into spatial and spectral methods. Spatial methods generally define graph convolution operation by designing an operator on node neighbors while spectral methods usually define graph convolution operation based on spectral analysis of graphs. For example, Monti et al. monti2017geometric present mixture model CNNs (MoNet) for graphs. Velickovic et al. velickovic2017graph present Graph Attention Networks (GAT) for graph semi-supervised learning. For spectral methods, Bruna et al. bruna2014spectral propose to define graph convolution based on eigen-decomposition of graph Laplacian matrix. Henaff et al. henaff2015deep further introduce a spatially constrained spectral filters. Defferrard et al. defferrard2016convolutional propose to approximate the spectral filters based on Chebyshev expansion. Kipf et al. kipf2016semi propose a more simple Graph Convolutional Network (GCN) based on first-order approximation of spectral filters.
However, in many applications, data are coming with multiple graphs, which is known as multi-graph learning nie2016parameter
. In this paper, we focus on multiple graphs that contain the common node set but with multiple different edge structures. The aim of our multi-graph representation is to find a consistent node representation across all of graphs. One main challenge for this problem is how to effectively integrate multiple graph (edge) structures together in graph node representation. The above existing GCNs generally cannot be used to deal with multiple graphs. To generalize GCNs to multiple graphs, one popular way is to use some heuristic fusion strategy and transform multi-graph learning to traditional single graph learningpham2017column ; simonovsky2017dynamic ; schlichtkrull2018modeling . However, since the fusion process is usually independent of graph representation/learning, this strategy may lead to weak optimal solution. In addition, some works propose to conduct the convolution operation on multiple graphs by sharing the common convolution parameters across different graphs duvenaud2015convolutional ; atwood2016diffusion ; zhuang2018dual . This mechanism can propagate some information/knowledge across multiple graphs. However, the learned representations of individual graphs in these methods are still not guaranteed to be consistent explicitly.
In this paper, we propose a novel Multiple Graph Adversarial Learning (MGAL) framework for multi-graph representation and learning. MGAL aims to learn an optimal structure-invariant and thus consistent representation for multiple graphs in a common subspace via an adversarial learning architecture. Based on MGAL, we further provide a unified network for multi-graph based semi-supervised learning task. Overall, the main contributions of this paper are summarized as follows:
We propose a novel Multiple Graph Adversarial Learning (MGAL) framework for multi-graph representation and learning. The proposed MGAL is a general framework which allows to generalize any learnable/parameterized graph representation models to deal with multiple graphs.
We present a unified network for semi-supervised learning task based on multi-graph representation.
We develop a general generative adversarial learning architecture (‘multiple generators + one discriminator’) to address the general multi-view representation and learning problem.
Experimental results on several datasets demonstrate the effectiveness and benefits of the proposed MGAL model and semi-supervised learning method.
2 Problem Formulation
Graph representation. Let be an attributed graph where denotes the collection of node features and encodes the pairwise relationships (such as similarities) between node pairs. The aim of graph representation is to learn a latent representation where in a low-dimensional space that takes in both graph structure and node content together. One kind of popular way is to use Graph Convolutional Networks (GCNs) kipf2016semi ; velickovic2017graph ; henaff2015deep ; monti2017geometric which provide a unified framework to define the representation function . Based on representation , we can then conduct some learning tasks, such as node classification, clustering and semi-supervised classification, etc.
Multi-graph representation. In many real applications, data are coming with multiple graphs, which is known as multi-graph representation/learning problem. In this paper, we focus on multiple graphs that contain the common node set and the same node content but with multiple different edge structures. Formally, given with denoting node content and representing the multiple edge structures, the aim of our multi-graph representation is to learn a consistent latent representation , where in a low-dimensional space that takes in multiple graph structures and node content together. Based on representation , we can then conduct some learning tasks, such as node classification, clustering and semi-supervised learning, etc. In this paper, we focus on semi-supervised learning. The main challenge for multi-graph representation is how to exploit the information of each individual graph while take in the correlation cue among multiple graphs simultaneously in final representation . A simple and direct way to use multiple graphs is to average them to a new one and then put it into the standard GCNs model. Obviously, since the graph average process is independent of graph learning, this may lead to weak local optimal solution.
3 Related Works
Multiple graph convolutional representation. To generalize GCNs to multiple graphs, one can first obtain the representation for each graph individually by using GCNs and then concatenate or average them together to obtain the final representation . Obviously, this two-stage strategy neglects the correlation information among different graphs. To overcome this issue, some works propose to conduct ensemble of different representations in middle layers of GCNs to share/communicate some information across different graphs in learning process pham2017column ; simonovsky2017dynamic ; schlichtkrull2018modeling . Since the fusion in each layer is independent of graph learning, this strategy may still lead to weak optimal representation.
Another kind of works propose to conduct the convolution operation on multiple graphs by sharing the common convolution parameters across different graphs, i.e., duvenaud2015convolutional ; atwood2016diffusion ; zhuang2018dual . This mechanism can propagate some correlation information across graphs via the common parameters . However, although the parameters are common for different graphs, the learned representations are still not guaranteed to be well consistent because each is determined by not only parameters but also individual graph .
Adversarial learning model. Our multiple graph adversarial learning model is inspired by Generative Adversarial Network (GAN) goodfellow2014generative , which consists of a generator and a discriminator . The generator is trained to generate the samples to convince the discriminator while the discriminator aims to discriminate the samples returned by generator. Recently, adversarial learning has been explored in graph representation tasks. Wang et al. wang2018graphgan propose a graph representation model with GANs (GraphGAN). Dai et al. dai2018adversarial propose an adversarial network embedding (ANE), which employs the adversarial learning to regularize graph representation. Pan et al. pan2018adversarially
also propose an adversarially regularized graph autoencoder model for graph embedding.
Different from previous works, our aim in this paper is to derive a general adversarial learning framework for multiple graph representation and learning. To the best of our knowledge, it is the first effort to develop adversarial learning framework for multi-graph learning problem.
4 The Proposed Model
4.1 Overall Framework
Given an input feature matrix and associated multiple graph structures , our aim is to learn a consistent latent representation and then conduct graph node semi-supervised classification in a unified network model. Figure 1 demonstrates the overall architecture of our Multiple Graph Adversarial Learning (MGAL) which consists of three main modules, i.e., graph convolutional representation, adversarial learning module and perceptron layer for node label prediction.
Graph convolutional representation. We employ several graph convolutional layers to learn a latent low-dimensional representation for each individual graph by incorporating both structure of graph and node features together.
Adversarial learning. The adversarial learning module aims to enforce the latent representations of graphs to be consistent and generate a kind of structure-invariant representations for different graphs in a common subspace.
Perceptron layer. We obtain the final representation by concatenating together and use a perceptron layer to predict the label for each node, and thus conduct semi-supervised classification for graph nodes.
4.2 Graph Convolutional Representation
The graph convolutional representation aims to learn a latent representation for nodes of each graph by exploring both structure of graph and node feature together. In this paper, we propose to employ graph convolutional networks (GCNs) kipf2016semi ; duvenaud2015convolutional ; atwood2016diffusion which have been widely studied in recent years. The main property of GCNs architecture is that it can represent both graph structure and node content in a unified framework. Generally, given a graph , GCNs conduct layer-wise propagation in hidden layers as
where and is the number of layers. is the initial input feature matrix. denotes the input feature map of the -th layer and is the output feature representation after conducting graph convolutional operation on graph . Parameters are layer-specific trainable weight matrices needing to be learned. We can thus use the final output of GCNs as a compact representation for each individual graph as
Many methods have been proposed to provide various kinds of graph convolution operator in recent years. In this paper, we employ the widely used spectral convolution function kipf2016semi defined as
where and .
is the identity matrix and
denotes an activation function, such as.
Remark. Note that, in our graph convolutional architecture, the weight parameters of hidden layers for different graphs are shared which can thus propagate some information/knowledge across different graphs for node representation, as suggested in works duvenaud2015convolutional ; atwood2016diffusion .
4.3 Adversarial Learning
As mentioned in §2, for our multi-graph learning problem, it is necessarily to generate a consistent representation across different graphs in a common subspace. However, the above learned representations (Eq.(2)) can not be guaranteed to be consistent because is determined by not only the common shared parameters but also individual graph . To overcome this issue, we design an adversarial learning module, which consists of a generator and a discriminator goodfellow2014generative . Our adversarial model is built on a standard multi-layer perceptron (MLP), which acts as a discriminator to distinguish whether the representation is generated from graph or some other graph where .
Generator aims to generate a kind of structure-invariant/consistent representations for graphs in a common subspace. In this paper, we employ the above GCNs as our generator module, i.e.,
For each node representation, we denote
where is the input feature vector of a graph node and is the corresponding generated representation. The generator is supervised and optimized by both cross-graph discrimination loss and node (semi-supervised) classification loss. The cross-graph discrimination loss minimizes the gap among different graph representations (as shown in Eq.(5)) while the classification loss can separate the representations of graph nodes for node semi-supervised classification task (as shown in Eq.(7)).
Discriminator aims to discriminate the representation
obtained from the generator. It is built on a standard multi-layer perceptron (MLP) and defined as a multi-class classifier. It outputs a class label indication vectorin which
represents the probability of representationgenerating from the -th graph and .
In our adversarial learning, generator and discriminator act as two opponents. Generator would try to generate an indistinctive representation for each graph , while discriminator , on the contrary, would try to discriminate whether the representation is generated from the -th graph or some other graph. This can be achieved by optimizing the following cross-entropy loss as
Remark. In our MGAL, since the convolutional parameters are shared for multiple graphs , thus only one generator is designed. Note that, here one can also use multiple generators with each generator conducting on each individual graph respectively. This ‘multiple generators + one discriminator’ adversarial learning provides a new feasible architecture to address the general multi-graph (view) representation and learning tasks.
4.4 Perceptron Layer
After adversarial training, all the representation are lied in a common low-dimensional space. In the final perceptron layer, we can use one of them or aggregate them for the final node representation. In this paper, we first aggregate by concatenating them as where denotes the concatenation operation and then employ a fully-connected layer to conduct label prediction as
where is a trainable weight matrix and denote the number of node classes. The final output denotes the predicted label vectors for graph nodes. Note that, here one can also use a graph convolutional layer to conduct label prediction as
where . For semi-supervised learning, we aim to minimize the following cross-entropy loss over all the labeled nodes , i.e.,
where indicates the set of labeled nodes and denotes the corresponding label indication vector for the -th labeled node, i.e.,
Figure 2 demonstrates the 2D t-SNE Geoffrey2017Visualizing visualization of the representation output by GCN on each individual graph (denote as GCN(v)) and MGAL on multiple graph respectively on MSRC-v1 dataset winn2005locus (as shown in Experiments). Different colors denote different classes. Intuitively, one can observe that the data of different classes are distributed more clearly in MGAL representation, which demonstrates the benefits of MGAL on conducting multi-graph representation and learning.
|Ratio of label||10%||20%||30%||10%||20%||30%|
|Ratio of label||10%||20%||30%||10%||20%||30%|
To evaluate the effectiveness of the proposed MGAL and semi-supervised learning method, we implement it and compare it with some other methods on four datasets.
We test our MGAL on four datasets including MSRC-v1 winn2005locus , Caltech101-7 li2015large ; nie2016parameter , Handwritten numerals AsuncionNewman2007 and CiteSeer AsuncionNewman2007 . The details of these datasets and their usages in our experiments are introduced below.
MSRC-v1 dataset winn2005locus contains 8 classes of 240 images. Following nie2016parameter , in our experiments, we select 7 classes including tree, building, airplane, cow, face, car and bicycle. Each class contains 30 images. Following the experimental setting in work nie2016parameter
, five graphs are obtained for this dataset by using five different kinds of visual descriptors, i.e., 24 Color Moment, 576 Histogram of Oriented Gradient, 512 GIST, 256 Local Binary Pattern and 254 Centrist features.
Caltech101-7 dataset li2015large ; nie2016parameter is an object recognition data set containing 101 categories of images. We follow the experimental setup of previous work nie2017multi and select the widely used 7 classes (Dolla-Bill, Face, Garfield, Motorbikes, Snoopy, Stop-Sign and Windsor-Chair) and obtain 1474 images in our experiments. Six neighbor graphs are obtained for this dataset by using six different kinds of visual feature descriptors including 48 dimension Gabor feature, 40 dimension wavelet moments (WM), 254 dimension CENTRIST feature, 1984 dimension HOG feature, 512 dimension GIST feature, and 928 dimension LBP feature.
Handwritten numerals AsuncionNewman2007 dataset contains 2,000 data points for 0 to 9 ten digit classes and each class has 200 data points. We construct six graphs by using six published feature descriptors including 76 Fourier coefficients of the character shapes, 216 profile correlations, 64 Karhunen-love coefficients, 240 pixel averages in windows, 47 Zernike moment and morphological features.
CiteSeer dataset AsuncionNewman2007 consists of 3,312 documents on scientific publications, which can be further classified into six classes, i.e., Agents, AI, DB, IR, ML and HCI. For our multi-graph learning, two graphs are built up by using a 3,703-dimensional vector representing whether the key words are included for the text view and the other 3279-dimensional vector recording the citing relations between every two documents, respectively nie2016parameter .
5.2 Experimental Setup
Parameter setting. For generator, we use a three-layer graph convolutional network and the number of units in each hidden layer is set to 64 and 16, respectively. For discriminator, we use a four-layer full connection network with the number of units in each layer is set to 16, 64, 16 and respectively where
denotes the number of graphs, as discussed in §4.3. We train our generator for a maximum of 500 epochs by using AdamAdam
with learning rate 0.005. We train our discriminator for a maximum of 500 epochs by using Stochastic Gradient Descent (SGD) with learning rate 0.01. All the network weights are initialized using Glorot initializationglorot2010understanding . We stop training if the validation loss does not decrease for 50 consecutive epochs.
Data setting. For all datasets, we select 10%, 20% and 30% nodes as label data per class and use the remaining data as unlabeled samples. For unlabeled samples, we also use 5% nodes for validation purpose to determine the convergence criterion, as suggested in kipf2016semi , and use the remaining 85%, 75% and 65% nodes respectively as test samples. All the accuracy results are averaged over 5 runs with different data splits.
Baselines. We compare our method (MGAL) against state-of-the art methods as follows,
GCN(v) that conducts traditional graph convolutional network kipf2016semi on the -th graph and node content features .
GCN-M that conducts traditional graph convolutional network kipf2016semi on the averaged graph and node features .
Multi-GCN that first learns representation for multiple graphs by using/sharing the common parameters, as suggested in work duvenaud2015convolutional , and then select the representation
with the lowest training loss function for the final multi-graph representation.
MGL that removes the adversarial learning module in our MGAL network. We implement it as a baseline to demonstrate the effectiveness of the proposed adversarial learning.
In addition, we also compare our method with some other recent traditional multiple graph learning and semi-supervised learning methods including:
5.3 Comparison Results
Table 1 summarizes the comparison results of semi-supervised classification on four datasets. Here, one can note that (1) The proposed MGAL performs obviously better than traditional single GCN method conduced on each individual graph . This clearly demonstrates the benefit of the proposed multi-graph representation and semi-supervised learning on multiple graphs. (2) Comparing with the other baseline method GCN-M and Multi-GCN, the proposed MGAL returns the best performance, which indicates the effectiveness of the proposed multi-graph representation architecture. (3) MGAL performs better than recent multiple graph learning and semi-supervised learning method AMGL nie2016parameter and MLAN nie2017multi , which indicates the benefit of the proposed multi-graph learning and semi-supervised learning model.
Figure 3 shows the classification accuracy of the proposed MGAL method across different numbers of graphs. For each graph number, we conduct semi-supervised experiments on all possible graph groups and then compute the average performance. Here, one can note that, (1) as the number of graphs increases, MGAL obtains better learning performance. It clearly demonstrates the desired ability of MGAL on multiple graph integration for robust learning which is the main issue for multi-graph learning task. (2) Our MGAL performs consistently better than other baseline methods on different sizes of graph set, which further indicates the advantage of the proposed MGAL.
Figure 4 shows the classification accuracy of the proposed MGAL method comparing with the baseline method MGL on four datasets. One can note that, MGAL generally performs better than MGL on all experiments, which clearly indicates the desired benefits of the proposed method by incorporating the adversarial learning architecture in multi-graph representation and learning.
This paper proposes a novel Multiple Graph Adversarial Learning (MGAL) framework for multi-graph representation and learning. MGAL aims to learn an optimal structure-invariant and thus consistent representation for multiple graphs in a common subspace, and thus incorporates both structure information of intra-graph and correlation information of inter-graphs simultaneously. Based on MGAL, we then provide a unified network for semi-supervised learning task. Promising experimental results demonstrate the effectiveness of the proposed MGAL model. The proposed MGAL is a general framework which allows to generalize any learnable/parameterized graph representation models to deal with multiple graphs.
- (1) D. N. Arthur Asuncion. Uci machine learning repository. 2007.
J. Atwood and D. Towsley.
Diffusion-convolutional neural networks.In Advances in Neural Information Processing Systems, pages 1993–2001, 2016.
- (3) J. Bruna, W. Zaremba, A. Szlam, and Y. LeCun. Spectral networks and locally connected networks on graphs. In International Conference on Learning Representations, 2014.
Q. Dai, Q. Li, J. Tang, and D. Wang.
Adversarial network embedding.
AAAI Conference on Artificial Intelligence, 2018.
- (5) M. Defferrard, X. Bresson, and P. Vandergheynst. Convolutional neural networks on graphs with fast localized spectral filtering. In Advances in Neural Information Processing Systems, pages 3844–3852, 2016.
- (6) D. K. Duvenaud, D. Maclaurin, J. Iparraguirre, R. Bombarell, T. Hirzel, A. Aspuru-Guzik, and R. P. Adams. Convolutional networks on graphs for learning molecular fingerprints. In Advances in Neural Information Processing Systems, pages 2224–2232, 2015.
- (7) X. Glorot and Y. Bengio. Understanding the difficulty of training deep feedforward neural networks. In International conference on artificial intelligence and statistics, pages 249–256, 2010.
- (8) I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio. Generative adversarial nets. In Advances in neural information processing systems, pages 2672–2680, 2014.
- (9) M. Henaff, J. Bruna, and Y. LeCun. Deep convolutional networks on graph-structured data. arXiv preprint arXiv:1506.05163, 2015.
- (10) D. P. Kingma and J. Ba. Adam: A method for stochastic optimization. In International Conference on Learning Representations, 2015.
- (11) T. N. Kipf and M. Welling. Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907, 2016.
Y. Li, F. Nie, H. Huang, and J. Huang.
Large-scale multi-view spectral clustering via bipartite graph.In AAAI, pages 2750–2756, 2015.
F. Monti, D. Boscaini, J. Masci, E. Rodola, J. Svoboda, and M. M. Bronstein.
Geometric deep learning on graphs and manifolds using mixture model cnns.In
IEEE Conference on Computer Vision and Pattern Recognition, pages 5423–5434, 2017.
- (14) F. Nie, G. Cai, and X. Li. Multi-view clustering and semi-supervised classification with adaptive neighbours. In AAAI, pages 2408–2414, 2017.
- (15) F. Nie, J. Li, X. Li, et al. Parameter-free auto-weighted multiple graph learning: A framework for multiview clustering and semi-supervised classification. In IJCAI, pages 1881–1887, 2016.
- (16) S. Pan, R. Hu, G. Long, J. Jiang, L. Yao, and C. Zhang. Adversarially regularized graph autoencoder. In International Joint Conference on Artificial Intelligence, 2018.
- (17) T. Pham, T. Tran, D. Q. Phung, and S. Venkatesh. Column networks for collective classification. In AAAI, pages 2485–2491, 2017.
- (18) F. Z. J. H. Ruoyu Li, Sheng Wang. Adaptive graph convolutional neural networks. In AAAI Conference on Artificial Intelligence, pages 3546–3553, 2018.
- (19) M. Schlichtkrull, T. N. Kipf, P. Bloem, R. van den Berg, I. Titov, and M. Welling. Modeling relational data with graph convolutional networks. In European Semantic Web Conference, pages 593–607, 2018.
- (20) M. Simonovsky and N. Komodakis. Dynamic edge-conditioned filters in convolutional neural networks on graphs. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 29–38, 2017.
- (21) L. van der Maaten and G. Hinton. Visualizing data using t-sne. Journal of Machine Learning Research, pages 2579–2605, 2008.
- (22) P. Velickovic, G. Cucurull, A. Casanova, A. Romero, P. Lio, and Y. Bengio. Graph attention networks. arXiv preprint arXiv:1710.10903, 2017.
- (23) H. Wang, J. Wang, J. Wang, M. Zhao, W. Zhang, F. Zhang, X. Xie, and M. Guo. Graphgan: Graph representation learning with generative adversarial nets. In AAAI Conference on Artificial Intelligence, 2018.
- (24) J. Winn and N. Jojic. Locus: Learning object classes with unsupervised segmentation. In Computer Vision, 2005. ICCV 2005. Tenth IEEE International Conference on, volume 1, pages 756–763, 2005.
- (25) C. Zhuang and Q. Ma. Dual graph convolutional networks for graph-based semi-supervised classification. In World Wide Web Conference on World Wide Web, pages 499–508, 2018.