1 Introduction
Graph representation and learning which aims to represent each graph node as a lowdimensional feature vector is a fundamental problem in pattern recognition and machine learning area. Recently, Graph Convolutional Networks (GCNs) have been widely studied for graph representation and learning
duvenaud2015convolutional ; atwood2016diffusion ; adaptive_GCN ; velickovic2017graph . These methods can be categorized into spatial and spectral methods. Spatial methods generally define graph convolution operation by designing an operator on node neighbors while spectral methods usually define graph convolution operation based on spectral analysis of graphs. For example, Monti et al. monti2017geometric present mixture model CNNs (MoNet) for graphs. Velickovic et al. velickovic2017graph present Graph Attention Networks (GAT) for graph semisupervised learning. For spectral methods, Bruna et al. bruna2014spectral propose to define graph convolution based on eigendecomposition of graph Laplacian matrix. Henaff et al. henaff2015deep further introduce a spatially constrained spectral filters. Defferrard et al. defferrard2016convolutional propose to approximate the spectral filters based on Chebyshev expansion. Kipf et al. kipf2016semi propose a more simple Graph Convolutional Network (GCN) based on firstorder approximation of spectral filters.However, in many applications, data are coming with multiple graphs, which is known as multigraph learning nie2016parameter
. In this paper, we focus on multiple graphs that contain the common node set but with multiple different edge structures. The aim of our multigraph representation is to find a consistent node representation across all of graphs. One main challenge for this problem is how to effectively integrate multiple graph (edge) structures together in graph node representation. The above existing GCNs generally cannot be used to deal with multiple graphs. To generalize GCNs to multiple graphs, one popular way is to use some heuristic fusion strategy and transform multigraph learning to traditional single graph learning
pham2017column ; simonovsky2017dynamic ; schlichtkrull2018modeling . However, since the fusion process is usually independent of graph representation/learning, this strategy may lead to weak optimal solution. In addition, some works propose to conduct the convolution operation on multiple graphs by sharing the common convolution parameters across different graphs duvenaud2015convolutional ; atwood2016diffusion ; zhuang2018dual . This mechanism can propagate some information/knowledge across multiple graphs. However, the learned representations of individual graphs in these methods are still not guaranteed to be consistent explicitly.In this paper, we propose a novel Multiple Graph Adversarial Learning (MGAL) framework for multigraph representation and learning. MGAL aims to learn an optimal structureinvariant and thus consistent representation for multiple graphs in a common subspace via an adversarial learning architecture. Based on MGAL, we further provide a unified network for multigraph based semisupervised learning task. Overall, the main contributions of this paper are summarized as follows:

We propose a novel Multiple Graph Adversarial Learning (MGAL) framework for multigraph representation and learning. The proposed MGAL is a general framework which allows to generalize any learnable/parameterized graph representation models to deal with multiple graphs.

We present a unified network for semisupervised learning task based on multigraph representation.

We develop a general generative adversarial learning architecture (‘multiple generators + one discriminator’) to address the general multiview representation and learning problem.
Experimental results on several datasets demonstrate the effectiveness and benefits of the proposed MGAL model and semisupervised learning method.
2 Problem Formulation
Graph representation. Let be an attributed graph where denotes the collection of node features and encodes the pairwise relationships (such as similarities) between node pairs. The aim of graph representation is to learn a latent representation where in a lowdimensional space that takes in both graph structure and node content together. One kind of popular way is to use Graph Convolutional Networks (GCNs) kipf2016semi ; velickovic2017graph ; henaff2015deep ; monti2017geometric which provide a unified framework to define the representation function . Based on representation , we can then conduct some learning tasks, such as node classification, clustering and semisupervised classification, etc.
Multigraph representation. In many real applications, data are coming with multiple graphs, which is known as multigraph representation/learning problem. In this paper, we focus on multiple graphs that contain the common node set and the same node content but with multiple different edge structures. Formally, given with denoting node content and representing the multiple edge structures, the aim of our multigraph representation is to learn a consistent latent representation , where in a lowdimensional space that takes in multiple graph structures and node content together. Based on representation , we can then conduct some learning tasks, such as node classification, clustering and semisupervised learning, etc. In this paper, we focus on semisupervised learning. The main challenge for multigraph representation is how to exploit the information of each individual graph while take in the correlation cue among multiple graphs simultaneously in final representation . A simple and direct way to use multiple graphs is to average them to a new one and then put it into the standard GCNs model. Obviously, since the graph average process is independent of graph learning, this may lead to weak local optimal solution.
3 Related Works
Multiple graph convolutional representation. To generalize GCNs to multiple graphs, one can first obtain the representation for each graph individually by using GCNs and then concatenate or average them together to obtain the final representation . Obviously, this twostage strategy neglects the correlation information among different graphs. To overcome this issue, some works propose to conduct ensemble of different representations in middle layers of GCNs to share/communicate some information across different graphs in learning process pham2017column ; simonovsky2017dynamic ; schlichtkrull2018modeling . Since the fusion in each layer is independent of graph learning, this strategy may still lead to weak optimal representation.
Another kind of works propose to conduct the convolution operation on multiple graphs by sharing the common convolution parameters across different graphs, i.e., duvenaud2015convolutional ; atwood2016diffusion ; zhuang2018dual . This mechanism can propagate some correlation information across graphs via the common parameters . However, although the parameters are common for different graphs, the learned representations are still not guaranteed to be well consistent because each is determined by not only parameters but also individual graph .
Adversarial learning model. Our multiple graph adversarial learning model is inspired by Generative Adversarial Network (GAN) goodfellow2014generative , which consists of a generator and a discriminator . The generator is trained to generate the samples to convince the discriminator while the discriminator aims to discriminate the samples returned by generator. Recently, adversarial learning has been explored in graph representation tasks. Wang et al. wang2018graphgan propose a graph representation model with GANs (GraphGAN). Dai et al. dai2018adversarial propose an adversarial network embedding (ANE), which employs the adversarial learning to regularize graph representation. Pan et al. pan2018adversarially
also propose an adversarially regularized graph autoencoder model for graph embedding.
Different from previous works, our aim in this paper is to derive a general adversarial learning framework for multiple graph representation and learning. To the best of our knowledge, it is the first effort to develop adversarial learning framework for multigraph learning problem.
4 The Proposed Model
4.1 Overall Framework
Given an input feature matrix and associated multiple graph structures , our aim is to learn a consistent latent representation and then conduct graph node semisupervised classification in a unified network model. Figure 1 demonstrates the overall architecture of our Multiple Graph Adversarial Learning (MGAL) which consists of three main modules, i.e., graph convolutional representation, adversarial learning module and perceptron layer for node label prediction.

Graph convolutional representation. We employ several graph convolutional layers to learn a latent lowdimensional representation for each individual graph by incorporating both structure of graph and node features together.

Adversarial learning. The adversarial learning module aims to enforce the latent representations of graphs to be consistent and generate a kind of structureinvariant representations for different graphs in a common subspace.

Perceptron layer. We obtain the final representation by concatenating together and use a perceptron layer to predict the label for each node, and thus conduct semisupervised classification for graph nodes.
4.2 Graph Convolutional Representation
The graph convolutional representation aims to learn a latent representation for nodes of each graph by exploring both structure of graph and node feature together. In this paper, we propose to employ graph convolutional networks (GCNs) kipf2016semi ; duvenaud2015convolutional ; atwood2016diffusion which have been widely studied in recent years. The main property of GCNs architecture is that it can represent both graph structure and node content in a unified framework. Generally, given a graph , GCNs conduct layerwise propagation in hidden layers as
(1) 
where and is the number of layers. is the initial input feature matrix. denotes the input feature map of the th layer and is the output feature representation after conducting graph convolutional operation on graph . Parameters are layerspecific trainable weight matrices needing to be learned. We can thus use the final output of GCNs as a compact representation for each individual graph as
(2) 
Many methods have been proposed to provide various kinds of graph convolution operator in recent years. In this paper, we employ the widely used spectral convolution function kipf2016semi defined as
where and .
is the identity matrix and
denotes an activation function, such as
.Remark. Note that, in our graph convolutional architecture, the weight parameters of hidden layers for different graphs are shared which can thus propagate some information/knowledge across different graphs for node representation, as suggested in works duvenaud2015convolutional ; atwood2016diffusion .
4.3 Adversarial Learning
As mentioned in §2, for our multigraph learning problem, it is necessarily to generate a consistent representation across different graphs in a common subspace. However, the above learned representations (Eq.(2)) can not be guaranteed to be consistent because is determined by not only the common shared parameters but also individual graph . To overcome this issue, we design an adversarial learning module, which consists of a generator and a discriminator goodfellow2014generative . Our adversarial model is built on a standard multilayer perceptron (MLP), which acts as a discriminator to distinguish whether the representation is generated from graph or some other graph where .
Generator aims to generate a kind of structureinvariant/consistent representations for graphs in a common subspace. In this paper, we employ the above GCNs as our generator module, i.e.,
(3) 
For each node representation, we denote
(4) 
where is the input feature vector of a graph node and is the corresponding generated representation. The generator is supervised and optimized by both crossgraph discrimination loss and node (semisupervised) classification loss. The crossgraph discrimination loss minimizes the gap among different graph representations (as shown in Eq.(5)) while the classification loss can separate the representations of graph nodes for node semisupervised classification task (as shown in Eq.(7)).
Discriminator aims to discriminate the representation
obtained from the generator. It is built on a standard multilayer perceptron (MLP) and defined as a multiclass classifier. It outputs a class label indication vector
in whichrepresents the probability of representation
generating from the th graph and .In our adversarial learning, generator and discriminator act as two opponents. Generator would try to generate an indistinctive representation for each graph , while discriminator , on the contrary, would try to discriminate whether the representation is generated from the th graph or some other graph. This can be achieved by optimizing the following crossentropy loss as
(5) 
Remark. In our MGAL, since the convolutional parameters are shared for multiple graphs , thus only one generator is designed. Note that, here one can also use multiple generators with each generator conducting on each individual graph respectively. This ‘multiple generators + one discriminator’ adversarial learning provides a new feasible architecture to address the general multigraph (view) representation and learning tasks.
4.4 Perceptron Layer
After adversarial training, all the representation are lied in a common lowdimensional space. In the final perceptron layer, we can use one of them or aggregate them for the final node representation. In this paper, we first aggregate by concatenating them as where denotes the concatenation operation and then employ a fullyconnected layer to conduct label prediction as
(6) 
where is a trainable weight matrix and denote the number of node classes. The final output denotes the predicted label vectors for graph nodes. Note that, here one can also use a graph convolutional layer to conduct label prediction as
(7) 
where . For semisupervised learning, we aim to minimize the following crossentropy loss over all the labeled nodes , i.e.,
(8) 
where indicates the set of labeled nodes and denotes the corresponding label indication vector for the th labeled node, i.e.,
(9) 
Figure 2 demonstrates the 2D tSNE Geoffrey2017Visualizing visualization of the representation output by GCN on each individual graph (denote as GCN(v)) and MGAL on multiple graph respectively on MSRCv1 dataset winn2005locus (as shown in Experiments). Different colors denote different classes. Intuitively, one can observe that the data of different classes are distributed more clearly in MGAL representation, which demonstrates the benefits of MGAL on conducting multigraph representation and learning.
Dataset  Handwritten numerals  Caltech1017  
Ratio of label  10%  20%  30%  10%  20%  30% 
GCN(1)  85.880.91  87.510.77  87.650.60  83.550.81  84.121.26  84.181.37 
GCN(2)  93.321.41  95.330.49  95.660.51  82.090.40  83.590.59  85.451.39 
GCN(3)  95.140.66  96.710.47  97.170.51  84.920.78  86.182.34  86.561.85 
GCN(4)  95.260.93  97.190.22  97.460.50  93.201.01  95.060.59  95.760.77 
GCN(5)  84.610.38  96.710.48  86.400.64  91.641.91  92.473.04  94.182.24 
GCN(6)  71.011.87  71.441.69  73.031.28  90.451.98  92.362.09  94.270.87 
GCNM  95.890.67  95.852.48  97.380.63  90.012.44  92.422.87  93.871.55 
MultiGCN  96.890.35  97.470.38  97.890.29  93.752.32  95.600.82  95.283.01 
AMGL  91.820.55  94.150.32  95.650.73  88.830.69  92.790.34  94.620.42 
MLAN  97.320.23  97.480.19  97.910.28  93.520.58  94.840.19  95.460.18 
MGAL  97.640.39  98.250.34  98.480.42  95.001.07  96.410.81  97.140.44 
Dataset  MSRCv1  CiteSeer  
Ratio of label  10%  20%  30%  10%  20%  30% 
GCN(1)  42.7511.2  45.094.45  45.007.38  71.081.83  75.621.14  77.951.59 
GCN(2)  77.698.15  83.982.47  83.435.45  73.382.89  77.441.16  79.150.96 
GCN(3)  81.214.52  84.841.65  89.572.15       
GCN(4)  70.662.77  75.652.87  76.004.49       
GCN(5)  71.104.64  78.885.39  79.572.84       
GCNM  80.774.13  86.213.37  90.292.15  72.842.32  77.931.49  80.131.82 
MultiGCN  83.302.21  88.701.69  89.292.43  72.041.37  76.151.42  77.691.73 
AMGL  82.292.43  89.352.09  90.211.08  64.781.58  68.001.41  70.922.93 
MLAN  83.662.38  87.921.18  89.471.68  63.246.54  59.496.72  60.476.16 
MAGL  88.680.66  89.932.40  91.291.65  75.181.64  78.841.57  78.771.25 
5 Experiments
To evaluate the effectiveness of the proposed MGAL and semisupervised learning method, we implement it and compare it with some other methods on four datasets.
5.1 Datasets
We test our MGAL on four datasets including MSRCv1 winn2005locus , Caltech1017 li2015large ; nie2016parameter , Handwritten numerals AsuncionNewman2007 and CiteSeer AsuncionNewman2007 . The details of these datasets and their usages in our experiments are introduced below.
MSRCv1 dataset winn2005locus contains 8 classes of 240 images. Following nie2016parameter , in our experiments, we select 7 classes including tree, building, airplane, cow, face, car and bicycle. Each class contains 30 images. Following the experimental setting in work nie2016parameter
, five graphs are obtained for this dataset by using five different kinds of visual descriptors, i.e., 24 Color Moment, 576 Histogram of Oriented Gradient, 512 GIST, 256 Local Binary Pattern and 254 Centrist features.
Caltech1017 dataset li2015large ; nie2016parameter is an object recognition data set containing 101 categories of images. We follow the experimental setup of previous work nie2017multi and select the widely used 7 classes (DollaBill, Face, Garfield, Motorbikes, Snoopy, StopSign and WindsorChair) and obtain 1474 images in our experiments. Six neighbor graphs are obtained for this dataset by using six different kinds of visual feature descriptors including 48 dimension Gabor feature, 40 dimension wavelet moments (WM), 254 dimension CENTRIST feature, 1984 dimension HOG feature, 512 dimension GIST feature, and 928 dimension LBP feature.
Handwritten numerals AsuncionNewman2007 dataset contains 2,000 data points for 0 to 9 ten digit classes and each class has 200 data points. We construct six graphs by using six published feature descriptors including 76 Fourier coefficients of the character shapes, 216 profile correlations, 64 Karhunenlove coefficients, 240 pixel averages in windows, 47 Zernike moment and morphological features.
CiteSeer dataset AsuncionNewman2007 consists of 3,312 documents on scientific publications, which can be further classified into six classes, i.e., Agents, AI, DB, IR, ML and HCI. For our multigraph learning, two graphs are built up by using a 3,703dimensional vector representing whether the key words are included for the text view and the other 3279dimensional vector recording the citing relations between every two documents, respectively nie2016parameter .
5.2 Experimental Setup
Parameter setting. For generator, we use a threelayer graph convolutional network and the number of units in each hidden layer is set to 64 and 16, respectively. For discriminator, we use a fourlayer full connection network with the number of units in each layer is set to 16, 64, 16 and respectively where
denotes the number of graphs, as discussed in §4.3. We train our generator for a maximum of 500 epochs by using Adam
Adamwith learning rate 0.005. We train our discriminator for a maximum of 500 epochs by using Stochastic Gradient Descent (SGD) with learning rate 0.01. All the network weights are initialized using Glorot initialization
glorot2010understanding . We stop training if the validation loss does not decrease for 50 consecutive epochs.Data setting. For all datasets, we select 10%, 20% and 30% nodes as label data per class and use the remaining data as unlabeled samples. For unlabeled samples, we also use 5% nodes for validation purpose to determine the convergence criterion, as suggested in kipf2016semi , and use the remaining 85%, 75% and 65% nodes respectively as test samples. All the accuracy results are averaged over 5 runs with different data splits.
Baselines. We compare our method (MGAL) against stateofthe art methods as follows,
GCN(v) that conducts traditional graph convolutional network kipf2016semi on the th graph and node content features .
GCNM that conducts traditional graph convolutional network kipf2016semi on the averaged graph and node features .
MultiGCN that first learns representation for multiple graphs by using/sharing the common parameters, as suggested in work duvenaud2015convolutional , and then select the representation
with the lowest training loss function for the final multigraph representation.
MGL that removes the adversarial learning module in our MGAL network. We implement it as a baseline to demonstrate the effectiveness of the proposed adversarial learning.
In addition, we also compare our method with some other recent traditional multiple graph learning and semisupervised learning methods including:
AMGL nie2016parameter is a parameterfree autoweighted multiple graph learning and label propagation for semisupervised learning problem.
MLAN nie2017multi is a multiview learning model for semisupervised classification.
5.3 Comparison Results
Table 1 summarizes the comparison results of semisupervised classification on four datasets. Here, one can note that (1) The proposed MGAL performs obviously better than traditional single GCN method conduced on each individual graph . This clearly demonstrates the benefit of the proposed multigraph representation and semisupervised learning on multiple graphs. (2) Comparing with the other baseline method GCNM and MultiGCN, the proposed MGAL returns the best performance, which indicates the effectiveness of the proposed multigraph representation architecture. (3) MGAL performs better than recent multiple graph learning and semisupervised learning method AMGL nie2016parameter and MLAN nie2017multi , which indicates the benefit of the proposed multigraph learning and semisupervised learning model.
Figure 3 shows the classification accuracy of the proposed MGAL method across different numbers of graphs. For each graph number, we conduct semisupervised experiments on all possible graph groups and then compute the average performance. Here, one can note that, (1) as the number of graphs increases, MGAL obtains better learning performance. It clearly demonstrates the desired ability of MGAL on multiple graph integration for robust learning which is the main issue for multigraph learning task. (2) Our MGAL performs consistently better than other baseline methods on different sizes of graph set, which further indicates the advantage of the proposed MGAL.
Figure 4 shows the classification accuracy of the proposed MGAL method comparing with the baseline method MGL on four datasets. One can note that, MGAL generally performs better than MGL on all experiments, which clearly indicates the desired benefits of the proposed method by incorporating the adversarial learning architecture in multigraph representation and learning.
6 Conclusion
This paper proposes a novel Multiple Graph Adversarial Learning (MGAL) framework for multigraph representation and learning. MGAL aims to learn an optimal structureinvariant and thus consistent representation for multiple graphs in a common subspace, and thus incorporates both structure information of intragraph and correlation information of intergraphs simultaneously. Based on MGAL, we then provide a unified network for semisupervised learning task. Promising experimental results demonstrate the effectiveness of the proposed MGAL model. The proposed MGAL is a general framework which allows to generalize any learnable/parameterized graph representation models to deal with multiple graphs.
References
 (1) D. N. Arthur Asuncion. Uci machine learning repository. 2007.

(2)
J. Atwood and D. Towsley.
Diffusionconvolutional neural networks.
In Advances in Neural Information Processing Systems, pages 1993–2001, 2016.  (3) J. Bruna, W. Zaremba, A. Szlam, and Y. LeCun. Spectral networks and locally connected networks on graphs. In International Conference on Learning Representations, 2014.

(4)
Q. Dai, Q. Li, J. Tang, and D. Wang.
Adversarial network embedding.
In
AAAI Conference on Artificial Intelligence
, 2018.  (5) M. Defferrard, X. Bresson, and P. Vandergheynst. Convolutional neural networks on graphs with fast localized spectral filtering. In Advances in Neural Information Processing Systems, pages 3844–3852, 2016.
 (6) D. K. Duvenaud, D. Maclaurin, J. Iparraguirre, R. Bombarell, T. Hirzel, A. AspuruGuzik, and R. P. Adams. Convolutional networks on graphs for learning molecular fingerprints. In Advances in Neural Information Processing Systems, pages 2224–2232, 2015.
 (7) X. Glorot and Y. Bengio. Understanding the difficulty of training deep feedforward neural networks. In International conference on artificial intelligence and statistics, pages 249–256, 2010.
 (8) I. Goodfellow, J. PougetAbadie, M. Mirza, B. Xu, D. WardeFarley, S. Ozair, A. Courville, and Y. Bengio. Generative adversarial nets. In Advances in neural information processing systems, pages 2672–2680, 2014.
 (9) M. Henaff, J. Bruna, and Y. LeCun. Deep convolutional networks on graphstructured data. arXiv preprint arXiv:1506.05163, 2015.
 (10) D. P. Kingma and J. Ba. Adam: A method for stochastic optimization. In International Conference on Learning Representations, 2015.
 (11) T. N. Kipf and M. Welling. Semisupervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907, 2016.

(12)
Y. Li, F. Nie, H. Huang, and J. Huang.
Largescale multiview spectral clustering via bipartite graph.
In AAAI, pages 2750–2756, 2015. 
(13)
F. Monti, D. Boscaini, J. Masci, E. Rodola, J. Svoboda, and M. M. Bronstein.
Geometric deep learning on graphs and manifolds using mixture model cnns.
InIEEE Conference on Computer Vision and Pattern Recognition
, pages 5423–5434, 2017.  (14) F. Nie, G. Cai, and X. Li. Multiview clustering and semisupervised classification with adaptive neighbours. In AAAI, pages 2408–2414, 2017.
 (15) F. Nie, J. Li, X. Li, et al. Parameterfree autoweighted multiple graph learning: A framework for multiview clustering and semisupervised classification. In IJCAI, pages 1881–1887, 2016.
 (16) S. Pan, R. Hu, G. Long, J. Jiang, L. Yao, and C. Zhang. Adversarially regularized graph autoencoder. In International Joint Conference on Artificial Intelligence, 2018.
 (17) T. Pham, T. Tran, D. Q. Phung, and S. Venkatesh. Column networks for collective classification. In AAAI, pages 2485–2491, 2017.
 (18) F. Z. J. H. Ruoyu Li, Sheng Wang. Adaptive graph convolutional neural networks. In AAAI Conference on Artificial Intelligence, pages 3546–3553, 2018.
 (19) M. Schlichtkrull, T. N. Kipf, P. Bloem, R. van den Berg, I. Titov, and M. Welling. Modeling relational data with graph convolutional networks. In European Semantic Web Conference, pages 593–607, 2018.
 (20) M. Simonovsky and N. Komodakis. Dynamic edgeconditioned filters in convolutional neural networks on graphs. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 29–38, 2017.
 (21) L. van der Maaten and G. Hinton. Visualizing data using tsne. Journal of Machine Learning Research, pages 2579–2605, 2008.
 (22) P. Velickovic, G. Cucurull, A. Casanova, A. Romero, P. Lio, and Y. Bengio. Graph attention networks. arXiv preprint arXiv:1710.10903, 2017.
 (23) H. Wang, J. Wang, J. Wang, M. Zhao, W. Zhang, F. Zhang, X. Xie, and M. Guo. Graphgan: Graph representation learning with generative adversarial nets. In AAAI Conference on Artificial Intelligence, 2018.
 (24) J. Winn and N. Jojic. Locus: Learning object classes with unsupervised segmentation. In Computer Vision, 2005. ICCV 2005. Tenth IEEE International Conference on, volume 1, pages 756–763, 2005.
 (25) C. Zhuang and Q. Ma. Dual graph convolutional networks for graphbased semisupervised classification. In World Wide Web Conference on World Wide Web, pages 499–508, 2018.
Comments
There are no comments yet.