Labeled Graph Generative Adversarial Networks

06/07/2019 ∙ by Shuangfei Fan, et al. ∙ Virginia Polytechnic Institute and State University 0

As a new way to train generative models, generative adversarial networks (GANs) have achieved considerable success in image generation, and this framework has also recently been applied to data with graph structures. We identify the drawbacks of existing deep frameworks for generating graphs, and we propose labeled-graph generative adversarial networks (LGGAN) to train deep generative models for graph-structured data with node labels. We test the approach on various types of graph datasets, such as collections of citation networks and protein graphs. Experiment results show that our model can generate diverse labeled graphs that match the structural characteristics of the training data and outperforms all baselines in terms of quality, generality, and scalability. To further evaluate the quality of the generated graphs, we apply it to a downstream task for graph classification, and the results show that LGGAN can better capture the important aspects of the graph structure.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Labeled graphs are powerful complex data structures that can describe collections of related objects. Such collections could be atoms forming molecular graphs, users connecting on online social networks, and papers connected by citations. The connected objects, or nodes, may be of different types or classes, and the graphs themselves may belong to particular categories. Methods that reason about this flexible and rich representation can empower analyses of important, complex real-world phenomena. One key approach for reasoning about such graphs is to learn the probability distributions over graphs. In this paper, we introduce a method that learns generative models for labeled graphs in which the nodes and the graphs may have categorical labels.

A high-quality generative model should be able to synthesize labeled graphs that preserve global structural properties of realistic graphs. Such a tool could be valuable in various settings. One motivating example application is in situations where data owners wish to share graph data but must protect sensitive information. For example, online social network providers may want to enable the scientific community to study the structural aspects of their user networks, but revealing structure could allow reidentification or other privacy-invading inferences (Zheleva et al., 2012). A generative model that can create realistic graphs that do not represent real-world users could allow for this kind of study.

Recently Goodfellow et al. (2014)

described generative adversarial networks (GANs), which have been widely explored in computer vision and natural language processing

(Zhang et al., 2017; Yu et al., 2017)

for generating realistic images and text, as well as performing tasks such as style transfer. GANs are composed of two neural networks. The first is a generator network that learns to map from a latent space to the distribution of the target data, and the second is a discriminator network that tries to distinguish real data from candidates synthesized by the generator. Those two networks compete with each other during training and each improves based on feedback from the other. The success of this general GAN framework has proven it to be a powerful tool for learning the distributions of complex data.

Motivated by the power of GANs, researchers have used them for generating graphs too. Bojchevski et al. (2018) proposed NetGAN, which uses the GAN framework to generate random walks on graphs. De Cao and Kipf (2018)

proposed MolGAN, which generates molecular graphs using the combination of a GAN framework and a reinforcement learning objective. However, there are many limitations of existing methods, such as the generality to graphs with different structures and scalability to different sized graphs. Furthermore, they are unable to generate graphs with node labels, a critical feature of some graph-structured data.

The rapid development of deep learning techniques has also led to advances in representation learning in graphs. Many works have been proposed to use deep learning structures to extract high-level features from nodes and their neighborhoods to include both node and structure information

(Kipf and Welling, 2017; Hamilton et al., 2017; Fan and Huang, 2017). These methods have been shown to be useful for many applications, such as link prediction and collective classification.

Building on these advances, we propose labeled graph generative adversarial network

(LGGAN), a deep generative model trained using a GAN framework to generate graph-structured data with node labels. LGGAN can be used to generate various kinds of graph-structured data, such as citation graphs, knowledge graphs, and protein graphs. Specifically, the generator in an LGGAN generates an adjacency matrix as well as labels for the nodes, and its discriminator uses a graph convolution network

(Kipf and Welling, 2017)

with residual connections to identify real graphs using adaptive, structure-aware higher-level graph features. Our approach is the first deep generative method that addresses the generation of labeled graph-structured data. In experiments, we demonstrate that our model can generate realistic graphs that preserve important properties from the training graphs. We evaluate our model on various datasets with different graph types—such as ego networks and proteins—and with different sizes. Our experiments demonstrate that LGGAN effectively learns distributions of different graph structures and that it can scale up to generate large graphs without losing much quality.

2 Related Work

Generative graph models were pioneered by Erdös and Rényi (1959), who introduced random graphs where each possible edge appears with a fixed independent probability. More realistic models followed, such as the preferential attachment model of (Albert and Barabási, 2002), which grows graphs by adding nodes and connecting them to existing nodes with probability proportional to their current degrees. Goldenberg et al. (2010) proposed the stochastic block model (SBM), and Airoldi et al. (2008) proposed the mixed-membership stochastic block model (MMSB). The SBM is a more complex version of the Erdös-Rényi (E-R) model that can generate graphs with multiple communities. In SBMs, instead of assuming that each pair of nodes has identical probability to connect, they predefine the number of communities in the generated graph and have a probability matrix of connections among different types of nodes. Compared to the E-R model, SBMs are more useful since they can learn more nuanced distributions of graphs from data. However, SBMs are still limited in that they can only generate graphs with this kind of community structure.

With the recent development of deep learning, some works have proposed deep models to represent the distribution of graphs. Li et al. (2018) proposed DeepGMG, which introduced a framework based on graph neural networks. They generate graphs by expansion, adding new structures at each step. Li et al. (2018)

proposed GraphRNN, which decomposes the graph generation into generating node and edge sequences from a hierarchical recurrent neural network. Simultaneously, researchers have also been developing other implicit yet powerful methods for generating graphs, especially based on the success of generative adversarial networks

(Goodfellow et al., 2014). For example, Bojchevski et al. (2018) proposed NetGAN, which uses the GAN framework to generate random walks on graphs from which structure can be inferred, and De Cao and Kipf (2018) proposed MolGAN to generate molecular graphs using the combination of the GAN framework and reinforcement learning.

However, these recently proposed deep models are either limited to generating small graphs with 40 or fewer nodes (Li et al., 2018), or to generating specific types of graphs such as molecular graphs (You et al., 2018a; De Cao and Kipf, 2018) (with no straightforward generalization to other domains due to specialized tools to calculate molecule-specific loss). Most broadly, most of these recently proposed methods can not generate labeled graphs.

3 Model

In this section, we introduce LGGAN and how it trains deep generative models for graph-structured data with node labels.

3.1 Alternate GAN Frameworks

Since the GAN framework was introduced by Goodfellow et al. (2014), many variations have been proposed that proved to be powerful for generation tasks. Therefore, we adopt three popular variations and compare how well they perform for our task of labeled graph generation. We use the traditional, original GAN approach as the first approach. Beyond the traditional GAN framework, we use two other methods that include extra information of classification labels for the graphs themselves. The first follows the conditional GAN (Mirza and Osindero, 2014) framework, which feeds the graph label as an extra input to the generator in addition to the noise . We can use this label to generate graphs of different types. To improve on this, our last variation uses the auxiliary conditional GAN (AC-GAN) (Odena et al., 2017)

framework, in which the discriminator not only distinguishes whether the graph is real or fake, but it also incorporates a classifier of the graph labels. Due to space, details on these tests are in the appendix. Based on the results, in the following experiments, we choose to use the framework of AC-GAN in our model.

3.2 Architecture

In this section, we provide details on the LGGAN architecture. As in the standard GAN framework, LGGAN consists of two main components: a generator and a discriminator . The generator takes a sample from a prior distribution and generates a labeled graph represented by an adjacency matrix and a node label matrix . The discriminator then trains to distinguish samples from the dataset and samples from the generator. In LGGAN, both the generator and the discriminator are trained using CT-GAN (Wei et al., 2018), an improved version based on the improved Wasserstein GAN approach (Gulrajani et al., 2017).


LGGAN’s generative model uses a multi-layer perceptron (MLP) to produce the graph. The generator

takes a random vector

sampled from a standard normal distribution and outputs two matrices: (1)

, which is a one-hot vector that defines the node labels; and (2) the adjacency matrix , which defines the connections among nodes in graphs. The architecture uses a fixed maximum number of nodes , but it is capable of generating structures of fewer nodes by dropping the nodes that are not connected to any of the other node in the generated graph. Since both the adjacency matrix and the label matrix are discrete and the categorical sampling procedure is non-differentiable, we directly use the continuous objects and during the training procedure. The original GAN structure uses the Jensen-Shannon (JS) divergence to measure the distance, which then cannot be used to generate discrete data. Therefore, we use variants of GANs that are based on Wasserstein distance, such as CT-GAN (Wei et al., 2018) and WGAN-GP (Gulrajani et al., 2017), which has been shown to be applicable to discrete data such as text.

Figure 1: The structure of the LGGAN discriminator with residual connections.

Discriminator   The discriminator takes a graph sample as input, represented by an adjacency matrix and a node label matrix , and it outputs a scalar value and a one-hot vector for the class label. For the discriminator, we use a graph convolutional network (GCN) (Kipf and Welling, 2017), which is a powerful neural network architecture for representation learning with complex graph structures. GCNs propagate information along graph edges with graph convolution layers. GCNs are also permutation-invariant, which is important when analyzing graphs because they are usually considered unordered objects.

We add residual connections between hidden layers of the GCN to allow the model to fuse information from previous layers. We find that these residual connections circumvent the issues reported by Kipf and Welling (2017) that limit their GCNs to only two or three layers. Allowing more depth is important because some graph types, such as proteins and molecules, have complex structure that require incorporation of information from nodes further in graph distance than are reachable with only three graph convolutions.

With residual connections, each layer of our GCN discriminator propagates with the following rule:


where is the output matrix at the th layer,

is the identity matrix,

is the adjacency matrix of the graph with self-connections added, is the diagonal degree matrix of graph , is the trainable weight matrix at the th layer, and

denotes an activation function (such as the sigmoid or ReLU

(Nair and Hinton, 2010)). Since we do not include node attributes, we set , where is the identity matrix.

After layers of propagation via graph convolutions, we aggregate the outputs from each layer with an aggregation function agg

, such as concatenation and max-pooling. We then concatenate the aggregated matrix with the node label matrix

and output as the final representation we learned for graph :


The representation of the graph will further be processed by a linear layer to produce the outputs of the discriminator: the graph-level scalar probability of the input being real data and a classifier to predict the category that the graph belongs to with a one-hot vector . We illustrate the structure of this discriminative model in Figure 1.

In Section 4.4, we evaluate the influence of the depth of GCN with and without residual connections and also with different aggregate functions to guide how to choose from different settings of LGGAN for the experiments.

3.3 Training

GANs (Goodfellow et al., 2014) train via a min-max game with two players competing to improve themselves. In theory, the method converges when it reaches a Nash equilibrium, where the samples produced by the generator match the data distribution. However, this process is highly unstable and often results in problems such as mode collapse (Goodfellow, 2016). To deal with the most common problems in training GAN, such as mode collapse and unstable training, we use the CT-GAN (Wei et al., 2018) framework, which is one of the state-of-the-art approaches. CT-GAN adds a consistency term to the Wasserstein GANs (WGAN-GP) (Salimans et al., 2016) that preserves Lipschitz continuity in the training procedure of WGAN-GPs. We also adopt several techniques such as feature matching and minibatch discrimination that were shown to encourage convergence and help avoid mode collapse (Salimans et al., 2016). Details are shown in the appendix.

3.4 Node Ordering

A common representation for graph structure uses adjacency matrices. However, using matrices to train a generative model introduces the issue of how to define the node ordering in the adjacency matrix. There are permutations of nodes, and it is time consuming to train over all of them.

For LGGAN, we use the framework of GCN with residual connections and a node aggregation operator (De Cao and Kipf, 2018) as the discriminator. This discriminator is invariant to node ordering, avoiding the issue. However, for the generator, we use an MLP, which does depend on node ordering. Therefore, we adapt the approach by You et al. (2018b) where we arrange the nodes in a breadth-first-search (BFS) ordering for each training graph.

In particular, we preprocess the adjacency matrix and node label matrix by feeding them into a BFS function. This function takes a random permutation of the nodes in graph as input, picks a node as the starting node, and then outputs another permutation that is a BFS ordering of the node in graph starting from node . By specifying a structure-determined node ordering for the graph, we only need to consider all possible BFS orderings, rather than all possible node permutations. This reduction makes a significant difference for computational complexity when graphs are large.

4 Experiments

In this section, we compare LGGAN with other graph generation methods to demonstrate its ability to generate high-quality labeled graphs in diverse settings. What’s more, we further evaluate the quality of the generated graphs by applying it to a downstream task graph classification.

4.1 Baselines

We compare our model against various traditional generative models for graphs, as well as some recently proposed deep graph generative models. For traditional baselines, we compare against the Erdös-Rényi model (E-R) (Erdös and Rényi, 1959), the Barabási-Albert (B-A) model (Albert and Barabási, 2002), and mixed-membership stochastic block models (MMSB) (Airoldi et al., 2008). Then we also compare with some recently proposed deep graph generative models such as the DeepGMG (Li et al., 2018) and GraphRNN (You et al., 2018b). Few current approaches are designed to generate labeled graphs. One exception is MolGAN (De Cao and Kipf, 2018), which is designed to generate only molecular graphs and needs specialized evaluation methods specific to that task, so we do not compare against it.

4.2 Datasets

We perform experiments on different kinds of datasets with varying sizes and characteristics. Details about the statistics of these datasets are given in the appendix.

Citation graphs   We test on scientific citation networks. We used the Cora and Citeseer datasets (Sen et al., 2008)

. The Cora dataset is a collection of 2,708 machine learning publications categorized into seven classes, and the CiteSeer dataset is a collection of 3,312 research publications crawled from the CiteSeer repository. To test the scalability of LGGAN, we extracted different subsets with different graph sizes by constraining the number of nodes in graph

. For small datasets (denoted Cora_small and Citeseer_small), we extract two-hop and three-hop ego networks with . For the large datasets (denoted Cora and Citeseer), we extract three-hop ego networks with . For the graph label, we set it to be the node label of the center node in the ego network.

Protein graphs   We also test on multiple collections of protein molecular graphs. The ENZYMES dataset consists of 600 enzymes (Schomburg et al., 2004). Each enzyme in the dataset is labeled with one of the six enzyme commission (EC) code top-level classes. The PROTEINS dataset includes proteins from the dataset of enzymes and non-enzymes created by Dobson and Doig (2003). There are two graph labels: enzymes and non-enzymes.

Figure 2: Comparison of the results with different GCN layers with or without residual connections.

4.3 Metrics for the Quality of Generated Labeled Graphs

To evaluate the quality of the generated graphs, we follow the approach used by You et al. (2018b): we compare a distribution of generated graphs with that of real ones by measuring the maximum mean discrepancy (MMD) (Gretton et al., 2012) of graph statistics, capturing how close their distributions are. We use four graph statistics to evaluate the generated graphs: degree distribution, clustering coefficient distribution, node-label distribution, and average orbit count statistics.

Since we are generating labeled graphs, we also want to evaluate the graph distribution in each class. To do this, we extract subgraphs for each class from both the training graphs and generated graphs and evaluate based on these three metrics (excluding the label distribution). These per-label tests help test whether the model simply assigns the class based on the label distribution without considering the underlying graph structure. Due to space, the details are shown in the appendix.

4.4 Evaluating the Design of the Residual GCN Discriminator

To evaluate the influence of residual connections and the depth of the GCN discriminator, we report the results for graph generation on the ENZYMES dataset based on the four evaluation metrics mentioned in Section

4.3. Through these tests, we aim to investigate the following design aspects: (1) the GCN depth, (2) residual connections, and (3) different aggregate functions (i.e., max-pooling and concatenation). We plot the results in Figure 2.

According to the plots, the performance does not have noticeable improvement with more than two or three GC layers unless we include residual connections. However, when adding the residual connections, using either aggregate function can train deeper GCNs with more than five or six layers, achieving high quality results that outperform other baseline models we compare to in Section 4.5. There is no notable difference between the aggregate functions. Since max-pooling does not introduce any additional parameters to learn, we use max-pooling as the aggregation function for residual connections in the remaining experiments.

To further evaluate the performance of GCN based discriminative model, we also compare LGGAN with a simple version where we use MLP as the discriminator. Due to space, please find the details in the appendix.

4.5 Comparing to Other Models

We compare LGGAN to other methods for generating graphs—both traditional generative models such as E-R, B-A, and MMSB as well as deep generative models that were proposed recently, such as GraphRNN and DeepGMG. DeepGMG cannot be used to generate large graphs due to its high computational complexity, so the results of DeepGMG on large graph datasets are not available. For each method, we measure three aspects. The first is the quality of the generated graphs, which should be able to mimic typical topology of the training graphs. The second is the generality, where a good generative model should be able to generalize to different and complex graph-structured data. Then the last aspect is the scalability, where we want the model to be able to scale up to generate large networks instead of being restricted to relatively small graphs.

Table 1 lists results from our comparison. LGGAN achieves the best performance on all datasets, with a decrease of MMD on average compared with traditional baselines, and a decrease of MMD compared with the state-of-the-art deep learning baseline GraphRNN. Although GraphRNN performs well on the two smaller protein-related datasets, ENZYMES and PROTEINS, it does not maintain the same performance on large datasets, such as Cora and Citeseer.

Cora_small Citeseer_small Cora
Degree Clustering Orbit Label Degree Clustering Orbit Label Degree Clustering Orbit Label
E-R 0.68 0.94 0.48 N/A 0.63 0.86 0.12 N/A 0.88 1.45 0.27 N/A
B-A 0.31 0.53 0.11 N/A 0.37 0.18 0.11 N/A 0.54 1.06 0.16 N/A
MMSB 0.21 0.68 0.07 0.48 0.17 0.50 0.11 0.32 0.12 0.68 0.09 0.49
DeepGMG 0.34 0.44 0.27 N/A 0.27 0.36 0.20 N/A - - - -
GraphRNN 0.26 0.38 0.39 N/A 0.19 0.20 0.39 N/A 0.20 0.46 0.11 N/A
LGGAN 0.13 0.08 0.03 0.11 0.17 0.13 0.04 0.09 0.15 0.21 0.06 0.01
Degree Clustering Orbit Label Degree Clustering Orbit Label Degree Clustering Orbit Label
E-R 0.31 1.06 0.28 N/A 0.38 1.26 0.08 N/A 0.82 1.57 0.06 N/A
B-A 0.93 0.88 0.05 N/A 1.17 1.08 0.51 N/A 0.32 1.04 0.08 N/A
MMSB 0.46 1.05 0.21 0.01 0.55 1.08 0.05 0.92 0.08 0.50 0.11 0.32
DeepGMG 0.96 0.63 0.16 N/A 0.43 0.38 0.08 N/A - - - -
GraphRNN 0.04 0.18 0.06 N/A 0.06 0.20 0.07 N/A 0.20 1.15 0.14 N/A
LG-GAN 0.18 0.15 0.02 0.01 0.09 0.17 0.03 0.01 0.25 0.12 0.06 0.15
Table 1: Comparison of LGGAN and other generative models on different graph structured data using MMD evaluation metrics.

4.6 Downstream Task: Graph Classification

To further evaluate the quality of LGGAN’s generated graphs, we extract the generated examples and apply it to a downstream task. We use the synthetic graphs to train a model for graph classification. We first compute a kernel matrix for each of a set of graph kernels, where represents an inner product between representations of and

. Then we can train kernel support vector machines (SVM) to classify the graphs. In our experiment, we choose three popular graph kernels: the graphlet kernel (GK) based on subgraph patterns, the shortest-path kernel (SP) based on random walks, and the Weisfeiler-Lehman subtree kernel (WL) based on subtrees.

We compare performance when training on synthetic graphs from LGGAN, MMSB (the only baseline model that can generate labeled graphs), and real graphs. We run this procedure with three datasets: Cora_small, ENZYMES and PROTEINS. For each dataset, we run ten trials and calculate the average value of the accuracy.

We list results in Table 2. The accuracy of models trained with graphs generated by LGGAN is close to those trained using the real graphs, especially compared to the models trained with graphs generated by MMSB. These results suggest that the graphs generated by LGGAN can better capture the important aspects of graph structure.

gen_MMSB 22.62 21.34 23.15 15.38 14.29 17.86 64.32 65.61 64.29
gen_LGGAN 23.44 26.56 26.92 23.08 26.92 30.19 65.61 66.54 69.23
real_graphs 26.56 29.69 34.32 23.08 29.68 35.71 69.05 73.81 76.19
Table 2: Comparison of graph classification accuracy with different kernels: graphlet kernel (GK), shortest-path kernel (SP) and Weisfeiler-Lehman subtree kernel (WL) on citation and protein datasets with the other labeled graph generation model MMSB.

4.7 Scalability

To evaluate the scalability of these methods, we perform experiments on two different subsets of the Cora dataset with different graph sizes: the Cora_small and Cora datasets. As listed in Table 1, the traditional models all create a large gap between these two datasets in terms of three evaluation metrics. For the deep generative models, DeepGMG cannot generate large graphs due to the computational complexity of its generation procedure which tries to add node one by one increasingly. And compared to GraphRNN, LGGAN MMD scores barely increase compare to smaller dataset, suggesting that our model is more reliable and has the best ability to scale up to large graphs.

4.8 Generality

To evaluate the ability of LGGAN to adapt to different graph-structured data, we evaluate the results of all methods on the different domains of citation ego-networks (Cora) and molecular protein graphs (ENZYMES). From Table 1, LGGAN achieves more consistent results on various datasets compared to other models, where some of them suffer from the issue of generalization such as MolGAN which can only be used to generate specific or limited types of graph-structured data.

Some examples are visualized in Figure 3, which contains graphs generated by our model and the baselines. Although it is not as intuitive for humans to assess as, e.g., natural images, one can still see that LGGAN appears to capture the typical structures of datasets better than other models.

Figure 3: Visualization of training graphs (first row); graphs generated by traditional models (second row): E-R model, B-A model, MMSB model; graphs generated by deep models (third row): DeepGMG, GraphRNN, LGGAN for different datasets.

4.9 Diversity

A good labeled graph generative model should generate diverse examples. Two types of diversity are important: (1) diversity among generated examples would capture the natural variations in real graphs; and (2) diversity compared to training examples ensures that the generative model is doing more than exactly memorizing some training examples and outputting copies of them. Generative models should balance the need for generated outputs to be new graphs unseen during training while retaining important properties of the real data.

Therefore, to investigate to what extent our model can maintain these types of diversity, we calculate the Weisfeiler-Lehman kernel value for both the training graphs and the generated graphs (by MMSB and LGGAN) and compute the kernel distance between any graphs and as


We plot histograms of the minimum distances between each generated example and the training set in Figure 4 for four datasets, Cora_small, Citeseer_small, PROTEINS and ENZYMES. In each plot, the left column shows the minimum distance of each training graph to any other graph; the middle column shows the minimum distance for each graph generated by LGGAN to the training graphs; and the right column shows the same for MMSB. These plots suggest that the graphs generated by our model are more similar to training graphs than the examples generated by MMSB, yet they are not exact copies of training graphs and have a similar diversity of graph distances as the real data.

Figure 4: Histogram of the distances between training graphs and graphs generated by LGGAN and MMSB.

5 Conclusion

In this work, we proposed a deep generative model using a GAN framework that generates labeled graphs. These labeled graphs can mimic distributions of citation graphs, knowledge graphs, social networks, and more. We also introduced an evaluation method for labeled graphs to measure how well the model learns the sub-structure of the labeled graphs. Our model can be useful for simulation studies, especially when access to labeled graph data is limited by access or privacy concerns. We can use these models to generate synthetic datasets or augment existing datasets, to do graph-based analyses such as communication segmentation, node classification, anomaly detection, and link prediction. The experiments show that it outperforms other state-of-the-art models for generating graphs while also being capable of the previously unaddressed task of generating labels for nodes.

6 Acknowledgements

We thank NVIDIA for donating hardware through their GPU Grant Program and Amazon for donating cloud computing resources through their AWS Cloud Credits for Research Program. We appreciate their support of our research.


  • Airoldi et al. [2008] Edoardo M. Airoldi, David M. Blei, Stephen E. Fienberg, and Eric P. Xing. Mixed Membership Stochastic Blockmodels. Journal of Machine Learning Research, 9(Sep):1981–2014, 2008.
  • Albert and Barabási [2002] Réka Albert and Albert-László Barabási. Statistical Mechanics of Complex Networks. Reviews of Modern Physics, 74(1):47, 2002.
  • Arjovsky et al. [2017] Martin Arjovsky, Soumith Chintala, and Léon Bottou. Wasserstein GAN. arXiv preprint arXiv:1701.07875, 2017.
  • Bojchevski et al. [2018] Aleksandar Bojchevski, Oleksandr Shchur, Daniel Zügner, and Stephan Günnemann. NetGAN: Generating graphs via random walks. In International Conference on Learning Representations (ICLR), 2018.
  • De Cao and Kipf [2018] Nicola De Cao and Thomas Kipf. Molgan: An implicit generative model for small molecular graphs. arXiv preprint arXiv:1805.11973, 2018.
  • Dobson and Doig [2003] Paul D Dobson and Andrew J Doig. Distinguishing Enzyme Structures from Non-enzymes Without Alignments. Journal of molecular biology, 330(4):771–783, 2003.
  • Erdös and Rényi [1959] P. Erdös and A. Rényi. On Random Graphs I. Publicationes Mathematicae Debrecen, 6:290–297, 1959.
  • Fan and Huang [2017] Shuangfei Fan and Bert Huang. Recurrent Collective Classification. Knowledge and Information Systems, pages 1–15, 2017.
  • Goldenberg et al. [2010] Anna Goldenberg, Alice X. Zheng, Stephen E. Fienberg, and Edoardo M. Airoldi. A survey of statistical network models. Foundations and Trends® in Machine Learning, 2(2):129–233, 2010.
  • Goodfellow [2016] Ian Goodfellow. NIPS 2016 tutorial: Generative adversarial networks. arXiv preprint arXiv:1701.00160, 2016.
  • Goodfellow et al. [2014] Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets. In Advances in Neural Information Processing Systems, pages 2672–2680, 2014.
  • Gretton et al. [2012] Arthur Gretton, Karsten M. Borgwardt, Malte J. Rasch, Bernhard Schölkopf, and Alexander Smola. A kernel two-sample test. Journal of Machine Learning Research, 13(Mar):723–773, 2012.
  • Gulrajani et al. [2017] Ishaan Gulrajani, Faruk Ahmed, Martin Arjovsky, Vincent Dumoulin, and Aaron C Courville. Improved training of Wasserstein GANs. In Advances in Neural Information Processing Systems, pages 5767–5777, 2017.
  • Hamilton et al. [2017] Will Hamilton, Zhitao Ying, and Jure Leskovec. Inductive representation learning on large graphs. In Advances in Neural Information Processing Systems, pages 1024–1034, 2017.
  • Kersting et al. [2016] Kristian Kersting, Nils M. Kriege, Christopher Morris, Petra Mutzel, and Marion Neumann. Benchmark data sets for graph kernels, 2016. URL
  • Kipf and Welling [2017] Thomas N. Kipf and Max Welling. Semi-supervised classification with graph convolutional networks. In International Conference on Learning Representations (ICLR), 2017.
  • Li et al. [2018] Yujia Li, Oriol Vinyals, Chris Dyer, Razvan Pascanu, and Peter Battaglia. Learning deep generative models of graphs. arXiv preprint arXiv:1803.03324, 2018.
  • Mirza and Osindero [2014] Mehdi Mirza and Simon Osindero. Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784, 2014.
  • Nair and Hinton [2010] Vinod Nair and Geoffrey E. Hinton. Rectified linear units improve restricted Boltzmann machines. In Proceedings of the 27th International Conference on Machine Learning, pages 807–814, 2010.
  • Odena et al. [2017] Augustus Odena, Christopher Olah, and Jonathon Shlens. Conditional image synthesis with auxiliary classifier GANs. In Proceedings of the 34th International Conference on Machine Learning-Volume 70, pages 2642–2651. JMLR. org, 2017.
  • Salimans et al. [2016] Tim Salimans, Ian Goodfellow, Wojciech Zaremba, Vicki Cheung, Alec Radford, and Xi Chen. Improved techniques for training GANs. In Advances in Neural Information Processing Systems, pages 2234–2242, 2016.
  • Schomburg et al. [2004] Ida Schomburg, Antje Chang, Christian Ebeling, Marion Gremse, Christian Heldt, Gregor Huhn, and Dietmar Schomburg. Brenda, the enzyme database: updates and major new developments. Nucleic Acids Research, 32(suppl_1):D431–D433, 2004.
  • Sen et al. [2008] Prithviraj Sen, Galileo Namata, Mustafa Bilgic, Lise Getoor, Brian Galligher, and Tina Eliassi-Rad. Collective classification in network data. AI Magazine, 29(3):93, 2008.
  • Wei et al. [2018] Xiang Wei, Boqing Gong, Zixia Liu, Wei Lu, and Liqiang Wang. Improving the improved training of Wasserstein GANs: A consistency term and its dual effect. In International Conference on Learning Representations (ICLR)., 2018.
  • You et al. [2018a] Jiaxuan You, Bowen Liu, Rex Ying, Vijay Pande, and Jure Leskovec. Graph convolutional policy network for goal-directed molecular graph generation. In Advances in Neural Information Processing Systems, 2018a.
  • You et al. [2018b] Jiaxuan You, Rex Ying, Xiang Ren, William Hamilton, and Jure Leskovec. GraphRNN: Generating realistic graphs with deep auto-regressive models. In International Conference on Machine Learning, pages 5694–5703, 2018b.
  • Yu et al. [2017] Lantao Yu, Weinan Zhang, Jun Wang, and Yong Yu. SeqGAN: Sequence generative adversarial nets with policy gradient. In

    Thirty-First AAAI Conference on Artificial Intelligence

    , 2017.
  • Zhang et al. [2017] Han Zhang, Tao Xu, Hongsheng Li, Shaoting Zhang, Xiaogang Wang, Xiaolei Huang, and Dimitris N Metaxas. StackGAN: Text to photo-realistic image synthesis with stacked generative adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision, pages 5907–5915, 2017.
  • Zheleva et al. [2012] Elena Zheleva, Evimaria Terzi, and Lise Getoor. Privacy in social networks. Synthesis Lectures on Data Mining and Knowledge Discovery, 3(1):1–85, 2012.

Appendix A Different variants of LGGAN

a.1 Alternate GAN frameworks

Since the GAN framework was introduced by Goodfellow et al. [2014], many variations have been proposed that proved to be powerful for generation tasks. Therefore, we adopt three popular variations and compare how well they perform for our task of labeled graph generation. We use the traditional, original GAN approach as the first approach. Beyond the traditional GAN framework, we use two other methods that include extra information of classification labels for the graphs themselves. The first follows the conditional GAN [Mirza and Osindero, 2014] framework, which feeds the graph label as an extra input to the generator in addition to the noise . We can use this label to generate the graphs of different types. To improve on this, our last variation uses the auxiliary conditional GAN [Odena et al., 2017] framework, in which the discriminator not only distinguishes whether the graph is real or fake, but it also incorporates a classifier of the graph labels. The structure of all those three variations is illustrated in Figure 5.

Generative adversarial networks

Generative adversarial networks (GANs) [Goodfellow et al., 2014] train implicit generative models by competitively training two neural networks. The first is the generative model , which learns to map from a latent space to the distribution of the target data. The second network is the discriminative model , which tries to separate the real data and the candidates predicted by the generator. These two networks compete with each other during training, via different objective functions. They adapt to improve itself based feedback from each other. The generator and the discriminator can be seen as two players in a minimax game where where the generator tries to produce samples realistic enough to fool the discriminator, and the discriminator tries to differentiate the samples from the real data and the generator correctly. The famous objective for GAN training is


Conditional GAN

After the original GAN was proposed, Mirza and Osindero [2014] introduced a conditional variant, which can be constructed by simply modifying the GAN framework to feed the class label to both the generator and discriminator. By doing this, the model is then able to generate fake data that are conditioned on class labels. This ability means that this model gives more control over the generated data. The objective function of this conditional GAN is similar to the original version, only adding the condition of to both generator and discriminator:



Beyond the conditional GAN, Odena et al. [2017] proposed another variant of the GAN called AC-GAN that extends the idea of conditional GAN. In the ACGAN framework, each generated sample has a pre-defined (usually randomly chosen) class label, in addition to the noise as the input to the generator . The generator then uses both to generate samples . The discriminator will output probability distributions over both data and the class labels, . The objective function of AC-GAN has two parts: the log-likelihood of the correct data, , and the log-likelihood of the correct class, .


The objective function of the discriminator is , while the the objective function for the generator is . From the structure, we can see that the procedure of learning a mapping from to the representation of AC-GANs is independent of class label. There are mainly two differences between AC-GAN and conditional GAN: First, conditional GAN conditioned labels to both generator and discriminator. However, in the structure of AC-GAN, it is only fed to the generator. Second, in AC-GAN, the discriminator not only tries to classify whether the sample comes from the real data or not, but it also has a classifier that outputs the probability distribution over the class labels. These modifications to the standard GAN formulation can produce excellent results and also help to stabilize the training procedure.

Figure 5: The adversarial training framework of LGGAN using different GAN structures. Conditional GANs and Auxiliary Conditional GANs incorporate increasing amounts of secondary information to aid the training process.

a.2 Different models for the discriminator

For the discriminator, We also compare two different discriminative models, a simple multi-layered perceptron (MLP) and GCN with residual connections and the advanced model is what we proposed in our paper which is a GCN with residual connections. The advanced model is comprised by a series of graph convolution layers and a layer aggregation operator to integrate useful information from each layer for learning more powerful graph representations. We refer to the whole framework using the simple model as LGGAN_s and using what we proposed as simply LGGAN.

We evaluate these six different variants for LGGAN (three different GAN frameworks for either LGGAN or LGGAN_s) on different graph-structured data to measure the quality of the learned generative models. We run experiments on two datasets: Cora_small and ENZYMES. The results are listed in Table 3.

Among all the three GAN frameworks, LGGAN_ACGAN achieves the best results on both datasets regardless of which discriminative model is used. This result matches with our expectations, since the AC-GAN framework incorporates the class information allowing it to learn a better embedding and to propagate that information to the generator.

Also, we noticed that using GCN with residual connections added can improve the quality of generated graphs regardless of which GAN framework is used. An interesting point is that there is a gap between LGGAN_s and LGGAN on the ENZYMES dataset that is much larger than on the citation networks. This difference may be because the Cora_small dataset is composed of many small two-hop and three-hop ego networks where the structure is quite simple and uniform—so it could be easier to learn. However, with the ENZYMES dataset, the structure is more complicated and diverse. Therefore, this result reveals that the LGGAN_s is unable to generalize to complex data. In contrast, the quality of generated graphs with LGGAN using GCN with residual connection is more consistent among different datasets, which suggests that LGGAN can adaptively adjust to different graph-structured data. Based on these results, we use LGGAN_ACGAN in the remaining experiments and refer to it as LGGAN when comparing with other baselines.

GAN frameworks Cora_small ENZYMES
Degree Clustering Orbit Label Degree Clustering Orbit Label
LGGAN_GAN_s 0.27 0.18 0.03 0.37 0.67 0.88 0.004 0.01
LGGAN_GAN 0.21 0.14 0.01 0.15 0.31 0.20 0.01 0.01
LGGAN_CGAN_s 0.18 0.18 0.006 0.35 0.53 0.69 0.04 0.004
LGGAN_CGAN 0.10 0.24 0.01 0.19 0.23 0.13 0.02 0.01
LGGAN_ACGAN_s 0.14 0.009 0.06 0.13 0.51 0.29 0.03 0.01
LGGAN_ACGAN 0.13 0.08 0.03 0.11 0.09 0.17 0.01 0.01
Table 3: Comparison of LGGAN with different GAN frameworks and discriminative models on Cora_small and ENZYMES datasets.

Appendix B Datasets

We perform experiments on different kinds of datasets, with varying sizes and characteristics, such as the Enzymes, Protein, D&D dataset [Dobson and Doig, 2003, Kersting et al., 2016], and also datasets of citation graphs such as Cora and Citeseer [Sen et al., 2008]. The summary of the statistics for these datasets is shown in Table 4

Graph Types Datasets # Graphs # Graph classes Avg. Avg. # Node labels
Citation graphs Cora_small 256 7 38.7 61.6 7
Citeseer_small 256 6 44.2 82.7 6
Cora_large 128 7 175.3 326.3 7
Citeseer_large 128 6 172.5 414.7 6
Protein graphs PROTEINS 384 2 28.1 53.4 3
ENZYMES 256 6 39.4 77.7 3
Table 4: Details of the graph datasets.

Appendix C Training

For training, we adopt several techniques such as feature matching and minibatch discrimination that were shown to encourage convergence and help avoid mode collapse.

Wasserstein GAN

Wasserstein GAN (WGAN) framework [Arjovsky et al., 2017]

, as it prevents mode collapse and leads to more stable training. In this work, they introduced WGANs which minimize an approximation of the Wasserstein distance between the real distributions and the distribution of the generated samples. They proposed to use gradient clipping as an constraint on the 1-Lipschitz continuity to help WGAN to converge. In a later followup work,

Gulrajani et al. [2017]

proposed a better method that uses a gradient penalty as an alternative soft constraint compared to the gradient clipping. Therefore, the loss function for the discriminator is modified to



CT-GAN [Wei et al., 2018] is a recently proposed model based on WGAN-GP. It improves the WGAN-GP approach by adding a soft consistency term to enforce the Lipschitz constraint. We train our model based on it since CT-GAN has been shown to further stabilize the training procedure. The objective function for CT-GAN is


Feature matching

To stabilize the training procedure of GAN, Salimans et al. [2016] proposed another technique: feature matching to prevent the generator from overtraining on the current discriminator. It specifies a new objective function for the generator:


where denote the activation of an intermediate layer of the discriminator. Instead of directly maximizing the output of the discriminator, the generator is trained to match the expected value of the features on an intermediate layer of the discriminator. The discriminator is thus trained to find the features that are most discriminative between the real samples from samples produced by the generative model.

Appendix D Evaluating Labeled Graphs

To better evaluate the structure of the labeled graphs being generated, we also calculate MMD of the three graph statistics for the sub-graphs centered around each node class, taking the average MMD value across all classes. Since among existing methods, only MMSB can be used to directly generate labels, we compare LGGAN to it using the ENZYMES datasets. The results are listed in Table 5. LGGAN can not only learn a good distribution of the labels, but it is also able to learn the structure within each class much more reliably than the MMSB model.

Graph statistics Sub-graph statistics
Degree Clustering Orbit Label Avg. D Avg. C Avg. O
MMSB 0.55 1.08 0.05 0.92 0.14 0.20 0.03
LGGAN 0.09 0.17 0.03 0.01 0.13 0.15 0.01
Table 5: Comparison of LGGAN with MMSB on both the graph statistics and average sub-graph statistics of different classes using MMD evaluation metrics on the ENZYMES dataset.