1 Introduction
Graphs are used to capture relational structure in many domains, including knowledge bases (Hamaguchi et al., 2017), social networks (Hamilton et al., 2017; Kipf and Welling, 2016), protein interaction networks (Fout et al., 2017), and physical systems (Batagelj and Zaversnik, 2003). Generating graphs using suitable probabilistic models has many applications, such as drug design (Duvenaud et al., 2015; GómezBombarelli et al., 2018; Li et al., 2018a), creating computation graphs for architecture search (Xie et al., 2019), as well as research in network science (Watts and Strogatz, 1998; Albert and Barabási, 2002; Leskovec et al., 2010).
While many stochastic models of graphs have been proposed, the idea of learning statistical generative models of graphs from data has recently gained significant attention. One approach is to use latent variable generative models similar to variational autoencoders
(Kingma and Welling, 2013). Examples include GraphVAE (Simonovsky and Komodakis, 2018), Graphite (Grover et al., 2018), and junction tree variational autoencoders (Jin et al., 2018). These models typically use a graph neural network (GNN) (Gori et al., 2005; Scarselli et al., 2008) to encode graph data to a latent space, and generate samples by decoding latent variables sampled from a prior distribution. The second paradigm is autoregressive graph generative models (Li et al., 2018a; You et al., 2018a; Liao et al., 2019), where graphs are generated sequentially, one node (or one subgraph) at a time.Although these models have achieved great success, they are not satisfying in terms of capturing the permutation invariance properties of graphs. Permutation invariance is a fundamental inductive bias of graphstructured data. For a graph with nodes, there are up to
different adjacency matrices that are equivalent representations of the same graph. Therefore, a graph generative model should ideally assign the same probability to each of these equivalent adjacency matrices. It is challenging, however, to enforce permutation invariance in variational autoencoders or autoregressive models. Some previous approaches only approximately induce permutation invariance: GraphVAE
(Simonovsky and Komodakis, 2018) uses inexact graph matching techniques requiring up to operations, whereas the model in Li et al. (2018a)augments the training data by randomly permuting the nodes of existing data. Other approaches instead focus on selecting a specific node ordering based on heuristics: GraphRNN
(You et al., 2018b) uses random breadthfirst search (BFS) to determine an ordering, and GRAN (Liao et al., 2019) adaptively chooses an ordering depending on the input graph from a family of predefined node orderings.To better capture the permutation invariance of graphs, we propose a new graph generative model using the framework of scorebased generative modeling (Song and Ermon, 2019)
. Intuitively, this approach trains a model to capture the vector field of gradients of the log data density of graphs (a.k.a., scores). Contrary to likelihoodbased models such as variational autoencoders and autoregressive models, scorebased generative modeling imposes fewer constraints on the model architectures (e.g., a score does not have to be normalized). This enables the use of function families with desirable inductive biases, such as permutation invariance. In particular, we leverage graph neural networks
(Scarselli et al., 2008) to build a permutation equivariant model for the scores of the distribution over graphs we wish to learn. As shown later in the paper, this implicitly defines a permutation invariant distribution over adjacency matrices representing graphs.As in other classes of deep generative models, the neural architecture used in scorebased generative modeling is critical to its success. In this work, we introduce a new type of graph neural networks, named EDPGNN, with learnable multichannel adjacency matrices. In our experiments, we first test the effectiveness of EDPGNN for the task of learning graph algorithms, where it significantly outperforms traditional GNNs. Next, we evaluate the generation quality of our scorebased models using MMD (Gretton et al., 2012) metrics on several graph datasets, where we achieved comparable performance to GraphRNN (You et al., 2018b), a competitive method for generative modeling of graphs.
2 Preliminaries
2.1 Notations
For each weighted undirected graph, we can choose an ordering of nodes and represent it with an adjacency matrix . Here we use the superscript to indicate that the rows/columns of are arranged in accordance with a specific node ordering . When the graph is undirected, the corresponding adjacency matrix is symmetric. We denote the set of adjacency matrices as .
A distribution of graphs can be represented as a distribution of adjacency matrices . Since graphs are invariant to permutations, and always represent the same graph for any different node orderings and . This permutation invariance also implies that , i.e., the distribution of adjacency matrices is invariant to node permutations. In the sequel, we often omit the superscript in when not emphasizing any specific node ordering.
2.2 Graph Neural Network (GNN)
Graph neural networks are a family of neural networks that map graphs to vector representations using messagepassing type operations on node features (Gori et al., 2005; Scarselli et al., 2008). They are natural models for graphstructured data; for example, GIN (Xu et al., 2018a) is one type of GNN that is proved to be as expressive as the WeisfeilerLehman graph isomorphism test (WLtest). The message passing mechanism guarantees that the output representation of an input adjacency matrix is equivariant to permutations of the node ordering .
2.3 ScoreBased Generative Modeling
Scorebased generative modeling (Song and Ermon, 2019)
is a class of generative models. For a probability density function
, the score function is defined as . Instead of directly modeling the density function of the data distribution, scorebased generative modeling estimates the data score function
. The advantage is that the score function can be easier to model than the density function.For better score estimation, following (Song and Ermon, 2019) we perturb the data with Gaussian noise of different intensities, and estimate the scores jointly for all noise levels. We train a noise conditional model (e.g., a neural network parameterized by ) to approximate the score function corresponding to noise level . Given a data distribution , a noise distribution (e.g., ), and a sequence of noise levels , the training loss is defined as:
(1) 
where the expectation is taken with respect to the sampling process: . We note that all expectations in can be estimated with i.i.d. samples from and , which are easy to obtain. The objective is .
3 ScoreBased Generative Modeling for Graphs
Contrary to the weighted graphs we used to define the probability density function in Section 2.1, in realworld problems unweighted graphs are much more common, which means entries in the adjacency matrix can only be either 0 or 1. While the scorebased method (Song and Ermon, 2019) was initially proposed for handling continuous data, it can be adopted to generate discrete ones as well. Below, we first show our modifications of scorebased generative modeling for graph generation, and then introduce our specialized neural network architecture EDPGNN for the noise conditional model , where .
3.1 Noise Distribution
We add Gaussian perturbations to adjacency matrices and define the noise distribution as follows
(2) 
Intuitively, we only add Gaussian noise to the upper triangular part of the adjacency matrix, because we focus on undirected graphs whose adjacency matrices are symmetric.
Since , the training loss of is
(3) 
where the expectation is over the sampling process defined via and . The objective is .
Note that the supports of the noise distributions span , where is the number of nodes of the input graph. Therefore, the scores of perturbed distributions corresponding to all noise levels are welldefined, regardless of whether the training samples are discrete or not.
3.2 Sampling
To generate , we first sample , which is the number of nodes to be generated, and then sample with annealed Langevin dynamics. This amounts to factorizing . Implementationwise, we sample from the empirical distribution of number of nodes in the training dataset, as done in (Li et al., 2018b). When doing annealed Langevin dynamics, we first initialize
using folded normal distributions,
i.e.,where all . Then, we update by iteratively sampling from a series of trained conditional score models using Langevin dynamics. For each of the conditional score model , we run Langevin dynamics for steps, where the series is annealed down over the process such that is large but is small enough that it can be ignored. As a minor modification, we change the noise term in Algorithm 1 to a symmetric one , given by
which accouts for the symmetry of adjacency matrices.
Scorebased generative modeling provides samples in the continuous space, whereas graph data are often discrete. In order to obtain discrete samples, we quantize the generated continuous adjacency matrix (denoted as ) to a binary one (denoted as ) at the end of annealed Langevin dynamics. Formally, this quantization operation is defined as
(4) 
where is an indicator function that evalutes to 1 when the condition holds and 0 otherwise.
3.3 Permutation Equivariance and Invariance
Permutation invariance is a desirable property of graph generative models, since the true distribution is inherently permutation invariant. We show that by using a permutation equivariant score function , the corresponding distribution is permutation invarant.
Theorem 1.
If is a permutation equivariant function, then the scalar function is permutation invariant, where is the Frobenius inner product, is any curve from to , and is a constant.
Proof.
See Appendix B. ∎
Since the gradient of loglikelihood estimation is permutation equivariant, the implicitly defined loglikelihood function is permutation invariant, according to Theorem 1, given below by the line integral of .
3.4 Edgewise Dense Prediction Graph Neural Network (EDPGNN)
Below, we introduce a GNNbased score network that can effectively model the scores of graph distributions while being permutation equivariant.
3.4.1 MultiChannel GNN Layer
We introduce the multiChannel GNN layer , an extended version of the GIN (Xu et al., 2018a) layer, which serves as a basic component of our EDPGNN model. The intuition is to run messagepassing simultaneously on many different graphs, and collect the node features from all the channels via concatenation. For a channel GNN layer with messagepassing steps, the th messagepassing step can be expressed as follows,
where is the index of nodes, is the number of channels, is the multichannel adjacency matrix, and is the vector of node features. Here is a learnable parameter, the same as in the original GIN, stands for the concatenation operation, and
transforms each node feature using a multilayer perceptron.
After steps of messagepassing, we use the same concatenation operation as GIN to obtain node features. Specifically, for each node , the output feature is given by
Henceforth, we denote our MultiChannel GNN layer as
3.4.2 EDPGNN Layer
The EDPGNN layer is the key component of our model. It transforms the input adjacency matrix to another one, allowing us to adaptively change the process of message passing. The intuition is similar to neural networks for image dense prediction tasks (e.g., semantic parsing), where convolutional layers transform the input image to a feature map in a pixelwise manner, leveraging local information around each pixel location. Similarly, we want our GNN layer to extract edgewise features and map them to a new adjacency matrix, using local information (which is defined in terms of connectivity) of each node in the graph.
One EDPGNN layer has two steps:

Node feature inference: Using MultiChannelGNN to encode the local structure of different channels of the graph into node features, given by
(5) 
Edge feature inference: Updating the feature vector of each edge based on the current features of the edge and the updated feature vector of the two endpoints. For each edge , this operation is given by
where denotes a multilayer perceptron applied to edge features. To ensure symmetry, the new adjacency matrix is given by
(6)
3.4.3 Input and Output Layers
Input layer: Input graphs need to be preprocessed before they can be fed into our EDPGNN model. In particular, we take adjacency matrices of two channels as the input, where the first channel is the original adjacency matrix of an input graph, and the other channel is the negated version of the same adjacency matrix, where each entry is flipped. The node features are initialized using the weighted degrees. Formally,
where is the adjacency matrix of an input graph. If we have node features from data, then we use the following initialization for each node
Output layer: To get the output, we employ a similar approach to Xu et al. (2018b), where we aggregate the information from all previous layers to produce a set of permutation equivariant edge features. This can effectively collect information extracted in shallower layers. Formally, for each edge , the output features are given by
3.4.4 Noise Level Conditioning
The framework of scorebased generative modeling proposed in (Song and Ermon, 2019) requires a score network conditioned on a series of noise levels. We hope to provide the conditioning on noise levels with as few extra parameters as possible. To this end, we add gains and bias terms conditioned on the index of the noise level in all MLP layers, and share all the parameters across different noise levels. A conditional MLP layer for is denoted as
where are learnable parameters for each noise level and
denotes the activation function. We empirically found that this implementation of noise conditioning achieves similar performance to separately training a score network for each noise level.
3.4.5 Permutation Equivariance of EDPGNN
The message passing operations in a graph neural network are guaranteed to be permutation equivariant (Keriven and Peyré, 2019), as well as edgewise and nodewise operations for graphs. Since operations in EDPGNN are either message passing or edgewise/nodewise transformations, the edge features produced by EDPGNN are guaranteed to be permutation equivariant. In the last EDPGNN layer, each edge feature is one component of the estimated score. Hence Theorem 1 applies to this score network.
4 Related Work
FlowBased Graph Generative Models
In addition to models mentioned in Section 1, there is also an emerging class of graph generative models based on invertible mappings, such as GNF (Liu et al., 2019) and GraphNVP (Madhawa et al., 2019). These models modify the architecture of a graph neural network (GNN) using coupling layers (Dinh et al., 2016) to enable maximum likelihood learning via the change of variables formula. Since GNNs are permutation invariant, both GNF and GraphNVP could be permutation invariant in principle. However, GraphNVP opts not to be permutation invariant because making their model fully permutation invariant hurts the empirical performance. In contrast, GNF is a permutation invariant model. It achieves permutation invariance by first using a permutation equivariant autoencoder to encode the graph structure into a set of node features, and then model the distribution of the node features using reversible graph neural networks.
GNNs that Learn Edge Features
Although the majority of GNNs focus on node feature learning, (e.g., node classification tasks), there are GNNs, prior to our EDPGNN, that have intermediate edge features as well. For example, Graph Attention Networks (Veličković et al., 2017) compute an attention coefficient for each edge during message passing (MP) steps. Gong and Cheng (2019) further explored methods to utilize edge features during the MP steps, such as using normalized attention coefficients to construct a new adjacency matrix for the next MP step, and passing the message simultaneously on multiinput adjacency matrices. However, the model in Gong and Cheng (2019) is not designed for predicting edge features, and the capability to make edgewise prediction is limited by the normalizing operation and the restrictive form of attentions. Kipf et al. (2018) proposed a GNNbased VAE model for relational inference for interacting systems. Contrary to their model which predicts edge information based on only node features, our model takes a weighted graph without node features.
5 Experiments
5.1 Learning Graph Algorithms
In this section, we empirically demonstrate the power of the proposed EDPGNN model on edgewise prediction tasks. In particular, we reduce several classic graph algorithms to the task of predicting whether each edge is in the solution set or not. The training data include a graph and the corresponding solution set, and we train our models to fit the solution set by minimizing the crossentropy loss.
Setup
To verify the ability of EDPGNN of making edgewise dense predictions, we tested EDPGNN on learning classic graph algorithms, by labeling all the edges in a graph to indicate whether an edge is in the solution set or not. We choose two simple tasks, 1) Shortest Path (SP) between a given pair of nodes, and 2) Maximum Spanning Tree (MST) of a given graph. The solution set of SP corresponds to a path connecting the pair of nodes with the shortest length, while the solution set of MST is the collection of all edges inside the maximum spanning tree. For both tasks, all the graphs are randomly sampled from the Erdős and Rényi model (ER) (Erdos and Rényi, 1960) with and . For weighted graphs, all the edge weights are uniformly sampled from . A prediction is considered correct if and only if all the labels of the graph are correct. We calculate the accuracy over a fixed test set as the metric. For the baseline model, we use vanilla GIN (Xu et al., 2018a).
Training
During training, we generate the training data dynamically on the fly and use the crossentropy loss as the training objective for both tasks.
Model  SP (UW)  SP (W)  MST (W) 
GIN  0.57  0.12  0.20 
EDPGNN  0.60  0.92  0.84 
Results
All results are provided in Tab. 1. We observe that EDPGNN performs similarly to GIN for unweighted graphs, but achieves much better performance when graphs are weighted. This confirms that EDPGNN is more effective for edgewise predictions.
5.2 Graph Generation Task
In this section, we demonstrate that our EDPGNN is capable of producing highquality graph samples via scorebased generative modeling. To better understand learnable multichannel adjacency matrices in our model, we visualize the intermediate channels in Figure 2, and perform extensive ablation studies.
Datasets and Baselines
We tested our model on two datasets, Communitysmall () and Egosmall (), which are also used by You et al. (2018b), and Liu et al. (2019). See Appendix A for more details. Our baselines include GraphRNN (You et al., 2018b), Graph Normalizing Flow(GNF) (Liu et al., 2019), GraphVAE (Simonovsky and Komodakis, 2018), and DeepGMG (Li et al., 2018a).
Metrics
To evaluate generation quality, we used maximum mean discrepancy (MMD) over some graph statistics, as proposed by You et al. (2018b). We calculated MMD for three graph statistics: 1) degree distribution, 2) cluster coefficient distribution, and 3) the number of orbits with 4 nodes.
Results
We compare EDPGNN against baselines and summarize results in Tab. 2. Our model performs comparably to GraphRNN and GNF with respect to most MMD metrics, and outperforms all other methods when considering the overall average of MMDs on two datasets.
5.2.1 Understanding Intermediate Channels
Intuitively, the intermediate channels of EDPGNN should be analogous to those in convolutional neural networks (CNN) feature maps. Since channels of feature maps can be visualized as images in CNNs, we propose to visualize each channel of multichannel adjacency matrices as a graph. The EDPGNN layers should be able to map an input graph to intermediate graphs that possess interpretable semantics.
In Figure 2, we visualize the channels of intermediate adjacency matrices for a EDPGNN model trained on the Communitysmall dataset. We observe that the model processes a perturbed community graph with no clearly visible structures to a graph with a structure of two equalsized communities.
As implied by the training objective (3), the score network can perfectly predict the ground truth score, i.e., , if it can map the noiseperturbed graph to the true (noisefree) graph in some of the intermediate channels. Therefore, an ideal score network should be able to 1) understand the structure of a given graph, before 2) mapping a perturbed graph to the corresponding denoised graph. While previous GNNs are designed for the former task, EDPGNN is especially capable of solving the latter one.
5.2.2 Ablation Studies
To verify the importance of intermediate adjacency matrices in EDPGNN to be 1) learnable and 2) multichannel, we conducted ablative studies on Communitysmall and Egosmall datasets. We switched on/off the two properties respectively, and provide the performance comparison in Tab. 3. Note that EDPGNN is equivalent to vanilla GIN when intermediate adjacency matrices are singlechannel and nonlearnable. As shown in Tab. 3, both properties can improve the expressivity for score modeling, in the sense of reducing the training and test score matching losses. As expected, the performance is optimal when both properties are combined.
6 Conclusion
We propose a permutation invariant generative model for graphs based on the framework of scorebased generative modeling. In particular, we implicitly define a permutation invariant distribution over graph adjacency matrices by modeling the corresponding permutation equivariant score function and sampling with Langevin dynamics. For effective score modeling of graph distributions, we propose a new permutation equivariant GNN architecture, named EDPGNN, leveraging trainable, multichannel adjacency matrices as intermediate layers. Empirically, we demonstrate that EDPGNNs are more expressive than vanilla GNNs on predicting edgewise features, as evidenced by better performance on the task of learning classic graph algorithms such as shortest paths. Moreover, we show our model can produce samples with quality comparable to existing stateoftheart models. As one future direction, we hope to improve the scalability of our model by reducing the computational complexity, using techniques such as graph pooling (Ying et al., 2018).
Acknowledgements
This research was supported by Intel Corporation, Amazon AWS, TRI, NSF (#1651565, #1522054, #1733686), ONR (N000141912145), AFOSR (FA95501910024).
References
 Albert and Barabási (2002) Albert, R. and Barabási, A.L. (2002). Statistical mechanics of complex networks. Reviews of modern physics, 74(1):47.
 Batagelj and Zaversnik (2003) Batagelj, V. and Zaversnik, M. (2003). An o(m) algorithm for cores decomposition of networks. arXiv preprint cs/0310049.
 Dinh et al. (2016) Dinh, L., SohlDickstein, J., and Bengio, S. (2016). Density estimation using real nvp. arXiv preprint arXiv:1605.08803.
 Dobson and Doig (2003) Dobson, P. D. and Doig, A. J. (2003). Distinguishing enzyme structures from nonenzymes without alignments. Journal of molecular biology, 330(4):771–783.
 Duvenaud et al. (2015) Duvenaud, D. K., Maclaurin, D., Iparraguirre, J., Bombarell, R., Hirzel, T., AspuruGuzik, A., and Adams, R. P. (2015). Convolutional networks on graphs for learning molecular fingerprints. In Advances in neural information processing systems, pages 2224–2232.
 Erdos and Rényi (1960) Erdos, P. and Rényi, A. (1960). On the evolution of random graphs. Publ. Math. Inst. Hung. Acad. Sci, 5(1):17–60.
 Fout et al. (2017) Fout, A., Byrd, J., Shariat, B., and BenHur, A. (2017). Protein interface prediction using graph convolutional networks. In Advances in neural information processing systems, pages 6530–6539.
 Golomb (1996) Golomb, S. W. (1996). Polyominoes: puzzles, patterns, problems, and packings, volume 16. Princeton University Press.
 GómezBombarelli et al. (2018) GómezBombarelli, R., Wei, J. N., Duvenaud, D., HernándezLobato, J. M., SánchezLengeling, B., Sheberla, D., AguileraIparraguirre, J., Hirzel, T. D., Adams, R. P., and AspuruGuzik, A. (2018). Automatic chemical design using a datadriven continuous representation of molecules. ACS central science, 4(2):268–276.

Gong and Cheng (2019)
Gong, L. and Cheng, Q. (2019).
Exploiting edge features for graph neural networks.
In
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
, pages 9211–9219.  Gori et al. (2005) Gori, M., Monfardini, G., and Scarselli, F. (2005). A new model for learning in graph domains. In Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005., volume 2, pages 729–734. IEEE.

Gretton et al. (2012)
Gretton, A., Borgwardt, K. M., Rasch, M. J., Schölkopf, B., and Smola, A.
(2012).
A kernel twosample test.
Journal of Machine Learning Research
, 13(Mar):723–773.  Grover et al. (2018) Grover, A., Zweig, A., and Ermon, S. (2018). Graphite: Iterative generative modeling of graphs. arXiv preprint arXiv:1803.10459.
 Hamaguchi et al. (2017) Hamaguchi, T., Oiwa, H., Shimbo, M., and Matsumoto, Y. (2017). Knowledge transfer for outofknowledgebase entities: A graph neural network approach. arXiv preprint arXiv:1706.05674.
 Hamilton et al. (2017) Hamilton, W., Ying, Z., and Leskovec, J. (2017). Inductive representation learning on large graphs. In Advances in neural information processing systems, pages 1024–1034.
 Jin et al. (2018) Jin, W., Barzilay, R., and Jaakkola, T. (2018). Junction tree variational autoencoder for molecular graph generation. In International Conference on Machine Learning, pages 2328–2337.
 Keriven and Peyré (2019) Keriven, N. and Peyré, G. (2019). Universal invariant and equivariant graph neural networks. In Advances in Neural Information Processing Systems, pages 7090–7099.
 Kingma and Ba (2014) Kingma, D. P. and Ba, J. (2014). Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.
 Kingma and Welling (2013) Kingma, D. P. and Welling, M. (2013). Autoencoding variational bayes. arXiv preprint arXiv:1312.6114.
 Kipf et al. (2018) Kipf, T., Fetaya, E., Wang, K.C., Welling, M., and Zemel, R. (2018). Neural relational inference for interacting systems. In International Conference on Machine Learning, pages 2693–2702.
 Kipf and Welling (2016) Kipf, T. N. and Welling, M. (2016). Semisupervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907.
 Leskovec et al. (2010) Leskovec, J., Chakrabarti, D., Kleinberg, J., Faloutsos, C., and Ghahramani, Z. (2010). Kronecker graphs: An approach to modeling networks. Journal of Machine Learning Research, 11(Feb):985–1042.
 Li et al. (2018a) Li, Y., Vinyals, O., Dyer, C., Pascanu, R., and Battaglia, P. (2018a). Learning deep generative models of graphs. arXiv preprint arXiv:1803.03324.
 Li et al. (2018b) Li, Y., Zhang, L., and Liu, Z. (2018b). Multiobjective de novo drug design with conditional graph generative model. Journal of cheminformatics, 10(1):33.
 Liao et al. (2019) Liao, R., Zhao, Z., Urtasun, R., and Zemel, R. S. (2019). Lanczosnet: Multiscale deep graph convolutional networks. arXiv preprint arXiv:1901.01484.
 Liu et al. (2019) Liu, J., Kumar, A., Ba, J., Kiros, J., and Swersky, K. (2019). Graph normalizing flows.
 Madhawa et al. (2019) Madhawa, K., Ishiguro, K., Nakago, K., and Abe, M. (2019). Graphnvp: An invertible flow model for generating molecular graphs. arXiv preprint arXiv:1905.11600.
 Paszke et al. (2019) Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., et al. (2019). Pytorch: An imperative style, highperformance deep learning library. In Advances in Neural Information Processing Systems, pages 8024–8035.
 Scarselli et al. (2008) Scarselli, F., Gori, M., Tsoi, A. C., Hagenbuchner, M., and Monfardini, G. (2008). The graph neural network model. IEEE Transactions on Neural Networks, 20(1):61–80.
 Sen et al. (2008) Sen, P., Namata, G., Bilgic, M., Getoor, L., Galligher, B., and EliassiRad, T. (2008). Collective classification in network data. AI magazine, 29(3):93–93.
 Simonovsky and Komodakis (2018) Simonovsky, M. and Komodakis, N. (2018). Graphvae: Towards generation of small graphs using variational autoencoders. arXiv preprint arXiv:1802.03480.
 Song and Ermon (2019) Song, Y. and Ermon, S. (2019). Generative modeling by estimating gradients of the data distribution. arXiv preprint arXiv:1907.05600.
 Veličković et al. (2017) Veličković, P., Cucurull, G., Casanova, A., Romero, A., Lio, P., and Bengio, Y. (2017). Graph attention networks. arXiv preprint arXiv:1710.10903.
 Watts and Strogatz (1998) Watts, D. J. and Strogatz, S. H. (1998). Collective dynamics of ‘smallworld’networks. nature, 393(6684):440.
 Xie et al. (2019) Xie, S., Kirillov, A., Girshick, R., and He, K. (2019). Exploring randomly wired neural networks for image recognition. arXiv preprint arXiv:1904.01569.
 Xu et al. (2018a) Xu, K., Hu, W., Leskovec, J., and Jegelka, S. (2018a). How powerful are graph neural networks? arXiv preprint arXiv:1810.00826.
 Xu et al. (2018b) Xu, K., Li, C., Tian, Y., Sonobe, T., Kawarabayashi, K.i., and Jegelka, S. (2018b). Representation learning on graphs with jumping knowledge networks. In International Conference on Machine Learning, pages 5449–5458.
 Ying et al. (2018) Ying, Z., You, J., Morris, C., Ren, X., Hamilton, W., and Leskovec, J. (2018). Hierarchical graph representation learning with differentiable pooling. In Advances in neural information processing systems, pages 4800–4810.
 You et al. (2018a) You, J., Liu, B., Ying, Z., Pande, V., and Leskovec, J. (2018a). Graph convolutional policy network for goaldirected molecular graph generation. In Advances in neural information processing systems, pages 6410–6421.
 You et al. (2018b) You, J., Ying, R., Ren, X., Hamilton, W., and Leskovec, J. (2018b). Graphrnn: Generating realistic graphs with deep autoregressive models. In ICML, pages 5694–5703.
References
 Albert and Barabási (2002) Albert, R. and Barabási, A.L. (2002). Statistical mechanics of complex networks. Reviews of modern physics, 74(1):47.
 Batagelj and Zaversnik (2003) Batagelj, V. and Zaversnik, M. (2003). An o(m) algorithm for cores decomposition of networks. arXiv preprint cs/0310049.
 Dinh et al. (2016) Dinh, L., SohlDickstein, J., and Bengio, S. (2016). Density estimation using real nvp. arXiv preprint arXiv:1605.08803.
 Dobson and Doig (2003) Dobson, P. D. and Doig, A. J. (2003). Distinguishing enzyme structures from nonenzymes without alignments. Journal of molecular biology, 330(4):771–783.
 Duvenaud et al. (2015) Duvenaud, D. K., Maclaurin, D., Iparraguirre, J., Bombarell, R., Hirzel, T., AspuruGuzik, A., and Adams, R. P. (2015). Convolutional networks on graphs for learning molecular fingerprints. In Advances in neural information processing systems, pages 2224–2232.
 Erdos and Rényi (1960) Erdos, P. and Rényi, A. (1960). On the evolution of random graphs. Publ. Math. Inst. Hung. Acad. Sci, 5(1):17–60.
 Fout et al. (2017) Fout, A., Byrd, J., Shariat, B., and BenHur, A. (2017). Protein interface prediction using graph convolutional networks. In Advances in neural information processing systems, pages 6530–6539.
 Golomb (1996) Golomb, S. W. (1996). Polyominoes: puzzles, patterns, problems, and packings, volume 16. Princeton University Press.
 GómezBombarelli et al. (2018) GómezBombarelli, R., Wei, J. N., Duvenaud, D., HernándezLobato, J. M., SánchezLengeling, B., Sheberla, D., AguileraIparraguirre, J., Hirzel, T. D., Adams, R. P., and AspuruGuzik, A. (2018). Automatic chemical design using a datadriven continuous representation of molecules. ACS central science, 4(2):268–276.

Gong and Cheng (2019)
Gong, L. and Cheng, Q. (2019).
Exploiting edge features for graph neural networks.
In
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
, pages 9211–9219.  Gori et al. (2005) Gori, M., Monfardini, G., and Scarselli, F. (2005). A new model for learning in graph domains. In Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005., volume 2, pages 729–734. IEEE.

Gretton et al. (2012)
Gretton, A., Borgwardt, K. M., Rasch, M. J., Schölkopf, B., and Smola, A.
(2012).
A kernel twosample test.
Journal of Machine Learning Research
, 13(Mar):723–773.  Grover et al. (2018) Grover, A., Zweig, A., and Ermon, S. (2018). Graphite: Iterative generative modeling of graphs. arXiv preprint arXiv:1803.10459.
 Hamaguchi et al. (2017) Hamaguchi, T., Oiwa, H., Shimbo, M., and Matsumoto, Y. (2017). Knowledge transfer for outofknowledgebase entities: A graph neural network approach. arXiv preprint arXiv:1706.05674.
 Hamilton et al. (2017) Hamilton, W., Ying, Z., and Leskovec, J. (2017). Inductive representation learning on large graphs. In Advances in neural information processing systems, pages 1024–1034.
 Jin et al. (2018) Jin, W., Barzilay, R., and Jaakkola, T. (2018). Junction tree variational autoencoder for molecular graph generation. In International Conference on Machine Learning, pages 2328–2337.
 Keriven and Peyré (2019) Keriven, N. and Peyré, G. (2019). Universal invariant and equivariant graph neural networks. In Advances in Neural Information Processing Systems, pages 7090–7099.
 Kingma and Ba (2014) Kingma, D. P. and Ba, J. (2014). Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.
 Kingma and Welling (2013) Kingma, D. P. and Welling, M. (2013). Autoencoding variational bayes. arXiv preprint arXiv:1312.6114.
 Kipf et al. (2018) Kipf, T., Fetaya, E., Wang, K.C., Welling, M., and Zemel, R. (2018). Neural relational inference for interacting systems. In International Conference on Machine Learning, pages 2693–2702.
 Kipf and Welling (2016) Kipf, T. N. and Welling, M. (2016). Semisupervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907.
 Leskovec et al. (2010) Leskovec, J., Chakrabarti, D., Kleinberg, J., Faloutsos, C., and Ghahramani, Z. (2010). Kronecker graphs: An approach to modeling networks. Journal of Machine Learning Research, 11(Feb):985–1042.
 Li et al. (2018a) Li, Y., Vinyals, O., Dyer, C., Pascanu, R., and Battaglia, P. (2018a). Learning deep generative models of graphs. arXiv preprint arXiv:1803.03324.
 Li et al. (2018b) Li, Y., Zhang, L., and Liu, Z. (2018b). Multiobjective de novo drug design with conditional graph generative model. Journal of cheminformatics, 10(1):33.
 Liao et al. (2019) Liao, R., Zhao, Z., Urtasun, R., and Zemel, R. S. (2019). Lanczosnet: Multiscale deep graph convolutional networks. arXiv preprint arXiv:1901.01484.
 Liu et al. (2019) Liu, J., Kumar, A., Ba, J., Kiros, J., and Swersky, K. (2019). Graph normalizing flows.
 Madhawa et al. (2019) Madhawa, K., Ishiguro, K., Nakago, K., and Abe, M. (2019). Graphnvp: An invertible flow model for generating molecular graphs. arXiv preprint arXiv:1905.11600.
 Paszke et al. (2019) Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., et al. (2019). Pytorch: An imperative style, highperformance deep learning library. In Advances in Neural Information Processing Systems, pages 8024–8035.
 Scarselli et al. (2008) Scarselli, F., Gori, M., Tsoi, A. C., Hagenbuchner, M., and Monfardini, G. (2008). The graph neural network model. IEEE Transactions on Neural Networks, 20(1):61–80.
 Sen et al. (2008) Sen, P., Namata, G., Bilgic, M., Getoor, L., Galligher, B., and EliassiRad, T. (2008). Collective classification in network data. AI magazine, 29(3):93–93.
 Simonovsky and Komodakis (2018) Simonovsky, M. and Komodakis, N. (2018). Graphvae: Towards generation of small graphs using variational autoencoders. arXiv preprint arXiv:1802.03480.
 Song and Ermon (2019) Song, Y. and Ermon, S. (2019). Generative modeling by estimating gradients of the data distribution. arXiv preprint arXiv:1907.05600.
 Veličković et al. (2017) Veličković, P., Cucurull, G., Casanova, A., Romero, A., Lio, P., and Bengio, Y. (2017). Graph attention networks. arXiv preprint arXiv:1710.10903.
 Watts and Strogatz (1998) Watts, D. J. and Strogatz, S. H. (1998). Collective dynamics of ‘smallworld’networks. nature, 393(6684):440.
 Xie et al. (2019) Xie, S., Kirillov, A., Girshick, R., and He, K. (2019). Exploring randomly wired neural networks for image recognition. arXiv preprint arXiv:1904.01569.
 Xu et al. (2018a) Xu, K., Hu, W., Leskovec, J., and Jegelka, S. (2018a). How powerful are graph neural networks? arXiv preprint arXiv:1810.00826.
 Xu et al. (2018b) Xu, K., Li, C., Tian, Y., Sonobe, T., Kawarabayashi, K.i., and Jegelka, S. (2018b). Representation learning on graphs with jumping knowledge networks. In International Conference on Machine Learning, pages 5449–5458.
 Ying et al. (2018) Ying, Z., You, J., Morris, C., Ren, X., Hamilton, W., and Leskovec, J. (2018). Hierarchical graph representation learning with differentiable pooling. In Advances in neural information processing systems, pages 4800–4810.
 You et al. (2018a) You, J., Liu, B., Ying, Z., Pande, V., and Leskovec, J. (2018a). Graph convolutional policy network for goaldirected molecular graph generation. In Advances in neural information processing systems, pages 6410–6421.
 You et al. (2018b) You, J., Ying, R., Ren, X., Hamilton, W., and Leskovec, J. (2018b). Graphrnn: Generating realistic graphs with deep autoregressive models. In ICML, pages 5694–5703.
Appendix A Experimental Details
We implement our model using PyTorch (Paszke et al., 2019). The optimization algorithm is Adam (Kingma and Ba, 2014). Our code is available at https://github.com/ermongroup/GraphScoreMatching.
a.1 Hyperparameters
For the noise levels , we chose and . Empirically, we found those settings work well for all the generation experiments. Note that since all the edge weights in training data (i.e., in (2)) are either 0 or 1, is small enough for the quantizing operation (4) to prefectly recover the perturbed graph with high probability.
In the sampling process, we set the number of sampling steps for each noise level to be . Apart from the coefficient in step size in Langevin dynamics, we added another scaling coefficient , since it is a common practice of applying Langevin dynamics. We chose the value of the hyperparameters based on the MMD metrics on the validation set, which contains 32 samples from the training set.
For the network architecture, we used 4 messagepassing steps for each GIN, and stacked 5 EDPGNN layers. The maximum number of channels of all EDPGNN layer is 4. The maximum size of node features is 16.
a.2 Dataset

Communitysmall: The graphs are constructed by two equalsized communities, each of which is generated by ER model (Erdos and Rényi, 1960), with . For each graph with nodes, we randomly add edges between the two communities. The range of total number of nodes per graph is .

Egosmall: Onehop ego graphs extracted from the Citeseer network (Sen et al., 2008). The range of node numbers per graph is .
Appendix B Properties of Permutation Invariant Functions
b.1 Permutation
Definition 1.
(Permutation Operation on Matrix) Let . Denote the set of permutations as . The node permutation operation on a matrix is defined by .
b.2 Permutation Invariant
Definition 2.
(Permutation Invariant Function) A function with as its domain is permutation invariant i.f.f. .
b.3 Permutation Equivariant
Definition 3.
(Permutation Equivariant Function) A function is permutation equivariant i.i.f. .
b.4 Relationship between Permutation Invariance and Permutation Equivariance
Definition 4.
(Implicitly Defined Scalar Function) A function defines a gradient vector field on . Veiw as the gradient of a scalar value function . Define , where , is any curve from to and is a constant.
Under this definition, a vector function defines a scalar function implicitly.
Lemma 1.
(Permutation Invariance of Frobenius Inner Product) For any , the Frobenius inner product of is . Frobenius inner product operation is permutation invariant, i.e., .
b.5 Proof of Theorem 1
Proof.
∎
Comments
There are no comments yet.