Permutation Invariant Graph Generation via Score-Based Generative Modeling

03/02/2020 ∙ by Chenhao Niu, et al. ∙ 18

Learning generative models for graph-structured data is challenging because graphs are discrete, combinatorial, and the underlying data distribution is invariant to the ordering of nodes. However, most of the existing generative models for graphs are not invariant to the chosen ordering, which might lead to an undesirable bias in the learned distribution. To address this difficulty, we propose a permutation invariant approach to modeling graphs, using the recent framework of score-based generative modeling. In particular, we design a permutation equivariant, multi-channel graph neural network to model the gradient of the data distribution at the input graph (a.k.a., the score function). This permutation equivariant model of gradients implicitly defines a permutation invariant distribution for graphs. We train this graph neural network with score matching and sample from it with annealed Langevin dynamics. In our experiments, we first demonstrate the capacity of this new architecture in learning discrete graph algorithms. For graph generation, we find that our learning approach achieves better or comparable results to existing models on benchmark datasets.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 7

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Graphs are used to capture relational structure in many domains, including knowledge bases (Hamaguchi et al., 2017), social networks (Hamilton et al., 2017; Kipf and Welling, 2016), protein interaction networks (Fout et al., 2017), and physical systems (Batagelj and Zaversnik, 2003). Generating graphs using suitable probabilistic models has many applications, such as drug design (Duvenaud et al., 2015; Gómez-Bombarelli et al., 2018; Li et al., 2018a), creating computation graphs for architecture search (Xie et al., 2019), as well as research in network science (Watts and Strogatz, 1998; Albert and Barabási, 2002; Leskovec et al., 2010).

While many stochastic models of graphs have been proposed, the idea of learning statistical generative models of graphs from data has recently gained significant attention. One approach is to use latent variable generative models similar to variational autoencoders 

(Kingma and Welling, 2013). Examples include GraphVAE (Simonovsky and Komodakis, 2018), Graphite (Grover et al., 2018), and junction tree variational autoencoders (Jin et al., 2018). These models typically use a graph neural network (GNN) (Gori et al., 2005; Scarselli et al., 2008) to encode graph data to a latent space, and generate samples by decoding latent variables sampled from a prior distribution. The second paradigm is autoregressive graph generative models (Li et al., 2018a; You et al., 2018a; Liao et al., 2019), where graphs are generated sequentially, one node (or one subgraph) at a time.

Although these models have achieved great success, they are not satisfying in terms of capturing the permutation invariance properties of graphs. Permutation invariance is a fundamental inductive bias of graph-structured data. For a graph with nodes, there are up to

different adjacency matrices that are equivalent representations of the same graph. Therefore, a graph generative model should ideally assign the same probability to each of these equivalent adjacency matrices. It is challenging, however, to enforce permutation invariance in variational autoencoders or autoregressive models. Some previous approaches only approximately induce permutation invariance: GraphVAE 

(Simonovsky and Komodakis, 2018) uses inexact graph matching techniques requiring up to operations, whereas the model in Li et al. (2018a)

augments the training data by randomly permuting the nodes of existing data. Other approaches instead focus on selecting a specific node ordering based on heuristics: GraphRNN 

(You et al., 2018b) uses random breadth-first search (BFS) to determine an ordering, and GRAN (Liao et al., 2019) adaptively chooses an ordering depending on the input graph from a family of pre-defined node orderings.

To better capture the permutation invariance of graphs, we propose a new graph generative model using the framework of score-based generative modeling (Song and Ermon, 2019)

. Intuitively, this approach trains a model to capture the vector field of gradients of the log data density of graphs (a.k.a., scores). Contrary to likelihood-based models such as variational auto-encoders and autoregressive models, score-based generative modeling imposes fewer constraints on the model architectures (e.g., a score does not have to be normalized). This enables the use of function families with desirable inductive biases, such as permutation invariance. In particular, we leverage graph neural networks 

(Scarselli et al., 2008) to build a permutation equivariant model for the scores of the distribution over graphs we wish to learn. As shown later in the paper, this implicitly defines a permutation invariant distribution over adjacency matrices representing graphs.

As in other classes of deep generative models, the neural architecture used in score-based generative modeling is critical to its success. In this work, we introduce a new type of graph neural networks, named EDP-GNN, with learnable multi-channel adjacency matrices. In our experiments, we first test the effectiveness of EDP-GNN for the task of learning graph algorithms, where it significantly outperforms traditional GNNs. Next, we evaluate the generation quality of our score-based models using MMD (Gretton et al., 2012) metrics on several graph datasets, where we achieved comparable performance to GraphRNN (You et al., 2018b), a competitive method for generative modeling of graphs.

2 Preliminaries

2.1 Notations

For each weighted undirected graph, we can choose an ordering of nodes and represent it with an adjacency matrix . Here we use the superscript to indicate that the rows/columns of are arranged in accordance with a specific node ordering . When the graph is undirected, the corresponding adjacency matrix is symmetric. We denote the set of adjacency matrices as .

A distribution of graphs can be represented as a distribution of adjacency matrices . Since graphs are invariant to permutations, and always represent the same graph for any different node orderings and . This permutation invariance also implies that , i.e., the distribution of adjacency matrices is invariant to node permutations. In the sequel, we often omit the superscript in when not emphasizing any specific node ordering.

2.2 Graph Neural Network (GNN)

Graph neural networks are a family of neural networks that map graphs to vector representations using message-passing type operations on node features (Gori et al., 2005; Scarselli et al., 2008). They are natural models for graph-structured data; for example, GIN (Xu et al., 2018a) is one type of GNN that is proved to be as expressive as the Weisfeiler-Lehman graph isomorphism test (WL-test). The message passing mechanism guarantees that the output representation of an input adjacency matrix is equivariant to permutations of the node ordering .

2.3 Score-Based Generative Modeling

Score-based generative modeling (Song and Ermon, 2019)

is a class of generative models. For a probability density function

, the score function is defined as . Instead of directly modeling the density function of the data distribution

, score-based generative modeling estimates the data score function

. The advantage is that the score function can be easier to model than the density function.

For better score estimation, following (Song and Ermon, 2019) we perturb the data with Gaussian noise of different intensities, and estimate the scores jointly for all noise levels. We train a noise conditional model (e.g., a neural network parameterized by ) to approximate the score function corresponding to noise level . Given a data distribution , a noise distribution (e.g., ), and a sequence of noise levels , the training loss is defined as:

(1)

where the expectation is taken with respect to the sampling process: . We note that all expectations in can be estimated with i.i.d. samples from and , which are easy to obtain. The objective is .

After the conditional score model has been trained, we use annealed Langevin dynamics (Song and Ermon, 2019) for sample generation (see Algorithm 1).

1: is smallest step size; is the number of iteration for each noise level.
2:Initialize
3:for  to  do
4:      is the step size.
5:     for  to  do
6:         Draw
7:         
8:     end for
9:     
10:end for
11:return
Algorithm 1 Annealed Langevin dynamics sampling.

3 Score-Based Generative Modeling for Graphs

Contrary to the weighted graphs we used to define the probability density function in Section 2.1, in real-world problems unweighted graphs are much more common, which means entries in the adjacency matrix can only be either 0 or 1. While the score-based method (Song and Ermon, 2019) was initially proposed for handling continuous data, it can be adopted to generate discrete ones as well. Below, we first show our modifications of score-based generative modeling for graph generation, and then introduce our specialized neural network architecture EDP-GNN for the noise conditional model , where .

3.1 Noise Distribution

We add Gaussian perturbations to adjacency matrices and define the noise distribution as follows

(2)

Intuitively, we only add Gaussian noise to the upper triangular part of the adjacency matrix, because we focus on undirected graphs whose adjacency matrices are symmetric.

Since , the training loss of is

(3)

where the expectation is over the sampling process defined via and . The objective is .

Note that the supports of the noise distributions span , where is the number of nodes of the input graph. Therefore, the scores of perturbed distributions corresponding to all noise levels are well-defined, regardless of whether the training samples are discrete or not.

3.2 Sampling

To generate , we first sample , which is the number of nodes to be generated, and then sample with annealed Langevin dynamics. This amounts to factorizing . Implementation-wise, we sample from the empirical distribution of number of nodes in the training dataset, as done in (Li et al., 2018b). When doing annealed Langevin dynamics, we first initialize

using folded normal distributions,

i.e.,

where all . Then, we update by iteratively sampling from a series of trained conditional score models using Langevin dynamics. For each of the conditional score model , we run Langevin dynamics for steps, where the series is annealed down over the process such that is large but is small enough that it can be ignored. As a minor modification, we change the noise term in Algorithm 1 to a symmetric one , given by

which accouts for the symmetry of adjacency matrices.

Score-based generative modeling provides samples in the continuous space, whereas graph data are often discrete. In order to obtain discrete samples, we quantize the generated continuous adjacency matrix (denoted as ) to a binary one (denoted as ) at the end of annealed Langevin dynamics. Formally, this quantization operation is defined as

(4)

where is an indicator function that evalutes to 1 when the condition holds and 0 otherwise.

3.3 Permutation Equivariance and Invariance

Permutation invariance is a desirable property of graph generative models, since the true distribution is inherently permutation invariant. We show that by using a permutation equivariant score function , the corresponding distribution is permutation invarant.

Theorem 1.

If is a permutation equivariant function, then the scalar function is permutation invariant, where is the Frobenius inner product, is any curve from to , and is a constant.

Proof.

See Appendix B. ∎

Since the gradient of log-likelihood estimation is permutation equivariant, the implicitly defined log-likelihood function is permutation invariant, according to Theorem 1, given below by the line integral of .

3.4 Edgewise Dense Prediction Graph Neural Network (EDP-GNN)

Figure 1: This figure shows an EDP-GNN with three layers. The input is an adjacency matrix of a graph with nodes given a fixed node ordering, and the outputs are edge representations. The dashed lines are preprocessing steps, and solid lines represent network computations.

Below, we introduce a GNN-based score network that can effectively model the scores of graph distributions while being permutation equivariant.

3.4.1 Multi-Channel GNN Layer

We introduce the multi-Channel GNN layer , an extended version of the GIN (Xu et al., 2018a) layer, which serves as a basic component of our EDP-GNN model. The intuition is to run message-passing simultaneously on many different graphs, and collect the node features from all the channels via concatenation. For a -channel GNN layer with message-passing steps, the -th message-passing step can be expressed as follows,

where is the index of nodes, is the number of channels, is the multi-channel adjacency matrix, and is the vector of node features. Here is a learnable parameter, the same as in the original GIN, stands for the concatenation operation, and

transforms each node feature using a multilayer perceptron.

After steps of message-passing, we use the same concatenation operation as GIN to obtain node features. Specifically, for each node , the output feature is given by

Henceforth, we denote our Multi-Channel GNN layer as

3.4.2 EDP-GNN Layer

The EDP-GNN layer is the key component of our model. It transforms the input adjacency matrix to another one, allowing us to adaptively change the process of message passing. The intuition is similar to neural networks for image dense prediction tasks (e.g., semantic parsing), where convolutional layers transform the input image to a feature map in a pixelwise manner, leveraging local information around each pixel location. Similarly, we want our GNN layer to extract edgewise features and map them to a new adjacency matrix, using local information (which is defined in terms of connectivity) of each node in the graph.

One EDP-GNN layer has two steps:

  1. Node feature inference: Using MultiChannelGNN to encode the local structure of different channels of the graph into node features, given by

    (5)
  2. Edge feature inference: Updating the feature vector of each edge based on the current features of the edge and the updated feature vector of the two endpoints. For each edge , this operation is given by

    where denotes a multilayer perceptron applied to edge features. To ensure symmetry, the new adjacency matrix is given by

    (6)

3.4.3 Input and Output Layers

Input layer: Input graphs need to be preprocessed before they can be fed into our EDP-GNN model. In particular, we take adjacency matrices of two channels as the input, where the first channel is the original adjacency matrix of an input graph, and the other channel is the negated version of the same adjacency matrix, where each entry is flipped. The node features are initialized using the weighted degrees. Formally,

where is the adjacency matrix of an input graph. If we have node features from data, then we use the following initialization for each node

Output layer: To get the output, we employ a similar approach to Xu et al. (2018b), where we aggregate the information from all previous layers to produce a set of permutation equivariant edge features. This can effectively collect information extracted in shallower layers. Formally, for each edge , the output features are given by

3.4.4 Noise Level Conditioning

The framework of score-based generative modeling proposed in (Song and Ermon, 2019) requires a score network conditioned on a series of noise levels. We hope to provide the conditioning on noise levels with as few extra parameters as possible. To this end, we add gains and bias terms conditioned on the index of the noise level in all MLP layers, and share all the parameters across different noise levels. A conditional MLP layer for is denoted as

where are learnable parameters for each noise level and

denotes the activation function. We empirically found that this implementation of noise conditioning achieves similar performance to separately training a score network for each noise level.

3.4.5 Permutation Equivariance of EDP-GNN

The message passing operations in a graph neural network are guaranteed to be permutation equivariant (Keriven and Peyré, 2019), as well as edgewise and nodewise operations for graphs. Since operations in EDP-GNN are either message passing or edgewise/nodewise transformations, the edge features produced by EDP-GNN are guaranteed to be permutation equivariant. In the last EDP-GNN layer, each edge feature is one component of the estimated score. Hence Theorem 1 applies to this score network.

4 Related Work

Flow-Based Graph Generative Models

In addition to models mentioned in Section 1, there is also an emerging class of graph generative models based on invertible mappings, such as GNF (Liu et al., 2019) and GraphNVP (Madhawa et al., 2019). These models modify the architecture of a graph neural network (GNN) using coupling layers (Dinh et al., 2016) to enable maximum likelihood learning via the change of variables formula. Since GNNs are permutation invariant, both GNF and GraphNVP could be permutation invariant in principle. However, GraphNVP opts not to be permutation invariant because making their model fully permutation invariant hurts the empirical performance. In contrast, GNF is a permutation invariant model. It achieves permutation invariance by first using a permutation equivariant auto-encoder to encode the graph structure into a set of node features, and then model the distribution of the node features using reversible graph neural networks.

GNNs that Learn Edge Features

Although the majority of GNNs focus on node feature learning, (e.g., node classification tasks), there are GNNs, prior to our EDP-GNN, that have intermediate edge features as well. For example, Graph Attention Networks (Veličković et al., 2017) compute an attention coefficient for each edge during message passing (MP) steps. Gong and Cheng (2019) further explored methods to utilize edge features during the MP steps, such as using normalized attention coefficients to construct a new adjacency matrix for the next MP step, and passing the message simultaneously on multi-input adjacency matrices. However, the model in Gong and Cheng (2019) is not designed for predicting edge features, and the capability to make edgewise prediction is limited by the normalizing operation and the restrictive form of attentions. Kipf et al. (2018) proposed a GNN-based VAE model for relational inference for interacting systems. Contrary to their model which predicts edge information based on only node features, our model takes a weighted graph without node features.

5 Experiments

Figure 2: Visualization of channels for a pre-trained EDP-GNN model on the Community-small dataset. The model is trained with a single noise level . The input is a community graph, but perturbed with Gaussian noise with

. The edge weights of each adjacency matrix are standardized to zero mean and unit variance. Since our model is agnostic to different permutations of nodes, we chose a specific ordering so that the adjacency matrices of community graphs possess a block diagonal form. We visualize one adjacency matrix for each layer. Sometimes a graph is less visually interpretable, and we instead visualize its complementary graph and mark it with "C". By comparing the graph visualizations for the 3rd, 4th, and the input layers, we observe that the model maps the perturbed graph with no visible structures to a graph with clear "community" structures.

5.1 Learning Graph Algorithms

In this section, we empirically demonstrate the power of the proposed EDP-GNN model on edgewise prediction tasks. In particular, we reduce several classic graph algorithms to the task of predicting whether each edge is in the solution set or not. The training data include a graph and the corresponding solution set, and we train our models to fit the solution set by minimizing the cross-entropy loss.

Setup

To verify the ability of EDP-GNN of making edgewise dense predictions, we tested EDP-GNN on learning classic graph algorithms, by labeling all the edges in a graph to indicate whether an edge is in the solution set or not. We choose two simple tasks, 1) Shortest Path (SP) between a given pair of nodes, and 2) Maximum Spanning Tree (MST) of a given graph. The solution set of SP corresponds to a path connecting the pair of nodes with the shortest length, while the solution set of MST is the collection of all edges inside the maximum spanning tree. For both tasks, all the graphs are randomly sampled from the Erdős and Rényi model (E-R) (Erdos and Rényi, 1960) with and . For weighted graphs, all the edge weights are uniformly sampled from . A prediction is considered correct if and only if all the labels of the graph are correct. We calculate the accuracy over a fixed test set as the metric. For the baseline model, we use vanilla GIN (Xu et al., 2018a).

Training

During training, we generate the training data dynamically on the fly and use the cross-entropy loss as the training objective for both tasks.

Model SP (UW) SP (W) MST (W)
GIN 0.57 0.12 0.20
EDP-GNN 0.60 0.92 0.84
Table 1: The test set accuracy of EDP-GNN vs. GIN on learning the shortest path (SP) and maximum spanning tree (MST) algorithms. "UW" and "W" stand for "unweighted" and "weighted" respectively. Since the training set is dynamically generated, the performance on (newly generated) training set and test set has no difference. Note that for unweighted graphs, there can be more than one shortest path for a given pair of nodes, and the accuracy is underestimated as we randomly picked one as the ground truth, in which case an accuracy of 0.6 is pretty non-trivial.
Results

All results are provided in Tab. 1. We observe that EDP-GNN performs similarly to GIN for unweighted graphs, but achieves much better performance when graphs are weighted. This confirms that EDP-GNN is more effective for edgewise predictions.

5.2 Graph Generation Task

(a) Training data
(b) EDP-GNN samples
(c) GraphRNN samples
(d) Training data
(e) EDP-GNN samples
(f) GraphRNN samples
Figure 3: Samples from the training data, EDP-GNN, and GraphRNN, on Community-small (top row) and Ego-small (bottom row).

In this section, we demonstrate that our EDP-GNN is capable of producing high-quality graph samples via score-based generative modeling. To better understand learnable multi-channel adjacency matrices in our model, we visualize the intermediate channels in Figure 2, and perform extensive ablation studies.

Datasets and Baselines

We tested our model on two datasets, Community-small () and Ego-small (), which are also used by You et al. (2018b), and Liu et al. (2019). See Appendix A for more details. Our baselines include GraphRNN (You et al., 2018b), Graph Normalizing Flow(GNF) (Liu et al., 2019), GraphVAE (Simonovsky and Komodakis, 2018), and DeepGMG (Li et al., 2018a).

Metrics

To evaluate generation quality, we used maximum mean discrepancy (MMD) over some graph statistics, as proposed by You et al. (2018b). We calculated MMD for three graph statistics: 1) degree distribution, 2) cluster coefficient distribution, and 3) the number of orbits with 4 nodes.

Results

We compare EDP-GNN against baselines and summarize results in Tab. 2. Our model performs comparably to GraphRNN and GNF with respect to most MMD metrics, and outperforms all other methods when considering the overall average of MMDs on two datasets.

max width=0.85 Model Community-small Ego-small Avg. Deg. Clus. Orbit Avg. Deg. Clus. Orbit Avg. GraphVAE 0.350 0.980 0.540 0.623 0.130 0.170 0.050 0.117 0.370 DeepGMG 0.220 0.950 0.400 0.523 0.040 0.100 0.020 0.053 0.288 GraphRNN 0.080 0.120 0.040 0.080 0.090 0.220 0.003 0.104 0.092 GNF 0.200 0.200 0.110 0.170 0.030 0.100 0.001 0.044 0.107 EDP-GNN 0.053 0.144 0.026 0.074 0.052 0.093 0.007 0.050 0.062 GraphRNN (1024) 0.030 0.010 0.010 0.017 0.040 0.050 0.060 0.050 0.033 GNF (1024) 0.120 0.150 0.020 0.097 0.010 0.030 0.001 0.014 0.055 EDP-GNN (1024) 0.006 0.127 0.018 0.050 0.010 0.025 0.003 0.013 0.031

Table 2: MMD results of various graph generative models. Rows marked with (1024) mean the corresponding number of samples is 1024; otherwise, the number of samples equals the size of the test set. Apart from three MMD statistics, we also provide their average values, noted as "Avg.". The rightmost column is the overall average of all MMDs on two datasets. For baselines, we directly ported the results from You et al. (2018b) and Liu et al. (2019). For a fair comparison, we followed the settings of evaluation in Liu et al. (2019).

5.2.1 Understanding Intermediate Channels

Intuitively, the intermediate channels of EDP-GNN should be analogous to those in convolutional neural networks (CNN) feature maps. Since channels of feature maps can be visualized as images in CNNs, we propose to visualize each channel of multi-channel adjacency matrices as a graph. The EDP-GNN layers should be able to map an input graph to intermediate graphs that possess interpretable semantics.

In Figure 2, we visualize the channels of intermediate adjacency matrices for a EDP-GNN model trained on the Community-small dataset. We observe that the model processes a perturbed community graph with no clearly visible structures to a graph with a structure of two equal-sized communities.

As implied by the training objective (3), the score network can perfectly predict the ground truth score, i.e., , if it can map the noise-perturbed graph to the true (noise-free) graph in some of the intermediate channels. Therefore, an ideal score network should be able to 1) understand the structure of a given graph, before 2) mapping a perturbed graph to the corresponding denoised graph. While previous GNNs are designed for the former task, EDP-GNN is especially capable of solving the latter one.

5.2.2 Ablation Studies

max width=0.35 A* C* Community-small Ego-small Train loss Test loss Train loss Test loss N N 140 140 14 17 Y N 120 120 12 15 N Y 110 120 13 15 Y Y 98 96 10 12

Table 3: Ablation experiments on Community-small and Ego-small datasets. The training and test losses are defined by (3). A* indicates whether the adjacency matrix is learnable, and C* indicates whether the intermediate adjacency matrices have multi-channels.

To verify the importance of intermediate adjacency matrices in EDP-GNN to be 1) learnable and 2) multi-channel, we conducted ablative studies on Community-small and Ego-small datasets. We switched on/off the two properties respectively, and provide the performance comparison in Tab. 3. Note that EDP-GNN is equivalent to vanilla GIN when intermediate adjacency matrices are single-channel and non-learnable. As shown in Tab. 3, both properties can improve the expressivity for score modeling, in the sense of reducing the training and test score matching losses. As expected, the performance is optimal when both properties are combined.

6 Conclusion

We propose a permutation invariant generative model for graphs based on the framework of score-based generative modeling. In particular, we implicitly define a permutation invariant distribution over graph adjacency matrices by modeling the corresponding permutation equivariant score function and sampling with Langevin dynamics. For effective score modeling of graph distributions, we propose a new permutation equivariant GNN architecture, named EDP-GNN, leveraging trainable, multi-channel adjacency matrices as intermediate layers. Empirically, we demonstrate that EDP-GNNs are more expressive than vanilla GNNs on predicting edgewise features, as evidenced by better performance on the task of learning classic graph algorithms such as shortest paths. Moreover, we show our model can produce samples with quality comparable to existing state-of-the-art models. As one future direction, we hope to improve the scalability of our model by reducing the computational complexity, using techniques such as graph pooling (Ying et al., 2018).

Acknowledgements

This research was supported by Intel Corporation, Amazon AWS, TRI, NSF (#1651565, #1522054, #1733686), ONR (N00014-19-1-2145), AFOSR (FA9550-19-1-0024).

References

References

Appendix A Experimental Details

We implement our model using PyTorch (Paszke et al., 2019). The optimization algorithm is Adam (Kingma and Ba, 2014). Our code is available at https://github.com/ermongroup/GraphScoreMatching.

a.1 Hyperparameters

For the noise levels , we chose and . Empirically, we found those settings work well for all the generation experiments. Note that since all the edge weights in training data (i.e., in (2)) are either 0 or 1, is small enough for the quantizing operation (4) to prefectly recover the perturbed graph with high probability.

In the sampling process, we set the number of sampling steps for each noise level to be . Apart from the coefficient in step size in Langevin dynamics, we added another scaling coefficient , since it is a common practice of applying Langevin dynamics. We chose the value of the hyper-parameters based on the MMD metrics on the validation set, which contains 32 samples from the training set.

For the network architecture, we used 4 message-passing steps for each GIN, and stacked 5 EDP-GNN layers. The maximum number of channels of all EDP-GNN layer is 4. The maximum size of node features is 16.

a.2 Dataset

  • Community-small: The graphs are constructed by two equal-sized communities, each of which is generated by E-R model (Erdos and Rényi, 1960), with . For each graph with nodes, we randomly add edges between the two communities. The range of total number of nodes per graph is .

  • Ego-small: One-hop ego graphs extracted from the Citeseer network (Sen et al., 2008). The range of node numbers per graph is .

Appendix B Properties of Permutation Invariant Functions

b.1 Permutation

Definition 1.

(Permutation Operation on Matrix) Let . Denote the set of permutations as . The node permutation operation on a matrix is defined by .

b.2 Permutation Invariant

Definition 2.

(Permutation Invariant Function) A function with as its domain is permutation invariant i.f.f. .

b.3 Permutation Equivariant

Definition 3.

(Permutation Equivariant Function) A function is permutation equivariant i.i.f. .

b.4 Relationship between Permutation Invariance and Permutation Equivariance

Definition 4.

(Implicitly Defined Scalar Function) A function defines a gradient vector field on . Veiw as the gradient of a scalar value function . Define , where , is any curve from to and is a constant.

Under this definition, a vector function defines a scalar function implicitly.

Lemma 1.

(Permutation Invariance of Frobenius Inner Product) For any , the Frobenius inner product of is . Frobenius inner product operation is permutation invariant, i.e., .

b.5 Proof of Theorem 1

Proof.

Appendix C Extra Samples

(a) Training data
(b) EDP-GNN samples
(c) GraphRNN samples
Figure 4: Extra samples from the training data, EDP-GNN, and GraphRNN, on Ego-small.
(a) Training data
(b) EDP-GNN samples
(c) GraphRNN samples
Figure 5: Extra samples from the training data, EDP-GNN, and GraphRNN, on Community-small.
(a) Training data
(b) EDP-GNN samples
Figure 6: Extra samples from the training data and EDP-GNN, on the Protein dataset (Dobson and Doig, 2003), with the number of node .
(a) Training data
(b) EDP-GNN samples
Figure 7: Extra samples from the training data and EDP-GNN, on the Lobster graph dataset (Golomb, 1996), with the number of node .