Continuous Graph Flow for Flexible Density Estimation

08/07/2019 ∙ by Zhiwei Deng, et al. ∙ Simon Fraser University Borealis AI 21

In this paper, we propose Continuous Graph Flow, a generative continuous flow based method that aims to model distributions of graph-structured complex data. The model is formulated as an ordinary differential equation system with shared and reusable functions that operate over the graph structure. This leads to a new type of neural graph message passing scheme that performs continuous message passing over time. This class of models offer several advantages: (1) modeling complex graphical distributions without rigid assumptions on the distributions; (2) not limited to modeling data of fixed dimensions and can generalize probability evaluation and data generation over unseen subset of variables; (3) the underlying continuous graph message passing process is reversible and memory-efficient. We demonstrate the effectiveness of our model on two generation tasks, namely, image puzzle generation, and layout generation from scene graphs. Compared to unstructured and structured latent-space VAE models, we show that our proposed model achieves significant performance improvement (up to 400

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 6

page 14

page 15

page 16

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

footnotetext: indicates equal contribution.

Accurate modeling of probability densities over sets of random variables is a central part of AI research. Core tasks such as reasoning, planning, and generation benefit from the ability to effectively characterize the probability density over variables pertinent to a domain of interest. As such, the construction of these densities has been a preoccupation of research ranging from seminal work on probabilistic graphical models through to recent efforts utilizing flow-based models.

In this paper, we focus on learning a single representation that can be used to model a density over an arbitrary set of random variables. This task arises in inductive learning scenarios for the aforementioned tasks – learning a model for reasoning over a variable number of related entities, generating layouts of scenes composed of variable numbers of objects. The desiderata of a powerful and expressive model for this task include: (1) flexibility: architectures to enable generalization to new instances with varying numbers of random variables, (2) relations: ability to model dependencies among the random variables to enhance modeling capabilities, (3) high capacity: potential to model complex data distributions, and (4) tractability: exact and efficient computation of the probability density.

Significant research effort has been devoted in this vein. The classic probabilistic graphical model (PGM) framework Koller and Friedman (2009) allows many algorithms to learn structured representations for complex data. However, PGMs often suffer from rigid assumptions on distributions Pearl (2014); Wainwright et al. (2008). Furthermore, performing exact inference can be computationally difficult, which led to development of several techniques for approximate inference Sudderth et al. (2010); Noorshams and Wainwright (2012); Wainwright and Jordan (2008)

. Alternatively, impressive advances in generative modeling within the variational autoencoder (VAE)

Kingma and Welling (2013) formalism offer promise for learning flexible data distributions. In addition, there has been a surge of interest in designing models that focus on bestowing generative models with the ability to learn structured latent space models by unifying PGMs & VAEs Lin et al. (2018); He et al. (2018); Grover et al. (2018); Johnson et al. (2016).

Despite the promising success of these modeling efforts, their capacity is still very much limited. This is mainly due to modeling choices enforced by the approximate techniques necessary to alleviate computational difficulty (in case of PGMs) or intractability (in case of VAEs). An immediate consequence of such approximations is restricted distributions or latent space structures, thereby compromising flexibility in the representations obtained from these models.

On that premise, the approach of normalizing flows provides several tools to conduct exact computation of likelihood and perform efficient inference in generative models Rezende and Mohamed (2015); Kingma and Dhariwal (2018); Dinh et al. (2014, 2016). This line of work uses iterative updates on a composition of invertible transformations at discrete time instances to map points from a simple distribution to complex distributions. However, none of the existing flow-based generative models can be extended to learning flexible representations for variable-sized inputs as the invertible mappings are required to be bijective. This jeopardizes the capability of the modeling approaches to generalize to new unseen tasks relevant to the given data.

Recent extensions of normalizing flows to continuous time versions using ordinary differential equations (ODEs) Chen et al. (2018); Grathwohl et al. (2018) lift all restrictions on the architecture of the flow based generative model. We build on continuous normalizing flow based approaches to accomplish our goal of learning structure-aware flexible models that can generalize across different graph structures while still permitting efficient and exact likelihood computation of variables.

In this paper, we present a new class of models – continuous graph flow, a continuous flow based generative model formulated as a set of related variables having continuous time dynamics. We demonstrate the effectiveness of our model on image puzzle generation and scene graph based layout generation tasks. We illustrate the ability of our model to generalize over variable graph sizes and perform efficient conditional inference. Experimental results clearly show that continuous graph flow achieves significant performance improvement (up to improvement in negative log-likelihood) over state-of-the-art unstructured and structured latent-space VAE models.

2 Preliminaries

Probabilistic graphical models.

A graphical model represents a joint probability distribution over a set of random variables

. A random variable from the set

is represented by a node in the graph. The joint distribution is defined by a set of factor functions

where is a set containing sets of variables over which a factor function instance is defined. Each instance of corresponds to a type of dependency between a set or subset of graph nodes. We focus on factor function based PGMs for this work that are typically formulated as follows:

(1)

where is the normalization factor and function is defined over a set of variables

. Typically, these models allow performing flexible inference when the distributions of variables are simplified such as log-linear, Gaussian distributions.

Graph neural networks.

Relational networks such as Graph Neural Networks (GNNs) facilitate learning of non-linear interactions using neural networks. In every layer

of a GNN, the graph node embedding accumulates information from its neighbors of the previous layer recursively:

(2)

where the function is an aggregator function, is the set of neighbour nodes of node , and is the message function from node to node , is the input node features. Our model uses the form of GNN where embeddings of the graph nodes are kept the same across iterations. This allows using in the flow-based models (below) while maintaining the same dimensionality across iterations.

Normalizing flows. Unlike likelihood based generative models that parameterize distributions with an explicit form, flow-based models allow to construct complex distributions from simple distributions (e.g. Gaussian) through a sequence of invertible mappings Rezende and Mohamed (2015) by applying a rule for change of variables. The density and the final state of the variable obtained by successively transforming a random variable through a chain of K invertible functions :

(3)
(4)

where is the Jacobian of and .

Instead of having predetermined number of transformations, continuous normalizing flows (CNFs) Chen et al. (2018); Grathwohl et al. (2018) replace the transformation function with an integral of continuous-time dynamics as follows:

(5)

where the trace computation is computationally less expensive than computation of the Jacobian in Equation (4). Building on CNF based approaches, we present continuous graph flow which effectively models continuous time dynamics of probabilistic graphical models.

Figure 1: Illustration of evolution of message passing mechanisms from discrete updates (left) to our proposed continuous density transformations from simple distributions such as Gaussian to data distributions (right).

3 Continuous graph flow

Given a set of random variables , we represent the joint distribution of as , where and each variable associated with each graph node. To model the interactions between graph nodes, we define a set of message functions which takes a subset of graph nodes as input and outputs an aggregated message for a target variable. The goal is to learn a joint distribution .

To define the transformations for the probability density we formulate a multi-variate ordinary differential equation system. The form of this system follows the below template:

(6)
(7)

where and is the set of variables at time . The random variable at time follows a simple prior distribution, i.e, . In the above transformation, function implicitly defines the interaction and density transformation over the variables. It transforms the value of the variable at time to the value of the variable at time which follows the data distribution to be learned. The prior distributions can have simple forms such as Gaussian distributions.

3.1 Continuous message passing over graphs

The above form in Eq. 7 represents a generic multi-variate update. However, it does not take into account the structure between nodes and the form of update functions does not naturally permit generalization to new random variables.

To alleviate these shortcomings, we define a message passing process that operates over graphs, by making the update functions reusable and defined over variables according to the graph structure. The updates are defined as discrete steps of updating the set of node variables . The process begins from timestep where variables only contain local information, and these variables update node variables based on information gathered from other relevant graph nodes in the graphical model after a fixed number of message passing steps. Instead of treating the message passing process as a discrete process, we formulate it as a continuous process which eliminates the requirement of having a predetermined number of steps of message passing. By further pushing the message passing process to update at smaller steps and continuing the updates for an arbitrarily large number of steps, each variable update can be represented as a ordinary differential equation (ODE) system with shared and reusable functions:

(8)

where , is a reusable message function, is the set of neighbours of node , and is a function that aggregates the information passed to a variable. The above formulation describes the case of pairwise message functions, though this can be generalized to higher-order interactions. The function to use in passing information between nodes and depends on the nodes; we implement this as a fixed library of functions, the selection from which depends on the type of node to allow for generalization to new graph structures. Performing message passing to derive final states is equivalent to solving an initial value problem for an ODE system. Following the ODE formulation, the final states of variables can be computed as:

(9)

This formulation can be solved with an ODE solver.

3.2 Continuous message passing as density transformation

Continuous graph flow leverages the continuous message passing mechanism (described in Sec. 3.1) and formulates the message passing as implicit density transformations of the variables in the graphical model (illustrated in Figure 1). Given a set of variables with dependencies between variables (as in graphical model), the goal is to obtain the true distribution over by fitting a model based on data sampled from the true distribution. Assume that at initial time , the joint distribution has a simple form such as independent Gaussian distribution for different variables

where the mean and variance of Gaussian are learnable. The continuous message passing process allows the transformation of the variables from

to . Moreover, this process also converts the distributions over variables from simple distributions (e.g. Gaussian distributions) to complex data distributions. Building on the single-variable continuous dynamics proposed by Chen et al. (2018) where each variable in the system has a single coherent meaning, we define the following dynamics over joint distributions corresponding to multiple-variables that subsequently can be solved using initial value problem of ODE as follows:

(10)

where the set of variables is a concatenation of all variables in the graph (varies depending on the size of the input graph) and represents the set of factor functions . The changes in log probability along with the sequential changes of variables can be reduced to the trace computation of Jacobian matrix Chen et al. (2018).

In this work, we use two forms of density transformations for message passing: (1) generic message transformations – transformations with generic update functions whose trace (refer Eq. 10) can be approximated instead of computed by brute force, and (2) multi-scale message transformations – transformations with generic update functions at multiple scales of information.

Generic message transformations. We represent a message passing block as neural network. The trace estimation of Jacobian matrix by a generic neural network function can be estimated using the trace estimator proposed in Grathwohl et al. (2018):

(11)

where denotes a generic neural network based message passing block, and

is a noise vector and usually can be sampled from standard Gaussian or Rademacher distributions.

Multi-scale message transformations. As a generalization of generic message transformations, we design a model with multi-scale message passing to encode different levels of information in the variables. As in Dinh et al. (2016), we construct our overall multiscale flow model by stacking several blocks wherein each block contains message passing blocks based on generic message transformations. After passing the input through a block, we factor out a portion of the output and feed it as input to the subsequent block.

(12)

where with as the total number of blocks in the design of the multi-scale architecture. Assume at time instance , the variable is factored out into two to obtain and , we input one of these as the input to the block. Let be the input to the next block, the density transformation is formulated as:

(13)

4 Experiments

To show the effectiveness and generalizability of our Continuous Graph Flow (CGF) on graphical distributions, we evaluate our model on two tasks: (1) image puzzle generation; and (2) layout generation from scene graphs. These two tasks have high complexity in variable distributions and diverse potential function types and offer a challenging evaluation setup for our model.

4.1 Graphical distribution – density estimation

4.1.1 Image puzzle generation

Task description. We design image puzzles for image datasets. Given an image of size , we design a puzzle by dividing the original image into non-overlapping unique patches (each forming a puzzle patch). A puzzle patch is of size and each image is divided into puzzle patches both horizontally and vertically. Each of the patches forms a node in the graph. To evaluate the performance of our model on dynamic graph sizes, instead of training the model with all nodes, we sample adjacent patches where is uniformly sampled from as input to the model during training and test using the same setup. In our experiments, we use patch size , the number of puzzle patches where and edge function for each direction (left, right, up, down) within a neighbourhood of a node. Additional details are provided in Appendix.

Datasets and baselines. We design the image puzzle generation task (described above) for each of three image datasets: MNIST LeCun et al. (1998), CIFAR10 Krizhevsky and Hinton (2009), and CelebA Liu et al. (2015) CelebA dataset does not have a validation set, thus, we split the original dataset into a training set of 27,000 images and test set of 3,000 images. We perform training and testing using the training and validation set provided in the original dataset respectively. We compare our model with four state-of-the-art VAE based models which focus on learning structured representations: (1) GraphVAE He et al. (2018), (2) Graphite Grover et al. (2018), (3) Variational message passing using structured inference networks (VMP-SIN) Lin et al. (2018), (4) BiLSTM + VAE: a bidirectional LSTM used to model the interaction between node latent variables (obtained after serializing the graph) in an autoregressive manner similar to Gregor et al. (2015), (5) Variational graph autoencoder (GAE) Kipf and Welling (2016b), and (6) Neural relational inference Kipf et al. (2018): we adapt this to model data for single time instance and model interactions between the nodes.

Quantitative results. We report the negative log likelihood (NLL) in bits/dimension for each of the three datasets (lower is better). We observe that our model outperforms the VAE-based baselines with a significant margin as shown in Table 1.

Method MNIST CIFAR-10 CelebA-HQ
2x2 3x3 4x4 2x2 3x3 4x4 2x2 3x3 4x4
BiLSTM + VAE 4.97 4.77 4.42 6.02 5.20 4.53 5.72 5.66 5.48
GraphVAE He et al. (2018) 4.89 4.65 3.82 6.03 5.02 4.70 5.66 5.43 5.27
Graphite Grover et al. (2018) 4.90 4.64 4.02 6.06 5.09 4.61 5.71 5.50 5.32
VMP-SIN Lin et al. (2018) 5.13 4.92 4.44 6.00 4.96 4.34 5.70 5.43 5.27
GAE Kipf and Welling (2016b) 4.91 4.89 4.17 5.83 4.95 4.21 5.71 5.63 5.28
NRI Kipf et al. (2018) 4.58 4.35 4.11 5.44 4.82 4.70 5.36 5.43 5.28
Ours 1.24 1.21 1.20 2.42 2.31 2.00 3.44 3.17 3.16
Table 1: Quantitative results on image puzzle generation. Comparison of our CGF model with standard VAE and state-of-the-art VAE based models in bits/dimension (lower is better). These results are for unconditional generation using multi-scale version of continuous graph flow.
   
(a)     (b)
Figure 2: Qualitative results on MNIST for image puzzle generation. Samples generated using our model for 2x2 MNIST puzzles in (a) unconditional generation and (b) conditional generation settings. For setting (b), generated patches (highlighted in green bounding boxes) are conditioned on the remaining patches (obtained from ground truth). Best viewed in color.
     
(a)       (b)
Figure 3: Qualitative results on CelebA-HQ for image puzzle generation. Samples generated using our model for 3x3 CelebA-HQ puzzles in (a) unconditional generation and (b) conditional generation settings. For setting (b), generated patches (highlighted in green bounding boxes) are conditioned on the remaining patches (obtained from ground truth). Best viewed in color.
Settings MNIST CIFAR-10 CelebA-HQ
Odd to even 1.33 2.81 3.31
Less to more 1.37 2.91 3.66
More to less 1.34 2.83 3.44
Table 2: Quantitative performance measurement of generalizability of our CGF model in three different evaluation settings for image puzzle sizes 3x3 for three image datasets in bits/dimension. These results are for unconditional generation using multiscale version of continuous graph flow.

Qualitative evaluation. In addition to the quantitative results, we perform two types of sampling based evaluation: (1) Unconditional Generation, i.e., samples from a base distribution (e.g. Gaussian) are transformed into learnt data distribution using the flow, and (2) Conditional Generation: we map where to the points in base distribution to obtain and subsequently concatenate the samples from Gaussian distribution to to obtain that match the dimensions of desired graph and generate samples by transforming from to using the trained graph flow. For our experiments, we perform two sets of generation experiments using CGF: (1) Unconditional Generation: Given a puzzle size , puzzle patches are generated using a vector sampled from Gaussian distribution (refer Fig. 2, 3(a)); and (2) Conditional Generation: Given patches from an image puzzle having patches, we generate the remaining patches of the puzzle using our model (see Fig. 2, 3(b)). We believe the task of conditional generation is easier than unconditional generation one as there is more relevant information in the input while performing flow based transformations.

4.1.2 Layout generation from scene graphs

Task description. Layout generation from scene graphs bridges the gap between the symbolic graph-based scene description and the object layouts in the scene  Johnson et al. (2018). Scene graphs represent scenes as directed graphs, where nodes are objects and edges give relationships between objects. Object layouts are described by the set of corresponding bounding box annotations Johnson et al. (2018); Zhao et al. (2019). Continuous graph flow is computed on the graph that resembles the scene graph (nodes correspond to objects and relations to directed edges). One edge function is defined for each relationship type. The output contains a set of object bounding boxes described by , where represents the top-left coordinates, and represents the bounding box width and height respectively. In general, this task is challenging because a single instance of a scene graph can potentially correspond to multiple instances of scene layouts. Additional details are provided in Appendix.

Datasets and baselines. We evaluate our proposed model on Visual Genome Krishna et al. (2017) and COCO-Stuff Caesar et al. (2018) datasets. The preprocessing is the same as  Johnson et al. (2018). Visual Genome contains 175 object and 45 relation types. The training, validation and test set contain 62565, 5506 and 5088 images respectively. The average number of objects and relations is 10 and 5 for each image. COCO-Stuff contains 80 things and 91 stuff categories. Object relations are defined by their geometric positions and are categorized into 6 relations: left of, right of, above, below, inside, and surrounding. The final dataset contains 24972 train, 1024 validation, and 2048 test scene graphs. We use the same baselines as the image puzzle generation tasks in Sec. 4.1.1.

Figure 4: Visualization for layout generation on Visual Genome. Our CGF model can generate diverse layouts for the same scene graph. The upper row shows four layouts with unconditional generation; the lower row shows three generated layouts conditioned on known bounding boxes. Please zoom in to see the category of each object.

Results and analysis. We show the main results in Table 3. We use negative log likelihood for evaluating models on scene layout generation. The negative log likelihood is calculated per node (lower is better). The results indicate that our CGF model significantly outperforms the VAE-based baselines illustrating the benefits of exact likelihood computation and flexible architectures. Moreover, the results also indicate that our model is able to reason over varied numbers of relation type (or edge types) described by the scene graphs. Fig. 4 shows some qualitative results.

Method Visual Genome COCO-Stuff
BiLSTM + VAE -1.20 -1.60
GraphVAE He et al. (2018) -1.05 -1.36
Graphite Grover et al. (2018) -1.17 -0.93
VMP-SIN Lin et al. (2018) -0.61 -0.85
GAE Kipf and Welling (2016b) -1.85 -1.92
NRI Kipf et al. (2018) -0.76 -0.91
Ours -4.24 -6.21
Table 3: Quantitative results for layout generation for scene graph in negative log-likelihood. These results are for unconditional generation using CGF with generic message transformations.

4.2 Graphical distribution – generalization test

Can CGF generalize to unseen subset of variables? To test the generalizability of our model to variable graph sizes (variable number of nodes), we design three different evaluation settings and test it on image puzzle task: (1) odd to even: training with graphs having odd graph sizes and testing on graphs with even numbers of nodes, (2) less to more: training on graphs with smaller sizes and testing on graphs with larger sizes, and (3) more to less: training on graphs with larger sizes and testing on graphs with smaller. In the less to more setting, we test the model’s ability to use the factors learned from small graphs on more complicated ones, whereas the more to less setting evaluates the model’s ability to learn disentangled functions without explicitly seeing them during training. In our experiments, for the less to more setting, we use sizes less than for training and more than for testing where G is the size of the full graph. Similarly, for the less to more setting, we use sizes less than for training and more than for testing. Tab. 2 reports the NLL for these settings. The NLL of these models are close to the performance on the original experiments when trained with variable graph sizes indicating that our model is able to generalize to unseen graph sizes.

5 Related work

Probabilistic graphical models. Probabilistic graphical models (PGMs) serve as a statistical framework for modeling dependencies among random variables and building multivariate statistical models Wainwright et al. (2008). Traditional modeling algorithms include belief propagation networks Yedidia et al. (2001); Ping and Ihler (2017); Murphy et al. (1999), Markov Random Fields Rue and Held (2005); Boykov et al. (1998); Ishikawa (2003), Conditional Random Fields (CRFs) Lafferty et al. (2001); Zheng et al. (2015); Quattoni et al. (2007)

and Restricted Boltzmann Machines (RBMs)

Sutskever et al. (2009); Hinton (2012). Neural networks have also been explored for inference in PGMs to model the inference steps as feedforward or recurrent networks Li and Zemel (2014); Dai et al. (2016). Based on the similarities in the message passing mechanism of graph neural networks (GNNs) and iterative updates for dependencies between variables in PGMs Gilmer et al. (2017), we use GNN based architectures to represent graphical models in our work.

Generative modeling. Recent generative modeling techniques such as autoregressive methods Oord et al. (2016); Van Den Oord et al. (2016), variational autoencoders (VAEs) Kingma and Welling (2013); Lin et al. (2018); He et al. (2018); Grover et al. (2018); Kipf and Welling (2016b); Kipf et al. (2018) and generative adversarial networks (GANs) Goodfellow et al. (2014) have demonstrated considerable success in modeling complex data distributions. These models focus on learning a joint probability distribution over variables in the data. However, none of these models allow exact computation of likelihood of the data. More recently, flow-based generative models Rezende and Mohamed (2015); Kingma and Dhariwal (2018); Dinh et al. (2014, 2016) have been designed allowing the computation of exact likelihood of the data. Continuous time of normalizing flows Chen et al. (2018); Grathwohl et al. (2018) allow flow-based models to have completely unrestricted architectures, but have been shown to be effective for modeling dynamics for a single variable(with coherent meaning). In this work, we use continuous normalizing flows to model the multivariate dynamics to procure data distributions that incorporate statistical dependencies in complex graphical models.

Learning structured representations. Recently, there has been a surge of interest in modeling efforts that focus on improving generative models with the ability to learn structured latent space by unifying PGMs and VAEs  Lin et al. (2018); He et al. (2018); Johnson et al. (2016); Simonovsky and Komodakis (2018). However, these models still have limited capacity mainly because they are based on approximate computation of likelihood of data.

Learning graph representations. Random walk approaches Perozzi et al. (2014); Grover and Leskovec (2016), factorization-based approaches Belkin and Niyogi (2002); Cao et al. (2015) and GNN based approaches Scarselli et al. (2009); Veličković et al. (2017); Kipf and Welling (2016a) are three main approaches for representation learning on graphs. Our work is closely related with GNNs, in which nodes repeatedly accumulate information from their neighbors in a recursive manner. However, as opposed to the conventional discrete time updates, our model employs continuous normalizing flow based techniques to build flexible graph representations.

6 Conclusion

In this paper, we presented a new class of models – continuous graph flow, a continuous flow-based generative model that aims to model distributions of graph-structured complex data. In particular, we proposed a novel message passing scheme that performs continuous-time message passing formulated as an ordinary differential equation system with shared and reusable functions that operate over the graph structure. Our model effectively learns flexible representations for generalization to new tasks, models dependencies in complex data distributions, and performs exact computation of the likelihood of the variables in the data. We conducted empirical evaluation for two generation tasks, namely, image puzzle generation and layout generation for scene graph. Experimental results showed that continuous graph flow achieves significant performance improvement (up to in negative log-likelihood) over state-of-the-art unstructured and structured latent-space VAE models.

References

  • [1] A. Ahmed, N. Shervashidze, S. Narayanamurthy, V. Josifovski, and A. J. Smola (2013) Distributed large-scale natural graph factorization. In WWW,
  • [2] D. Beck, G. Haffari, and T. Cohn (2018) Graph-to-sequence learning using gated graph neural networks. arXiv preprint arXiv:1806.09835.
  • [3] M. Belkin and P. Niyogi (2002) Laplacian eigenmaps and spectral techniques for embedding and clustering. In NIPS, Cited by: §5.
  • [4] Y. Boykov, O. Veksler, and R. Zabih (1998) Markov random fields with efficient approximations. In CVPR, Cited by: §5.
  • [5] H. Caesar, J. Uijlings, and V. Ferrari (2018) Coco-stuff: thing and stuff classes in context. In

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

    ,
    pp. 1209–1218. Cited by: §4.1.2.
  • [6] S. Cao, W. Lu, and Q. Xu (2015) Grarep: learning graph representations with global structural information. In Proceedings of the 24th ACM international on conference on information and knowledge management, Cited by: §5.
  • [7] T. Q. Chen, Y. Rubanova, J. Bettencourt, and D. K. Duvenaud (2018) Neural ordinary differential equations. In Advances in Neural Information Processing Systems, Cited by: §A.1, §1, §2, §3.2, §3.2, §5.
  • [8] H. Dai, B. Dai, and L. Song (2016) Discriminative embeddings of latent variable models for structured data. In ICML, Cited by: §5.
  • [9] L. Dinh, D. Krueger, and Y. Bengio (2014) Nice: non-linear independent components estimation. arXiv preprint arXiv:1410.8516. Cited by: §1, §5.
  • [10] L. Dinh, J. Sohl-Dickstein, and S. Bengio (2016) Density estimation using real nvp. arXiv preprint arXiv:1605.08803. Cited by: §1, §3.2, §5.
  • [11] D. K. Duvenaud, D. Maclaurin, J. Iparraguirre, R. Bombarell, T. Hirzel, A. Aspuru-Guzik, and R. P. Adams (2015) Convolutional networks on graphs for learning molecular fingerprints. In NIPS,
  • [12] A. Fout, J. Byrd, B. Shariat, and A. Ben-Hur (2017) Protein interface prediction using graph convolutional networks. In NIPS,
  • [13] J. Gilmer, S. S. Schoenholz, P. F. Riley, O. Vinyals, and G. E. Dahl (2017) Neural message passing for quantum chemistry. In ICML, Cited by: §5.
  • [14] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio (2014) Generative adversarial nets. In NIPS, Cited by: §5.
  • [15] W. Grathwohl, R. T. Chen, J. Betterncourt, I. Sutskever, and D. Duvenaud (2018) Ffjord: free-form continuous dynamics for scalable reversible generative models. arXiv preprint arXiv:1810.01367. Cited by: §1, §2, §3.2, §5.
  • [16] K. Gregor, I. Danihelka, A. Graves, D. J. Rezende, and D. Wierstra (2015)

    Draw: a recurrent neural network for image generation

    .
    arXiv preprint arXiv:1502.04623. Cited by: §4.1.1.
  • [17] A. Grover and J. Leskovec (2016) Node2vec: scalable feature learning for networks. In KDD, Cited by: §5.
  • [18] A. Grover, A. Zweig, and S. Ermon (2018) Graphite: iterative generative modeling of graphs. arXiv preprint arXiv:1803.10459. Cited by: §1, §4.1.1, Table 1, Table 3, §5.
  • [19] T. Hamaguchi, H. Oiwa, M. Shimbo, and Y. Matsumoto (2017) Knowledge transfer for out-of-knowledge-base entities: a graph neural network approach. arXiv preprint arXiv:1706.05674.
  • [20] W. Hamilton, Z. Ying, and J. Leskovec (2017) Inductive representation learning on large graphs. In NIPS,
  • [21] W. L. Hamilton, R. Ying, and J. Leskovec (2017) Representation learning on graphs: methods and applications. arXiv preprint arXiv:1709.05584.
  • [22] J. He, Y. Gong, J. Marino, G. Mori, and A. Lehrmann (2018) Variational autoencoders with jointly optimized latent dependency structure. In ICLR, Cited by: §1, §4.1.1, Table 1, Table 3, §5, §5.
  • [23] G. E. Hinton (2012) A practical guide to training restricted boltzmann machines. In Neural networks: Tricks of the trade, Cited by: §5.
  • [24] H. Ishikawa (2003) Exact optimization for markov random fields with convex priors. PAMI. Cited by: §5.
  • [25] J. Johnson, A. Gupta, and L. Fei-Fei (2018) Image generation from scene graphs. In CVPR, Cited by: §4.1.2, §4.1.2.
  • [26] M. Johnson, D. K. Duvenaud, A. Wiltschko, R. P. Adams, and S. R. Datta (2016) Composing graphical models with neural networks for structured representations and fast inference. In NIPS, Cited by: §1, §5.
  • [27] A. Jyothi, T. Durand, J. He, L. Sigal, and G. Mori (2019) LayoutVAE: stochastic scene layout generation from a label set. In ICCV,
  • [28] T. Karras, T. Aila, S. Laine, and J. Lehtinen (2017) Progressive growing of gans for improved quality, stability, and variation. arXiv preprint arXiv:1710.10196.
  • [29] D. P. Kingma and M. Welling (2013) Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114. Cited by: §1, §5.
  • [30] D. P. Kingma and P. Dhariwal (2018) Glow: generative flow with invertible 1x1 convolutions. In NIPS, Cited by: §1, §5.
  • [31] T. Kipf, E. Fetaya, K. Wang, M. Welling, and R. Zemel (2018) Neural relational inference for interacting systems. arXiv preprint arXiv:1802.04687. Cited by: §4.1.1, Table 1, Table 3, §5.
  • [32] T. N. Kipf and M. Welling (2016) Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907. Cited by: §5.
  • [33] T. N. Kipf and M. Welling (2016) Variational graph auto-encoders. arXiv preprint arXiv:1611.07308. Cited by: §4.1.1, Table 1, Table 3, §5.
  • [34] D. Koller and N. Friedman (2009) Probabilistic graphical models: principles and techniques. MIT press. Cited by: §1.
  • [35] R. Krishna, Y. Zhu, O. Groth, J. Johnson, K. Hata, J. Kravitz, S. Chen, Y. Kalantidis, L. Li, D. A. Shamma, et al. (2017) Visual genome: connecting language and vision using crowdsourced dense image annotations. IJCV. Cited by: §4.1.2.
  • [36] A. Krizhevsky and G. Hinton (2009) Learning multiple layers of features from tiny images. Technical report Citeseer. Cited by: §4.1.1.
  • [37] J. Lafferty, A. McCallum, and F. C. Pereira (2001) Conditional random fields: probabilistic models for segmenting and labeling sequence data. Cited by: §5.
  • [38] Y. LeCun, L. Bottou, Y. Bengio, P. Haffner, et al. (1998) Gradient-based learning applied to document recognition. Proceedings of the IEEE. Cited by: §4.1.1.
  • [39] R. Li, M. Tapaswi, R. Liao, J. Jia, R. Urtasun, and S. Fidler (2017) Situation recognition with graph neural networks. In ICCV,
  • [40] Y. Li and R. Zemel (2014) Mean-field networks. arXiv preprint arXiv:1410.5884. Cited by: §5.
  • [41] W. Lin, N. Hubacher, and M. E. Khan (2018) Variational message passing with structured inference networks. arXiv preprint arXiv:1803.05589. Cited by: §1, §4.1.1, Table 1, Table 3, §5, §5.
  • [42] Z. Liu, P. Luo, X. Wang, and X. Tang (2015) Deep learning face attributes in the wild. In ICCV, Cited by: §4.1.1.
  • [43] D. Marcheggiani, J. Bastings, and I. Titov (2018) Exploiting semantics in neural machine translation with graph convolutional networks. arXiv preprint arXiv:1804.08313.
  • [44] K. P. Murphy, Y. Weiss, and M. I. Jordan (1999) Loopy belief propagation for approximate inference: an empirical study. In

    Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence

    ,
    Cited by: §5.
  • [45] M. Nickel, K. Murphy, V. Tresp, and E. Gabrilovich (2016)

    A review of relational machine learning for knowledge graphs

    .
    Proceedings of the IEEE.
  • [46] N. Noorshams and M. J. Wainwright (2012) Stochastic belief propagation: a low-complexity alternative to the sum-product algorithm. IEEE Transactions on Information Theory. Cited by: §1.
  • [47] A. v. d. Oord, N. Kalchbrenner, and K. Kavukcuoglu (2016) Pixel recurrent neural networks. arXiv preprint arXiv:1601.06759. Cited by: §5.
  • [48] J. Pearl (2014) Probabilistic reasoning in intelligent systems: networks of plausible inference. Elsevier. Cited by: §1.
  • [49] B. Perozzi, R. Al-Rfou, and S. Skiena (2014) Deepwalk: online learning of social representations. In KDD, Cited by: §5.
  • [50] W. Ping and A. Ihler (2017) Belief propagation in conditional rbms for structured prediction. arXiv preprint arXiv:1703.00986. Cited by: §5.
  • [51] A. Quattoni, S. Wang, L. Morency, M. Collins, and T. Darrell (2007) Hidden conditional random fields. IEEE Transactions on Pattern Analysis & Machine Intelligence (10), pp. 1848–1852. Cited by: §5.
  • [52] D. J. Rezende and S. Mohamed (2015) Variational inference with normalizing flows. arXiv preprint arXiv:1505.05770. Cited by: §1, §2, §5.
  • [53] H. Rue and L. Held (2005) Gaussian markov random fields: theory and applications. Chapman and Hall/CRC. Cited by: §5.
  • [54] A. Santoro, F. Hill, D. Barrett, A. Morcos, and T. Lillicrap (2018) Measuring abstract reasoning in neural networks. In ICML,
  • [55] A. Santoro, D. Raposo, D. G. Barrett, M. Malinowski, R. Pascanu, P. Battaglia, and T. Lillicrap (2017) A simple neural network module for relational reasoning. In NIPS,
  • [56] F. Scarselli, M. Gori, A. C. Tsoi, M. Hagenbuchner, and G. Monfardini (2009) The graph neural network model. IEEE Transactions on Neural Networks. Cited by: §5.
  • [57] X. Shen, S. Pan, W. Liu, Y. Ong, and Q. Sun (2018) Discrete network embedding. In IJCAI,
  • [58] M. Simonovsky and N. Komodakis (2018) Graphvae: towards generation of small graphs using variational autoencoders. In International Conference on Artificial Neural Networks, Cited by: §5.
  • [59] E. B. Sudderth, A. T. Ihler, M. Isard, W. T. Freeman, and A. S. Willsky (2010) Nonparametric belief propagation. Communications of the ACM. Cited by: §1.
  • [60] I. Sutskever, G. E. Hinton, and G. W. Taylor (2009) The recurrent temporal restricted boltzmann machine. In NIPS, Cited by: §5.
  • [61] J. Tang, M. Qu, M. Wang, M. Zhang, J. Yan, and Q. Mei (2015) Line: large-scale information network embedding. In WWW,
  • [62] A. Van Den Oord, S. Dieleman, H. Zen, K. Simonyan, O. Vinyals, A. Graves, N. Kalchbrenner, A. W. Senior, and K. Kavukcuoglu (2016) WaveNet: a generative model for raw audio.. SSW. Cited by: §5.
  • [63] P. Veličković, G. Cucurull, A. Casanova, A. Romero, P. Lio, and Y. Bengio (2017) Graph attention networks. arXiv preprint arXiv:1710.10903. Cited by: §5.
  • [64] M. J. Wainwright, T. S. Jaakkola, and A. S. Willsky (2003) Tree-based reparameterization framework for analysis of sum-product and related algorithms. IEEE Transactions on information theory.
  • [65] M. J. Wainwright, M. I. Jordan, et al. (2008) Graphical models, exponential families, and variational inference. Foundations and Trends® in Machine Learning. Cited by: §1, §5.
  • [66] M. J. Wainwright and M. I. Jordan (2008) Graphical models, exponential families, and variational inference. Foundations and Trends in Machine Learning 1. External Links: Link Cited by: §1.
  • [67] T. Yao, Y. Pan, Y. Li, and T. Mei (2018) Exploring visual relationship for image captioning. In ECCV,
  • [68] J. S. Yedidia, W. T. Freeman, and Y. Weiss (2001) Generalized belief propagation. In NIPS, Cited by: §5.
  • [69] M. Zhang, Z. Cui, M. Neumann, and Y. Chen (2018) An end-to-end deep learning architecture for graph classification. In AAAI,
  • [70] B. Zhao, L. Meng, W. Yin, and L. Sigal (2019) Image generation from layout. In CVPR, Cited by: §4.1.2.
  • [71] S. Zheng, S. Jayasumana, B. Romera-Paredes, V. Vineet, Z. Su, D. Du, C. Huang, and P. H. Torr (2015) Conditional random fields as recurrent neural networks. In Proceedings of the IEEE international conference on computer vision, Cited by: §5.

Appendix A Appendix for Continuous Graph Flow for Flexible Density Estimation

We provide supplementary materials to support the contents of the main paper. In this part, we describe implementation details of our model. We also provide additional qualitative results for both the generation tasks, namely, image puzzle generation and layout generation from scene graphs.

a.1 Implementation Details

The ODE formulation for continuous graph flow (CGF) model was solved using ODE solver provided by NeuralODE [7]. In this section, we provide specific details of the configuration of our CGF model used in our experiments on two different generation tasks used for evaluation in the paper.

Image puzzle generation. Each graph for this task comprise nodes corresponding to the puzzle pieces. The pieces that share an edge in the puzzle grid are considered to be connected and an edge function is defined over those connections. In our experiments, each node is transformed to an embedding of size 64 using convolutional layer. The graph message passing is performed over these node embeddings. The image puzzle generation model is designed using a multi-scale continuous graph flow architecture. We use two levels of downscaling in our model each of which factors out the channel dimension of the random variable by 2. We have two blocks of continuous graph flow before each downscaling wth four convolutional message passing blocks in each of them. Each message passing block has a unary message passing function and binary passing functions based on the edge types – all containing hidden dimensions of 64.

Layout generation for scene graphs. For scene graph layout generation, a graph comprises node corresponding to object bounding boxes described by , where represents the top-left coordinates, and represents the bounding box width and height respectively and edge functions are defined based on the relation types. In our experiments, the layout generation model uses two blocks of continuous graph flow units, with four linear graph message passing blocks in each of them. The message passing function uses 64 hidden dimensions, and takes the embedding of node label and edge label in unary message passing function and binary message passing function respectively. The embedding dimension is also set to 64 dimensions. For binary message passing function, we pass the messages both through the direction of edge and the reverse direction of edge to increase the model capacity.

a.2 Image Puzzle Generation: Additional Qualitative Results for CelebA-HQ

Fig. 5 and Fig. 6 presents the image puzzles generated using unconditional generation and conditional generation respectively.

Figure 5: Qualitative results on CelebA-HQ for image puzzle generation. Samples generated using our model for 3x3 CelebA-HQ puzzles in unconditional generation setting. Best viewed in color.
Figure 6: Qualitative results on CelebA-HQ for image puzzle generation. Samples generated using our model for 3x3 CelebA-HQ puzzles in conditional generation setting. Generated patches are highlighted in green. Best viewed in color.

a.3 Layout Generation from Scene Graph: Qualitative Results

Fig. 7 and Fig. 8 show qualitative result on unconditional and conditional layout generation from scene graphs for COCO-stuff dataset respectively. Fig. 9 and Fig. 10 show qualitative result on unconditional and conditional layout generation from scene graphs for Visual Genome dataset respectively. The generated results have diverse layouts corresponding to a single scene graph.

Figure 7: Examples of Unconditional generation of layouts from scene graphs for COCO-Stuff dataset. We sample 4 layouts. The generated results have different layouts, but sharing the same scene graph. Best viewed in color. Please zoom in to see the category of each object.
Figure 8: Conditional generation of layouts from scene graphs for COCO-stuff dataset. We sample 4 layouts. The generated results have different layouts except the conditional layout objects in (b), but sharing the same scene graph. Best viewed in color. Please zoom in to see the category of each object.
Figure 9: Unconditional generation of layouts from scene graphs for Visual Genome dataset. We sample 4 layouts for each scene graph. The generated results have different layouts, but sharing the same scene graph. Best viewed in color. Please zoom in to see the category of each object.
Figure 10: Conditional generation of layouts from scene graphs for Visual Genome dataset. We sample 4 layouts for each scene graph. The generated results have different layouts except the conditional layout objects in (b), but sharing the same scene graph. Best viewed in color. Please zoom in to see the category of each object.