Efficient Generation of Structured Objects with Constrained Adversarial Networks

07/26/2020 ∙ by Luca Di Liello, et al. ∙ Università di Trento 0

Generative Adversarial Networks (GANs) struggle to generate structured objects like molecules and game maps. The issue is that structured objects must satisfy hard requirements (e.g., molecules must be chemically valid) that are difficult to acquire from examples alone. As a remedy, we propose Constrained Adversarial Networks (CANs), an extension of GANs in which the constraints are embedded into the model during training. This is achieved by penalizing the generator proportionally to the mass it allocates to invalid structures. In contrast to other generative models, CANs support efficient inference of valid structures (with high probability) and allows to turn on and off the learned constraints at inference time. CANs handle arbitrary logical constraints and leverage knowledge compilation techniques to efficiently evaluate the disagreement between the model and the constraints. Our setup is further extended to hybrid logical-neural constraints for capturing very complex constraints, like graph reachability. An extensive empirical analysis shows that CANs efficiently generate valid structures that are both high-quality and novel.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 14

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Many key applications require to generate objects that satisfy hard structural constraints, like drug molecules, which must be chemically valid, and game levels, which must be playable. Despite their impressive success (Karras et al., 2018; Zhang et al., 2017; Zhu et al., 2017), Generative Adversarial Networks (GANs) (Goodfellow et al., 2014) struggle in these applications. The reason is that data alone are often insufficient to capture the structural constraints (especially if noisy) and convey them to the model.

As a remedy, we derive Constrained Adversarial Networks (CANs), which extend GANs to generating valid structures with high probabilty. Given a set of arbitrary discrete constraints, CANs achieve this by penalizing the generator for allocating mass to invalid objects during training. The penalty term is implemented using the semantic loss (SL) (Xu et al., 2018)

, which turns the discrete constraints into a differentiable loss function implemented as an arithmetic circuit (i.e., a polynomial). The SL is probabilistically sound, can be evaluated exactly, and supports end-to-end training. Importantly, the arithmetic circuit – which can be quite large, depending on the complexity of the constraints – can be thrown away after training. CANs handle complex constraints, like reachability on graphs, by first embedding configurations in space in which the constraints can be encoded compactly, and then applying the SL to the embeddings.

Since the constraints are embedded directly into the generator, high-quality structures can be sampled efficiently (in time practically independent of the complexity of the constraints) with a simple forward pass on the generator, as in regular GANs. No costly sampling or optimization steps are needed. We additionally show how to equip CANs with the ability to switch constraints on and off dynamically during inference, at no run-time cost.

Contributions. Summarizing, we contribute: 1) CANs, an extension of GANs in which the generator is encouraged at training time to generate valid structures and support efficient sampling, 2) native support for intractably complex constraints, 3) conditional CANs, an effective solution for dynamically turning on and off the constraints at inference time, 4) a thorough empirical study on real-world data showing that CANs generate structures that are likely valid and coherent with the training data.

2 Related Work

Structured generative tasks have traditionally been tackled using probabilistic graphical models (Koller and Friedman, 2009) and grammars (Talton et al., 2012), which lack support for representation learning and efficient sampling under constraints. Tractable probabilistic circuits (Poon and Domingos, 2011; Kisa et al., 2014) are a recent alternative that make use of ideas from knowledge compilation Darwiche and Marquis (2002) to provide efficient generation of valid structures. These approaches generate valid objects by constructing a circuit (a polynomial) that encodes both the hard constraints and the probabilistic structure of the problem. Although inference is linear in the size of the circuit, the latter can grow very large if the constaints are complex enough. In contrast, CANs model the probabilistic structure of the problem using a neural architecture, while relying on knowledge compilation for encoding the hard constraints during training. For this reason, the resulting circuit is much more compact. Moreover, the circuit can be discarded at inference time. The time and space complexity of sampling for CANs is therefore roughly independent from the complexity of the constraints in practice.

Deep generative models developed for structured tasks are special-purpose, in that they rely on ad-hoc architectures, tackle specific applications, or do not support efficient sampling (Guimaraes et al., 2017; De Cao and Kipf, 2018; Xue and van Hoeve, 2019; Torrado et al., 2019)

. Some recent approaches have focused on incorporating a constraint learning component in training deep generative models, using reinforcement learning 

De Cao and Kipf (2018) or inverse reinforcement learning Hu et al. (2018) techniques. This direction is complementary to ours and is useful when constraints are not known in advance or cannot be easily formalized as functions of the generator output. Indeed, our experiment on molecule generation shows the advantages of enriching CANs with constraint learning to generate high quality and diverse molecules.

Other general approaches for injecting knowledge into neural nets (like deep statistical-relational models (Lippi and Frasconi, 2009; Manhaeve et al., 2018; Marra and Kuželka, 2019)

, tensor-based models 

(Rocktäschel and Riedel, 2017; Donadello et al., 2017), and fuzzy logic-based models (Marra et al., 2019)) are either not generative or require the constraints to be available at inference time.

3 Unconstrained GANs

GANs (Goodfellow et al., 2014) are composed of two neural nets: a discriminator trained to recognize “real” objects sampled from the data distribution , and a generator

that maps random latent vectors

to objects that fool the discriminator. Learning equates to solving the minimax game with value function:

(1)

Here and are the distributions induced by the generator and discriminator, respectively. New objects can be sampled by mapping random vectors using the generator, i.e., . Under idealized assumptions, the learned generator matches the data distribution:

Theorem 1 (Goodfellow et al. (2014)).

If and are non-parametric and the leftmost expectation in Eq. 1 is approximated arbitrarily well by the data, the global equilibrium of Eq. 1 satisfies and .

In practice, training GANs is notoriously hard (Salimans et al., 2016; Mescheder et al., 2018). The most common failure mode is mode collapse, in which the generated objects are clustered in a tiny region of the object space. Remedies include using alternative objective functions Goodfellow et al. (2014), divergences (Nowozin et al., 2016; Arjovsky et al., 2017) and regularizers (Miyato et al., 2018). In our experiments, we apply some of these techniques to stabilize training.

In structured tasks, the objects of interest are usually discrete. In the following, we focus on stochastic generators that output a categorical distribution over and objects are sampled from the latter. In this case, .

4 Generating Structures with CANs

Our goal is to learn a deep generative model that outputs structures consistent with validity constraints and an unobserved distribution . We assume to be given: i) a feature map that extracts binary features from , and ii) a single validity constraint encoded as a Boolean formula on . Any discrete structured space can be encoded this way.

4.1 Limitations of GANs

Standard GANs are likely to output invalid structures, for two main reasons. First, the VC dimension of unrestricted discrete formulas is exponential in the number of variables Vapnik and Chervonenkis (2015). Hence, the number of examples necessary to capture any non-trivial constraint can be intractably large. This rules out learning the rules of chemical validity or, worse still, node reachability from even moderately large data sets. Second, in many cases of interest the examples are noisy and do violate . In this more challenging case, it can be shown that the data lures GANs into learning not to satisfy the constraint:

Corollary 1.

Under the assumptions of Theorem 1, given a target distribution , a constraint consistent with it, and a dataset of examples sampled i.i.d. from a corrupted distribution inconsistent with , GANs associate non-zero mass to infeasible objects.

Indeed, by Theorem 1 the optimal generator satisfies , which is inconsistent with . Since Theorem 1 captures the intent of GAN training, this corollary shows that GANs are by design incapable of handling invalid examples.

4.2 Constrained Adversarial Networks

Constrained Adversarial Networks (CANs) avoid these issues by taking both the data and the target structural constraint as inputs. The value function is designed so that the generator maximizes the probability of generating valid structures. In order to derive CANs it is convenient to start from the following alternative GAN value function (Goodfellow et al., 2014): .

Let be a GAN and be a fixed discriminator that distinguishes between valid and invalid structures, where indicates logical entailment. Ideally, we wish the generator to never output invalid structures. This can be achieved by using an aggregate discriminator that only accepts configurations that are both valid and high-quality w.r.t. . Let be the indicator that classifies as real, and similarly for and . By definition:

(2)

Plugging the aggregate discriminator into the alternative value function gives:

(3)
(4)
(5)
(6)

The second step holds because does not depend on . If allocates non-zero mass to any measurable subset of invalid structures, the second term becomes . This is consistent with our goal but problematic for learning. A better alternative is to optimize the lower bound:

(7)

This term is the semantic loss (SL) proposed in (Xu et al., 2018)

to inject knowledge into neural networks. The SL is much smoother than the original and it only evaluates to

if allocates all the mass to infeasible configurations. This immediately leads to the CAN value function:

(8)

where is a newly-introduced hyper-parameter controlling the importance of the constraint. The SL can be viewed as the negative log-likelihood of the constraint . This shows that it rewards the generator proportionally to the mass it allocates to valid structures. The SL can be rewritten as:

(9)

Since the SL is the negative logarithm of a polynomial in , it is fully differentiable.111So long as , which is always the case in practice. In practice, below we apply the semantic loss term directly to , i.e., .

If the SL is given large enough weight then it gets closer to the ideal “hard” discriminator, and therefore more strongly encourages the CAN to generate valid structures. Under the preconditions of Theorem 1, it can be shown that for CANs generate valid structures only:

Proposition 1.

Under the assumptions of Corollary 1, CANs associate zero mass to infeasible objects, irrespective of the discrepancy between  and .

This holds because any global equilibrium of must minimize the second term. If is non-parametric, then the minimum is attained by or equivalently , which implies , proving the claim. Of course, as with standard GANs, the prerequisites are often violated in practice. Regardless, Proposition 1 works as a sanity check, and shows that, in contrast to GANs, CANs are appropriate for structured generative tasks.

Figure 1: Left: fuzzy logic encoding (using the Łukasiewicz T-norm) of in CNF format as a function of and . Middle: encoding of DNF XOR. Right: SL of either encoding.

A possible alternative to introduce a differentiable knowledge-based loss into the value function consists in relaxing constraints using fuzzy logic, as done in a number of recent works for deep discriminative learning Donadello et al. (2017); Marra et al. (2019). Apart from lacking a formal derivation in terms of expected probability of satisfying constraints, the issue is that fuzzy logic is not semantically sound, meaning that equivalent encodings of the same constraint may give different loss functions Giannini et al. (2018). Figure 1 illustrates this inconsistency using an XOR constraint: the “fuzzy loss” and its gradient change radically depending on whether the XOR is encoded as CNF (left) or DNF (middle), while the SL is unaffected (right).

Evaluating the Semantic Loss

The sum in Eq. 9 represents the unnormalized probability of sampling a valid configuration from . This evaluation involves computing the Weighted Model Count (WMC) (Chavira and Darwiche, 2008), i.e. the sum of all solutions of , weighted according to their probability with respect to . Naïvely implementing the SL as in Eq. 9 is infeasible in most cases, as it involves summing over exponentially many configurations. Knowledge compilation (KC) (Darwiche and Marquis, 2002)

is a well known approach in automated reasoning and solving WMC through KC is a state-of-the-art technique for answering probabilistic queries in many discrete graphical models 

(Chavira and Darwiche, 2008; Fierens et al., 2015; Van den Broeck et al., 2011). These techniques work by compiling the problem into a more compact representation and are particularly effective when the logical knowledge doesn’t change through time. As pointed out in (Xu et al., 2018), KC comes to the rescue by making the SL much more efficient to evaluate during training, at the cost of an off-line compilation phase.

The main downside of KC is that, depending on the complexity of , the compiled circuit may be very large. This is less of an issue during training, which is often performed on powerful machines, but it can be problematic for inference, especially on embedded devices. A major advantage of CANs is that the circuit is not required for inference (as the latter consists of a simple forward pass over the generator), and can thus be thrown away after training. This means that CANs incur no space penalty during inference compared to GANs.

The embedding function

When fed a particularly complex constraint, knowledge compilation may produce a circuit too large even for the training stage. In the case of such an intractable constraint, we approximate the semantic loss by first mapping the objects from to an application-specific space where can be expressed in compact form, and then use the semantic loss on top of the transformed objects. We successfully employed this technique to synthesize mario levels where the goal tile is reachable from the starting tile; all details are provided below. The same technique can be exploited for dealing with very complex logical formulas beyond the reach of state-of-the-art knowledge compilation.

4.3 Conditional CANs

So far we described how to use the SL for enforcing structural constraints on the generator’s output. Since the SL can be applied to any distribution over binary variables, it can also be used to enforce conditional constraints that can be turned on and off at inference time. Specifically, we notice that the constraint can involve also latent variables, and we show how this can be leveraged for different purposes. Similarly to InfoGANs 

Chen et al. (2016), the generator’s input is augmented with an additional binary vector . Instead of maximizing (an approximation of) the mutual information between and the generator’s output, the SL is used to logically bind the input codes to semantic features or constraint of interest. Let be constraints of interest. In order to make them switchable, we extend the latent vector with fresh variables and train the CAN using the constraint:

where the prior

used during training is estimated from data.

Using a conditional SL term during training results in a model that can be conditioned to generate object with desired, arbitrarily complex properties at inference time. Additionally, this feature shows a beneficial effect in mitigating mode collapse during training, as reported in Section 5.2.

5 Experiments

Our experimental evaluation aims at answering the following questions:

  • Can CANs with tractable constraints achieve better results than GANs?

  • Can CANs with intractable constraints achieve better results than GANs?

  • Can constraints be combined with rewards to achieve better results than using rewards only?

We implemented CANs using Tensorflow and used PySDD

222URL: pypi.org/project/PySDD/ to perform knowledge compilation. We tested CANs using different generator architectures on three real-world structured generative tasks.333Details and code can be found in the Supplementary material. In all cases, we evaluated the objects generated by CANs and those of the baselines using three metrics (adopted from Samanta et al. (2018)): validity is the proportion of sampled objects that are valid; novelty is the proportion of valid sampled objects that are not present in the training data; and uniqueness is the proportion of valid unique (non-repeated) sampled objects.

5.1 Super Mario Bros level generation

In this experiment we show how CANs can help in the challenging task of learning to generate videogame levels from user-authored content. While procedural approaches to videogame level generation have successfully been used for decades, the application of machine learning techniques in the creation of (functional) content is a relatively new area of research 

(Summerville et al., 2018). On the one hand, modern video game levels are characterized by aesthetical features that cannot be formally encoded and thus are difficult to implement in a procedure, which motivates the use of ML techniques for the task. On the other hand, the levels have often to satisfy a set of functional (hard) constraints that are easy to guarantee when the generator is hand-coded but pose challenges for current machine learning models.

Architectures for Super Mario Bros level generation include LSTMs (Summerville and Mateas, 2016), probabilistic graphical models (Guzdial and Riedl, 2016), and multi-dimensional MCMC (Snodgrass and Ontanón, 2016). MarioGANs Torrado et al. (2019) are specifically designed for level generation, but they only constrain the mixture of tiles appearing in the level. This technique cannot be easily generalized to arbitrary constraints.

In the following, we show how the semantic loss can be used to encode useful hard constraints in the context of videogame level generation. These constraints might be functional requirements that apply to every generated object or might be contextually used to steer the generation towards objects with certain properties. In our empirical analysis, we focus on Super Mario Bros (SMB), possibly one of the most studied video games in tile-based level generation.

Recently, Volz et al. (2018) applied Wasserstein GANs (WGANs) (Arjovsky et al., 2017)

to SMB level generation. The approach works by first training a generator in the usual way, then using an evolutionary algorithm called Covariance Matrix Adaptation Evolution Strategy (CMA-ES) to search for the best latent vectors according to a user-defined fitness function on the corresponding levels. We stress that this technique is orthogonal to CANs and the two can be combined together. We adopt the same experimental setting, WGAN architecture and training procedure of 

Volz et al. (2018). The structured objects are tile-based representations of SMB levels (e.g. Fig. 2) and the training data is obtained by sliding a tiles window over levels from the Video game level corpus (Summerville et al., 2016).

We run all the experiments on a machine with a single 1080Ti GPU.

5.1.1 CANs with tractable constraints: generating SMB levels with pipes

In this experiment, the focus is on showing how CANs can effectively deal with constraints that can be directly encoded over the generator output. Pipes are made of four different types of tiles. They can have a variable height but the general structure is always the same: two tiles (top-left and top-right) on top and one or more pairs of body tiles (body-left and body-right) below (see the CAN - pipes in picture in Fig. 2 for examples of valid pipes). Since encoding all possible dispositions and combinations of pipes in a level would result in an extremely large propositional formula, we apply the constraint locally to a window that is slid, horizontally and vertically, by one tile at a time (notice that all structural properties of pipes are covered using this method). The constraint consists of a lot of implications of the type “if this is a top-left tile, then the tile below must be a body-left one” conjoined together (see the Supplementary material for the full formula). The relative importance of the constraints is determined by the hyper-parameter (see Eq. 8).

GAN - pipes CAN - pipes GAN - playable CAN - playable
Figure 2: Examples of SMB levels generated by GAN and CAN. Left: generating levels containing pipes; right: generating reachable levels. For each of the two settings we report prototypical examples of levels generated by GAN (first and third picture) and CAN (second and fourth picture). Notice how all pipes generated by CAN are valid, contrarily to what happens for GAN, and that the GAN generates a level that is not playable (because of the big jump at the start of the map).

There are two major problems in the application of the constraint on pipes when using a large : i) vanishing pipes: this occurs because the generator can satisfy the constraint by simply generating layers without pipes; ii) mode collapse: the generator may learn to place pipes always in the same positions. We address both issues by introducing the SL after an initial bootstrap phase (of epochs) in which the generator learns to generate sensible objects, and by linearly increasing its weight from zero to . The final value for was chosen as the highest value allowing to retain al least 80% of pipe tiles on average with respect to a plain GAN. All experiments were run for epochs.

Table 1 reports experimental results comparing GAN and CAN trained on all levels containing pipes. CAN manage to almost double the validity of the generated levels (see the two left pictures in Fig. 2 for some prototypical examples) while retaining about 82% of the pipe tiles and without any significant loss in terms of diversity (as measured by the L1 norm on the difference between each pair of levels in the generated batch) or cost in terms of training (roughly doubled training times). Inference is real-time (< 40 ms) for both architectures.

These results allow to answer Q1 affirmatively.

Model # Maps Validity Avg pipe tiles / level L1 Norm Training time
GAN 7 47.6% 7.8 0.0115 1h 12m
CAN 7 83.2% 6.4 0.0110 2h 2m
Table 1: Comparison between GAN and CAN on SMB level generation with pipes. The 7 maps containing pipes are mario-1-1, mario-2-1, mario-3-1, mario-4-1, mario-4-2, mario-6-2 and mario-8-1, for a total of training samples. Results report validity, average number of pipe tiles per level, L1 norm on the difference between each pair of levels in the generated batch and training time. Inference is real-time (< 40 ms) for both architectures.

5.1.2 CANs with intractable constraints: generating playable SMB levels

In the following we show how CANs can be successfully applied in settings where constraints are too complex to be directly encoded onto the generator output. A level is playable if there is a feasible path444According to the game’s physics. from the left-most to the right-most column of the level. We refer to this property as reachability. We compare CANs with CMA-ES, as both techniques can be used to steer the network towards the generation of playable levels. In CMA-ES, the fitness function doesn’t have to be differentiable and the playability is computed on the output of an A* agent (the same used in  (Volz et al., 2018)) playing the level. Having the SL to steer the generation towards playable levels is not trivial, since it requires a differentiable definition of playability. Directly encoding the constraint in propositional logic is intractable. Consider the size of a first order logic propositional formula describing all possible path a player can follow in the level. We thus define the playability constraint on the output of an embedding function (modelled as a feedforward NN) that approximates tile reachability. The function is trained to predict whether each tile is reachable from the left-most column using traces obtained from the A* agent. See the Supplementary material for the details.

Network type Level Tested samples Validity Training time Inference time per sample
GAN mario-1-3 1000 9.80% 1 h 15 min 40 ms
GAN + CMA-ES mario-1-3 1000 65.90% 1 h 15 min 22 min
CAN mario-1-3 1000 71.60% 1 h 34 min 40 ms
GAN mario-3-3 1000 13.00% 1 h 11 min 40 ms
GAN + CMA-ES mario-3-3 1000 64.20% 1 h 11 min 22 min
CAN mario-3-3 1000 62.30% 1 h 27 min 40 ms
Table 2: Results on the generation of playable SMB level. Levels mario-1-3 ( training samples) and mario-3-3 ( training samples) were chosen due to their high solving complexity. Results compare a baseline GAN, a GAN combined with CMA-ES and a CAN. Validity is defined as the ability of the A* agent to complete the level. Note that inference time for GAN and CAN is measured in milliseconds while time for GAN + CMA-ES is in minutes.

Table 2 shows the validity of a batch of levels generated respectively by plain GAN, GAN combined with CMA-ES using the default parameters for the search, and a forward pass of CAN. Each training run lasted epochs with all the default hyper parameters defined in Volz et al. (2018), and the SL was activated from epoch with , which validation experiments showed to be a reasonable trade-off between SL and generator loss. Results show that CANs achieves better (mario-1-3) or comparable (mario-3-3) validity with respect to GAN + CMA-ES at a fraction of the inference time. At the cost of pretraining the reachability function, CANs avoid the execution of the A* agent during the generation and sample high quality objects in milliseconds (as compared to minutes), thus enabling applications to create new levels at run time. Moreover, no significant quality degradation can be seen on the generated levels as compared to the ones generated by plain GAN (which on the other hand fails most of the time to generate reachable levels), as can be seen in Fig. 2.

With these results, we can answer Q2 affirmatively.

5.2 Molecule generation

Most approaches in molecule generation use variational autoencoders (VAEs) 

(Gómez-Bombarelli et al., 2018; Kusner et al., 2017; Dai et al., 2018; Samanta et al., 2019), or more expensive techniques like MCMC Seff et al. (2019). Closest to CANs are ORGANs (Guimaraes et al., 2017) and MolGANs (De Cao and Kipf, 2018), which respectively combine Sequence GANs (SeqGANs) and Graph Convolutional Networks (GCNs) with a reward network that optimizes specific chemical properties. Albeit comparing favorably with both sequence models (Jaques et al., 2017; Guimaraes et al., 2017) (using SMILE representations) and likelihood-based methods, MolGAN are reported to be susceptible to mode collapse.

In this experiment, we investigate Q3 by combining MolGAN’s adversarial training and reinforcement learning objective with a conditional SL term on the task of generating molecules with certain desirable chemical properties.

In contrast with our previous experimental settings, here the structured objects are undirected graphs of bounded maximum size, represented by discrete tensors that encode the atom/node type (padding atom (no atom), Carbon, Nitogren, Oxygen, Fluorine) and the bound/edge type (padding bond (no bond), single, double, triple and aromatic bond). During training, the network implicitly rewards validity and the maximization of the three chemical properties at once:

QED (druglikeness), SA (synthesizability) and logP (solubility). The training is stopped once the uniqueness drops under . We augment the MolGAN architecture with a conditional SL term, making use of latent dimensions to control the presence of one of the types of atoms considered in the experiment, as shown in Section 4.3.

Conditioning the generation of molecules with specific atoms at training time mitigates the drop in uniqueness caused by the reward network during the training. This allows the model to be trained for more epochs and results in higher quality molecules, as reported in Table 3. 555

The experimental setting and evaluation metrics are identical to 

De Cao and Kipf (2018) except for the introduction of the SL, we thus report the same results for the baseline.

In this experiment, we train the model on a NVIDIA RTX 2080 Ti. The total training time is around 1 hour, and the inference is real-time. Using CANs produced a negligible overhead during the training with respect to the original model, providing further evidence that the technique doesn’t heavily impact on the training.

This results suggest that coupling CANs with a reinforcement learning objective is beneficial, answering Q3 affirmatively.

Reward for SL validity uniqueness diversity QED SA logP
QED + SA + logP False 97.4 2.4 91.0 47.0 84.0 65.0
True 96.6 2.5 98.8 51.8 90.7 73.6
Table 3: Results of using the semantic loss on the MolGAN architecture. The diversity score is obtained by comparing sub-structures of generated samples against a random subset of the dataset. A lower score indicates a higher amount of repetitions between the generated samples and the dataset. The first row refers to the results reported in the MolGAN paper.

6 Conclusion

We presented Constrained Adversarial Networks (CANs), a generalization of GANs in which the generator is encouraged during training to output valid structures. CANs make use of the semantic loss (Xu et al., 2018) to penalize the generator proportionally to the mass it allocate to invalid structures and. As in GANs, generating valid structures (on average) requires a simple forward pass on the generator. Importantly, the data structures used by the SL, which can be large if the structural constraints are very complex, are discarded after training. CANs were proven to be effective in improving the quality of the generated structures without significantly affecting inference run-time, and conditional CANs proved useful in promoting diversity of the generator’s outputs.

Broader Impact

Broadly speaking, this work aims at improving the reliability of structures / configurations generated via machine learning approaches. This can have a strong impact on a wide range of research fields and application domains, from drug design and protein engineering to layout synthesis and urban planning. Indeed, the lack of reliability of machine-generated outcomes is one of main obstacles to a wider adoption of machine learning technology in our societies. On the other hand, there is a risk of overestimating the reliability of the outputs of CANs, which are only guaranteed to satisfy constraints in expectation. For applications in which invalid structures should be avoided, like safety-critical applications, the objects output by CANs should always be validated before use.

From an artificial intelligence perspective, this work supports the line of thought that in order to overcome the current limitations of AI there is a need for combining machine learning and especially deep learning technology with approaches from knowledge representation and automated reasoning, and that principled ways to achieve this integration should be pursued.

References

  • M. Arjovsky, S. Chintala, and L. Bottou (2017) Wasserstein generative adversarial networks. In International conference on machine learning, pp. 214–223. Cited by: §3, §5.1, Super Mario Bros Level Generation.
  • M. Chavira and A. Darwiche (2008) On probabilistic inference by weighted model counting. Artificial Intelligence 172 (6-7), pp. 772–799. Cited by: §4.2.
  • X. Chen, Y. Duan, R. Houthooft, J. Schulman, I. Sutskever, and P. Abbeel (2016) Infogan: interpretable representation learning by information maximizing generative adversarial nets. In Advances in neural information processing systems, pp. 2172–2180. Cited by: §4.3.
  • H. Dai, Y. Tian, B. Dai, S. Skiena, and L. Song (2018) Syntax-directed variational autoencoder for molecule generation. In Proceedings of the International Conference on Learning Representations, Cited by: §5.2.
  • A. Darwiche and P. Marquis (2002) A knowledge compilation map. Journal of Artificial Intelligence Research 17, pp. 229–264. Cited by: §2, §4.2.
  • N. De Cao and T. Kipf (2018) MolGAN: An implicit generative model for small molecular graphs. arXiv preprint arXiv:1805.11973. Cited by: §2, §5.2, Molecule Generation, footnote 5.
  • I. Donadello, L. Serafini, and A. d’Avila Garcez (2017) Logic tensor networks for semantic image interpretation. Cited by: §2, §4.2.
  • D. Fierens, G. Van den Broeck, J. Renkens, D. Shterionov, B. Gutmann, I. Thon, G. Janssens, and L. De Raedt (2015)

    Inference and learning in probabilistic logic programs using weighted boolean formulas

    .
    Theory and Practice of Logic Programming 15 (3), pp. 358–401. Cited by: §4.2.
  • F. Giannini, M. Diligenti, M. Gori, and M. Maggini (2018) On a convex logic fragment for learning and reasoning. IEEE Transactions on Fuzzy Systems 27 (7), pp. 1407–1416. Cited by: §4.2.
  • R. Gómez-Bombarelli, J. N. Wei, D. Duvenaud, J. M. Hernández-Lobato, B. Sánchez-Lengeling, D. Sheberla, J. Aguilera-Iparraguirre, T. D. Hirzel, R. P. Adams, and A. Aspuru-Guzik (2018) Automatic chemical design using a data-driven continuous representation of molecules. ACS central science 4 (2), pp. 268–276. Cited by: §5.2.
  • I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio (2014) Generative adversarial nets. In Advances in neural information processing systems, pp. 2672–2680. Cited by: §1, §3, §3, §4.2, Theorem 1.
  • G. L. Guimaraes, B. Sanchez-Lengeling, C. Outeiral, P. L. C. Farias, and A. Aspuru-Guzik (2017) Objective-reinforced generative adversarial networks (ORGAN) for sequence generation models. arXiv preprint arXiv:1705.10843. Cited by: §2, §5.2.
  • M. Guzdial and M. Riedl (2016) Game level generation from gameplay videos. In Twelfth Artificial Intelligence and Interactive Digital Entertainment Conference, Cited by: §5.1.
  • Z. Hu, Z. Yang, R. R. Salakhutdinov, L. Qin, X. Liang, H. Dong, and E. P. Xing (2018) Deep generative models with learnable knowledge constraints. In Advances in Neural Information Processing Systems, pp. 10501–10512. Cited by: §2.
  • N. Jaques, S. Gu, D. Bahdanau, J. M. Hernández-Lobato, R. E. Turner, and D. Eck (2017) Sequence tutor: conservative fine-tuning of sequence generation models with kl-control. In Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 1645–1654. Cited by: §5.2.
  • T. Karras, T. Aila, S. Laine, and J. Lehtinen (2018) Progressive growing of gans for improved quality, stability, and variation. Cited by: §1.
  • D. Kisa, G. Van den Broeck, A. Choi, and A. Darwiche (2014) Probabilistic sentential decision diagrams. In Fourteenth International Conference on the Principles of Knowledge Representation and Reasoning, Cited by: §2.
  • D. Koller and N. Friedman (2009) Probabilistic graphical models: principles and techniques. MIT press. Cited by: §2.
  • M. J. Kusner, B. Paige, and J. M. Hernández-Lobato (2017) Grammar variational autoencoder. In Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 1945–1954. Cited by: §5.2.
  • M. Lippi and P. Frasconi (2009) Prediction of protein -residue contacts by markov logic networks with grounding-specific weights. Bioinformatics 25 (18), pp. 2326–2333. Cited by: §2.
  • R. Manhaeve, S. Dumancic, A. Kimmig, T. Demeester, and L. De Raedt (2018) DeepProbLog: Neural probabilistic logic programming. In Advances in Neural Information Processing Systems, pp. 3749–3759. Cited by: §2.
  • G. Marra, F. Giannini, M. Diligenti, and M. Gori (2019) LYRICS: a General Interface Layer to Integrate AI and Deep Learning. arXiv preprint arXiv:1903.07534. Cited by: §2, §4.2.
  • G. Marra and O. Kuželka (2019) Neural markov logic networks. arXiv preprint arXiv:1905.13462. Cited by: §2.
  • L. Mescheder, A. Geiger, and S. Nowozin (2018) Which training methods for gans do actually converge?. arXiv preprint arXiv:1801.04406. Cited by: §3.
  • T. Miyato, T. Kataoka, M. Koyama, and Y. Yoshida (2018) Spectral normalization for generative adversarial networks. arXiv preprint arXiv:1802.05957. Cited by: §3.
  • S. Nowozin, B. Cseke, and R. Tomioka (2016) f-GAN: Training generative neural samplers using variational divergence minimization. In Advances in neural information processing systems, pp. 271–279. Cited by: §3.
  • H. Poon and P. Domingos (2011) Sum-product networks: a new deep architecture. In

    2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops)

    ,
    pp. 689–690. Cited by: §2.
  • R. Ramakrishnan, P. O. Dral, M. Rupp, and O. A. von Lilienfeld (2014) Quantum chemistry structures and properties of 134 kilo molecules. Scientific Data 1. Cited by: Molecule Generation.
  • T. Rocktäschel and S. Riedel (2017) End-to-end differentiable proving. In Advances in Neural Information Processing Systems, pp. 3788–3800. Cited by: §2.
  • L. Ruddigkeit, R. van Deursen, L. C. Blum, and J. Reymond (2012) Enumeration of 166 billion organic small molecules in the chemical universe database gdb-17. Journal of Chemical Information and Modeling 52 (11), pp. 2864–2875. Note: PMID: 23088335 External Links: Document, Link, https://doi.org/10.1021/ci300415d Cited by: Molecule Generation.
  • T. Salimans, I. Goodfellow, W. Zaremba, V. Cheung, A. Radford, and X. Chen (2016) Improved techniques for training gans. In Advances in neural information processing systems, pp. 2234–2242. Cited by: §3.
  • B. Samanta, D. Abir, G. Jana, P. K. Chattaraj, N. Ganguly, and M. G. Rodriguez (2019) Nevae: a deep generative model for molecular graphs. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33, pp. 1110–1117. Cited by: §5.2.
  • B. Samanta, A. De, N. Ganguly, and M. Gomez-Rodriguez (2018) Designing random graph models using variational autoencoders with applications to chemical design. arXiv preprint arXiv:1802.05283. Cited by: §5.
  • A. Seff, W. Zhou, F. Damani, A. Doyle, and R. P. Adams (2019) Discrete object generation with reversible inductive construction. arXiv preprint arXiv:1907.08268. Cited by: §5.2.
  • S. Snodgrass and S. Ontanón (2016)

    Controllable procedural content generation via constrained multi-dimensional markov chain sampling.

    .
    In IJCAI, pp. 780–786. Cited by: §5.1.
  • A. J. Summerville, S. Snodgrass, M. Mateas, and S. Ontanón (2016) The vglc: the video game level corpus. arXiv preprint arXiv:1606.07487. Cited by: §5.1.
  • A. Summerville and M. Mateas (2016) Super mario as a string: platformer level generation via lstms. arXiv preprint arXiv:1603.00930. Cited by: §5.1.
  • A. Summerville, S. Snodgrass, M. Guzdial, C. Holmgård, A. K. Hoover, A. Isaksen, A. Nealen, and J. Togelius (2018) Procedural content generation via machine learning (pcgml). IEEE Transactions on Games 10 (3), pp. 257–270. Cited by: §5.1.
  • J. Talton, L. Yang, R. Kumar, M. Lim, N. Goodman, and R. Měch (2012) Learning design patterns with bayesian grammar induction. In Proceedings of the 25th annual ACM symposium on User interface software and technology, pp. 63–74. Cited by: §2.
  • R. R. Torrado, A. Khalifa, M. C. Green, N. Justesen, S. Risi, and J. Togelius (2019) Bootstrapping conditional gans for video game level generation. arXiv preprint arXiv:1910.01603. Cited by: §2, §5.1.
  • G. Van den Broeck, N. Taghipour, W. Meert, J. Davis, and L. De Raedt (2011) Lifted probabilistic inference by first-order knowledge compilation. In Proceedings of the Twenty-Second international joint conference on Artificial Intelligence, pp. 2178–2185. Cited by: §4.2.
  • V. N. Vapnik and A. Y. Chervonenkis (2015) On the uniform convergence of relative frequencies of events to their probabilities. In Measures of complexity, pp. 11–30. Cited by: §4.1.
  • V. Volz, J. Schrum, J. Liu, S. M. Lucas, A. Smith, and S. Risi (2018) Evolving mario levels in the latent space of a deep convolutional generative adversarial network. In

    Proceedings of the Genetic and Evolutionary Computation Conference

    ,
    pp. 221–228. Cited by: §5.1.2, §5.1.2, §5.1, Super Mario Bros Level Generation.
  • J. Xu, Z. Zhang, T. Friedman, Y. Liang, and G. Broeck (2018) A semantic loss function for deep learning with symbolic knowledge. In International Conference on Machine Learning, pp. 5498–5507. Cited by: §1, §4.2, §4.2, §6.
  • Y. Xue and W. van Hoeve (2019) Embedding decision diagrams into generative adversarial networks. In International Conference on Integration of Constraint Programming, Artificial Intelligence, and Operations Research, pp. 616–632. Cited by: §2.
  • H. Zhang, T. Xu, H. Li, S. Zhang, X. Wang, X. Huang, and D. N. Metaxas (2017) Stackgan: text to photo-realistic image synthesis with stacked generative adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision, pp. 5907–5915. Cited by: §1.
  • J. Zhu, T. Park, P. Isola, and A. A. Efros (2017)

    Unpaired image-to-image translation using cycle-consistent adversarial networks

    .
    In Proceedings of the IEEE international conference on computer vision, pp. 2223–2232. Cited by: §1.

Supplementary Material: implementation details

Super Mario Bros Level Generation

The deep neural network for this experiment is based on the DCGANs used in  [Volz et al., 2018]

. Batch normalization and ReLU are applied between the layers of the generator

, while batch normalization and Leaky ReLU with a slope of has been used for the discriminator . In the last layer of the generator we apply a softmaxactivation function to obtain probabilities that are finally given in input to the Semantic Loss. On the other hand, the generation of samples is done through the application, always on , of a stretched softmax function followed by an argmax, as in  [Volz et al., 2018].

The networks have been trained using the WGAN guidelines  [Arjovsky et al., 2017]. Thus, the number of iterations on the discriminator has been set to for each iteration on the generator. RMSProp has been used as optimizer, with a constant learning rate equal to . The batch size used during the experiments has been set to . Layers have been initialized using normal initializer for both the generator and the discriminator. Moreover, weight clipping is applied on the weights of with equal to . Finally, the size of the latent vector has been set to

and sampled from a normal distribution


Table 4 shows the network architecture of and .

Part Input Shape Output Shape Layer Type Kernel Stride
(32) (1, 1, 32) Reshape. - -
(1, 1, 32) (, , 16) Deconv.
(, , 16) (, , 8) Deconv.
(, , 8) (, , 4) Deconv.
(, , 4) (, , 13) Deconv.
(, , 13) (, , 64) Conv.
(, , 64) (, , 128) Conv.
(, , 128) (, , 256) Conv.
(, , 256) (1, 1, 1) Conv.
Table 4: Super Mario Bros Level Generation network architecture.

Details about the pipes constraint

Figure 3: A scomposed pipe

As reported in the main paper, experiment with the constraint on pipes has been run for epochs. Figure 3 shows the various parts composing a pipe and their disposition. Suppose to call the matrix boolean variables corresponding to the output of the generator with shape . Remember that we apply the constraint separately to windows of size , with each pixel having channels. The four channels represent the probabilities of the tiles: [top-left, top-right, body-left, body-right, others]. In particular, in the last channel we collapse all the probabilities of the tiles that do not belong to pipes (air, monsters, walls, …). Then, given the boolean vector, the list of the clauses composing the final constraint can be written as:

top-left tile requires top-right tile on the right and vice-versa
body-left tile requires body-right tile on the right and vice-versa
top-left tile requires body-left tile below
top-right tile requires body-right tile below
body-left tile requires body-left of top-left above
body-right tile requires body-right of top-right above
One hot encoding over all the 4 positions

Notice that first two indexes describe the position, e.g. means the upper left corner of the window, and the third index defines the tile type.

Details about the reachability constraint

The feedforward neural network is implemented using a CNN with two final dense layers, which architecture is described in Table 5.

Input Shape Output Shape Layer Type Kernel Stride
(, , 13) (, , 8) Conv.
(, , 8) (, , 16) Conv.
(, , 16) (, , 24) Conv.
(, , 24) (, , 32) Conv.
(, , 32) (, , 64) Conv.
(, , 64) (, , 96) Conv.
(, , 96) (, , 128) Conv.
(, , 128) (, , 192) Conv.
(, , 192) (, , 32) Dense - -
(, , 32) (, , 2) Dense - -
Table 5: Reachability Network architecture.

Performances of the approximation network include an accuracy and an F1-score higher than . Picture 4 shows examples of how reachability maps have been approximated with . The first column contains levels generated by the GAN. The binary maps have been obtained by summing the probabilities of all the solid tiles (ground, pipes, …), given in white. The second column contains the reachability maps, computed by the A* agent and averaged over different runs. Finally, the third column shows the reachability maps approximated by the neural network. Notice how well does work: the second and third columns are almost indistinguishable. and are such that and

Figure 4: Given some generated levels (first column), this picture compares real reachability maps computed by the A* agent (second column) and approximated by (third column).

Molecule Generation

The MolGAN architecture is composed of three networks, the generator , the discriminator and the reward network . and share the same architecture, but are trained with different objectives. is trained to predict the product of the QED, SA, logP metrics (in ). is optimized to produce samples that maximize the output of and are convincing to , on top of that, the conditional semantic loss is also applied.
While is trained in parallel with and , the loss of with respect to the output of is activated only after 150 epochs, at which point the adversarial loss stops being used. The semantic loss is applied to from the start to the end of the training. The weight of the semantic loss is equal to , whereas the adversarial loss of has a weight of .
Improved WGAN is used as the adversarial loss between and ( = 5), whereas is trained to estimate the desired target by mean squared error, and is trained with deep deterministic policy gradient w.r.t. the output of , which is seen as a reward to maximize.
Training proceeds until the uniqueness of the batch falling below . During training, the learning rate is set at a constant value of , the batch size is and there is no dropout. Adam with = 0.9 and = 0.999 is the optimizer of choice. Batch discrimination is used.
Results are obtained by evaluating a batch of generated samples.
The input noise

has dimension 32 if the semantic loss is not applied, 36 if it is applied; with the first 32 dimensions sampled from a standard normal distribution and the last four from a uniform distribution in

.
The maximum number of nodes for each molecule is 9, with 5 possible atom types and 5 bond types. Each molecule is represented by two matrices, one mapping each node to a label, and one adjacency matrix informing about the presence or lack of edges between nodes, and their type.
The input noise is received by and processed by three fully connected layers of units each, while acts as the activation function; a linear projection followed by a softmax is then applied to the output of the last layer to have it matching the size of the adjacency matrices.
and share the same architecture (no parameters are shared), based on two relational graph convolutions [De Cao and Kipf, 2018] of hidden units, followed by an aggregation as in [De Cao and Kipf, 2018] to obtain a graph level representation of 128 features. Two fully connected layers of units then reduce the graph embedding to a single output value, with as the activation for the hidden layer and with sigmoid being applied on the output in the case of . The MolGAN architecture is trained on the QM9 dataset  [Ramakrishnan et al., 2014, Ruddigkeit et al., 2012] composed of training examples.