Controllable and Compositional Generation with Latent-Space Energy-Based Models

by   Weili Nie, et al.

Controllable generation is one of the key requirements for successful adoption of deep generative models in real-world applications, but it still remains as a great challenge. In particular, the compositional ability to generate novel concept combinations is out of reach for most current models. In this work, we use energy-based models (EBMs) to handle compositional generation over a set of attributes. To make them scalable to high-resolution image generation, we introduce an EBM in the latent space of a pre-trained generative model such as StyleGAN. We propose a novel EBM formulation representing the joint distribution of data and attributes together, and we show how sampling from it is formulated as solving an ordinary differential equation (ODE). Given a pre-trained generator, all we need for controllable generation is to train an attribute classifier. Sampling with ODEs is done efficiently in the latent space and is robust to hyperparameters. Thus, our method is simple, fast to train, and efficient to sample. Experimental results show that our method outperforms the state-of-the-art in both conditional sampling and sequential editing. In compositional generation, our method excels at zero-shot generation of unseen attribute combinations. Also, by composing energy functions with logical operators, this work is the first to achieve such compositionality in generating photo-realistic images of resolution 1024x1024.


page 21

page 25

page 26

page 27

page 28

page 29

page 30

page 31


Seen to Unseen: Exploring Compositional Generalization of Multi-Attribute Controllable Dialogue Generation

Existing controllable dialogue generation work focuses on the single-att...

Exploring the Effectiveness of Mask-Guided Feature Modulation as a Mechanism for Localized Style Editing of Real Images

The success of Deep Generative Models at high-resolution image generatio...

Exploring Compositional Visual Generation with Latent Classifier Guidance

Diffusion probabilistic models have achieved enormous success in the fie...

Latent Constraints: Learning to Generate Conditionally from Unconditional Generative Models

Deep generative neural networks have proven effective at both conditiona...

OptGAN: Optimizing and Interpreting the Latent Space of the Conditional Text-to-Image GANs

Text-to-image generation intends to automatically produce a photo-realis...

Evaluation of Latent Space Disentanglement in the Presence of Interdependent Attributes

Controllable music generation with deep generative models has become inc...

MacLaSa: Multi-Aspect Controllable Text Generation via Efficient Sampling from Compact Latent Space

Multi-aspect controllable text generation aims to generate fluent senten...

Please sign up or login with your details

Forgot password? Click here to reset