Compositional Transformers for Scene Generation

11/17/2021
by   Drew A. Hudson, et al.
0

We introduce the GANformer2 model, an iterative object-oriented transformer, explored for the task of generative modeling. The network incorporates strong and explicit structural priors, to reflect the compositional nature of visual scenes, and synthesizes images through a sequential process. It operates in two stages: a fast and lightweight planning phase, where we draft a high-level scene layout, followed by an attention-based execution phase, where the layout is being refined, evolving into a rich and detailed picture. Our model moves away from conventional black-box GAN architectures that feature a flat and monolithic latent space towards a transparent design that encourages efficiency, controllability and interpretability. We demonstrate GANformer2's strengths and qualities through a careful evaluation over a range of datasets, from multi-object CLEVR scenes to the challenging COCO images, showing it successfully achieves state-of-the-art performance in terms of visual quality, diversity and consistency. Further experiments demonstrate the model's disentanglement and provide a deeper insight into its generative process, as it proceeds step-by-step from a rough initial sketch, to a detailed layout that accounts for objects' depths and dependencies, and up to the final high-resolution depiction of vibrant and intricate real-world scenes. See https://github.com/dorarad/gansformer for model implementation.

READ FULL TEXT

page 17

page 18

page 19

page 20

page 21

page 24

page 25

page 27

research
03/01/2021

Generative Adversarial Transformers

We introduce the GANsformer, a novel and efficient type of transformer, ...
research
03/21/2023

CC3D: Layout-Conditioned Generation of Compositional 3D Scenes

In this work, we introduce CC3D, a conditional generative model that syn...
research
06/02/2022

Modeling Image Composition for Complex Scene Generation

We present a method that achieves state-of-the-art results on challengin...
research
08/29/2022

Frido: Feature Pyramid Diffusion for Complex Scene Image Synthesis

Diffusion models (DMs) have shown great potential for high-quality image...
research
03/08/2023

Transformer-based Image Generation from Scene Graphs

Graph-structured scene descriptions can be efficiently used in generativ...
research
02/18/2021

Multi-Agent Reinforcement Learning of 3D Furniture Layout Simulation in Indoor Graphics Scenes

In the industrial interior design process, professional designers plan t...
research
07/19/2018

Compositional GAN: Learning Conditional Image Composition

Generative Adversarial Networks (GANs) can produce images of surprising ...

Please sign up or login with your details

Forgot password? Click here to reset