Transformer-based Image Generation from Scene Graphs

03/08/2023
by   Renato Sortino, et al.
0

Graph-structured scene descriptions can be efficiently used in generative models to control the composition of the generated image. Previous approaches are based on the combination of graph convolutional networks and adversarial methods for layout prediction and image generation, respectively. In this work, we show how employing multi-head attention to encode the graph information, as well as using a transformer-based model in the latent space for image generation can improve the quality of the sampled data, without the need to employ adversarial models with the subsequent advantage in terms of training stability. The proposed approach, specifically, is entirely based on transformer architectures both for encoding scene graphs into intermediate object layouts and for decoding these layouts into images, passing through a lower dimensional space learned by a vector-quantized variational autoencoder. Our approach shows an improved image quality with respect to state-of-the-art methods as well as a higher degree of diversity among multiple generations from the same scene graph. We evaluate our approach on three public datasets: Visual Genome, COCO, and CLEVR. We achieve an Inception Score of 13.7 and 12.8, and an FID of 52.3 and 60.3, on COCO and Visual Genome, respectively. We perform ablation studies on our contributions to assess the impact of each component. Code is available at https://github.com/perceivelab/trf-sg2im

READ FULL TEXT

page 8

page 9

page 10

research
03/16/2020

Object-Centric Image Generation from Layouts

Despite recent impressive results on single-object and single-domain ima...
research
08/29/2022

Frido: Feature Pyramid Diffusion for Complex Scene Image Synthesis

Diffusion models (DMs) have shown great potential for high-quality image...
research
05/05/2019

PasteGAN: A Semi-Parametric Method to Generate Image from Scene Graph

Despite some exciting progress on high-quality image generation from str...
research
06/02/2022

Modeling Image Composition for Complex Scene Generation

We present a method that achieves state-of-the-art results on challengin...
research
02/15/2018

Mapping Images to Scene Graphs with Permutation-Invariant Structured Prediction

Structured prediction is concerned with predicting multiple inter-depend...
research
11/17/2021

Compositional Transformers for Scene Generation

We introduce the GANformer2 model, an iterative object-oriented transfor...
research
10/22/2021

MIGS: Meta Image Generation from Scene Graphs

Generation of images from scene graphs is a promising direction towards ...

Please sign up or login with your details

Forgot password? Click here to reset