Visual-Relation Conscious Image Generation from Structured-Text

08/05/2019
by   Duc Minh Vo, et al.
5

Generating realistic images from text descriptions is a challenging problem and has many applications such as image editing or computer-aided design. In spite of recent progress on this text-to-image generation based on GANs, generating realistic images from complex descriptions with many entities in a general scene is not yet achieved in the literature. In the presence of multiple entities, the relationships between entities become important because they condition the location of each entity. We propose a GAN-based end-to-end network that learns the visual-relation layout between entities from given texts and conditions the layout in generating images. Our proposed network consists of the visual-relation layout module and the stacking-GANs. The visual-relation layout module predicts bounding-boxes for all the entities given in an input text so that each of them uniquely corresponds to each entity while keeping its involved relationships. The visual-relation layout is obtained by aggregating all the bounding-boxes, reflecting the scene structure given in text. The stacking-GANs is the stack of three GANs conditioned on the output of previous GAN and the visual-relation layout, consistently capturing the scene structure. Our network realistically renders entities' details in high resolution while keeping the scene structure. Experimental results on two public datasets show outperformances of our method against state-of-the-art methods.

READ FULL TEXT

page 3

page 7

research
04/04/2018

Image Generation from Scene Graphs

To truly understand the visual world our models should be able not only ...
research
12/25/2019

Controllable and Progressive Image Extrapolation

Image extrapolation aims at expanding the narrow field of view of a give...
research
04/13/2023

ALR-GAN: Adaptive Layout Refinement for Text-to-Image Synthesis

We propose a novel Text-to-Image Generation Network, Adaptive Layout Ref...
research
11/28/2016

Generating Holistic 3D Scene Abstractions for Text-based Image Retrieval

Spatial relationships between objects provide important information for ...
research
09/04/2019

An Efficient and Layout-Independent Automatic License Plate Recognition System Based on the YOLO detector

In this paper, we present an efficient and layout-independent Automatic ...
research
09/07/2019

Relationships from Entity Stream

Relational reasoning is a central component of intelligent behavior, but...
research
04/10/2018

Imagine This! Scripts to Compositions to Videos

Imagining a scene described in natural language with realistic layout an...

Please sign up or login with your details

Forgot password? Click here to reset