Text-to-Image Generation Grounded by Fine-Grained User Attention

11/07/2020
by   Jing Yu Koh, et al.
5

Localized Narratives is a dataset with detailed natural language descriptions of images paired with mouse traces that provide a sparse, fine-grained visual grounding for phrases. We propose TReCS, a sequential model that exploits this grounding to generate images. TReCS uses descriptions to retrieve segmentation masks and predict object labels aligned with mouse traces. These alignments are used to select and position masks to generate a fully covered segmentation canvas; the final image is produced by a segmentation-to-image generator using this canvas. This multi-step, retrieval-based approach outperforms existing direct text-to-image generation models on both automatic metrics and human evaluations: overall, its generated images are more photo-realistic and better match descriptions.

READ FULL TEXT

page 2

page 3

page 7

page 8

page 12

page 13

page 14

page 15

research
09/24/2021

Fine-Grained Image Generation from Bangla Text Description using Attentional Generative Adversarial Network

Generating fine-grained, realistic images from text has many application...
research
09/16/2019

Controllable Text-to-Image Generation

In this paper, we propose a novel controllable text-to-image generative ...
research
09/10/2021

Panoptic Narrative Grounding

This paper proposes Panoptic Narrative Grounding, a spatially fine and g...
research
05/12/2021

Connecting What to Say With Where to Look by Modeling Human Attention Traces

We introduce a unified framework to jointly model images, text, and huma...
research
02/09/2021

Telling the What while Pointing the Where: Fine-grained Mouse Trace and Language Supervision for Improved Image Retrieval

Existing image retrieval systems use text queries to provide a natural a...
research
09/09/2019

Neural Naturalist: Generating Fine-Grained Image Comparisons

We introduce the new Birds-to-Words dataset of 41k sentences describing ...
research
06/21/2018

Fashion-Gen: The Generative Fashion Dataset and Challenge

We introduce a new dataset of 293,008 high definition (1360 x 1360 pixel...

Please sign up or login with your details

Forgot password? Click here to reset