Text2Scene: Generating Abstract Scenes from Textual Descriptions

09/04/2018
by   Fuwen Tan, et al.
6

In this paper, we propose an end-to-end model that learns to interpret natural language describing a scene to generate an abstract pictorial representation. The pictorial representations generated by our model comprise the spatial distribution and attributes of the objects in the described scene. Our model uses a sequence-to-sequence network with a double attentive mechanism and introduces a regularization strategy. These scene representations can be sampled from our model similarly as in language-generation models. We show that the proposed model, initially designed to handle the generation of cartoon-like pictorial representations in the Abstract Scenes Dataset, can also handle, under minimal modifications, the generation of semantic layouts corresponding to real images in the COCO dataset. Human evaluations using a visual entailment task show that pictorial representations generated with our full model can entail at least one out of three input visual descriptions 94 and at least two out of three 62

READ FULL TEXT

page 4

page 5

page 7

page 8

page 11

page 12

research
05/23/2015

Text to 3D Scene Generation with Rich Lexical Grounding

The ability to map descriptions of scenes to 3D geometric representation...
research
09/23/2021

Scene Graph Generation for Better Image Captioning?

We investigate the incorporation of visual relationships into the task o...
research
06/09/2023

Aladdin: Zero-Shot Hallucination of Stylized 3D Assets from Abstract Scene Descriptions

What constitutes the "vibe" of a particular scene? What should one find ...
research
05/18/2017

Learning a bidirectional mapping between human whole-body motion and natural language using deep recurrent neural networks

Linking human whole-body motion and natural language is of great interes...
research
05/15/2018

Paper Abstract Writing through Editing Mechanism

We present a paper abstract writing system based on an attentive neural ...
research
04/29/2021

Comparing Visual Reasoning in Humans and AI

Recent advances in natural language processing and computer vision have ...
research
12/02/2020

Generating Descriptions for Sequential Images with Local-Object Attention and Global Semantic Context Modelling

In this paper, we propose an end-to-end CNN-LSTM model for generating de...

Please sign up or login with your details

Forgot password? Click here to reset