Visual Madlibs: Fill in the blank Image Generation and Question Answering

05/31/2015
by   Licheng Yu, et al.
0

In this paper, we introduce a new dataset consisting of 360,001 focused natural language descriptions for 10,738 images. This dataset, the Visual Madlibs dataset, is collected using automatically produced fill-in-the-blank templates designed to gather targeted descriptions about: people and objects, their appearances, activities, and interactions, as well as inferences about the general scene or its broader context. We provide several analyses of the Visual Madlibs dataset and demonstrate its applicability to two new description generation tasks: focused description generation, and multiple-choice question-answering for images. Experiments using joint-embedding and deep learning methods show promising results on these tasks.

READ FULL TEXT

page 1

page 2

page 9

research
07/25/2022

Is GPT-3 all you need for Visual Question Answering in Cultural Heritage?

The use of Deep Learning and Computer Vision in the Cultural Heritage do...
research
08/20/2020

Document Visual Question Answering Challenge 2020

This paper presents results of Document Visual Question Answering Challe...
research
04/27/2018

Customized Image Narrative Generation via Interactive Visual Question Generation and Answering

Image description task has been invariably examined in a static manner w...
research
05/24/2023

Pento-DIARef: A Diagnostic Dataset for Learning the Incremental Algorithm for Referring Expression Generation from Examples

NLP tasks are typically defined extensionally through datasets containin...
research
06/21/2018

Fashion-Gen: The Generative Fashion Dataset and Challenge

We introduce a new dataset of 293,008 high definition (1360 x 1360 pixel...
research
08/25/2023

GeoExplainer: A Visual Analytics Framework for Spatial Modeling Contextualization and Report Generation

Geographic regression models of various descriptions are often applied t...
research
01/28/2022

Summarizing Differences between Text Distributions with Natural Language

How do two distributions of texts differ? Humans are slow at answering t...

Please sign up or login with your details

Forgot password? Click here to reset