VQA: Visual Question Answering

05/03/2015
by   Aishwarya Agrawal, et al.
0

We propose the task of free-form and open-ended Visual Question Answering (VQA). Given an image and a natural language question about the image, the task is to provide an accurate natural language answer. Mirroring real-world scenarios, such as helping the visually impaired, both the questions and answers are open-ended. Visual questions selectively target different areas of an image, including background details and underlying context. As a result, a system that succeeds at VQA typically needs a more detailed understanding of the image and complex reasoning than a system producing generic image captions. Moreover, VQA is amenable to automatic evaluation, since many open-ended answers contain only a few words or a closed set of answers that can be provided in a multiple-choice format. We provide a dataset containing 0.25M images, 0.76M questions, and 10M answers (www.visualqa.org), and discuss the information it provides. Numerous baselines and methods for VQA are provided and compared with human performance. Our VQA demo is available on CloudCV (http://cloudcv.org/vqa).

READ FULL TEXT

page 1

page 3

page 6

page 18

page 19

page 22

page 23

page 24

research
06/03/2018

On the Flip Side: Identifying Counterexamples in Visual Question Answering

Visual question answering (VQA) models respond to open-ended natural lan...
research
10/20/2016

Proposing Plausible Answers for Open-ended Visual Question Answering

Answering open-ended questions is an essential capability for any intell...
research
06/08/2023

Knowledge Detection by Relevant Question and Image Attributes in Visual Question Answering

Visual question answering (VQA) is a Multidisciplinary research problem ...
research
04/23/2020

Visual Question Answering Using Semantic Information from Image Descriptions

Visual question answering (VQA) is a task that requires AI systems to di...
research
10/24/2020

Beyond VQA: Generating Multi-word Answer and Rationale to Visual Questions

Visual Question Answering is a multi-modal task that aims to measure hig...
research
05/07/2023

OpenViVQA: Task, Dataset, and Multimodal Fusion Models for Visual Question Answering in Vietnamese

In recent years, visual question answering (VQA) has attracted attention...
research
09/21/2016

The Color of the Cat is Gray: 1 Million Full-Sentences Visual Question Answering (FSVQA)

Visual Question Answering (VQA) task has showcased a new stage of intera...

Please sign up or login with your details

Forgot password? Click here to reset