Show, Ask, Attend, and Answer: A Strong Baseline For Visual Question Answering

04/11/2017
by   Vahid Kazemi, et al.
0

This paper presents a new baseline for visual question answering task. Given an image and a question in natural language, our model produces accurate answers according to the content of the image. Our model, while being architecturally simple and relatively small in terms of trainable parameters, sets a new state of the art on both unbalanced and balanced VQA benchmark. On VQA 1.0 open ended challenge, our model achieves 64.6 test-standard set without using additional data, an improvement of 0.4 state of the art, and on newly released VQA 2.0, our model scores 59.7 validation set outperforming best previously reported results by 0.5 results presented in this paper are especially interesting because very similar models have been tried before but significantly lower performance were reported. In light of the new results we hope to see more meaningful research on visual question answering in the future.

READ FULL TEXT

page 1

page 3

page 6

research
03/01/2019

Answer Them All! Toward Universal Visual Question Answering Models

Visual Question Answering (VQA) research is split into two camps: the fi...
research
09/24/2017

Survey of Recent Advances in Visual Question Answering

Visual Question Answering (VQA) presents a unique challenge as it requir...
research
05/09/2016

Ask Your Neurons: A Deep Learning Approach to Visual Question Answering

We address a question answering task on real-world images that is set up...
research
08/09/2017

Tips and Tricks for Visual Question Answering: Learnings from the 2017 Challenge

This paper presents a state-of-the-art model for visual question answeri...
research
06/08/2018

CS-VQA: Visual Question Answering with Compressively Sensed Images

Visual Question Answering (VQA) is a complex semantic task requiring bot...
research
06/04/2021

Human-Adversarial Visual Question Answering

Performance on the most commonly used Visual Question Answering dataset ...
research
06/20/2016

DualNet: Domain-Invariant Network for Visual Question Answering

Visual question answering (VQA) task not only bridges the gap between im...

Please sign up or login with your details

Forgot password? Click here to reset