Ask Your Neurons: A Deep Learning Approach to Visual Question Answering

05/09/2016
by   Mateusz Malinowski, et al.
0

We address a question answering task on real-world images that is set up as a Visual Turing Test. By combining latest advances in image representation and natural language processing, we propose Ask Your Neurons, a scalable, jointly trained, end-to-end formulation to this problem. In contrast to previous efforts, we are facing a multi-modal problem where the language output (answer) is conditioned on visual and natural language inputs (image and question). We provide additional insights into the problem by analyzing how much information is contained only in the language part for which we provide a new human baseline. To study human consensus, which is related to the ambiguities inherent in this challenging task, we propose two novel metrics and collect additional answers which extend the original DAQUAR dataset to DAQUAR-Consensus. Moreover, we also extend our analysis to VQA, a large-scale question answering about images dataset, where we investigate some particular design choices and show the importance of stronger visual models. At the same time, we achieve strong performance of our model that still uses a global image representation. Finally, based on such analysis, we refine our Ask Your Neurons on DAQUAR, which also leads to a better performance on this challenging task.

READ FULL TEXT

page 19

page 20

page 21

page 22

page 23

research
05/05/2015

Ask Your Neurons: A Neural-based Approach to Answering Questions about Images

We address a question answering task on real-world images that is set up...
research
04/11/2017

Show, Ask, Attend, and Answer: A Strong Baseline For Visual Question Answering

This paper presents a new baseline for visual question answering task. G...
research
02/12/2020

Component Analysis for Visual Question Answering Architectures

Recent research advances in Computer Vision and Natural Language Process...
research
10/29/2014

Towards a Visual Turing Challenge

As language and visual understanding by machines progresses rapidly, we ...
research
10/01/2014

A Multi-World Approach to Question Answering about Real-World Scenes based on Uncertain Input

We propose a method for automatically answering questions about images b...
research
01/25/2022

MGA-VQA: Multi-Granularity Alignment for Visual Question Answering

Learning to answer visual questions is a challenging task since the mult...
research
04/08/2021

PQA: Perceptual Question Answering

Perceptual organization remains one of the very few established theories...

Please sign up or login with your details

Forgot password? Click here to reset