Component Analysis for Visual Question Answering Architectures

02/12/2020
by   Camila Kolling, et al.
0

Recent research advances in Computer Vision and Natural Language Processing have introduced novel tasks that are paving the way for solving AI-complete problems. One of those tasks is called Visual Question Answering (VQA). A VQA system must take an image and a free-form, open-ended natural language question about the image, and produce a natural language answer as the output. Such a task has drawn great attention from the scientific community, which generated a plethora of approaches that aim to improve the VQA predictive accuracy. Most of them comprise three major components: (i) independent representation learning of images and questions; (ii) feature fusion so the model can use information from both sources to answer visual questions; and (iii) the generation of the correct answer in natural language. With so many approaches being recently introduced, it became unclear the real contribution of each component for the ultimate performance of the model. The main goal of this paper is to provide a comprehensive analysis regarding the impact of each component in VQA models. Our extensive set of experiments cover both visual and textual elements, as well as the combination of these representations in form of fusion and attention mechanisms. Our major contribution is to identify core components for training VQA models so as to maximize their predictive performance.

READ FULL TEXT
research
05/02/2021

A survey on VQA_Datasets and Approaches

Visual question answering (VQA) is a task that combines both the techniq...
research
10/04/2016

Tutorial on Answering Questions about Images with Deep Learning

Together with the development of more accurate methods in Computer Visio...
research
04/02/2017

Aligned Image-Word Representations Improve Inductive Transfer Across Vision-Language Tasks

An important goal of computer vision is to build systems that learn visu...
research
01/10/2022

COIN: Counterfactual Image Generation for VQA Interpretation

Due to the significant advancement of Natural Language Processing and Co...
research
03/04/2021

Visual Question Answering: which investigated applications?

Visual Question Answering (VQA) is an extremely stimulating and challeng...
research
05/09/2016

Ask Your Neurons: A Deep Learning Approach to Visual Question Answering

We address a question answering task on real-world images that is set up...
research
11/17/2021

Achieving Human Parity on Visual Question Answering

The Visual Question Answering (VQA) task utilizes both visual image and ...

Please sign up or login with your details

Forgot password? Click here to reset