Tips and Tricks for Visual Question Answering: Learnings from the 2017 Challenge

08/09/2017
by   Damien Teney, et al.
0

This paper presents a state-of-the-art model for visual question answering (VQA), which won the first place in the 2017 VQA Challenge. VQA is a task of significant importance for research in artificial intelligence, given its multimodal nature, clear evaluation protocol, and potential real-world applications. The performance of deep neural networks for VQA is very dependent on choices of architectures and hyperparameters. To help further research in the area, we describe in detail our high-performing, though relatively simple model. Through a massive exploration of architectures and hyperparameters representing more than 3,000 GPU-hours, we identified tips and tricks that lead to its success, namely: sigmoid outputs, soft training targets, image features from bottom-up attention, gated tanh activations, output embeddings initialized using GloVe and Google Images, large mini-batches, and smart shuffling of training data. We provide a detailed analysis of their impact on performance to assist others in making an appropriate selection.

READ FULL TEXT

page 1

page 7

page 10

research
03/21/2018

Attention on Attention: Architectures for Visual Question Answering (VQA)

Visual Question Answering (VQA) is an increasingly popular topic in deep...
research
04/11/2017

Show, Ask, Attend, and Answer: A Strong Baseline For Visual Question Answering

This paper presents a new baseline for visual question answering task. G...
research
09/27/2021

VQA-MHUG: A Gaze Dataset to Study Multimodal Neural Attention in Visual Question Answering

We present VQA-MHUG - a novel 49-participant dataset of multimodal human...
research
06/08/2018

CS-VQA: Visual Question Answering with Compressively Sensed Images

Visual Question Answering (VQA) is a complex semantic task requiring bot...
research
07/08/2017

Effective Approaches to Batch Parallelization for Dynamic Neural Network Architectures

We present a simple dynamic batching approach applicable to a large clas...
research
03/04/2021

Visual Question Answering: which investigated applications?

Visual Question Answering (VQA) is an extremely stimulating and challeng...
research
01/23/2023

HRVQA: A Visual Question Answering Benchmark for High-Resolution Aerial Images

Visual question answering (VQA) is an important and challenging multimod...

Please sign up or login with your details

Forgot password? Click here to reset