Recurrent and Contextual Models for Visual Question Answering

03/23/2017
by   Abhijit Sharang, et al.
0

We propose a series of recurrent and contextual neural network models for multiple choice visual question answering on the Visual7W dataset. Motivated by divergent trends in model complexities in the literature, we explore the balance between model expressiveness and simplicity by studying incrementally more complex architectures. We start with LSTM-encoding of input questions and answers; build on this with context generation by LSTM-encodings of neural image and question representations and attention over images; and evaluate the diversity and predictive power of our models and the ensemble thereof. All models are evaluated against a simple baseline inspired by the current state-of-the-art, consisting of involving simple concatenation of bag-of-words and CNN representations for the text and images, respectively. Generally, we observe marked variation in image-reasoning performance between our models not obvious from their overall performance, as well as evidence of dataset bias. Our standalone models achieve accuracies up to 64.6%, while the ensemble of all models achieves the best accuracy of 66.67%, within 0.5% of the current state-of-the-art for Visual7W.

READ FULL TEXT

page 6

page 7

page 8

research
12/07/2015

Simple Baseline for Visual Question Answering

We describe a very simple bag-of-words baseline for visual question answ...
research
09/19/2016

Graph-Structured Representations for Visual Question Answering

This paper proposes to improve visual question answering (VQA) with stru...
research
06/05/2018

Focal Visual-Text Attention for Visual Question Answering

Recent insights on language and vision with neural networks have been su...
research
02/01/2018

Dual Recurrent Attention Units for Visual Question Answering

We propose an architecture for VQA which utilizes recurrent layers to ge...
research
12/10/2015

Neural Self Talk: Image Understanding via Continuous Questioning and Answering

In this paper we consider the problem of continuously discovering image ...
research
07/20/2016

Neural Contextual Conversation Learning with Labeled Question-Answering Pairs

Neural conversational models tend to produce generic or safe responses i...
research
10/22/2018

A Fully Attention-Based Information Retriever

Recurrent neural networks are now the state-of-the-art in natural langua...

Please sign up or login with your details

Forgot password? Click here to reset