Training Recurrent Answering Units with Joint Loss Minimization for VQA

06/12/2016
by   Hyeonwoo Noh, et al.
0

We propose a novel algorithm for visual question answering based on a recurrent deep neural network, where every module in the network corresponds to a complete answering unit with attention mechanism by itself. The network is optimized by minimizing loss aggregated from all the units, which share model parameters while receiving different information to compute attention probability. For training, our model attends to a region within image feature map, updates its memory based on the question and attended image feature, and answers the question based on its memory state. This procedure is performed to compute loss in each step. The motivation of this approach is our observation that multi-step inferences are often required to answer questions while each problem may have a unique desirable number of steps, which is difficult to identify in practice. Hence, we always make the first unit in the network solve problems, but allow it to learn the knowledge from the rest of units by backpropagation unless it degrades the model. To implement this idea, we early-stop training each unit as soon as it starts to overfit. Note that, since more complex models tend to overfit on easier questions quickly, the last answering unit in the unfolded recurrent neural network is typically killed first while the first one remains last. We make a single-step prediction for a new question using the shared model. This strategy works better than the other options within our framework since the selected model is trained effectively from all units without overfitting. The proposed algorithm outperforms other multi-step attention based approaches using a single step prediction in VQA dataset.

READ FULL TEXT

page 6

page 10

research
03/28/2017

An Analysis of Visual Question Answering Algorithms

In visual question answering (VQA), an algorithm must answer text-based ...
research
02/01/2018

Dual Recurrent Attention Units for Visual Question Answering

We propose an architecture for VQA which utilizes recurrent layers to ge...
research
11/17/2015

Ask, Attend and Answer: Exploring Question-Guided Spatial Attention for Visual Question Answering

We address the problem of Visual Question Answering (VQA), which require...
research
07/23/2018

Question Relevance in Visual Question Answering

Free-form and open-ended Visual Question Answering systems solve the pro...
research
04/03/2018

Improved Fusion of Visual and Language Representations by Dense Symmetric Co-Attention for Visual Question Answering

A key solution to visual question answering (VQA) exists in how to fuse ...
research
02/22/2017

Task-driven Visual Saliency and Attention-based Visual Question Answering

Visual question answering (VQA) has witnessed great progress since May, ...

Please sign up or login with your details

Forgot password? Click here to reset