Visual Question Answering as a Multi-Task Problem

07/03/2020
by   Amelia Elizabeth Pollard, et al.
0

Visual Question Answering(VQA) is a highly complex problem set, relying on many sub-problems to produce reasonable answers. In this paper, we present the hypothesis that Visual Question Answering should be viewed as a multi-task problem, and provide evidence to support this hypothesis. We demonstrate this by reformatting two commonly used Visual Question Answering datasets, COCO-QA and DAQUAR, into a multi-task format and train these reformatted datasets on two baseline networks, with one designed specifically to eliminate other possible causes for performance changes as a result of the reformatting. Though the networks demonstrated in this paper do not achieve strongly competitive results, we find that the multi-task approach to Visual Question Answering results in increases in performance of 5-9 and that the networks reach convergence much faster than in the single-task case. Finally we discuss possible reasons for the observed difference in performance, and perform additional experiments which rule out causes not associated with the learning of the dataset as a multi-task problem.

READ FULL TEXT

page 5

page 10

research
07/03/2020

Eliminating Catastrophic Interference with Biased Competition

We present here a model to take advantage of the multi-task nature of co...
research
05/02/2022

Answer-Me: Multi-Task Open-Vocabulary Visual Question Answering

We present Answer-Me, a task-aware multi-task framework which unifies a ...
research
09/21/2016

The Color of the Cat is Gray: 1 Million Full-Sentences Visual Question Answering (FSVQA)

Visual Question Answering (VQA) task has showcased a new stage of intera...
research
10/17/2022

PACIFIC: Towards Proactive Conversational Question Answering over Tabular and Textual Data in Finance

To facilitate conversational question answering (CQA) over hybrid contex...
research
05/28/2019

Leveraging Medical Visual Question Answering with Supporting Facts

In this working notes paper, we describe IBM Research AI (Almaden) team'...
research
04/29/2019

Routing Networks and the Challenges of Modular and Compositional Computation

Compositionality is a key strategy for addressing combinatorial complexi...
research
11/19/2020

Logically Consistent Loss for Visual Question Answering

Given an image, a back-ground knowledge, and a set of questions about an...

Please sign up or login with your details

Forgot password? Click here to reset