Augmenting Visual Question Answering with Semantic Frame Information in a Multitask Learning Approach

01/31/2020
by   Mehrdad Alizadeh, et al.
0

Visual Question Answering (VQA) concerns providing answers to Natural Language questions about images. Several deep neural network approaches have been proposed to model the task in an end-to-end fashion. Whereas the task is grounded in visual processing, if the question focuses on events described by verbs, the language understanding component becomes crucial. Our hypothesis is that models should be aware of verb semantics, as expressed via semantic role labels, argument types, and/or frame elements. Unfortunately, no VQA dataset exists that includes verb semantic information. Our first contribution is a new VQA dataset (imSituVQA) that we built by taking advantage of the imSitu annotations. The imSitu dataset consists of images manually labeled with semantic frame elements, mostly taken from FrameNet. Second, we propose a multitask CNN-LSTM VQA model that learns to classify the answers as well as the semantic frame elements. Our experiments show that semantic frame element classification helps the VQA system avoid inconsistent responses and improves performance.

READ FULL TEXT

page 2

page 4

page 6

research
10/20/2016

Proposing Plausible Answers for Open-ended Visual Question Answering

Answering open-ended questions is an essential capability for any intell...
research
04/23/2020

Visual Question Answering Using Semantic Information from Image Descriptions

Visual question answering (VQA) is a task that requires AI systems to di...
research
01/25/2023

Towards a Unified Model for Generating Answers and Explanations in Visual Question Answering

Providing explanations for visual question answering (VQA) has gained mu...
research
08/15/2017

VQS: Linking Segmentations to Questions and Answers for Supervised Attention in VQA and Question-Focused Semantic Segmentation

Rich and dense human labeled datasets are among the main enabling factor...
research
03/04/2021

Visual Question Answering: which investigated applications?

Visual Question Answering (VQA) is an extremely stimulating and challeng...
research
09/12/2018

The Wisdom of MaSSeS: Majority, Subjectivity, and Semantic Similarity in the Evaluation of VQA

We introduce MASSES, a simple evaluation metric for the task of Visual Q...
research
06/10/2020

Estimating semantic structure for the VQA answer space

Since its appearance, Visual Question Answering (VQA, i.e. answering a q...

Please sign up or login with your details

Forgot password? Click here to reset