Visual Question Answering Using Semantic Information from Image Descriptions

04/23/2020
by   Tasmia Tasrin, et al.
0

Visual question answering (VQA) is a task that requires AI systems to display multi-modal understanding. A system must be able to reason over the question being asked as well as the image itself to determine reasonable answers to the questions posed. In many cases, simply reasoning over the image itself and the question is not enough to achieve good performance. As an aid of the task, other than region based visual information and natural language questions, external textual knowledge extracted from images can also be used to generate correct answers for questions. Considering these, we propose a deep neural network model that uses an attention mechanism which utilizes image features, the natural language question asked and semantic knowledge extracted from the image to produce open-ended answers for the given questions. The combination of image features and contextual information about the image bolster a model to more accurately respond to questions and potentially do so with less required training data. We evaluate our proposed architecture on a VQA task against a strong baseline and show that our method achieves excellent results on this task.

READ FULL TEXT

page 1

page 7

page 8

research
05/03/2015

VQA: Visual Question Answering

We propose the task of free-form and open-ended Visual Question Answerin...
research
09/23/2018

Textually Enriched Neural Module Networks for Visual Question Answering

Problems at the intersection of language and vision, like visual questio...
research
04/26/2021

InfographicVQA

Infographics are documents designed to effectively communicate informati...
research
01/31/2020

Augmenting Visual Question Answering with Semantic Frame Information in a Multitask Learning Approach

Visual Question Answering (VQA) concerns providing answers to Natural La...
research
05/30/2023

Generate then Select: Open-ended Visual Question Answering Guided by World Knowledge

The open-ended Visual Question Answering (VQA) task requires AI models t...
research
06/13/2023

AVIS: Autonomous Visual Information Seeking with Large Language Models

In this paper, we propose an autonomous information seeking visual quest...
research
06/04/2021

Visual Question Rewriting for Increasing Response Rate

When a human asks questions online, or when a conversational virtual age...

Please sign up or login with your details

Forgot password? Click here to reset