Sentence Attention Blocks for Answer Grounding

09/20/2023
by   Seyedalireza Khoshsirat, et al.
0

Answer grounding is the task of locating relevant visual evidence for the Visual Question Answering task. While a wide variety of attention methods have been introduced for this task, they suffer from the following three problems: designs that do not allow the usage of pre-trained networks and do not benefit from large data pre-training, custom designs that are not based on well-grounded previous designs, therefore limiting the learning power of the network, or complicated designs that make it challenging to re-implement or improve them. In this paper, we propose a novel architectural block, which we term Sentence Attention Block, to solve these problems. The proposed block re-calibrates channel-wise image feature-maps by explicitly modeling inter-dependencies between the image feature-maps and sentence embedding. We visually demonstrate how this block filters out irrelevant feature-maps channels based on sentence embedding. We start our design with a well-known attention method, and by making minor modifications, we improve the results to achieve state-of-the-art accuracy. The flexibility of our method makes it easy to use different pre-trained backbone networks, and its simplicity makes it easy to understand and be re-implemented. We demonstrate the effectiveness of our method on the TextVQA-X, VQS, VQA-X, and VizWiz-VQA-Grounding datasets. We perform multiple ablation studies to show the effectiveness of our design choices.

READ FULL TEXT
research
08/21/2023

VQA Therapy: Exploring Answer Differences by Visually Grounding Answers

Visual question answering is a task of predicting the answer to a questi...
research
03/29/2022

Shifting More Attention to Visual Backbone: Query-modulated Refinement Networks for End-to-End Visual Grounding

Visual grounding focuses on establishing fine-grained alignment between ...
research
02/03/2021

Answer Questions with Right Image Regions: A Visual Attention Regularization Approach

Visual attention in Visual Question Answering (VQA) targets at locating ...
research
06/28/2021

Adventurer's Treasure Hunt: A Transparent System for Visually Grounded Compositional Visual Question Answering based on Scene Graphs

With the expressed goal of improving system transparency and visual grou...
research
05/24/2023

Measuring Faithful and Plausible Visual Grounding in VQA

Metrics for Visual Grounding (VG) in Visual Question Answering (VQA) sys...
research
05/25/2022

Guiding Visual Question Answering with Attention Priors

The current success of modern visual reasoning systems is arguably attri...
research
09/13/2020

Cosine meets Softmax: A tough-to-beat baseline for visual grounding

In this paper, we present a simple baseline for visual grounding for aut...

Please sign up or login with your details

Forgot password? Click here to reset