Improving Visual Grounding by Encouraging Consistent Gradient-based Explanations

06/30/2022
by   Ziyan Yang, et al.
0

We propose a margin-based loss for vision-language model pretraining that encourages gradient-based explanations that are consistent with region-level annotations. We refer to this objective as Attention Mask Consistency (AMC) and demonstrate that it produces superior visual grounding performance compared to models that rely instead on region-level annotations for explicitly training an object detector such as Faster R-CNN. AMC works by encouraging gradient-based explanation masks that focus their attention scores mostly within annotated regions of interest for images that contain such annotations. Particularly, a model trained with AMC on top of standard vision-language modeling objectives obtains a state-of-the-art accuracy of 86.59 benchmark, an absolute improvement of 5.48 model. Our approach also performs exceedingly well on established benchmarks for referring expression comprehension and offers the added benefit by design of gradient-based explanations that better align with human annotations.

READ FULL TEXT

page 2

page 8

page 15

page 16

research
04/13/2023

ODAM: Gradient-based instance-specific visual explanations for object detection

We propose the gradient-weighted Object Detector Activation Maps (ODAM),...
research
02/11/2019

Taking a HINT: Leveraging Explanations to Make Vision and Language Models More Grounded

Many vision and language models suffer from poor visual grounding - ofte...
research
08/01/2018

Interpretable Visual Question Answering by Visual Grounding from Attention Supervision Mining

A key aspect of VQA models that are interpretable is their ability to gr...
research
09/11/2023

Distance-Aware eXplanation Based Learning

eXplanation Based Learning (XBL) is an interactive learning approach tha...
research
09/11/2020

AttnGrounder: Talking to Cars with Attention

We propose Attention Grounder (AttnGrounder), a single-stage end-to-end ...
research
02/21/2023

Tell Model Where to Attend: Improving Interpretability of Aspect-Based Sentiment Classification via Small Explanation Annotations

Gradient-based explanation methods play an important role in the field o...
research
01/19/2018

Evaluating neural network explanation methods using hybrid documents and morphological prediction

We propose two novel paradigms for evaluating neural network explanation...

Please sign up or login with your details

Forgot password? Click here to reset