Causal Attention for Vision-Language Tasks

03/05/2021
by   Xu Yang, et al.
0

We present a novel attention mechanism: Causal Attention (CATT), to remove the ever-elusive confounding effect in existing attention-based vision-language models. This effect causes harmful bias that misleads the attention module to focus on the spurious correlations in training data, damaging the model generalization. As the confounder is unobserved in general, we use the front-door adjustment to realize the causal intervention, which does not require any knowledge on the confounder. Specifically, CATT is implemented as a combination of 1) In-Sample Attention (IS-ATT) and 2) Cross-Sample Attention (CS-ATT), where the latter forcibly brings other samples into every IS-ATT, mimicking the causal intervention. CATT abides by the Q-K-V convention and hence can replace any attention module such as top-down attention and self-attention in Transformers. CATT improves various popular attention-based vision-language models by considerable margins. In particular, we show that CATT has great potential in large-scale pre-training, e.g., it can promote the lighter LXMERT <cit.>, which uses fewer data and less computational power, comparable to the heavier UNITER <cit.>. Code is published in <https://github.com/yangxuntu/catt>.

READ FULL TEXT

page 7

page 12

research
08/19/2021

Causal Attention for Unbiased Visual Recognition

Attention module does not always help deep models learn causal features ...
research
08/03/2021

Armour: Generalizable Compact Self-Attention for Vision Transformers

Attention-based transformer networks have demonstrated promising potenti...
research
12/08/2021

MLP Architectures for Vision-and-Language Modeling: An Empirical Study

We initiate the first empirical study on the use of MLP architectures fo...
research
06/24/2021

VOLO: Vision Outlooker for Visual Recognition

Visual recognition has been dominated by convolutional neural networks (...
research
07/26/2022

V^2L: Leveraging Vision and Vision-language Models into Large-scale Product Retrieval

Product retrieval is of great importance in the ecommerce domain. This p...
research
05/03/2023

Causality-aware Concept Extraction based on Knowledge-guided Prompting

Concepts benefit natural language understanding but are far from complet...
research
10/11/2022

LARF: Two-level Attention-based Random Forests with a Mixture of Contamination Models

New models of the attention-based random forests called LARF (Leaf Atten...

Please sign up or login with your details

Forgot password? Click here to reset