Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers

03/29/2021
by   Hila Chefer, et al.
16

Transformers are increasingly dominating multi-modal reasoning tasks, such as visual question answering, achieving state-of-the-art results thanks to their ability to contextualize information using the self-attention and co-attention mechanisms. These attention modules also play a role in other computer vision tasks including object detection and image segmentation. Unlike Transformers that only use self-attention, Transformers with co-attention require to consider multiple attention maps in parallel in order to highlight the information that is relevant to the prediction in the model's input. In this work, we propose the first method to explain prediction by any Transformer-based architecture, including bi-modal Transformers and Transformers with co-attentions. We provide generic solutions and apply these to the three most commonly used of these architectures: (i) pure self-attention, (ii) self-attention combined with co-attention, and (iii) encoder-decoder attention. We show that our method is superior to all existing methods which are adapted from single modality explainability.

READ FULL TEXT

page 1

page 3

page 4

page 5

page 8

page 9

page 10

page 14

research
07/07/2022

Vision Transformers: State of the Art and Research Challenges

Transformers have achieved great success in natural language processing....
research
08/20/2023

Generic Attention-model Explainability by Weighted Relevance Accumulation

Attention-based transformer models have achieved remarkable progress in ...
research
04/23/2022

Grad-SAM: Explaining Transformers via Gradient Self-Attention Maps

Transformer-based language models significantly advanced the state-of-th...
research
12/17/2020

Transformer Interpretability Beyond Attention Visualization

Self-attention techniques, and specifically Transformers, are dominating...
research
07/21/2022

Focused Decoding Enables 3D Anatomical Detection by Transformers

Detection Transformers represent end-to-end object detection approaches ...
research
06/01/2022

Fair Comparison between Efficient Attentions

Transformers have been successfully used in various fields and are becom...
research
03/02/2023

Self-attention in Vision Transformers Performs Perceptual Grouping, Not Attention

Recently, a considerable number of studies in computer vision involves d...

Please sign up or login with your details

Forgot password? Click here to reset