Human Interpretation and Exploitation of Self-attention Patterns in Transformers: A Case Study in Extractive Summarization

12/10/2021
by   Raymond Li, et al.
0

The transformer multi-head self-attention mechanism has been thoroughly investigated recently. On one hand, researchers are interested in understanding why and how transformers work. On the other hand, they propose new attention augmentation methods to make transformers more accurate, efficient and interpretable. In this paper, we synergize these two lines of research in a human-in-the-loop pipeline to first find important task-specific attention patterns. Then those patterns are applied, not only to the original model, but also to smaller models, as a human-guided knowledge distillation process. The benefits of our pipeline are demonstrated in a case study with the extractive summarization task. After finding three meaningful attention patterns in the popular BERTSum model, experiments indicate that when we inject such patterns, both the original and the smaller model show improvements in performance and arguably interpretability.

READ FULL TEXT
research
12/03/2020

Do We Really Need That Many Parameters In Transformer For Extractive Summarization? Discourse Can Help !

The multi-head self-attention of popular transformer models is widely us...
research
12/31/2020

MiniLMv2: Multi-Head Self-Attention Relation Distillation for Compressing Pretrained Transformers

We generalize deep self-attention distillation in MiniLM (Wang et al., 2...
research
08/03/2021

Armour: Generalizable Compact Self-Attention for Vision Transformers

Attention-based transformer networks have demonstrated promising potenti...
research
04/27/2022

DearKD: Data-Efficient Early Knowledge Distillation for Vision Transformers

Transformers are successfully applied to computer vision due to their po...
research
05/27/2022

Transformers from an Optimization Perspective

Deep learning models such as the Transformer are often constructed by he...
research
03/27/2023

Evaluating self-attention interpretability through human-grounded experimental protocol

Attention mechanisms have played a crucial role in the development of co...
research
11/03/2021

PhyloTransformer: A Discriminative Model for Mutation Prediction Based on a Multi-head Self-attention Mechanism

Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has caused ...

Please sign up or login with your details

Forgot password? Click here to reset