Do Transformer Attention Heads Provide Transparency in Abstractive Summarization?

07/01/2019
by   Joris Baan, et al.
4

Learning algorithms become more powerful, often at the cost of increased complexity. In response, the demand for algorithms to be transparent is growing. In NLP tasks, attention distributions learned by attention-based deep learning models are used to gain insights in the models' behavior. To which extent is this perspective valid for all NLP tasks? We investigate whether distributions calculated by different attention heads in a transformer architecture can be used to improve transparency in the task of abstractive summarization. To this end, we present both a qualitative and quantitative analysis to investigate the behavior of the attention heads. We show that some attention heads indeed specialize towards syntactically and semantically distinct input. We propose an approach to evaluate to which extent the Transformer model relies on specifically learned attention distributions. We also discuss what this implies for using attention distributions as a means of transparency.

READ FULL TEXT
research
02/16/2022

The NLP Task Effectiveness of Long-Range Transformers

Transformer models cannot easily scale to long sequences due to their O(...
research
11/10/2019

Understanding Multi-Head Attention in Abstractive Summarization

Attention mechanisms in deep learning architectures have often been used...
research
05/25/2020

Deep Learning Models for Automatic Summarization

Text summarization is an NLP task which aims to convert a textual docume...
research
06/07/2019

Analyzing the Structure of Attention in a Transformer Language Model

The Transformer is a fully attention-based alternative to recurrent netw...
research
05/13/2022

A Study of the Attention Abnormality in Trojaned BERTs

Trojan attacks raise serious security concerns. In this paper, we invest...
research
03/26/2021

Dodrio: Exploring Transformer Models with Interactive Visualization

Why do large pre-trained transformer-based models perform so well across...
research
05/08/2020

Attentional Bottleneck: Towards an Interpretable Deep Driving Network

Deep neural networks are a key component of behavior prediction and moti...

Please sign up or login with your details

Forgot password? Click here to reset