Learning to Deceive with Attention-Based Explanations

09/17/2019
by   Danish Pruthi, et al.
0

Attention mechanisms are ubiquitous components in neural architectures applied in natural language processing. In addition to yielding gains in predictive accuracy, researchers often claim that attention weights confer interpretability, purportedly useful both for providing insights to practitioners and for explaining why a model makes its decisions to stakeholders. We call the latter use of attention mechanisms into question, demonstrating a simple method for training models to produce deceptive attention masks, diminishing the total weight assigned to designated impermissible tokens, even as the models are shown to nevertheless rely on these features to drive predictions. Across multiple models and datasets, our approach manipulates attention weights while paying surprisingly little cost in accuracy. Although our results do not rule out potential insights due to organically-trained attention, they cast doubt on attention's reliability as a tool for auditing algorithms, as in the context of fairness and accountability.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/06/2021

Improving the Faithfulness of Attention-based Explanations with Task-specific Information for Text Classification

Neural network architectures in natural language processing often use at...
research
01/26/2022

Attention cannot be an Explanation

Attention based explanations (viz. saliency maps), by providing interpre...
research
06/09/2019

Is Attention Interpretable?

Attention mechanisms have recently boosted performance on a range of NLP...
research
11/10/2019

Location Attention for Extrapolation to Longer Sequences

Neural networks are surprisingly good at interpolating and perform remar...
research
07/26/2022

Is Attention Interpretation? A Quantitative Assessment On Sets

The debate around the interpretability of attention mechanisms is center...
research
12/20/2016

Exploring Different Dimensions of Attention for Uncertainty Detection

Neural networks with attention have proven effective for many natural la...
research
06/02/2017

Latent Attention Networks

Deep neural networks are able to solve tasks across a variety of domains...

Please sign up or login with your details

Forgot password? Click here to reset