-
Explaining Black Box Predictions and Unveiling Data Artifacts through Influence Functions
Modern deep learning models for NLP are notoriously opaque. This has mot...
read it
-
Certifiably Robust Interpretation in Deep Learning
Although gradient-based saliency maps are popular methods for deep learn...
read it
-
Sparsity Emerges Naturally in Neural Language Models
Concerns about interpretability, computational resources, and principled...
read it
-
Rethinking Positive Aggregation and Propagation of Gradients in Gradient-based Saliency Methods
Saliency methods interpret the prediction of a neural network by showing...
read it
-
Is Attention Interpretable?
Attention mechanisms have recently boosted performance on a range of NLP...
read it
-
Gradient-based Taxis Algorithms for Network Robotics
Finding the physical location of a specific network node is a prototypic...
read it
-
AllenNLP Interpret: A Framework for Explaining Predictions of NLP Models
Neural NLP models are increasingly accurate but are imperfect and opaque...
read it
Gradient-based Analysis of NLP Models is Manipulable
Gradient-based analysis methods, such as saliency map visualizations and adversarial input perturbations, have found widespread use in interpreting neural NLP models due to their simplicity, flexibility, and most importantly, their faithfulness. In this paper, however, we demonstrate that the gradients of a model are easily manipulable, and thus bring into question the reliability of gradient-based analyses. In particular, we merge the layers of a target model with a Facade that overwhelms the gradients without affecting the predictions. This Facade can be trained to have gradients that are misleading and irrelevant to the task, such as focusing only on the stop words in the input. On a variety of NLP tasks (text classification, NLI, and QA), we show that our method can manipulate numerous gradient-based analysis techniques: saliency maps, input reduction, and adversarial perturbations all identify unimportant or targeted tokens as being highly important. The code and a tutorial of this paper is available at http://ucinlp.github.io/facade.
READ FULL TEXT
Comments
There are no comments yet.