An Empirical Comparison of Instance Attribution Methods for NLP

04/09/2021
by   Pouya Pezeshkpour, et al.
0

Widespread adoption of deep models has motivated a pressing need for approaches to interpret network outputs and to facilitate model debugging. Instance attribution methods constitute one means of accomplishing these goals by retrieving training instances that (may have) led to a particular prediction. Influence functions (IF; Koh and Liang 2017) provide machinery for doing this by quantifying the effect that perturbing individual train instances would have on a specific test prediction. However, even approximating the IF is computationally expensive, to the degree that may be prohibitive in many cases. Might simpler approaches (e.g., retrieving train examples most similar to a given test point) perform comparably? In this work, we evaluate the degree to which different potential instance attribution agree with respect to the importance of training samples. We find that simple retrieval methods yield training instances that differ from those identified via gradient-based methods (such as IFs), but that nonetheless exhibit desirable characteristics similar to more complex attribution methods. Code for all methods and experiments in this paper is available at: https://github.com/successar/instance_attributions_NLP.

READ FULL TEXT

page 3

page 8

research
07/01/2021

Combining Feature and Instance Attribution to Detect Artifacts

Training the large deep neural networks that dominate NLP requires large...
research
03/23/2022

An Empirical Study of Memorization in NLP

A recent study by Feldman (2020) proposed a long-tail theory to explain ...
research
12/27/2020

Inserting Information Bottlenecks for Attribution in Transformers

Pretrained transformers achieve the state of the art across tasks in nat...
research
03/29/2021

Efficient Explanations from Empirical Explainers

Amid a discussion about Green AI in which we see explainability neglecte...
research
05/31/2023

A Bayesian Perspective On Training Data Attribution

Training data attribution (TDA) techniques find influential training dat...
research
11/08/2017

Picasso, Matisse, or a Fake? Automated Analysis of Drawings at the Stroke Level for Attribution and Authentication

This paper proposes a computational approach for analysis of strokes in ...
research
08/23/2021

Longitudinal Distance: Towards Accountable Instance Attribution

Previous research in interpretable machine learning (IML) and explainabl...

Please sign up or login with your details

Forgot password? Click here to reset