Revisiting Methods for Finding Influential Examples

11/08/2021
by   Karthikeyan K, et al.
0

Several instance-based explainability methods for finding influential training examples for test-time decisions have been proposed recently, including Influence Functions, TraceIn, Representer Point Selection, Grad-Dot, and Grad-Cos. Typically these methods are evaluated using LOO influence (Cook's distance) as a gold standard, or using various heuristics. In this paper, we show that all of the above methods are unstable, i.e., extremely sensitive to initialization, ordering of the training data, and batch size. We suggest that this is a natural consequence of how in the literature, the influence of examples is assumed to be independent of model state and other examples – and argue it is not. We show that LOO influence and heuristics are, as a result, poor metrics to measure the quality of instance-based explanations, and instead propose to evaluate such explanations by their ability to detect poisoning attacks. Further, we provide a simple, yet effective baseline to improve all of the above methods and show how it leads to very significant improvements on downstream tasks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/24/2022

Analyzing the Use of Influence Functions for Instance-Specific Data Filtering in Neural Machine Translation

Customer feedback can be an important signal for improving commercial ma...
research
03/25/2020

RelatIF: Identifying Explanatory Training Examples via Relative Influence

In this work, we focus on the use of influence functions to identify rel...
research
11/10/2020

Debugging Tests for Model Explanations

We investigate whether post-hoc model explanations are effective for dia...
research
08/21/2022

Inferring Sensitive Attributes from Model Explanations

Model explanations provide transparency into a trained machine learning ...
research
01/25/2022

Identifying a Training-Set Attack's Target Using Renormalized Influence Estimation

Targeted training-set attacks inject malicious instances into the traini...
research
06/07/2021

Interactive Label Cleaning with Example-based Explanations

We tackle sequential learning under label noise in applications where a ...
research
11/29/2021

A General Framework for Defending Against Backdoor Attacks via Influence Graph

In this work, we propose a new and general framework to defend against b...

Please sign up or login with your details

Forgot password? Click here to reset