Understanding Black-box Predictions via Influence Functions

03/14/2017
by   Pang Wei Koh, et al.
0

How can we explain the predictions of a black-box model? In this paper, we use influence functions -- a classic technique from robust statistics -- to trace a model's prediction through the learning algorithm and back to its training data, thereby identifying training points most responsible for a given prediction. To scale up influence functions to modern machine learning settings, we develop a simple, efficient implementation that requires only oracle access to gradients and Hessian-vector products. We show that even on non-convex and non-differentiable models where the theory breaks down, approximations to influence functions can still provide valuable information. On linear models and convolutional neural networks, we demonstrate that influence functions are useful for multiple purposes: understanding model behavior, debugging models, detecting dataset errors, and even creating visually-indistinguishable training-set attacks.

READ FULL TEXT

page 3

page 6

page 7

research
11/01/2019

Second-Order Group Influence Functions for Black-Box Predictions

With the rapid adoption of machine learning systems in sensitive applica...
research
11/23/2018

Representer Point Selection for Explaining Deep Neural Networks

We propose to explain the predictions of a deep neural network, by point...
research
10/12/2020

Explaining Neural Matrix Factorization with Gradient Rollback

Explaining the predictions of neural black-box models is an important pr...
research
10/23/2018

Interpreting Black Box Predictions using Fisher Kernels

Research in both machine learning and psychology suggests that salient e...
research
06/25/2020

Influence Functions in Deep Learning Are Fragile

Influence functions approximate the effect of training samples in test-t...
research
08/07/2023

Studying Large Language Model Generalization with Influence Functions

When trying to gain better visibility into a machine learning model in o...
research
05/22/2023

Relabel Minimal Training Subset to Flip a Prediction

Yang et al. (2023) discovered that removing a mere 1 often lead to the f...

Please sign up or login with your details

Forgot password? Click here to reset