Understanding Influence Functions and Datamodels via Harmonic Analysis

10/03/2022
by   Nikunj Saunshi, et al.
8

Influence functions estimate effect of individual data points on predictions of the model on test data and were adapted to deep learning in Koh and Liang [2017]. They have been used for detecting data poisoning, detecting helpful and harmful examples, influence of groups of datapoints, etc. Recently, Ilyas et al. [2022] introduced a linear regression method they termed datamodels to predict the effect of training points on outputs on test data. The current paper seeks to provide a better theoretical understanding of such interesting empirical phenomena. The primary tool is harmonic analysis and the idea of noise stability. Contributions include: (a) Exact characterization of the learnt datamodel in terms of Fourier coefficients. (b) An efficient method to estimate the residual error and quality of the optimum linear datamodel without having to train the datamodel. (c) New insights into when influences of groups of datapoints may or may not add up linearly.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/30/2019

On the Accuracy of Influence Functions for Measuring Group Effects

Influence functions estimate the effect of removing particular training ...
research
03/21/2019

Harmless interpolation of noisy data in regression

A continuing mystery in understanding the empirical success of deep neur...
research
05/02/2023

Class based Influence Functions for Error Detection

Influence functions (IFs) are a powerful tool for detecting anomalous ex...
research
08/30/2019

Rewarding High-Quality Data via Influence Functions

We consider a crowdsourcing data acquisition scenario, such as federated...
research
02/04/2021

HYDRA: Hypergradient Data Relevance Analysis for Interpreting Deep Neural Networks

The behaviors of deep neural networks (DNNs) are notoriously resistant t...
research
06/20/2023

A Model-free Closeness-of-influence Test for Features in Supervised Learning

Understanding the effect of a feature vector x ∈ℝ^d on the response valu...
research
05/26/2023

Generalization Error without Independence: Denoising, Linear Regression, and Transfer Learning

Studying the generalization abilities of linear models with real data is...

Please sign up or login with your details

Forgot password? Click here to reset