Relabel Minimal Training Subset to Flip a Prediction

05/22/2023
by   Jinghan Yang, et al.
0

Yang et al. (2023) discovered that removing a mere 1 often lead to the flipping of a prediction. Given the prevalence of noisy data in machine learning models, we pose the question: can we also result in the flipping of a test prediction by relabeling a small subset of the training data before the model is trained? In this paper, utilizing the extended influence function, we propose an efficient procedure for identifying and relabeling such a subset, demonstrating consistent success. This mechanism serves multiple purposes: (1) providing a complementary approach to challenge model predictions by recovering potentially mislabeled training points; (2) evaluating model resilience, as our research uncovers a significant relationship between the subset's size and the ratio of noisy data in the training set; and (3) offering insights into bias within the training set. To the best of our knowledge, this work represents the first investigation into the problem of identifying and relabeling the minimal training subset required to flip a given prediction.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/04/2023

How Many and Which Training Points Would Need to be Removed to Flip this Prediction?

We consider the problem of identifying a minimal subset of training data...
research
06/23/2021

Training Data Subset Selection for Regression with Controlled Generalization Error

Data subset selection from a large number of training instances has been...
research
03/14/2017

Understanding Black-box Predictions via Influence Functions

How can we explain the predictions of a black-box model? In this paper, ...
research
10/27/2022

Outlier-Aware Training for Improving Group Accuracy Disparities

Methods addressing spurious correlations such as Just Train Twice (JTT, ...
research
05/19/2022

Dataset Pruning: Reducing Training Data by Examining Generalization Influence

The great success of deep learning heavily relies on increasingly larger...
research
05/26/2021

The Impact of Dormant Defects on Defect Prediction: a Study of 19 Apache Projects

Defect prediction models can be beneficial to prioritize testing, analys...
research
12/03/2019

Less Is Better: Unweighted Data Subsampling via Influence Function

In the time of Big Data, training complex models on large-scale data set...

Please sign up or login with your details

Forgot password? Click here to reset