DIVINE: Diverse Influential Training Points for Data Visualization and Model Refinement

07/13/2021
by   Umang Bhatt, et al.
0

As the complexity of machine learning (ML) models increases, resulting in a lack of prediction explainability, several methods have been developed to explain a model's behavior in terms of the training data points that most influence the model. However, these methods tend to mark outliers as highly influential points, limiting the insights that practitioners can draw from points that are not representative of the training data. In this work, we take a step towards finding influential training points that also represent the training data well. We first review methods for assigning importance scores to training points. Given importance scores, we propose a method to select a set of DIVerse INfluEntial (DIVINE) training points as a useful explanation of model behavior. As practitioners might not only be interested in finding data points influential with respect to model accuracy, but also with respect to other important metrics, we show how to evaluate training data points on the basis of group fairness. Our method can identify unfairness-inducing training points, which can be removed to improve fairness outcomes. Our quantitative experiments and user studies show that visualizing DIVINE points helps practitioners understand and explain model behavior better than earlier approaches.

READ FULL TEXT

page 20

page 23

page 29

research
12/13/2022

Fair Infinitesimal Jackknife: Mitigating the Influence of Biased Training Data Points Without Refitting

In consequential decision-making applications, mitigating unwanted biase...
research
10/28/2022

On the Vulnerability of Data Points under Multiple Membership Inference Attacks and Target Models

Membership Inference Attacks (MIAs) infer whether a data point is in the...
research
11/07/2020

On the Privacy Risks of Algorithmic Fairness

Algorithmic fairness and privacy are essential elements of trustworthy m...
research
03/02/2022

PUMA: Performance Unchanged Model Augmentation for Training Data Removal

Preserving the performance of a trained model while removing unique char...
research
02/04/2023

How Many and Which Training Points Would Need to be Removed to Flip this Prediction?

We consider the problem of identifying a minimal subset of training data...
research
10/03/2022

Data Budgeting for Machine Learning

Data is the fuel powering AI and creates tremendous value for many domai...
research
11/23/2022

Reconnoitering the class distinguishing abilities of the features, to know them better

The relevance of machine learning (ML) in our daily lives is closely int...

Please sign up or login with your details

Forgot password? Click here to reset