Beta Shapley: a Unified and Noise-reduced Data Valuation Framework for Machine Learning

10/26/2021
by   Yongchan Kwon, et al.
0

Data Shapley has recently been proposed as a principled framework to quantify the contribution of individual datum in machine learning. It can effectively identify helpful or harmful data points for a learning algorithm. In this paper, we propose Beta Shapley, which is a substantial generalization of Data Shapley. Beta Shapley arises naturally by relaxing the efficiency axiom of the Shapley value, which is not critical for machine learning settings. Beta Shapley unifies several popular data valuation methods and includes data Shapley as a special case. Moreover, we prove that Beta Shapley has several desirable statistical properties and propose efficient algorithms to estimate it. We demonstrate that Beta Shapley outperforms state-of-the-art data valuation methods on several downstream ML tasks such as: 1) detecting mislabeled training data; 2) learning with subsamples; and 3) identifying points whose addition or removal have the largest positive or negative impact on the model.

READ FULL TEXT

page 10

page 24

research
06/02/2022

On Some Properties of the Beta Inverse Rayleigh Distribution

We study with some details a lifetime model of the class of beta general...
research
04/05/2019

Data Shapley: Equitable Valuation of Data for Machine Learning

As data becomes the fuel driving technological and economic growth, a fu...
research
07/02/2020

Efficient computation and analysis of distributional Shapley values

Distributional data Shapley value (DShapley) has been recently proposed ...
research
10/03/2011

Strange Beta: An Assistance System for Indoor Rock Climbing Route Setting Using Chaotic Variations and Machine Learning

This paper applies machine learning and the mathematics of chaos to the ...
research
10/30/2012

Hierarchical Learning Algorithm for the Beta Basis Function Neural Network

The paper presents a two-level learning method for the design of the Bet...
research
10/18/2022

Machine-Learning-Optimized Perovskite Nanoplatelet Synthesis

With the demand for renewable energy and efficient devices rapidly incre...
research
06/18/2023

OpenDataVal: a Unified Benchmark for Data Valuation

Assessing the quality and impact of individual data points is critical f...

Please sign up or login with your details

Forgot password? Click here to reset