Robust Data Valuation via Variance Reduced Data Shapley

10/30/2022
by   Mengmeng Wu, et al.
0

Data valuation, especially quantifying data value in algorithmic prediction and decision-making, is a fundamental problem in data trading scenarios. The most widely used method is to define the data Shapley and approximate it by means of the permutation sampling algorithm. To make up for the large estimation variance of the permutation sampling that hinders the development of the data marketplace, we propose a more robust data valuation method using stratified sampling, named variance reduced data Shapley (VRDS for short). We theoretically show how to stratify, how many samples are taken at each stratum, and the sample complexity analysis of VRDS. Finally, the effectiveness of VRDS is illustrated in different types of datasets and data removal applications.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/28/2023

Sample Complexity of Variance-reduced Distributionally Robust Q-learning

Dynamic decision making under distributional shifts is of fundamental in...
research
02/20/2018

Sample Complexity of Stochastic Variance-Reduced Cubic Regularization for Nonconvex Optimization

The popular cubic regularization (CR) method converges with first- and s...
research
03/30/2021

Greedy-GQ with Variance Reduction: Finite-time Analysis and Improved Complexity

Greedy-GQ is a value-based reinforcement learning (RL) algorithm for opt...
research
02/01/2022

PAGE-PG: A Simple and Loopless Variance-Reduced Policy Gradient Method with Probabilistic Gradient Estimation

Despite their success, policy gradient methods suffer from high variance...
research
11/18/2021

C-OPH: Improving the Accuracy of One Permutation Hashing (OPH) with Circulant Permutations

Minwise hashing (MinHash) is a classical method for efficiently estimati...
research
01/07/2020

Reanalysis of Variance Reduced Temporal Difference Learning

Temporal difference (TD) learning is a popular algorithm for policy eval...
research
05/22/2023

Sequential Estimation using Hierarchically Stratified Domains with Latin Hypercube Sampling

Quantifying the effect of uncertainties in systems where only point eval...

Please sign up or login with your details

Forgot password? Click here to reset