What is the Value of Data? On Mathematical Methods for Data Quality Estimation

01/09/2020
by   Netanel Raviv, et al.
0

Data is one of the most important assets of the information age, and its societal impact is undisputed. Yet, rigorous methods of assessing the quality of data are lacking. In this paper, we propose a formal definition for the quality of a given dataset. We assess a dataset's quality by a quantity we call the expected diameter, which measures the expected disagreement between two randomly chosen hypotheses that explain it, and has recently found applications in active learning. We focus on Boolean hyperplanes, and utilize a collection of Fourier analytic, algebraic, and probabilistic methods to come up with theoretical guarantees and practical solutions for the computation of the expected diameter. We also study the behaviour of the expected diameter on algebraically structured datasets, conduct experiments that validate this notion of quality, and demonstrate the feasibility of our techniques.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/13/2018

Revisiting Radius, Diameter, and all Eccentricity Computation in Graphs through Certificates

We introduce notions of certificates allowing to bound eccentricities in...
research
08/16/2020

Diameter Polytopes of Feasible Binary Programs

Feasible binary programs often have multiple optimal solutions, which is...
research
02/25/2022

A note on Cops and Robbers, independence number, domination number and diameter

We study relations between diameter D(G), domination number γ(G), indepe...
research
06/22/2020

Effective Version Space Reduction for Convolutional Neural Networks

In active learning, sampling bias could pose a serious inconsistency pro...
research
07/30/2014

Targeting Optimal Active Learning via Example Quality

In many classification problems unlabelled data is abundant and a subset...
research
01/28/2023

Computing expected moments of the Rényi parking problem on the circle

A highly accurate and efficient method to compute the expected values of...

Please sign up or login with your details

Forgot password? Click here to reset