Fundamentals of Task-Agnostic Data Valuation

08/25/2022
by   Mohammad Mohammadi Amiri, et al.
0

We study valuing the data of a data owner/seller for a data seeker/buyer. Data valuation is often carried out for a specific task assuming a particular utility metric, such as test accuracy on a validation set, that may not exist in practice. In this work, we focus on task-agnostic data valuation without any validation requirements. The data buyer has access to a limited amount of data (which could be publicly available) and seeks more data samples from a data seller. We formulate the problem as estimating the differences in the statistical properties of the data at the seller with respect to the baseline data available at the buyer. We capture these statistical differences through second moment by measuring diversity and relevance of the seller's data for the buyer; we estimate these measures through queries to the seller without requesting raw data. We design the queries with the proposed approach so that the seller is blind to the buyer's raw data and has no knowledge to fabricate responses to queries to obtain a desired outcome of the diversity and relevance trade-off.We will show through extensive experiments on real tabular and image datasets that the proposed estimates capture the diversity and relevance of the seller's data for the buyer.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/01/2015

Evaluation of Genotypic Diversity Measurements Exploited in Real-Coded Representation

Numerous genotypic diversity measures (GDMs) are available in the litera...
research
09/27/2020

Simultaneous Relevance and Diversity: A New Recommendation Inference Approach

Relevance and diversity are both important to the success of recommender...
research
10/14/2020

Probabilistic Time Series Forecasting with Structured Shape and Temporal Diversity

Probabilistic forecasting consists in predicting a distribution of possi...
research
04/29/2021

Efficient SPARQL Autocompletion via SPARQL

We show how to achieve fast autocompletion for SPARQL queries on very la...
research
11/02/2021

ISP-Agnostic Image Reconstruction for Under-Display Cameras

Under-display cameras have been proposed in recent years as a way to red...
research
08/06/2016

Transferring Knowledge from Text to Predict Disease Onset

In many domains such as medicine, training data is in short supply. In s...

Please sign up or login with your details

Forgot password? Click here to reset