2D-Shapley: A Framework for Fragmented Data Valuation

06/18/2023
by   Zhihong Liu, et al.
0

Data valuation – quantifying the contribution of individual data sources to certain predictive behaviors of a model – is of great importance to enhancing the transparency of machine learning and designing incentive systems for data sharing. Existing work has focused on evaluating data sources with the shared feature or sample space. How to valuate fragmented data sources of which each only contains partial features and samples remains an open question. We start by presenting a method to calculate the counterfactual of removing a fragment from the aggregated data matrix. Based on the counterfactual calculation, we further propose 2D-Shapley, a theoretical framework for fragmented data valuation that uniquely satisfies some appealing axioms in the fragmented data context. 2D-Shapley empowers a range of new use cases, such as selecting useful data fragments, providing interpretation for sample-wise data values, and fine-grained data issue diagnosis.

READ FULL TEXT

page 23

page 24

research
06/11/2018

Aggregating Predictions on Multiple Non-disclosed Datasets using Conformal Prediction

Conformal Prediction is a machine learning methodology that produces val...
research
11/03/2016

GFA: Exploratory Analysis of Multiple Data Sources with Group Factor Analysis

The R package GFA provides a full pipeline for factor analysis of multip...
research
01/29/2018

Evaluating approaches for supervised semantic labeling

Relational data sources are still one of the most popular ways to store ...
research
10/17/2022

Private Data Valuation and Fair Payment in Data Marketplaces

Data valuation is an essential task in a data marketplace. It aims at fa...
research
09/22/2022

Linking Contexts from Distinct Data Sources in Zero Trust Federation

An access control model called Zero Trust Architecture (ZTA) has attract...
research
07/18/2022

Turning the information-sharing dial: efficient inference from different data sources

A fundamental aspect of statistics is the integration of data from diffe...
research
08/28/2023

Data fusion using weakly aligned sources

We introduce a new data fusion method that utilizes multiple data source...

Please sign up or login with your details

Forgot password? Click here to reset