Data Valuation for Vertical Federated Learning: An Information-Theoretic Approach

12/15/2021
by   Xiao Han, et al.
0

Federated learning (FL) is a promising machine learning paradigm that enables cross-party data collaboration for real-world AI applications in a privacy-preserving and law-regulated way. How to valuate parties' data is a critical but challenging FL issue. In the literature, data valuation either relies on running specific models for a given task or is just task irrelevant; however, it is often requisite for party selection given a specific task when FL models have not been determined yet. This work thus fills the gap and proposes FedValue, to our best knowledge, the first privacy-preserving, task-specific but model-free data valuation method for vertical FL tasks. Specifically, FedValue incorporates a novel information-theoretic metric termed Shapley-CMI to assess data values of multiple parties from a game-theoretic perspective. Moreover, a novel server-aided federated computation mechanism is designed to compute Shapley-CMI and meanwhile protects each party from data leakage. We also propose several techniques to accelerate Shapley-CMI computation in practice. Extensive experiments on six open datasets validate the effectiveness and efficiency of FedValue for data valuation of vertical FL tasks. In particular, Shapley-CMI as a model-free metric performs comparably with the measures that depend on running an ensemble of well-performing models.

READ FULL TEXT
research
03/05/2021

FedV: Privacy-Preserving Federated Learning over Vertically Partitioned Data

Federated learning (FL) has been proposed to allow collaborative trainin...
research
02/11/2023

Vertical Federated Knowledge Transfer via Representation Distillation for Healthcare Collaboration Networks

Collaboration between healthcare institutions can significantly lessen t...
research
02/11/2021

Privacy-Preserving Self-Taught Federated Learning for Heterogeneous Data

Many application scenarios call for training a machine learning model am...
research
10/11/2021

An Information-Theoretic Analysis of The Cost of Decentralization for Learning and Inference Under Privacy Constraints

In vertical federated learning (FL), the features of a data sample are d...
research
07/21/2021

Defending against Reconstruction Attack in Vertical Federated Learning

Recently researchers have studied input leakage problems in Federated Le...
research
02/25/2022

Towards an Accountable and Reproducible Federated Learning: A FactSheets Approach

Federated Learning (FL) is a novel paradigm for the shared training of m...
research
12/24/2019

A Communication Efficient Vertical Federated Learning Framework

One critical challenge for applying today's Artificial Intelligence (AI)...

Please sign up or login with your details

Forgot password? Click here to reset