Scalable Data Point Valuation in Decentralized Learning

05/01/2023
by   Konstantin D. Pandl, et al.
2

Existing research on data valuation in federated and swarm learning focuses on valuing client contributions and works best when data across clients is independent and identically distributed (IID). In practice, data is rarely distributed IID. We develop an approach called DDVal for decentralized data valuation, capable of valuing individual data points in federated and swarm learning. DDVal is based on sharing deep features and approximating Shapley values through a k-nearest neighbor approximation method. This allows for novel applications, for example, to simultaneously reward institutions and individuals for providing data to a decentralized machine learning task. The valuation of data points through DDVal allows to also draw hierarchical conclusions on the contribution of institutions, and we empirically show that the accuracy of DDVal in estimating institutional contributions is higher than existing Shapley value approximation methods for federated learning. Specifically, it reaches a cosine similarity in approximating Shapley values of 99.969 compared with 99.301 DDVal scales with the number of data points instead of the number of clients, and has a loglinear complexity. This scales more favorably than existing approaches with an exponential complexity. We show that DDVal is especially efficient in data distribution scenarios with many clients that have few data points - for example, more than 16 clients with 8,000 data points each. By integrating DDVal into a decentralized system, we show that it is not only suitable for centralized federated learning, but also decentralized swarm learning, which aligns well with the research on emerging internet technologies such as web3 to reward users for providing data to algorithms.

READ FULL TEXT
research
11/01/2021

Implicit Model Specialization through DAG-based Decentralized Federated Learning

Federated learning allows a group of distributed clients to train a comm...
research
07/19/2022

SphereFed: Hyperspherical Federated Learning

Federated Learning aims at training a global model from multiple decentr...
research
07/18/2021

Decentralized federated learning of deep neural networks on non-iid data

We tackle the non-convex problem of learning a personalized deep learnin...
research
09/30/2022

Federated Training of Dual Encoding Models on Small Non-IID Client Datasets

Dual encoding models that encode a pair of inputs are widely used for re...
research
06/23/2022

EFFGAN: Ensembles of fine-tuned federated GANs

Generative adversarial networks have proven to be a powerful tool for le...
research
06/15/2022

Global Convergence of Federated Learning for Mixed Regression

This paper studies the problem of model training under Federated Learnin...
research
08/30/2019

Rewarding High-Quality Data via Influence Functions

We consider a crowdsourcing data acquisition scenario, such as federated...

Please sign up or login with your details

Forgot password? Click here to reset