Federated Learning on Non-IID Data Silos: An Experimental Study

02/03/2021
by   Qinbin Li, et al.
0

Machine learning services have been emerging in many data-intensive applications, and their effectiveness highly relies on large-volume high-quality training data. However, due to the increasing privacy concerns and data regulations, training data have been increasingly fragmented, forming distributed databases of multiple data silos (e.g., within different organizations and countries). To develop effective machine learning services, there is a must to exploit data from such distributed databases without exchanging the raw data. Recently, federated learning (FL) has been a solution with growing interests, which enables multiple parties to collaboratively train a machine learning model without exchanging their local data. A key and common challenge on distributed databases is the heterogeneity of the data distribution (i.e., non-IID) among the parties. There have been many FL algorithms to address the learning effectiveness under non-IID data settings. However, there lacks an experimental study on systematically understanding their advantages and disadvantages, as previous studies have very rigid data partitioning strategies among parties, which are hardly representative and thorough. In this paper, to help researchers better understand and study the non-IID data setting in federated learning, we propose comprehensive data partitioning strategies to cover the typical non-IID data cases. Moreover, we conduct extensive experiments to evaluate state-of-the-art FL algorithms. We find that non-IID does bring significant challenges in learning accuracy of FL algorithms, and none of the existing state-of-the-art FL algorithms outperforms others in all cases. Our experiments provide insights for future studies of addressing the challenges in data silos.

READ FULL TEXT

page 5

page 9

research
03/30/2021

Model-Contrastive Federated Learning

Federated learning enables multiple parties to collaboratively train a m...
research
12/04/2020

Mitigating Bias in Federated Learning

As methods to create discrimination-aware models develop, they focus on ...
research
07/22/2020

IBM Federated Learning: an Enterprise Framework White Paper V0.1

Federated Learning (FL) is an approach to conduct machine learning witho...
research
09/03/2023

martFL: Enabling Utility-Driven Data Marketplace with a Robust and Verifiable Federated Learning Architecture

The development of machine learning models requires a large amount of tr...
research
12/06/2020

SoK: Training Machine Learning Models over Multiple Sources with Privacy Preservation

Nowadays, gathering high-quality training data from multiple data contro...
research
01/18/2022

Towards Federated Clustering: A Federated Fuzzy c-Means Algorithm (FFCM)

Federated Learning (FL) is a setting where multiple parties with distrib...
research
02/03/2023

Vertical Federated Learning: Taxonomies, Threats, and Prospects

Federated learning (FL) is the most popular distributed machine learning...

Please sign up or login with your details

Forgot password? Click here to reset