The Non-IID Data Quagmire of Decentralized Machine Learning

10/01/2019
by   Kevin Hsieh, et al.
25

Many large-scale machine learning (ML) applications need to train ML models over decentralized datasets that are generated at different devices and locations. These decentralized datasets pose a fundamental challenge to ML because they are typically generated in very different contexts, which leads to significant differences in data distribution across devices/locations (i.e., they are not independent and identically distributed (IID)). In this work, we take a step toward better understanding this challenge, by presenting the first detailed experimental study of the impact of such non-IID data on the decentralized training of deep neural networks (DNNs). Our study shows that: (i) the problem of non-IID data partitions is fundamental and pervasive, as it exists in all ML applications, DNN models, training datasets, and decentralized learning algorithms in our study; (ii) this problem is particularly difficult for DNN models with batch normalization layers; and (iii) the degree of deviation from IID (the skewness) is a key determinant of the difficulty level of the problem. With these findings in mind, we present SkewScout, a system-level approach that adapts the communication frequency of decentralized learning algorithms to the (skew-induced) accuracy loss between data partitions. We also show that group normalization can recover much of the skew-induced accuracy loss of batch normalization.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/17/2021

Low Precision Decentralized Distributed Training over IID and non-IID Data

Decentralized distributed learning is the key to enabling large-scale ma...
research
05/08/2023

Global Update Tracking: A Decentralized Learning Algorithm for Heterogeneous Data

Decentralized learning enables the training of deep learning models over...
research
03/12/2023

Making Batch Normalization Great in Federated Deep Learning

Batch Normalization (BN) is commonly used in modern deep neural networks...
research
11/04/2022

LightNorm: Area and Energy-Efficient Batch Normalization Hardware for On-Device DNN Training

When training early-stage deep neural networks (DNNs), generating interm...
research
10/18/2019

Machine Learning Systems for Highly-Distributed and Rapidly-Growing Data

The usability and practicality of any machine learning (ML) applications...
research
12/13/2021

How to Learn when Data Gradually Reacts to Your Model

A recent line of work has focused on training machine learning (ML) mode...

Please sign up or login with your details

Forgot password? Click here to reset