DeepAI AI Chat
Log In Sign Up

The Non-IID Data Quagmire of Decentralized Machine Learning

by   Kevin Hsieh, et al.
Carnegie Mellon University

Many large-scale machine learning (ML) applications need to train ML models over decentralized datasets that are generated at different devices and locations. These decentralized datasets pose a fundamental challenge to ML because they are typically generated in very different contexts, which leads to significant differences in data distribution across devices/locations (i.e., they are not independent and identically distributed (IID)). In this work, we take a step toward better understanding this challenge, by presenting the first detailed experimental study of the impact of such non-IID data on the decentralized training of deep neural networks (DNNs). Our study shows that: (i) the problem of non-IID data partitions is fundamental and pervasive, as it exists in all ML applications, DNN models, training datasets, and decentralized learning algorithms in our study; (ii) this problem is particularly difficult for DNN models with batch normalization layers; and (iii) the degree of deviation from IID (the skewness) is a key determinant of the difficulty level of the problem. With these findings in mind, we present SkewScout, a system-level approach that adapts the communication frequency of decentralized learning algorithms to the (skew-induced) accuracy loss between data partitions. We also show that group normalization can recover much of the skew-induced accuracy loss of batch normalization.


page 1

page 2

page 3

page 4


Low Precision Decentralized Distributed Training over IID and non-IID Data

Decentralized distributed learning is the key to enabling large-scale ma...

Global Update Tracking: A Decentralized Learning Algorithm for Heterogeneous Data

Decentralized learning enables the training of deep learning models over...

Making Batch Normalization Great in Federated Deep Learning

Batch Normalization (BN) is commonly used in modern deep neural networks...

LightNorm: Area and Energy-Efficient Batch Normalization Hardware for On-Device DNN Training

When training early-stage deep neural networks (DNNs), generating interm...

Machine Learning Systems for Highly-Distributed and Rapidly-Growing Data

The usability and practicality of any machine learning (ML) applications...

Improving Classifier Training Efficiency for Automatic Cyberbullying Detection with Feature Density

We study the effectiveness of Feature Density (FD) using different lingu...