Distributed Training with Heterogeneous Data: Bridging Median and Mean Based Algorithms

06/04/2019
by   Xiangyi Chen, et al.
0

Recently, there is a growing interest in the study of median-based algorithms for distributed non-convex optimization. Two prominent such algorithms include signSGD with majority vote, an effective approach for communication reduction via 1-bit compression on the local gradients, and medianSGD, an algorithm recently proposed to ensure robustness against Byzantine workers. The convergence analyses for these algorithms critically rely on the assumption that all the distributed data are drawn iid from the same distribution. However, in applications such as Federated Learning, the data across different nodes or machines can be inherently heterogeneous, which violates such an iid assumption. This work analyzes signSGD and medianSGD in distributed settings with heterogeneous data. We show that these algorithms are non-convergent whenever there is some disparity between the expected median and mean over the local gradients. To overcome this gap, we provide a novel gradient correction mechanism that perturbs the local gradients with noise, together with a series results that provable close the gap between mean and median of the gradients. The proposed methods largely preserve nice properties of these methods, such as the low per-iteration communication complexity of signSGD, and further enjoy global convergence to stationary solutions. Our perturbation technique can be of independent interest when one wishes to estimate mean through a median estimator.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/16/2020

Byzantine-Resilient SGD in High Dimensions on Heterogeneous Data

We study distributed stochastic gradient descent (SGD) in the master-wor...
research
04/14/2021

BROADCAST: Reducing Both Stochastic and Compression Noise to Robustify Communication-Efficient Federated Learning

Communication between workers and the master node to collect local stoch...
research
08/01/2021

A Decentralized Federated Learning Framework via Committee Mechanism with Convergence Guarantee

Federated learning allows multiple participants to collaboratively train...
research
06/01/2022

Variance Reduction is an Antidote to Byzantines: Better Rates, Weaker Assumptions and Communication Compression as a Cherry on the Top

Byzantine-robustness has been gaining a lot of attention due to the grow...
research
11/21/2019

Communication-Efficient and Byzantine-Robust Distributed Learning

We develop a communication-efficient distributed learning algorithm that...
research
12/12/2019

Parallel Restarted SPIDER – Communication Efficient Distributed Nonconvex Optimization with Optimal Computation Complexity

In this paper, we propose a distributed algorithm for stochastic smooth,...
research
01/24/2019

Trajectory Normalized Gradients for Distributed Optimization

Recently, researchers proposed various low-precision gradient compressio...

Please sign up or login with your details

Forgot password? Click here to reset