Gradient Scheduling with Global Momentum for Non-IID Data Distributed Asynchronous Training

02/21/2019
by   Chengjie Li, et al.
0

Distributed asynchronous offline training has received widespread attention in recent years because of its high performance on large-scale data and complex models. As data are processed from cloud-centric positions to edge locations, a big challenge for distributed systems is how to handle native and natural non-independent and identically distributed (non-IID) data for training. Previous asynchronous training methods do not have a satisfying performance on non-IID data because it would result in that the training process fluctuates greatly which leads to an abnormal convergence. We propose a gradient scheduling algorithm with global momentum (GSGM) for non-IID data distributed asynchronous training. Our key idea is to schedule the gradients contributed by computing nodes based on a white list so that each training node's update frequency remains even. Furthermore, our new momentum method can solve the biased gradient problem. GSGM can make model converge effectively, and maintain high availability eventually. Experimental results show that for non-IID data training under the same experimental conditions, GSGM on popular optimization algorithms can achieve an 20 improvement in accuracy on Fashion-Mnist and CIFAR-10 datasets. Meanwhile, when expanding distributed scale on CIFAR-100 dataset that results in sparse data distribution, GSGM can perform an 37 Moreover, only GSGM can converge well when the number of computing nodes is 30, compared to the state-of-the-art distributed asynchronous algorithms.

READ FULL TEXT

page 2

page 3

page 4

page 5

page 6

page 7

page 10

page 11

research
07/26/2019

Taming Momentum in a Distributed Asynchronous Environment

Although distributed computing can significantly reduce the training tim...
research
09/24/2019

Gap Aware Mitigation of Gradient Staleness

Cloud computing is becoming increasingly popular as a platform for distr...
research
10/24/2019

Gradient Sparification for Asynchronous Distributed Training

Modern large scale machine learning applications require stochastic opti...
research
05/22/2018

Gradient Energy Matching for Distributed Asynchronous Gradient Descent

Distributed asynchronous SGD has become widely used for deep learning in...
research
07/16/2014

Online Asynchronous Distributed Regression

Distributed computing offers a high degree of flexibility to accommodate...
research
05/20/2020

Consensus Driven Learning

As the complexity of our neural network models grow, so too do the data ...
research
01/15/2021

CPU Scheduling in Data Centers Using Asynchronous Finite-Time Distributed Coordination Mechanisms

We propose an asynchronous iterative scheme which allows a set of interc...

Please sign up or login with your details

Forgot password? Click here to reset