Oscars: Adaptive Semi-Synchronous Parallel Model for Distributed Deep Learning with Global View

02/17/2021
by   Sheng Huang, et al.
0

Deep learning has become an indispensable part of life, such as face recognition, NLP, etc., but the training of deep model has always been a challenge, and in recent years, the complexity of training data and models has shown explosive growth, so the training method is gradually transformed into distributed training. Classical synchronization strategy can guarantee accuracy but frequent communication can lead to a slow training speed, although asynchronous strategy training speed but can not guarantee the accuracy, and in the face of the training of the heterogeneous cluster, the above work is not efficient to work, on the one hand, can cause serious waste of resources, on the other hand, frequent communication also made slow training speed, so this paper proposes a semi-synchronous training strategy based on local-SDG, effectively improve the utilization efficiency of heterogeneous resources cluster and reduce communication overhead, to accelerate the training and ensure the accuracy of the model.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/10/2020

A Mechanism for Distributed Deep Learning Communication Optimization

Intensive communication and synchronization cost for gradients and param...
research
07/16/2023

Accelerating Distributed ML Training via Selective Synchronization

In distributed training, deep neural networks (DNNs) are launched over m...
research
03/12/2019

A Distributed Hierarchical SGD Algorithm with Sparse Global Reduction

Reducing communication overhead is a big challenge for large-scale distr...
research
11/19/2015

SparkNet: Training Deep Networks in Spark

Training deep networks is a time-consuming process, with networks for ob...
research
08/16/2019

Dynamic Stale Synchronous Parallel Distributed Training for Deep Learning

Deep learning is a popular machine learning technique and has been appli...
research
10/23/2021

Scalable Smartphone Cluster for Deep Learning

Various deep learning applications on smartphones have been rapidly risi...
research
11/16/2021

Task allocation for decentralized training in heterogeneous environment

The demand for large-scale deep learning is increasing, and distributed ...

Please sign up or login with your details

Forgot password? Click here to reset