Distributed Training of Deep Learning Models: A Taxonomic Perspective

07/08/2020
by   Matthias Langer, et al.
0

Distributed deep learning systems (DDLS) train deep neural network models by utilizing the distributed resources of a cluster. Developers of DDLS are required to make many decisions to process their particular workloads in their chosen environment efficiently. The advent of GPU-based deep learning, the ever-increasing size of datasets and deep neural network models, in combination with the bandwidth constraints that exist in cluster environments require developers of DDLS to be innovative in order to train high quality models quickly. Comparing DDLS side-by-side is difficult due to their extensive feature lists and architectural deviations. We aim to shine some light on the fundamental principles that are at work when training deep neural networks in a cluster of independent machines by analyzing the general properties associated with training deep learning models and how such workloads can be distributed in a cluster to achieve collaborative model training. Thereby we provide an overview of the different techniques that are used by contemporary DDLS and discuss their influence and implications on the training process. To conceptualize and compare DDLS, we group different techniques into categories, thus establishing a taxonomy of distributed deep learning systems.

READ FULL TEXT
research
11/19/2015

SparkNet: Training Deep Networks in Spark

Training deep networks is a time-consuming process, with networks for ob...
research
05/27/2020

Communication-Efficient Distributed Deep Learning: Survey, Evaluation, and Challenges

In recent years, distributed deep learning techniques are widely deploye...
research
10/28/2018

A Hitchhiker's Guide On Distributed Training of Deep Neural Networks

Deep learning has led to tremendous advancements in the field of Artific...
research
07/11/2022

Towards Crowdsourced Training of Large Neural Networks using Decentralized Mixture-of-Experts

Many recent breakthroughs in deep learning were achieved by training inc...
research
08/08/2018

Parallax: Automatic Data-Parallel Training of Deep Neural Networks

The employment of high-performance servers and GPU accelerators for trai...
research
02/12/2021

Cockpit: A Practical Debugging Tool for Training Deep Neural Networks

When engineers train deep learning models, they are very much "flying bl...
research
02/10/2020

Learning@home: Crowdsourced Training of Large Neural Networks using Decentralized Mixture-of-Experts

Many recent breakthroughs in deep learning were achieved by training inc...

Please sign up or login with your details

Forgot password? Click here to reset