Distributed Training Large-Scale Deep Architectures

08/10/2017
by   Shang-Xuan Zou, et al.
0

Scale of data and scale of computation infrastructures together enable the current deep learning renaissance. However, training large-scale deep architectures demands both algorithmic improvement and careful system configuration. In this paper, we focus on employing the system approach to speed up large-scale training. Via lessons learned from our routine benchmarking effort, we first identify bottlenecks and overheads that hinter data parallelism. We then devise guidelines that help practitioners to configure an effective system and fine-tune parameters to achieve desired speedup. Specifically, we develop a procedure for setting minibatch size and choosing computation algorithms. We also derive lemmas for determining the quantity of key components such as the number of GPUs and parameter servers. Experiments and examples show that these guidelines help effectively speed up large-scale deep learning training.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/16/2021

Hydra: A System for Large Multi-Model Deep Learning

In many deep learning (DL) applications, the desire for ever higher accu...
research
02/20/2021

GIST: Distributed Training for Large-Scale Graph Convolutional Networks

The graph convolutional network (GCN) is a go-to solution for machine le...
research
05/05/2022

dPRO: A Generic Profiling and Optimization System for Expediting Distributed DNN Training

Distributed training using multiple devices (e.g., GPUs) has been widely...
research
08/02/2016

Horn: A System for Parallel Training and Regularizing of Large-Scale Neural Networks

I introduce a new distributed system for effective training and regulari...
research
05/19/2020

Sparsity-based audio declipping methods: overview, new algorithms, and large-scale evaluation

Recent advances in audio declipping have substantially improved the stat...
research
10/01/2019

Training Kinetics in 15 Minutes: Large-scale Distributed Training on Videos

Deep video recognition is more computationally expensive than image reco...
research
10/09/2015

Large-scale Artificial Neural Network: MapReduce-based Deep Learning

Faced with continuously increasing scale of data, original back-propagat...

Please sign up or login with your details

Forgot password? Click here to reset