DBS: Dynamic Batch Size For Distributed Deep Neural Network Training

07/23/2020
by   Qing Ye, et al.
5

Synchronous strategies with data parallelism, such as the Synchronous StochasticGradient Descent (S-SGD) and the model averaging methods, are widely utilizedin distributed training of Deep Neural Networks (DNNs), largely owing to itseasy implementation yet promising performance. Particularly, each worker ofthe cluster hosts a copy of the DNN and an evenly divided share of the datasetwith the fixed mini-batch size, to keep the training of DNNs convergence. In thestrategies, the workers with different computational capability, need to wait foreach other because of the synchronization and delays in network transmission,which will inevitably result in the high-performance workers wasting computation.Consequently, the utilization of the cluster is relatively low. To alleviate thisissue, we propose the Dynamic Batch Size (DBS) strategy for the distributedtraining of DNNs. Specifically, the performance of each worker is evaluatedfirst based on the fact in the previous epoch, and then the batch size and datasetpartition are dynamically adjusted in consideration of the current performanceof the worker, thereby improving the utilization of the cluster. To verify theeffectiveness of the proposed strategy, extensive experiments have been conducted,and the experimental results indicate that the proposed strategy can fully utilizethe performance of the cluster, reduce the training time, and have good robustnesswith disturbance by irrelevant tasks. Furthermore, rigorous theoretical analysis hasalso been provided to prove the convergence of the proposed strategy.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/06/2020

PSO-PS: Parameter Synchronization with Particle Swarm Optimization for Distributed Training of Deep Neural Networks

Parameter updating is an important stage in parallelism-based distribute...
research
09/06/2020

HLSGD Hierarchical Local SGD With Stale Gradients Featuring

While distributed training significantly speeds up the training process ...
research
01/04/2019

A Scalable Framework for Acceleration of CNN Training on Deeply-Pipelined FPGA Clusters with Weight and Workload Balancing

Deep Neural Networks (DNNs) have revolutionized numerous applications, b...
research
06/07/2018

Fast Distributed Deep Learning via Worker-adaptive Batch Sizing

Deep neural network models are usually trained in cluster environments, ...
research
10/23/2021

Scalable Smartphone Cluster for Deep Learning

Various deep learning applications on smartphones have been rapidly risi...
research
12/12/2017

Integrated Model and Data Parallelism in Training Neural Networks

We propose a new integrated method of exploiting both model and data par...
research
12/12/2017

Integrated Model, Batch and Domain Parallelism in Training Neural Networks

We propose a new integrated method of exploiting model, batch and domain...

Please sign up or login with your details

Forgot password? Click here to reset