Large Batch Training of Convolutional Networks

08/13/2017
by   Yang You, et al.
0

A common way to speed up training of large convolutional networks is to add computational units. Training is then performed using data-parallel synchronous Stochastic Gradient Descent (SGD) with mini-batch divided between computational units. With an increase in the number of nodes, the batch size grows. But training with large batch size often results in the lower model accuracy. We argue that the current recipe for large batch training (linear learning rate scaling with warm-up) is not general enough and training may diverge. To overcome this optimization difficulties we propose a new training algorithm based on Layer-wise Adaptive Rate Scaling (LARS). Using LARS, we scaled Alexnet up to a batch size of 8K, and Resnet-50 to a batch size of 32K without loss in accuracy.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/01/2017

Don't Decay the Learning Rate, Increase the Batch Size

It is common practice to decay the learning rate. Here we show one can u...
research
12/07/2018

Nonlinear Conjugate Gradients For Scaling Synchronous Distributed DNN Training

Nonlinear conjugate gradient (NLCG) based optimizers have shown superior...
research
07/09/2020

AdaScale SGD: A User-Friendly Algorithm for Distributed Training

When using large-batch training to speed up stochastic gradient descent,...
research
11/27/2020

Improving Layer-wise Adaptive Rate Methods using Trust Ratio Clipping

Training neural networks with large batch is of fundamental significance...
research
07/26/2019

Anonymity Mixes as (Partial) Assembly Queues: Modeling and Analysis

Anonymity platforms route the traffic over a network of special routers ...
research
10/21/2022

A New Perspective for Understanding Generalization Gap of Deep Neural Networks Trained with Large Batch Sizes

Deep neural networks (DNNs) are typically optimized using various forms ...
research
10/06/2020

A Closer Look at Codistillation for Distributed Training

Codistillation has been proposed as a mechanism to share knowledge among...

Please sign up or login with your details

Forgot password? Click here to reset