Large Batch Training Does Not Need Warmup

02/04/2020
by   Zhouyuan Huo, et al.
0

Training deep neural networks using a large batch size has shown promising results and benefits many real-world applications. However, the optimizer converges slowly at early epochs and there is a gap between large-batch deep learning optimization heuristics and theoretical underpinnings. In this paper, we propose a novel Complete Layer-wise Adaptive Rate Scaling (CLARS) algorithm for large-batch training. We also analyze the convergence rate of the proposed method by introducing a new fine-grained analysis of gradient-based methods. Based on our analysis, we bridge the gap and illustrate the theoretical insights for three popular large-batch training techniques, including linear learning rate scaling, gradual warmup, and layer-wise adaptive rate scaling. Extensive experiments demonstrate that the proposed algorithm outperforms gradual warmup technique by a large margin and defeats the convergence of the state-of-the-art large-batch optimizer in training advanced deep neural networks (ResNet, DenseNet, MobileNet) on ImageNet dataset.

READ FULL TEXT
research
02/05/2021

Evaluating Deep Learning in SystemML using Layer-wise Adaptive Rate Scaling(LARS) Optimizer

Increasing the batch size of a deep learning model is a challenging task...
research
12/16/2020

Study on the Large Batch Size Training of Neural Networks Based on the Second Order Gradient

Large batch size training in deep neural networks (DNNs) possesses a wel...
research
04/23/2023

The Disharmony Between BN and ReLU Causes Gradient Explosion, but is Offset by the Correlation Between Activations

Deep neural networks based on batch normalization and ReLU-like activati...
research
11/27/2020

Improving Layer-wise Adaptive Rate Methods using Trust Ratio Clipping

Training neural networks with large batch is of fundamental significance...
research
05/17/2022

Hyper-Learning for Gradient-Based Batch Size Adaptation

Scheduling the batch size to increase is an effective strategy to contro...
research
03/30/2021

Exploiting Invariance in Training Deep Neural Networks

Inspired by two basic mechanisms in animal visual systems, we introduce ...
research
10/05/2022

Non-Convergence and Limit Cycles in the Adam optimizer

One of the most popular training algorithms for deep neural networks is ...

Please sign up or login with your details

Forgot password? Click here to reset