A disciplined approach to neural network hyper-parameters: Part 1 -- learning rate, batch size, momentum, and weight decay

03/26/2018
by   Leslie N. Smith, et al.
U.S. Navy
0

Although deep learning has produced dazzling successes for applications of image, speech, and video processing in the past few years, most trainings are with suboptimal hyper-parameters, requiring unnecessarily long training times. Setting the hyper-parameters remains a black art that requires years of experience to acquire. This report proposes several efficient ways to set the hyper-parameters that significantly reduce training time and improves performance. Specifically, this report shows how to examine the training validation/test loss function for subtle clues of underfitting and overfitting and suggests guidelines for moving toward the optimal balance point. Then it discusses how to increase/decrease the learning rate/momentum to speed up training. Our experiments show that it is crucial to balance every manner of regularization for each dataset and architecture. Weight decay is used as a sample regularizer to show how its optimal value is tightly coupled with the learning rates and momentums.

READ FULL TEXT

page 1

page 2

page 3

page 4

11/01/2017

Don't Decay the Learning Rate, Increase the Batch Size

It is common practice to decay the learning rate. Here we show one can u...
10/16/2019

An Exponential Learning Rate Schedule for Deep Learning

Intriguing empirical evidence exists that deep learning can work well wi...
09/30/2022

Adaptive Weight Decay: On The Fly Weight Decay Tuning for Improving Robustness

We introduce adaptive weight decay, which automatically tunes the hyper-...
08/03/2022

Empirical Study of Overfitting in Deep FNN Prediction Models for Breast Cancer Metastasis

Overfitting is defined as the fact that the current model fits a specifi...
04/02/2022

AdaSmooth: An Adaptive Learning Rate Method based on Effective Ratio

It is well known that we need to choose the hyper-parameters in Momentum...
06/15/2020

Spherical Motion Dynamics of Deep Neural Networks with Batch Normalization and Weight Decay

We comprehensively reveal the learning dynamics of deep neural networks ...
08/30/2021

Tune It or Don't Use It: Benchmarking Data-Efficient Image Classification

Data-efficient image classification using deep neural networks in settin...

Code Repositories

Deep-Learning

A few notebooks about deep learning in pytorch


view repo

cyclical-learning-activations-pytorch

Pytorch implementation of cyclical learning rates, along with custom activation functions, and easy testing and logging of multiple models.


view repo

Please sign up or login with your details

Forgot password? Click here to reset