Training Aware Sigmoidal Optimizer

02/17/2021
by   David Macêdo, et al.
12

Proper optimization of deep neural networks is an open research question since an optimal procedure to change the learning rate throughout training is still unknown. Manually defining a learning rate schedule involves troublesome time-consuming try and error procedures to determine hyperparameters such as learning rate decay epochs and learning rate decay rates. Although adaptive learning rate optimizers automatize this process, recent studies suggest they may produce overffiting and reduce performance when compared to fine-tuned learning rate schedules. Considering that deep neural networks loss functions present landscapes with much more saddle points than local minima, we proposed the Training Aware Sigmoidal Optimizer (TASO), which consists of a two-phases automated learning rate schedule. The first phase uses a high learning rate to fast traverse the numerous saddle point, while the second phase uses low learning rate to slowly approach the center of the local minimum previously found. We compared the proposed approach with commonly used adaptive learning rate schedules such as Adam, RMSProp, and Adagrad. Our experiments showed that TASO outperformed all competing methods in both optimal (i.e., performing hyperparameter validation) and suboptimal (i.e., using default hyperparameters) scenarios.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/30/2021

LRTuner: A Learning Rate Tuner for Deep Neural Networks

One very important hyperparameter for training deep neural networks is t...
research
03/23/2021

How to decay your learning rate

Complex learning rate schedules have become an integral part of deep lea...
research
10/21/2022

Amos: An Adam-style Optimizer with Adaptive Weight Decay towards Model-Oriented Scale

We present Amos, a stochastic gradient-based optimizer designed for trai...
research
06/16/2021

To Raise or Not To Raise: The Autonomous Learning Rate Question

There is a parameter ubiquitous throughout the deep learning world: lear...
research
03/02/2022

Adaptive Gradient Methods with Local Guarantees

Adaptive gradient methods are the method of choice for optimization in m...
research
11/01/2019

Does Adam optimizer keep close to the optimal point?

The adaptive optimizer for training neural networks has continually evol...
research
11/25/2020

Implicit bias of deep linear networks in the large learning rate phase

Correctly choosing a learning rate (scheme) for gradient-based optimizat...

Please sign up or login with your details

Forgot password? Click here to reset