Training Aware Sigmoidal Optimizer

by   David Macêdo, et al.

Proper optimization of deep neural networks is an open research question since an optimal procedure to change the learning rate throughout training is still unknown. Manually defining a learning rate schedule involves troublesome time-consuming try and error procedures to determine hyperparameters such as learning rate decay epochs and learning rate decay rates. Although adaptive learning rate optimizers automatize this process, recent studies suggest they may produce overffiting and reduce performance when compared to fine-tuned learning rate schedules. Considering that deep neural networks loss functions present landscapes with much more saddle points than local minima, we proposed the Training Aware Sigmoidal Optimizer (TASO), which consists of a two-phases automated learning rate schedule. The first phase uses a high learning rate to fast traverse the numerous saddle point, while the second phase uses low learning rate to slowly approach the center of the local minimum previously found. We compared the proposed approach with commonly used adaptive learning rate schedules such as Adam, RMSProp, and Adagrad. Our experiments showed that TASO outperformed all competing methods in both optimal (i.e., performing hyperparameter validation) and suboptimal (i.e., using default hyperparameters) scenarios.


page 1

page 2

page 3

page 4


LRTuner: A Learning Rate Tuner for Deep Neural Networks

One very important hyperparameter for training deep neural networks is t...

How to decay your learning rate

Complex learning rate schedules have become an integral part of deep lea...

To Raise or Not To Raise: The Autonomous Learning Rate Question

There is a parameter ubiquitous throughout the deep learning world: lear...

Adaptive Gradient Methods with Local Guarantees

Adaptive gradient methods are the method of choice for optimization in m...

Learning Rate Dropout

The performance of a deep neural network is highly dependent on its trai...

Does Adam optimizer keep close to the optimal point?

The adaptive optimizer for training neural networks has continually evol...

Implicit bias of deep linear networks in the large learning rate phase

Correctly choosing a learning rate (scheme) for gradient-based optimizat...