AdaSmooth: An Adaptive Learning Rate Method based on Effective Ratio

04/02/2022
by   Jun Lu, et al.
0

It is well known that we need to choose the hyper-parameters in Momentum, AdaGrad, AdaDelta, and other alternative stochastic optimizers. While in many cases, the hyper-parameters are tuned tediously based on experience becoming more of an art than science. We present a novel per-dimension learning rate method for gradient descent called AdaSmooth. The method is insensitive to hyper-parameters thus it requires no manual tuning of the hyper-parameters like Momentum, AdaGrad, and AdaDelta methods. We show promising results compared to other methods on different convolutional neural networks, multi-layer perceptron, and alternative machine learning tasks. Empirical results demonstrate that AdaSmooth works well in practice and compares favorably to other stochastic optimization methods in neural networks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/06/2022

BFE and AdaBFE: A New Approach in Learning Rate Automation for Stochastic Optimization

In this paper, a new gradient-based optimization approach by automatical...
research
09/02/2016

SEBOOST - Boosting Stochastic Learning Using Subspace Optimization Techniques

We present SEBOOST, a technique for boosting the performance of existing...
research
07/09/2022

Improved Binary Forward Exploration: Learning Rate Scheduling Method for Stochastic Optimization

A new gradient-based optimization approach by automatically scheduling t...
research
06/24/2012

Practical recommendations for gradient-based training of deep architectures

Learning algorithms related to artificial neural networks and in particu...
research
12/31/2019

A Dynamic Sampling Adaptive-SGD Method for Machine Learning

We propose a stochastic optimization method for minimizing loss function...
research
05/12/2020

Unified Framework for the Adaptive Operator Selection of Discrete Parameters

We conduct an exhaustive survey of adaptive selection of operators (AOS)...
research
03/26/2018

A disciplined approach to neural network hyper-parameters: Part 1 -- learning rate, batch size, momentum, and weight decay

Although deep learning has produced dazzling successes for applications ...

Please sign up or login with your details

Forgot password? Click here to reset