HyperAdam: A Learnable Task-Adaptive Adam for Network Training

11/22/2018
by   Shipeng Wang, et al.
0

Deep neural networks are traditionally trained using human-designed stochastic optimization algorithms, such as SGD and Adam. Recently, the approach of learning to optimize network parameters has emerged as a promising research topic. However, these learned black-box optimizers sometimes do not fully utilize the experience in human-designed optimizers, therefore have limitation in generalization ability. In this paper, a new optimizer, dubbed as HyperAdam, is proposed that combines the idea of "learning to optimize" and traditional Adam optimizer. Given a network for training, its parameter update in each iteration generated by HyperAdam is an adaptive combination of multiple updates generated by Adam with varying decay rates. The combination weights and decay rates in HyperAdam are adaptively learned depending on the task. HyperAdam is modeled as a recurrent neural network with AdamCell, WeightCell and StateCell. It is justified to be state-of-the-art for various network training, such as multilayer perceptron, CNN and LSTM.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/21/2022

Amos: An Adam-style Optimizer with Adaptive Weight Decay towards Model-Oriented Scale

We present Amos, a stochastic gradient-based optimizer designed for trai...
research
10/21/2019

Learning to Learn by Zeroth-Order Oracle

In the learning to learn (L2L) framework, we cast the design of optimiza...
research
12/02/2022

Transformer-Based Learned Optimization

In this paper, we propose a new approach to learned optimization. As com...
research
06/02/2021

A Generalizable Approach to Learning Optimizers

A core issue with learning to optimize neural networks has been the lack...
research
04/11/2018

Adafactor: Adaptive Learning Rates with Sublinear Memory Cost

In several recently proposed stochastic optimization methods (e.g. RMSPr...
research
02/28/2017

Learning What Data to Learn

Machine learning is essentially the sciences of playing with data. An ad...
research
11/29/2022

Learning to Optimize with Dynamic Mode Decomposition

Designing faster optimization algorithms is of ever-growing interest. In...

Please sign up or login with your details

Forgot password? Click here to reset