An Isometric Stochastic Optimizer

07/24/2023
by   Jacob Jackson, et al.
0

The Adam optimizer is the standard choice in deep learning applications. I propose a simple explanation of Adam's success: it makes each parameter's step size independent of the norms of the other parameters. Based on this principle I derive Iso, a new optimizer which makes the norm of a parameter's update invariant to the application of any linear transformation to its inputs and outputs. I develop a variant of Iso called IsoAdam that allows optimal hyperparameters to be transferred from Adam, and demonstrate that IsoAdam obtains a speedup over Adam when training a small Transformer.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/27/2018

An improvement of the convergence proof of the ADAM-Optimizer

A common way to train neural networks is the Backpropagation. This algor...
research
03/05/2020

On the Convergence of Adam and Adagrad

We provide a simple proof of the convergence of the optimization algorit...
research
12/02/2022

Transformer-Based Learned Optimization

In this paper, we propose a new approach to learned optimization. As com...
research
11/17/2022

VeLO: Training Versatile Learned Optimizers by Scaling Up

While deep learning models have replaced hand-designed features across m...
research
07/28/2023

CoRe Optimizer: An All-in-One Solution for Machine Learning

The optimization algorithm and its hyperparameters can significantly aff...
research
08/05/2020

ClipUp: A Simple and Powerful Optimizer for Distribution-based Policy Evolution

Distribution-based search algorithms are an effective approach for evolu...
research
05/27/2022

Momentum Stiefel Optimizer, with Applications to Suitably-Orthogonal Attention, and Optimal Transport

The problem of optimization on Stiefel manifold, i.e., minimizing functi...

Please sign up or login with your details

Forgot password? Click here to reset