Self-Tuning Stochastic Optimization with Curvature-Aware Gradient Filtering

11/09/2020
βˆ™
by   Ricky T. Q. Chen, et al.
βˆ™
0
βˆ™

Standard first-order stochastic optimization algorithms base their updates solely on the average mini-batch gradient, and it has been shown that tracking additional quantities such as the curvature can help de-sensitize common hyperparameters. Based on this intuition, we explore the use of exact per-sample Hessian-vector products and gradients to construct optimizers that are self-tuning and hyperparameter-free. Based on a dynamics model of the gradient, we derive a process which leads to a curvature-corrected, noise-adaptive online gradient estimate. The smoothness of our updates makes it more amenable to simple step size selection schemes, which we also base off of our estimates quantities. We prove that our model-based procedure converges in the noisy quadratic setting. Though we do not see similar gains in deep learning tasks, we can match the performance of well-tuned optimizers and ultimately, this is an interesting step for constructing self-tuning optimizers.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
βˆ™ 01/16/2013

Training Neural Networks with Stochastic Hessian-Free Optimization

Hessian-free (HF) optimization has been successfully used for training d...
research
βˆ™ 02/11/2015

Gradient-based Hyperparameter Optimization through Reversible Learning

Tuning hyperparameters of learning algorithms is hard because gradients ...
research
βˆ™ 08/10/2022

Adaptive Learning Rates for Faster Stochastic Gradient Methods

In this work, we propose new adaptive step size strategies that improve ...
research
βˆ™ 12/23/2019

BackPACK: Packing more into backprop

Automatic differentiation frameworks are optimized for exactly one thing...
research
βˆ™ 03/05/2021

Second-order step-size tuning of SGD for non-convex optimization

In view of a direct and simple improvement of vanilla SGD, this paper pr...
research
βˆ™ 10/03/2020

Expectigrad: Fast Stochastic Optimization with Robust Convergence Properties

Many popular adaptive gradient methods such as Adam and RMSProp rely on ...
research
βˆ™ 07/17/2023

Hyperparameter Tuning Cookbook: A guide for scikit-learn, PyTorch, river, and spotPython

This document provides a comprehensive guide to hyperparameter tuning us...

Please sign up or login with your details

Forgot password? Click here to reset