MetaGrad: Adaptation using Multiple Learning Rates in Online Learning

02/12/2021
by   Tim van Erven, et al.
0

We provide a new adaptive method for online convex optimization, MetaGrad, that is robust to general convex losses but achieves faster rates for a broad class of special functions, including exp-concave and strongly convex functions, but also various types of stochastic and non-stochastic functions without any curvature. We prove this by drawing a connection to the Bernstein condition, which is known to imply fast rates in offline statistical learning. MetaGrad further adapts automatically to the size of the gradients. Its main feature is that it simultaneously considers multiple learning rates, which are weighted directly proportional to their empirical performance on the data using a new meta-algorithm. We provide three versions of MetaGrad. The full matrix version maintains a full covariance matrix and is applicable to learning tasks for which we can afford update time quadratic in the dimension. The other two versions provide speed-ups for high-dimensional learning tasks with an update time that is linear in the dimension: one is based on sketching, the other on running a separate copy of the basic algorithm per coordinate. We evaluate all versions of MetaGrad on benchmark online classification and regression tasks, on which they consistently outperform both online gradient descent and AdaGrad.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/15/2019

Adaptivity and Optimality: A Universal Algorithm for Online Convex Optimization

In this paper, we study adaptive online convex optimization, and aim to ...
research
04/18/2012

Analysis of a Natural Gradient Algorithm on Monotonic Convex-Quadratic-Composite Functions

In this paper we investigate the convergence properties of a variant of ...
research
09/08/2023

Online Submodular Maximization via Online Convex Optimization

We study monotone submodular maximization under general matroid constrai...
research
06/06/2012

No More Pesky Learning Rates

The performance of stochastic gradient descent (SGD) depends critically ...
research
07/12/2012

Optimal rates for first-order stochastic convex optimization under Tsybakov noise condition

We focus on the problem of minimizing a convex function f over a convex ...
research
11/25/2019

Projective Quadratic Regression for Online Learning

This paper considers online convex optimization (OCO) problems - the par...
research
05/25/2023

DoWG Unleashed: An Efficient Universal Parameter-Free Gradient Descent Method

This paper proposes a new easy-to-implement parameter-free gradient-base...

Please sign up or login with your details

Forgot password? Click here to reset