On the Effectiveness of Richardson Extrapolation in Machine Learning

02/07/2020
by   Francis Bach, et al.
0

Richardson extrapolation is a classical technique from numerical analysis that can improve the approximation error of an estimation method by combining linearly several estimates obtained from different values of one of its hyperparameters, without the need to know in details the inner structure of the original estimation method. The main goal of this paper is to study when Richardson extrapolation can be used within machine learning, beyond the existing applications to step-size adaptations in stochastic gradient descent. We identify two situations where Richardson interpolation can be useful: (1) when the hyperparameter is the number of iterations of an existing iterative optimization algorithm, with applications to averaged gradient descent and Frank-Wolfe algorithms (where we obtain asymptotically rates of O(1/k^2) on polytopes, where k is the number of iterations), and (2) when it is a regularization parameter, with applications to Nesterov smoothing techniques for minimizing non-smooth functions (where we obtain asymptotically rates close to O(1/k^2) for non-smooth functions), and ridge regression. In all these cases, we show that extrapolation techniques come with no significant loss in performance, but with sometimes strong gains, and we provide theoretical justifications based on asymptotic developments for such gains, as well as empirical illustrations on classical problems from machine learning.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/10/2013

Non-strongly-convex smooth stochastic approximation with convergence rate O(1/n)

We consider the stochastic approximation problem where a convex function...
research
04/03/2022

Understanding the unstable convergence of gradient descent

Most existing analyses of (stochastic) gradient descent rely on the cond...
research
09/03/2023

Modified Step Size for Enhanced Stochastic Gradient Descent: Convergence and Experiments

This paper introduces a novel approach to enhance the performance of the...
research
08/10/2022

Adaptive Learning Rates for Faster Stochastic Gradient Methods

In this work, we propose new adaptive step size strategies that improve ...
research
01/18/2021

Screening for Sparse Online Learning

Sparsity promoting regularizers are widely used to impose low-complexity...
research
07/03/2020

Mathematical Perspective of Machine Learning

We take a closer look at some theoretical challenges of Machine Learning...
research
05/12/2023

Distributed Gradient Descent for Functional Learning

In recent years, different types of distributed learning schemes have re...

Please sign up or login with your details

Forgot password? Click here to reset