Super-efficiency of automatic differentiation for functions defined as a minimum

02/10/2020
by   Pierre Ablin, et al.
13

In min-min optimization or max-min optimization, one has to compute the gradient of a function defined as a minimum. In most cases, the minimum has no closed-form, and an approximation is obtained via an iterative algorithm. There are two usual ways of estimating the gradient of the function: using either an analytic formula obtained by assuming exactness of the approximation, or automatic differentiation through the algorithm. In this paper, we study the asymptotic error made by these estimators as a function of the optimization error. We find that the error of the automatic estimator is close to the square of the error of the analytic estimator, reflecting a super-efficiency phenomenon. The convergence of the automatic estimator greatly depends on the convergence of the Jacobian of the algorithm. We analyze it for gradient descent and stochastic gradient descent and derive convergence rates for the estimators in these cases. Our analysis is backed by numerical experiments on toy problems and on Wasserstein barycenter computation. Finally, we discuss the computational complexity of these estimators and give practical guidelines to chose between them.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/23/2023

One-step differentiation of iterative algorithms

In appropriate frameworks, automatic differentiation is transparent to t...
research
12/08/2020

Convergence Rates for Multi-classs Logistic Regression Near Minimum

Training a neural network is typically done via variations of gradient d...
research
08/27/2021

A Bisection Method Like Algorithm for Approximating Extrema of a Continuous Function

For a continuous function f defined on a closed and bounded domain, ther...
research
12/17/2021

Convergence Rates of Two-Time-Scale Gradient Descent-Ascent Dynamics for Solving Nonconvex Min-Max Problems

There are much recent interests in solving noncovnex min-max optimizatio...
research
05/08/2018

Differential Equations for Modeling Asynchronous Algorithms

Asynchronous stochastic gradient descent (ASGD) is a popular parallel op...
research
10/26/2022

Adaptive scaling of the learning rate by second order automatic differentiation

In the context of the optimization of Deep Neural Networks, we propose t...
research
03/25/2022

Estimation of high dimensional Gamma convolutions through random projections

Multivariate generalized Gamma convolutions are distributions defined by...

Please sign up or login with your details

Forgot password? Click here to reset