The Curse of Unrolling: Rate of Differentiating Through Optimization

09/27/2022
by   Damien Scieur, et al.
0

Computing the Jacobian of the solution of an optimization problem is a central problem in machine learning, with applications in hyperparameter optimization, meta-learning, optimization as a layer, and dataset distillation, to name a few. Unrolled differentiation is a popular heuristic that approximates the solution using an iterative solver and differentiates it through the computational path. This work provides a non-asymptotic convergence-rate analysis of this approach on quadratic objectives for gradient descent and the Chebyshev method. We show that to ensure convergence of the Jacobian, we can either 1) choose a large learning rate leading to a fast asymptotic convergence but accept that the algorithm may have an arbitrarily long burn-in phase or 2) choose a smaller learning rate leading to an immediate but slower convergence. We refer to this phenomenon as the curse of unrolling. Finally, we discuss open problems relative to this approach, such as deriving a practical update rule for the optimal unrolling strategy and making novel connections with the field of Sobolev orthogonal polynomials.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/31/2021

Bilevel Optimization for Machine Learning: Algorithm Design and Convergence Analysis

Bilevel optimization has become a powerful framework in various machine ...
research
03/14/2017

Online Learning Rate Adaptation with Hypergradient Descent

We introduce a general method for improving the convergence rate of grad...
research
01/22/2019

DTN: A Learning Rate Scheme with Convergence Rate of O(1/t) for SGD

We propose a novel diminishing learning rate scheme, coined Decreasing-T...
research
09/12/2023

ELRA: Exponential learning rate adaption gradient descent optimization method

We present a novel, fast (exponential rate adaption), ab initio (hyper-p...
research
07/06/2020

Refined Analysis of the Asymptotic Complexity of the Number Field Sieve

The classical heuristic complexity of the Number Field Sieve (NFS) is th...
research
09/19/2022

BOME! Bilevel Optimization Made Easy: A Simple First-Order Approach

Bilevel optimization (BO) is useful for solving a variety of important m...
research
05/25/2021

Saddle Point Optimization with Approximate Minimization Oracle and its Application to Robust Berthing Control

We propose an approach to saddle point optimization relying only on an o...

Please sign up or login with your details

Forgot password? Click here to reset