On the Iteration Complexity of Hypergradient Computation

06/29/2020
by   Riccardo Grazzi, et al.
11

We study a general class of bilevel problems, consisting in the minimization of an upper-level objective which depends on the solution to a parametric fixed-point equation. Important instances arising in machine learning include hyperparameter optimization, meta-learning, and certain graph and recurrent neural networks. Typically the gradient of the upper-level objective (hypergradient) is hard or even impossible to compute exactly, which has raised the interest in approximation methods. We investigate some popular approaches to compute the hypergradient, based on reverse mode iterative differentiation and approximate implicit differentiation. Under the hypothesis that the fixed point equation is defined by a contraction mapping, we present a unified analysis which allows for the first time to quantitatively compare these methods, providing explicit bounds for their iteration complexity. This analysis suggests a hierarchy in terms of computational efficiency among the above methods, with approximate implicit differentiation based on conjugate gradient performing best. We present an extensive experimental comparison among the methods which confirm the theoretical findings.

READ FULL TEXT
research
02/07/2022

Bilevel Optimization with a Lower-level Contraction: Optimal Sample Complexity without Warm-Start

We analyze a general class of bilevel problems, in which the upper-level...
research
11/13/2020

Convergence Properties of Stochastic Hypergradients

Bilevel optimization problems are receiving increasing attention in mach...
research
07/31/2021

Bilevel Optimization for Machine Learning: Algorithm Design and Convergence Analysis

Bilevel optimization has become a powerful framework in various machine ...
research
11/29/2021

Amortized Implicit Differentiation for Stochastic Bilevel Optimization

We study a class of algorithms for solving bilevel optimization problems...
research
03/31/2023

Scalable Bayesian Meta-Learning through Generalized Implicit Gradients

Meta-learning owns unique effectiveness and swiftness in tackling emergi...
research
06/09/2021

Convergence of parallel overlapping domain decomposition methods for the Helmholtz equation

We analyse parallel overlapping Schwarz domain decomposition methods for...
research
12/22/2022

Automatically Bounding the Taylor Remainder Series: Tighter Bounds and New Applications

We present a new algorithm for automatically bounding the Taylor remaind...

Please sign up or login with your details

Forgot password? Click here to reset