Nystrom Method for Accurate and Scalable Implicit Differentiation

02/20/2023
by   Ryuichiro Hataya, et al.
0

The essential difficulty of gradient-based bilevel optimization using implicit differentiation is to estimate the inverse Hessian vector product with respect to neural network parameters. This paper proposes to tackle this problem by the Nystrom method and the Woodbury matrix identity, exploiting the low-rankness of the Hessian. Compared to existing methods using iterative approximation, such as conjugate gradient and the Neumann series approximation, the proposed method avoids numerical instability and can be efficiently computed in matrix operations without iterations. As a result, the proposed method works stably in various tasks and is faster than iterative approximations. Throughout experiments including large-scale hyperparameter optimization and meta learning, we demonstrate that the Nystrom method consistently achieves comparable or even superior performance to other approaches. The source code is available from https://github.com/moskomule/hypergrad.

READ FULL TEXT
research
09/19/2022

BOME! Bilevel Optimization Made Easy: A Simple First-Order Approach

Bilevel optimization (BO) is useful for solving a variety of important m...
research
03/31/2023

Scalable Bayesian Meta-Learning through Generalized Implicit Gradients

Meta-learning owns unique effectiveness and swiftness in tackling emergi...
research
10/06/2021

Online Hyperparameter Meta-Learning with Hypergradient Distillation

Many gradient-based meta-learning methods assume a set of parameters tha...
research
11/06/2019

Optimizing Millions of Hyperparameters by Implicit Differentiation

We propose an algorithm for inexpensive gradient-based hyperparameter op...
research
05/21/2018

Small steps and giant leaps: Minimal Newton solvers for Deep Learning

We propose a fast second-order method that can be used as a drop-in repl...
research
08/22/2023

Understanding Hessian Alignment for Domain Generalization

Out-of-distribution (OOD) generalization is a critical ability for deep ...
research
09/05/2018

IKA: Independent Kernel Approximator

This paper describes a new method for low rank kernel approximation call...

Please sign up or login with your details

Forgot password? Click here to reset