ES-Based Jacobian Enables Faster Bilevel Optimization

10/13/2021
by   Daouda Sow, et al.
0

Bilevel optimization (BO) has arisen as a powerful tool for solving many modern machine learning problems. However, due to the nested structure of BO, existing gradient-based methods require second-order derivative approximations via Jacobian- or/and Hessian-vector computations, which can be very costly in practice, especially with large neural network models. In this work, we propose a novel BO algorithm, which adopts Evolution Strategies (ES) based method to approximate the response Jacobian matrix in the hypergradient of BO, and hence fully eliminates all second-order computations. We call our algorithm as ESJ (which stands for the ES-based Jacobian method) and further extend it to the stochastic setting as ESJ-S. Theoretically, we characterize the convergence guarantee and computational complexity for our algorithms. Experimentally, we demonstrate the superiority of our proposed algorithms compared to the state of the art methods on various bilevel problems. Particularly, in our experiment in the few-shot meta-learning problem, we meta-learn the twelve millions parameters of a ResNet-12 network over the miniImageNet dataset, which evidently demonstrates the scalability of our ES-based bilevel approach and its feasibility in the large-scale setting.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/12/2016

Second-Order Stochastic Optimization for Machine Learning in Linear Time

First-order stochastic methods are the state-of-the-art in large-scale m...
research
03/01/2022

A Constrained Optimization Approach to Bilevel Optimization with Multiple Inner Minima

Bilevel optimization has found extensive applications in modern machine ...
research
10/15/2020

Provably Faster Algorithms for Bilevel Optimization and Applications to Meta-Learning

Bilevel optimization has arisen as a powerful tool for many machine lear...
research
10/06/2021

Online Hyperparameter Meta-Learning with Hypergradient Distillation

Many gradient-based meta-learning methods assume a set of parameters tha...
research
06/08/2020

Multi-step Estimation for Gradient-based Meta-learning

Gradient-based meta-learning approaches have been successful in few-shot...
research
02/20/2020

Second Order Optimization Made Practical

Optimization in machine learning, both theoretical and applied, is prese...
research
10/19/2021

SOSP: Efficiently Capturing Global Correlations by Second-Order Structured Pruning

Pruning neural networks reduces inference time and memory costs. On stan...

Please sign up or login with your details

Forgot password? Click here to reset