Efficiently Escaping Saddle Points in Bilevel Optimization

by   Minhui Huang, et al.

Bilevel optimization is one of the fundamental problems in machine learning and optimization. Recent theoretical developments in bilevel optimization focus on finding the first-order stationary points for nonconvex-strongly-convex cases. In this paper, we analyze algorithms that can escape saddle points in nonconvex-strongly-convex bilevel optimization. Specifically, we show that the perturbed approximate implicit differentiation (AID) with a warm start strategy finds ϵ-approximate local minimum of bilevel optimization in Õ(ϵ^-2) iterations with high probability. Moreover, we propose an inexact NEgative-curvature-Originated-from-Noise Algorithm (iNEON), a pure first-order algorithm that can escape saddle point and find local minimum of stochastic bilevel optimization. As a by-product, we provide the first nonasymptotic analysis of perturbed multi-step gradient descent ascent (GDmax) algorithm that converges to local minimax point for minimax problems.


page 1

page 2

page 3

page 4


SSRGD: Simple Stochastic Recursive Gradient Descent for Escaping Saddle Points

We analyze stochastic gradient algorithms for optimizing nonconvex probl...

Amortized Implicit Differentiation for Stochastic Bilevel Optimization

We study a class of algorithms for solving bilevel optimization problems...

Local Saddle Point Optimization: A Curvature Exploitation Approach

Gradient-based optimization methods are the most popular choice for find...

Escaping Saddle Points for Nonsmooth Weakly Convex Functions via Perturbed Proximal Algorithms

We propose perturbed proximal algorithms that can provably escape strict...

SONIA: A Symmetric Blockwise Truncated Optimization Algorithm

This work presents a new algorithm for empirical risk minimization. The ...

Analysis of the Optimization Landscapes for Overcomplete Representation Learning

We study nonconvex optimization landscapes for learning overcomplete rep...

Convex and Nonconvex Optimization Are Both Minimax-Optimal for Noisy Blind Deconvolution

We investigate the effectiveness of convex relaxation and nonconvex opti...