DeepAI AI Chat
Log In Sign Up

Towards Gradient-based Bilevel Optimization with Non-convex Followers and Beyond

by   Risheng Liu, et al.
Southern University of Science & Technology
Dalian University of Technology
NetEase, Inc

In recent years, Bi-Level Optimization (BLO) techniques have received extensive attentions from both learning and vision communities. A variety of BLO models in complex and practical tasks are of non-convex follower structure in nature (a.k.a., without Lower-Level Convexity, LLC for short). However, this challenging class of BLOs is lack of developments on both efficient solution strategies and solid theoretical guarantees. In this work, we propose a new algorithmic framework, named Initialization Auxiliary and Pessimistic Trajectory Truncated Gradient Method (IAPTT-GM), to partially address the above issues. In particular, by introducing an auxiliary as initialization to guide the optimization dynamics and designing a pessimistic trajectory truncation operation, we construct a reliable approximate version of the original BLO in the absence of LLC hypothesis. Our theoretical investigations establish the convergence of solutions returned by IAPTT-GM towards those of the original BLO without LLC. As an additional bonus, we also theoretically justify the quality of our IAPTT-GM embedded with Nesterov's accelerated dynamics under LLC. The experimental results confirm both the convergence of our algorithm without LLC, and the theoretical findings under LLC.


page 1

page 2

page 3

page 4


Value-Function-based Sequential Minimization for Bi-level Optimization

Gradient-based Bi-Level Optimization (BLO) methods have been widely appl...

A Generic First-Order Algorithmic Framework for Bi-Level Programming Beyond Lower-Level Singleton

In recent years, a variety of gradient-based first-order methods have be...

A Generic Descent Aggregation Framework for Gradient-based Bi-level Optimization

In recent years, gradient-based methods for solving bi-level optimizatio...

A Value-Function-based Interior-point Method for Non-convex Bi-level Optimization

Bi-level optimization model is able to capture a wide range of complex l...

Active Slices for Sliced Stein Discrepancy

Sliced Stein discrepancy (SSD) and its kernelized variants have demonstr...

Learning to Initialize Gradient Descent Using Gradient Descent

Non-convex optimization problems are challenging to solve; the success a...

A Deterministic Convergence Framework for Exact Non-Convex Phase Retrieval

In this work, we analyze the non-convex framework of Wirtinger Flow (WF)...