Randomized Stochastic Variance-Reduced Methods for Stochastic Bilevel Optimization

by   Zhishuai Guo, et al.

In this paper, we consider non-convex stochastic bilevel optimization (SBO) problems that have many applications in machine learning. Although numerous studies have proposed stochastic algorithms for solving these problems, they are limited in two perspectives: (i) their sample complexities are high, which do not match the state-of-the-art result for non-convex stochastic optimization; (ii) their algorithms are tailored to problems with only one lower-level problem. When there are many lower-level problems, it could be prohibitive to process all these lower-level problems at each iteration. To address these limitations, this paper proposes fast randomized stochastic algorithms for non-convex SBO problems. First, we present a stochastic method for non-convex SBO with only one lower problem and establish its sample complexity of O(1/ϵ^3) for finding an ϵ-stationary point under appropriate conditions, matching the lower bound for stochastic smooth non-convex optimization. Second, we present a randomized stochastic method for non-convex SBO with m>1 lower level problems by processing only one lower problem at each iteration, and establish its sample complexity no worse than O(m/ϵ^3), which could have a better complexity than simply processing all m lower problems at each iteration. To the best of our knowledge, this is the first work considering SBO with many lower level problems and establishing state-of-the-art sample complexity.


page 1

page 2

page 3

page 4


Optimal Algorithms for Stochastic Multi-Level Compositional Optimization

In this paper, we investigate the problem of stochastic multi-level comp...

Stochastic Constrained DRO with a Complexity Independent of Sample Size

Distributionally Robust Optimization (DRO), as a popular method to train...

Blockwise Stochastic Variance-Reduced Methods with Parallel Speedup for Multi-Block Bilevel Optimization

In this paper, we consider non-convex multi-block bilevel optimization (...

Optimizing NOTEARS Objectives via Topological Swaps

Recently, an intriguing class of non-convex optimization problems has em...

Optimal Complexity in Non-Convex Decentralized Learning over Time-Varying Networks

Decentralized optimization with time-varying networks is an emerging par...

Optimal Algorithms for Stochastic Bilevel Optimization under Relaxed Smoothness Conditions

Stochastic Bilevel optimization usually involves minimizing an upper-lev...

A Homogenization Approach for Gradient-Dominated Stochastic Optimization

Gradient dominance property is a condition weaker than strong convexity,...

Please sign up or login with your details

Forgot password? Click here to reset