Blockwise Stochastic Variance-Reduced Methods with Parallel Speedup for Multi-Block Bilevel Optimization

by   Quanqi Hu, et al.

In this paper, we consider non-convex multi-block bilevel optimization (MBBO) problems, which involve m≫ 1 lower level problems and have important applications in machine learning. Designing a stochastic gradient and controlling its variance is more intricate due to the hierarchical sampling of blocks and data and the unique challenge of estimating hyper-gradient. We aim to achieve three nice properties for our algorithm: (a) matching the state-of-the-art complexity of standard BO problems with a single block; (b) achieving parallel speedup by sampling I blocks and sampling B samples for each sampled block per-iteration; (c) avoiding the computation of the inverse of a high-dimensional Hessian matrix estimator. However, it is non-trivial to achieve all of these by observing that existing works only achieve one or two of these properties. To address the involved challenges for achieving (a, b, c), we propose two stochastic algorithms by using advanced blockwise variance-reduction techniques for tracking the Hessian matrices (for low-dimensional problems) or the Hessian-vector products (for high-dimensional problems), and prove an iteration complexity of O(mϵ^-3𝕀(I<m)/I√(I) + mϵ^-3/I√(B)) for finding an ϵ-stationary point under appropriate conditions. We also conduct experiments to verify the effectiveness of the proposed algorithms comparing with existing MBBO algorithms.


page 1

page 2

page 3

page 4


Randomized Stochastic Variance-Reduced Methods for Stochastic Bilevel Optimization

In this paper, we consider non-convex stochastic bilevel optimization (S...

Stochastic Recursive Variance-Reduced Cubic Regularization Methods

Stochastic Variance-Reduced Cubic regularization (SVRC) algorithms have ...

On Noisy Negative Curvature Descent: Competing with Gradient Descent for Faster Non-convex Optimization

The Hessian-vector product has been utilized to find a second-order stat...

Distributed stochastic gradient tracking algorithm with variance reduction for non-convex optimization

This paper proposes a distributed stochastic algorithm with variance red...

Multi-block-Single-probe Variance Reduced Estimator for Coupled Compositional Optimization

Variance reduction techniques such as SPIDER/SARAH/STORM have been exten...

Accelerating Hessian-free optimization for deep neural networks by implicit preconditioning and sampling

Hessian-free training has become a popular parallel second or- der optim...

Adaptive Stochastic Optimisation of Nonconvex Composite Objectives

In this paper, we propose and analyse a family of generalised stochastic...

Please sign up or login with your details

Forgot password? Click here to reset