Bilevel stochastic methods for optimization and machine learning: Bilevel stochastic descent and DARTS

by   Tommaso Giovannelli, et al.

Two-level stochastic optimization formulations have become instrumental in a number of machine learning contexts such as neural architecture search, continual learning, adversarial learning, and hyperparameter tuning. Practical stochastic bilevel optimization problems become challenging in optimization or learning scenarios where the number of variables is high or there are constraints. The goal of this paper is twofold. First, we aim at promoting the use of bilevel optimization in large-scale learning and we introduce a practical bilevel stochastic gradient method (BSG-1) that requires neither lower level second-order derivatives nor system solves (and dismisses any matrix-vector products). Our BSG-1 method is close to first-order principles, which allows it to achieve a performance better than those that are not, such as DARTS. Second, we develop bilevel stochastic gradient descent for bilevel problems with lower level constraints, and we introduce a convergence theory that covers the unconstrained and constrained cases and abstracts as much as possible from the specifics of the bilevel gradient calculation.


page 1

page 2

page 3

page 4


Second-Order Stochastic Optimization for Machine Learning in Linear Time

First-order stochastic methods are the state-of-the-art in large-scale m...

Topology Optimization under Uncertainty using a Stochastic Gradient-based Approach

Topology optimization under uncertainty (TOuU) often defines objectives ...

Inverse design of photonic devices with strict foundry fabrication constraints

We introduce a new method for inverse design of nanophotonic devices whi...

Without-Replacement Sampling for Stochastic Gradient Methods: Convergence Results and Application to Distributed Optimization

Stochastic gradient methods for machine learning and optimization proble...

Optimization Methods for Large-Scale Machine Learning

This paper provides a review and commentary on the past, present, and fu...

Scalable Second Order Optimization for Deep Learning

Optimization in machine learning, both theoretical and applied, is prese...

Pushing Stochastic Gradient towards Second-Order Methods -- Backpropagation Learning with Transformations in Nonlinearities

Recently, we proposed to transform the outputs of each hidden neuron in ...