A Penalty Based Method for Communication-Efficient Decentralized Bilevel Programming

by   Parvin Nazari, et al.

Bilevel programming has recently received attention in the literature, due to a wide range of applications, including reinforcement learning and hyper-parameter optimization. However, it is widely assumed that the underlying bilevel optimization problem is solved either by a single machine or in the case of multiple machines connected in a star-shaped network, i.e., federated learning setting. The latter approach suffers from a high communication cost on the central node (e.g., parameter server) and exhibits privacy vulnerabilities. Hence, it is of interest to develop methods that solve bilevel optimization problems in a communication-efficient decentralized manner. To that end, this paper introduces a penalty function based decentralized algorithm with theoretical guarantees for this class of optimization problems. Specifically, a distributed alternating gradient-type algorithm for solving consensus bilevel programming over a decentralized network is developed. A key feature of the proposed algorithm is to estimate the hyper-gradient of the penalty function via decentralized computation of matrix-vector products and few vector communications, which is then integrated within our alternating algorithm to give the finite-time convergence analysis under different convexity assumptions. Owing to the generality of this complexity analysis, our result yields convergence rates for a wide variety of consensus problems including minimax and compositional optimization. Empirical results on both synthetic and real datasets demonstrate that the proposed method works well in practice.


A Decentralized Adaptive Momentum Method for Solving a Class of Min-Max Optimization Problems

Min-max saddle point games have recently been intensely studied, due to ...

On Penalty-based Bilevel Gradient Descent Method

Bilevel optimization enjoys a wide range of applications in hyper-parame...

Adaptive Consensus: A network pruning approach for decentralized optimization

We consider network-based decentralized optimization problems, where eac...

DADAM: A Consensus-based Distributed Adaptive Gradient Method for Online Optimization

Adaptive gradient-based optimization methods such as ADAGRAD, RMSPROP, a...

Decentralized Stochastic Gradient Descent Ascent for Finite-Sum Minimax Problems

Minimax optimization problems have attracted significant attention in re...

DESTRESS: Computation-Optimal and Communication-Efficient Decentralized Nonconvex Finite-Sum Optimization

Emerging applications in multi-agent environments such as internet-of-th...

Communication-Efficient Zeroth-Order Distributed Online Optimization: Algorithm, Theory, and Applications

This paper focuses on a multi-agent zeroth-order online optimization pro...

Please sign up or login with your details

Forgot password? Click here to reset