Hybrid Decentralized Optimization: First- and Zeroth-Order Optimizers Can Be Jointly Leveraged For Faster Convergence

10/14/2022
by   Shayan Talaei, et al.
0

Distributed optimization has become one of the standard ways of speeding up machine learning training, and most of the research in the area focuses on distributed first-order, gradient-based methods. Yet, there are settings where some computationally-bounded nodes may not be able to implement first-order, gradient-based optimization, while they could still contribute to joint optimization tasks. In this paper, we initiate the study of hybrid decentralized optimization, studying settings where nodes with zeroth-order and first-order optimization capabilities co-exist in a distributed system, and attempt to jointly solve an optimization task over some data distribution. We essentially show that, under reasonable parameter settings, such a system can not only withstand noisier zeroth-order agents but can even benefit from integrating such agents into the optimization process, rather than ignoring their information. At the core of our approach is a new analysis of distributed optimization with noisy and possibly-biased gradient estimators, which may be of independent interest. Experimental results on standard optimization tasks confirm our analysis, showing that hybrid first-zeroth order optimization can be practical.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/25/2019

DADAM: A Consensus-based Distributed Adaptive Gradient Method for Online Optimization

Adaptive gradient-based optimization methods such as ADAGRAD, RMSPROP, a...
research
04/28/2011

Distributed Delayed Stochastic Optimization

We analyze the convergence of gradient-based optimization algorithms tha...
research
12/16/2018

Stochastic Distributed Optimization for Machine Learning from Decentralized Features

Distributed machine learning has been widely studied in the literature t...
research
10/27/2019

PopSGD: Decentralized Stochastic Gradient Descent in the Population Model

The population model is a standard way to represent large-scale decentra...
research
01/27/2022

Distributed gradient-based optimization in the presence of dependent aperiodic communication

Iterative distributed optimization algorithms involve multiple agents th...
research
11/09/2020

BayGo: Joint Bayesian Learning and Information-Aware Graph Optimization

This article deals with the problem of distributed machine learning, in ...
research
02/17/2012

Extended Mixture of MLP Experts by Hybrid of Conjugate Gradient Method and Modified Cuckoo Search

This paper investigates a new method for improving the learning algorith...

Please sign up or login with your details

Forgot password? Click here to reset