DADAM: A Consensus-based Distributed Adaptive Gradient Method for Online Optimization

01/25/2019
by   Parvin Nazari, et al.
0

Adaptive gradient-based optimization methods such as ADAGRAD, RMSPROP, and ADAM are widely used in solving large-scale machine learning problems including deep learning. A number of schemes have been proposed in the literature aiming at parallelizing them, based on communications of peripheral nodes with a central node, but incur high communications cost. To address this issue, we develop a novel consensus-based distributed adaptive moment estimation method (DADAM) for online optimization over a decentralized network that enables data parallelization, as well as decentralized computation. The method is particularly useful, since it can accommodate settings where access to local data is allowed. Further, as established theoretically in this work, it can outperform centralized adaptive algorithms, for certain classes of loss functions used in applications. We analyze the convergence properties of the proposed algorithm and provide a dynamic regret bound on the convergence rate of adaptive moment estimation methods in both stochastic and deterministic settings. Empirical results demonstrate that DADAM works also well in practice and compares favorably to competing online optimization methods.

READ FULL TEXT
research
07/23/2019

Decentralized Stochastic First-Order Methods for Large-scale Machine Learning

Decentralized consensus-based optimization is a general computational fr...
research
10/14/2022

Hybrid Decentralized Optimization: First- and Zeroth-Order Optimizers Can Be Jointly Leveraged For Faster Convergence

Distributed optimization has become one of the standard ways of speeding...
research
11/04/2021

Finite-Time Consensus Learning for Decentralized Optimization with Nonlinear Gossiping

Distributed learning has become an integral tool for scaling up machine ...
research
06/03/2020

AdaVol: An Adaptive Recursive Volatility Prediction Method

Quasi-Maximum Likelihood (QML) procedures are theoretically appealing an...
research
11/08/2022

A Penalty Based Method for Communication-Efficient Decentralized Bilevel Programming

Bilevel programming has recently received attention in the literature, d...
research
04/04/2018

GoSGD: Distributed Optimization for Deep Learning with Gossip Exchange

We address the issue of speeding up the training of convolutional neural...
research
06/01/2019

Adaptive Online Learning for Gradient-Based Optimizers

As application demands for online convex optimization accelerate, the ne...

Please sign up or login with your details

Forgot password? Click here to reset