Escaping From Saddle Points Using Asynchronous Coordinate Gradient Descent

11/17/2022
by   Marco Bornstein, et al.
0

Large-scale non-convex optimization problems are expensive to solve due to computational and memory costs. To reduce the costs, first-order (computationally efficient) and asynchronous-parallel (memory efficient) algorithms are necessary to minimize non-convex functions in machine learning. However, asynchronous-first-order methods applied within non-convex settings run into two difficulties: (i) parallelization delays, which affect convergence by disrupting the monotonicity of first-order methods, and (ii) sub-optimal saddle points where the gradient is zero. To solve these two difficulties, we propose an asynchronous-coordinate-gradient-descent algorithm shown to converge to local minima with a bounded delay. Our algorithm overcomes parallelization-delay issues by using a carefully constructed Hamiltonian function. We prove that our designed kinetic-energy term, incorporated within the Hamiltonian, allows our algorithm to decrease monotonically per iteration. Next, our algorithm steers iterates clear of saddle points by utilizing a perturbation sub-routine. Similar to other state-of-the-art (SOTA) algorithms, we achieve a poly-logarithmic convergence rate with respect to dimension. Unlike other SOTA algorithms, which are synchronous, our work is the first to study how parallelization delays affect the convergence rate of asynchronous first-order algorithms. We prove that our algorithm outperforms synchronous counterparts under large parallelization delays, with convergence depending sublinearly with respect to delays. To our knowledge, this is the first local optima convergence result of a first-order asynchronous algorithm for non-convex settings.

READ FULL TEXT
research
05/24/2018

Taming Convergence for Asynchronous Stochastic Gradient Descent with Unbounded Delay in Non-Convex Learning

Understanding the convergence performance of asynchronous stochastic gra...
research
06/15/2022

Asynchronous SGD Beats Minibatch SGD Under Arbitrary Delays

The existing analysis of asynchronous stochastic gradient descent (SGD) ...
research
02/21/2020

Asynchronous parallel adaptive stochastic gradient methods

Stochastic gradient methods (SGMs) are the predominant approaches to tra...
research
03/06/2020

Asynchronous and Parallel Distributed Pose Graph Optimization

We present Asynchronous Stochastic Parallel Pose Graph Optimization (ASA...
research
08/13/2018

AsySPA: An Exact Asynchronous Algorithm for Convex Optimization Over Digraphs

This paper proposes a novel exact asynchronous subgradient-push algorith...
research
02/06/2020

Block Distributed Majorize-Minimize Memory Gradient Algorithm and its application to 3D image restoration

Modern 3D image recovery problems require powerful optimization framewor...
research
07/26/2021

Asynchronous Distributed Reinforcement Learning for LQR Control via Zeroth-Order Block Coordinate Descent

Recently introduced distributed zeroth-order optimization (ZOO) algorith...

Please sign up or login with your details

Forgot password? Click here to reset