Make Workers Work Harder: Decoupled Asynchronous Proximal Stochastic Gradient Descent

05/21/2016
by   Yitan Li, et al.
0

Asynchronous parallel optimization algorithms for solving large-scale machine learning problems have drawn significant attention from academia to industry recently. This paper proposes a novel algorithm, decoupled asynchronous proximal stochastic gradient descent (DAP-SGD), to minimize an objective function that is the composite of the average of multiple empirical losses and a regularization term. Unlike the traditional asynchronous proximal stochastic gradient descent (TAP-SGD) in which the master carries much of the computation load, the proposed algorithm off-loads the majority of computation tasks from the master to workers, and leaves the master to conduct simple addition operations. This strategy yields an easy-to-parallelize algorithm, whose performance is justified by theoretical convergence analyses. To be specific, DAP-SGD achieves an O( T/T) rate when the step-size is diminishing and an ergodic O(1/√(T)) rate when the step-size is constant, where T is the number of total iterations.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/24/2015

Fast Asynchronous Parallel Stochastic Gradient Decent

Stochastic gradient descent (SGD) and its variants have become more and ...
research
10/19/2018

A Model Parallel Proximal Stochastic Gradient Algorithm for Partially Asynchronous Systems

Large models are prevalent in modern machine learning scenarios, includi...
research
05/17/2023

Stochastic Ratios Tracking Algorithm for Large Scale Machine Learning Problems

Many machine learning applications and tasks rely on the stochastic grad...
research
12/03/2015

Kalman-based Stochastic Gradient Method with Stop Condition and Insensitivity to Conditioning

Modern proximal and stochastic gradient descent (SGD) methods are believ...
research
01/31/2022

Lightweight Projective Derivative Codes for Compressed Asynchronous Gradient Descent

Coded distributed computation has become common practice for performing ...
research
01/18/2021

Screening for Sparse Online Learning

Sparsity promoting regularizers are widely used to impose low-complexity...
research
10/06/2018

Anytime Stochastic Gradient Descent: A Time to Hear from all the Workers

In this paper, we focus on approaches to parallelizing stochastic gradie...

Please sign up or login with your details

Forgot password? Click here to reset