Distributed Stochastic Optimization via Adaptive Stochastic Gradient Descent

02/16/2018
by   Ashok Cutkosky, et al.
0

Stochastic convex optimization algorithms are the most popular way to train machine learning models on large-scale data. Scaling up the training process of these models is crucial in many applications, but the most popular algorithm, Stochastic Gradient Descent (SGD), is a serial algorithm that is surprisingly hard to parallelize. In this paper, we propose an efficient distributed stochastic optimization method based on adaptive step sizes and variance reduction techniques. We achieve a linear speedup in the number of machines, small memory footprint, and only a small number of synchronization rounds -- logarithmic in dataset size -- in which the computation nodes communicate with each other. Critically, our approach is a general reduction than parallelizes any serial SGD algorithm, allowing us to leverage the significant progress that has been made in designing adaptive SGD algorithms. We conclude by implementing our algorithm in the Spark distributed framework and exhibit dramatic performance gains on large-scale logistic regression problems.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/01/2018

Optimal Adaptive and Accelerated Stochastic Gradient Descent

Stochastic gradient descent (Sgd) methods are the most powerful optimiza...
research
05/17/2023

Stochastic Ratios Tracking Algorithm for Large Scale Machine Learning Problems

Many machine learning applications and tasks rely on the stochastic grad...
research
10/09/2017

SGD for robot motion? The effectiveness of stochastic optimization on a new benchmark for biped locomotion tasks

Trajectory optimization and posture generation are hard problems in robo...
research
12/09/2015

Efficient Distributed SGD with Variance Reduction

Stochastic Gradient Descent (SGD) has become one of the most popular opt...
research
05/31/2023

Toward Understanding Why Adam Converges Faster Than SGD for Transformers

While stochastic gradient descent (SGD) is still the most popular optimi...
research
04/16/2016

DS-MLR: Exploiting Double Separability for Scaling up Distributed Multinomial Logistic Regression

Scaling multinomial logistic regression to datasets with very large numb...
research
06/09/2021

Fractal Structure and Generalization Properties of Stochastic Optimization Algorithms

Understanding generalization in deep learning has been one of the major ...

Please sign up or login with your details

Forgot password? Click here to reset