99

01/27/2019
by   Konstantin Mishchenko, et al.
0

It is well known that many optimization methods, including SGD, SAGA, and Accelerated SGD for over-parameterized models, do not scale linearly in the parallel setting. In this paper, we present a new version of block coordinate descent that solves this issue for a number of methods. The core idea is to make the sampling of coordinate blocks on each parallel unit independent of the others. Surprisingly, we prove that the optimal number of blocks to be updated by each of n units in every iteration is equal to m/n, where m is the total number of blocks. As an illustration, this means that when n=100 parallel units are used, 99% of work is a waste of time. We demonstrate that with m/n blocks used by each unit the iteration complexity often remains the same. Among other applications which we mention, this fact can be exploited in the setting of distributed optimization to break the communication bottleneck. Our claims are justified by numerical experiments which demonstrate almost a perfect match with our theory on a number of datasets.

READ FULL TEXT
research
11/16/2020

Avoiding Communication in Logistic Regression

Stochastic gradient descent (SGD) is one of the most widely used optimiz...
research
12/09/2015

Efficient Distributed SGD with Variance Reduction

Stochastic Gradient Descent (SGD) has become one of the most popular opt...
research
12/20/2013

Accelerated, Parallel and Proximal Coordinate Descent

We propose a new stochastic coordinate descent method for minimizing the...
research
12/04/2012

Parallel Coordinate Descent Methods for Big Data Optimization

In this work we show that randomized (block) coordinate descent methods ...
research
12/20/2021

Distributed and Stochastic Optimization Methods with Gradient Compression and Local Steps

In this thesis, we propose new theoretical frameworks for the analysis o...
research
09/10/2019

Better Communication Complexity for Local SGD

We revisit the local Stochastic Gradient Descent (local SGD) method and ...
research
11/29/2018

Feature selection with optimal coordinate ascent (OCA)

In machine learning, Feature Selection (FS) is a major part of efficient...

Please sign up or login with your details

Forgot password? Click here to reset