Avoiding Communication in Proximal Methods for Convex Optimization Problems

10/24/2017
by   Saeed Soori, et al.
0

The fast iterative soft thresholding algorithm (FISTA) is used to solve convex regularized optimization problems in machine learning. Distributed implementations of the algorithm have become popular since they enable the analysis of large datasets. However, existing formulations of FISTA communicate data at every iteration which reduces its performance on modern distributed architectures. The communication costs of FISTA, including bandwidth and latency costs, is closely tied to the mathematical formulation of the algorithm. This work reformulates FISTA to communicate data at every k iterations and reduce data communication when operating on large data sets. We formulate the algorithm for two different optimization methods on the Lasso problem and show that the latency cost is reduced by a factor of k while bandwidth and floating-point operation costs remain the same. The convergence rates and stability properties of the reformulated algorithms are similar to the standard formulations. The performance of communication-avoiding FISTA and Proximal Newton methods is evaluated on 1 to 1024 nodes for multiple benchmarks and demonstrate average speedups of 3-10x with scaling properties that outperform the classical algorithms.

READ FULL TEXT
research
12/17/2017

Avoiding Synchronization in First-Order Methods for Sparse Convex Optimization

Parallel computing has played an important role in speeding up convex op...
research
07/18/2018

Distributed Second-order Convex Optimization

Convex optimization problems arise frequently in diverse machine learnin...
research
10/23/2017

Communication-avoiding Cholesky-QR2 for rectangular matrices

The need for scalable algorithms to solve least squares and eigenvalue p...
research
06/07/2022

Distributed Newton-Type Methods with Communication Compression and Bernoulli Aggregation

Despite their high computation and communication costs, Newton-type meth...
research
07/20/2018

signProx: One-Bit Proximal Algorithm for Nonconvex Stochastic Optimization

Stochastic gradient descent (SGD) is one of the most widely used optimiz...
research
11/16/2020

Avoiding Communication in Logistic Regression

Stochastic gradient descent (SGD) is one of the most widely used optimiz...
research
05/27/2019

Parallel and Communication Avoiding Least Angle Regression

We are interested in parallelizing the Least Angle Regression (LARS) alg...

Please sign up or login with your details

Forgot password? Click here to reset