Smoothness Matrices Beat Smoothness Constants: Better Communication Compression Techniques for Distributed Optimization

02/14/2021
by   Mher Safaryan, et al.
16

Large scale distributed optimization has become the default tool for the training of supervised machine learning models with a large number of parameters and training data. Recent advancements in the field provide several mechanisms for speeding up the training, including compressed communication, variance reduction and acceleration. However, none of these methods is capable of exploiting the inherently rich data-dependent smoothness structure of the local losses beyond standard smoothness constants. In this paper, we argue that when training supervised models, smoothness matrices – information-rich generalizations of the ubiquitous smoothness constants – can and should be exploited for further dramatic gains, both in theory and practice. In order to further alleviate the communication burden inherent in distributed optimization, we propose a novel communication sparsification strategy that can take full advantage of the smoothness matrices associated with local losses. To showcase the power of this tool, we describe how our sparsification technique can be adapted to three distributed optimization algorithms – DCGD, DIANA and ADIANA – yielding significant savings in terms of communication complexity. The new methods always outperform the baselines, often dramatically so.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/07/2021

Smoothness-Aware Quantization Techniques

Distributed machine learning has become an indispensable tool for traini...
research
12/14/2020

A case for new neural network smoothness constraints

How sensitive should machine learning models be to input changes? We tac...
research
06/28/2023

Towards a Better Theoretical Understanding of Independent Subnetwork Training

Modern advancements in large-scale machine learning would be impossible ...
research
12/20/2019

Distributed Fixed Point Methods with Compressed Iterates

We propose basic and natural assumptions under which iterative optimizat...
research
08/13/2020

Training Faster with Compressed Gradient

Although the distributed machine learning methods show the potential for...
research
02/12/2022

Escaping Saddle Points with Bias-Variance Reduced Local Perturbed SGD for Communication Efficient Nonconvex Distributed Learning

In recent centralized nonconvex distributed learning and federated learn...
research
12/12/2020

Learning Representations from Temporally Smooth Data

Events in the real world are correlated across nearby points in time, an...

Please sign up or login with your details

Forgot password? Click here to reset