ATOMO: Communication-efficient Learning via Atomic Sparsification

06/11/2018
by   Hongyi Wang, et al.
0

Distributed model training suffers from communication overheads due to frequent gradient updates transmitted between compute nodes. To mitigate these overheads, several studies propose the use of sparsified stochastic gradients. We argue that these are facets of a general sparsification method that can operate on any possible atomic decomposition. Notable examples include element-wise, singular value, and Fourier decompositions. We present ATOMO, a general framework for atomic sparsification of stochastic gradients. Given a gradient, an atomic decomposition, and a sparsity budget, ATOMO gives a random unbiased sparsification of the atoms minimizing variance. We show that methods such as QSGD and TernGrad are special cases of ATOMO and show that sparsifiying gradients in their singular value decomposition (SVD), rather than the coordinate-wise one, can lead to significantly faster distributed training.

READ FULL TEXT

page 12

page 14

page 15

research
01/24/2020

Electric Field Propagation Through Singular Value Decomposition

We demonstrate that the singular value decomposition algorithm in conjun...
research
10/28/2020

Generalized eigen, singular value, and partial least squares decompositions: The GSVD package

The generalized singular value decomposition (GSVD, a.k.a. "SVD triplet"...
research
10/29/2021

IRA: A shape matching approach for recognition and comparison of generic atomic patterns

We propose a versatile, parameter-less approach for solving the shape ma...
research
01/13/2022

GradMax: Growing Neural Networks using Gradient Information

The architecture and the parameters of neural networks are often optimiz...
research
08/13/2021

A Parallel Distributed Algorithm for the Power SVD Method

In this work, we study how to implement a distributed algorithm for the ...
research
09/22/2021

A New Robust Scalable Singular Value Decomposition Algorithm for Video Surveillance Background Modelling

A basic algorithmic task in automated video surveillance is to separate ...
research
05/11/2018

State Gradients for RNN Memory Analysis

We present a framework for analyzing what the state in RNNs remembers fr...

Please sign up or login with your details

Forgot password? Click here to reset