
Critical Parameters for Scalable Distributed Learning with Large Batches and Asynchronous Updates
It has been experimentally observed that the efficiency of distributed t...
read it

Consensus Control for Decentralized Deep Learning
Decentralized training of deep learning models enables ondevice learnin...
read it

QuasiGlobal Momentum: Accelerating Decentralized Deep Learning on Heterogeneous Data
Decentralized training of deep learning models is a key element for enab...
read it

A Linearly Convergent Algorithm for Decentralized Optimization: Sending Less Bits for Free!
Decentralized optimization methods enable ondevice training of machine ...
read it

On Communication Compression for Distributed Optimization on Heterogeneous Data
Lossy gradient compression, with either unbiased or biased compressors, ...
read it

Mime: Mimicking Centralized Stochastic Algorithms in Federated Learning
Federated learning is a challenging optimization problem due to the hete...
read it

Analysis of SGD with Biased Gradient Estimators
We analyze the complexity of biased stochastic gradient methods (SGD), w...
read it

Dynamic Model Pruning with Feedback
Deep neural networks often have millions of parameters. This can hinder ...
read it

Ensemble Distillation for Robust Model Fusion in Federated Learning
Federated Learning (FL) is a machine learning setting where many devices...
read it

Extrapolation for Largebatch Training in Deep Learning
Deep learning networks are typically trained by Stochastic Gradient Desc...
read it

A Unified Theory of Decentralized SGD with Changing Topology and Local Updates
Decentralized stochastic optimization methods have gained a lot of atten...
read it

Is Local SGD Better than Minibatch SGD?
We study local SGD (also known as parallel SGD and federated averaging),...
read it

Advances and Open Problems in Federated Learning
Federated learning (FL) is a machine learning setting where many clients...
read it

SCAFFOLD: Stochastic Controlled Averaging for OnDevice Federated Learning
Federated learning is a key scenario in modern largescale machine learn...
read it

The ErrorFeedback Framework: Better Rates for SGD with Delayed Gradients and Compressed Communication
We analyze (stochastic) gradient descent (SGD) with delayed updates on s...
read it

Decentralized Deep Learning with Arbitrary Communication Compression
Decentralized training of deep learning models is a key element for enab...
read it

Unified Optimal Analysis of the (Stochastic) Gradient Method
In this note we give a simple proof for the convergence of stochastic gr...
read it

Decentralized Stochastic Optimization and Gossip Algorithms with Compressed Communication
We consider decentralized stochastic optimization with the objective fun...
read it

Error Feedback Fixes SignSGD and other Gradient Compression Schemes
Signbased algorithms (e.g. signSGD) have been proposed as a biased grad...
read it

Efficient Greedy Coordinate Descent for Composite Problems
Coordinate descent with random coordinate selection is the current state...
read it

Sparsified SGD with Memory
Huge scale machine learning problems are nowadays tackled by distributed...
read it

Don't Use Large MiniBatches, Use Local SGD
Minibatch stochastic gradient methods are the current state of the art ...
read it

Global linear convergence of Newton's method without strongconvexity or Lipschitz gradients
We show that Newton's method converges globally at a linear rate for obj...
read it

Local SGD Converges Fast and Communicates Little
Minibatch stochastic gradient descent (SGD) is the state of the art in ...
read it

SVRG meets SAGA: kSVRG  A Tale of Limited Memory
In recent years, many variance reduced algorithms for empirical risk min...
read it

Revisiting FirstOrder Convex Optimization Over Linear Spaces
Two popular examples of firstorder optimization methods over linear spa...
read it
Sebastian U. Stich
is this you? claim profile