
Random Reshuffling with Variance Reduction: New Analysis and Better Rates
Virtually all stateoftheart methods for training supervised machine l...
read it

ZeroSARAH: Efficient Nonconvex FiniteSum Optimization with Zero Full Gradient Computation
We propose ZeroSARAH – a novel variant of the variancereduced method SA...
read it

Hyperparameter Transfer Learning with Adaptive Complexity
Bayesian optimization (BO) is a sample efficient approach to automatical...
read it

AISARAH: Adaptive and Implicit Stochastic Recursive Gradient Methods
We present an adaptive stochastic variance reduced method with an implic...
read it

ADOM: Accelerated Decentralized Optimization Method for TimeVarying Networks
We propose ADOM  an accelerated method for smooth and strongly convex d...
read it

IntSGD: Floatless Compression of Stochastic Gradients
We propose a family of lossy integer compressions for Stochastic Gradien...
read it

MARINA: Faster NonConvex Distributed Learning with Compression
We develop and analyze MARINA: a new communication efficient method for ...
read it

Smoothness Matrices Beat Smoothness Constants: Better Communication Compression Techniques for Distributed Optimization
Large scale distributed optimization has become the default tool for the...
read it

Distributed Second Order Methods with Fast Rates and Compressed Communication
We develop several new communicationefficient secondorder methods for ...
read it

Proximal and Federated Random Reshuffling
Random Reshuffling (RR), also known as Stochastic Gradient Descent (SGD)...
read it

A Linearly Convergent Algorithm for Decentralized Optimization: Sending Less Bits for Free!
Decentralized optimization methods enable ondevice training of machine ...
read it

Local SGD: Unified Theory and New Efficient Methods
We present a unified framework for analyzing local SGD methods in the co...
read it

Optimal Client Sampling for Federated Learning
It is well understood that clientmaster communication can be a primary ...
read it

Linearly Converging Error Compensated SGD
In this paper, we propose a unified analysis of variants of distributed ...
read it

Optimal Gradient Compression for Distributed and Federated Learning
Communicating information, like gradient vectors, between computing node...
read it

Lower Bounds and Optimal Algorithms for Personalized Federated Learning
In this work, we consider the optimization formulation of personalized f...
read it

Distributed Proximal Splitting Algorithms with Rates and Acceleration
We analyze several generic proximal splitting algorithms well suited for...
read it

VarianceReduced Methods for Machine Learning
Stochastic optimization lies at the heart of machine learning, and its c...
read it

PAGE: A Simple and Optimal Probabilistic Gradient Estimator for Nonconvex Optimization
In this paper, we propose a novel stochastic gradient estimator—ProbAbil...
read it

Optimal and Practical Algorithms for Smooth and Strongly Convex Decentralized Optimization
We consider the task of decentralized minimization of the sum of smooth ...
read it

Unified Analysis of Stochastic Gradient Methods for Composite Convex and Smooth Optimization
We present a unified theorem for the convergence analysis of stochastic ...
read it

A Better Alternative to Error Feedback for CommunicationEfficient Distributed Learning
Modern largescale machine learning applications require stochastic opti...
read it

Primal Dual Interpretation of the Proximal Stochastic Gradient Langevin Algorithm
We consider the task of sampling with respect to a log concave probabili...
read it

A Unified Analysis of Stochastic Gradient Methods for Nonconvex Federated Optimization
In this paper, we study the performance of a large family of SGD variant...
read it

Random Reshuffling: Simple Analysis with Vast Improvements
Random Reshuffling (RR) is an algorithm for minimizing finitesum functi...
read it

Adaptive Learning of the Optimal MiniBatch Size of SGD
Recent advances in the theoretical understandingof SGD (Qian et al., 201...
read it

On the Convergence Analysis of Asynchronous SGD for Solving Consistent Linear Systems
In the realm of big data and machine learning, dataparallel, distribute...
read it

Dualize, Split, Randomize: Fast Nonsmooth Optimization Algorithms
We introduce a new primaldual algorithm for minimizing the sum of three...
read it

From Local SGD to Local Fixed Point Methods for Federated Learning
Most algorithms for solving optimization problems or finding saddle poin...
read it

On Biased Compression for Distributed Learning
In the last few years, various communication compression techniques have...
read it

Acceleration for Compressed Gradient Descent in Distributed and Federated Optimization
Due to the high communication cost in distributed and federated learning...
read it

Fast Linear Convergence of Randomized BFGS
Since the late 1950's when quasiNewton methods first appeared, they hav...
read it

Stochastic Subspace Cubic Newton Method
In this paper, we propose a new randomized secondorder optimization alg...
read it

Uncertainty Principle for Communication Compression in Distributed and Federated Learning and the Search for an Optimal Compressor
In order to mitigate the high communication cost in distributed and fede...
read it

Adaptivity of Stochastic Gradient Methods for Nonconvex Optimization
Adaptivity is an important yet understudied property in modern optimiza...
read it

Variance Reduced Coordinate Descent with Acceleration: New Method With a Surprising Application to FiniteSum Problems
We propose an accelerated version of stochastic variance reduced coordin...
read it

Federated Learning of a Mixture of Global and Local Models
We propose a new optimization formulation for training federated learnin...
read it

Better Theory for SGD in the Nonconvex World
Largescale nonconvex optimization problems are ubiquitous in modern mac...
read it

Distributed Fixed Point Methods with Compressed Iterates
We propose basic and natural assumptions under which iterative optimizat...
read it

Stochastic Newton and Cubic Newton Methods with Simple Local LinearQuadratic Rates
We present two new remarkably simple stochastic secondorder methods for...
read it

Better Communication Complexity for Local SGD
We revisit the local Stochastic Gradient Descent (local SGD) method and ...
read it

Gradient Descent with Compressed Iterates
We propose and analyze a new type of stochastic first order method: grad...
read it

First Analysis of Local GD on Heterogeneous Data
We provide the first convergence analysis of local gradient descent for ...
read it

Stochastic Convolutional Sparse Coding
Stateoftheart methods for Convolutional Sparse Coding usually employ ...
read it

Stochastic Proximal Langevin Algorithm: Potential Splitting and Nonasymptotic Rates
We propose a new algorithmStochastic Proximal Langevin Algorithm (SPL...
read it

Direct Nonlinear Acceleration
Optimization acceleration techniques such as momentum play a key role in...
read it

Revisiting Stochastic Extragradient
We consider a new extension of the extragradient method that is motivate...
read it

One Method to Rule Them All: Variance Reduction for Data, Parameters and Many New Methods
We propose a remarkably general variancereduced method suitable for sol...
read it

A Unified Theory of SGD: Variance Reduction, Sampling, Quantization and Coordinate Descent
In this paper we introduce a unified analysis of a large family of varia...
read it

Natural Compression for Distributed Deep Learning
Due to their hunger for big data, modern deep learning models are traine...
read it