
-
Random Reshuffling with Variance Reduction: New Analysis and Better Rates
Virtually all state-of-the-art methods for training supervised machine l...
read it
-
ZeroSARAH: Efficient Nonconvex Finite-Sum Optimization with Zero Full Gradient Computation
We propose ZeroSARAH – a novel variant of the variance-reduced method SA...
read it
-
Hyperparameter Transfer Learning with Adaptive Complexity
Bayesian optimization (BO) is a sample efficient approach to automatical...
read it
-
AI-SARAH: Adaptive and Implicit Stochastic Recursive Gradient Methods
We present an adaptive stochastic variance reduced method with an implic...
read it
-
ADOM: Accelerated Decentralized Optimization Method for Time-Varying Networks
We propose ADOM - an accelerated method for smooth and strongly convex d...
read it
-
IntSGD: Floatless Compression of Stochastic Gradients
We propose a family of lossy integer compressions for Stochastic Gradien...
read it
-
MARINA: Faster Non-Convex Distributed Learning with Compression
We develop and analyze MARINA: a new communication efficient method for ...
read it
-
Smoothness Matrices Beat Smoothness Constants: Better Communication Compression Techniques for Distributed Optimization
Large scale distributed optimization has become the default tool for the...
read it
-
Distributed Second Order Methods with Fast Rates and Compressed Communication
We develop several new communication-efficient second-order methods for ...
read it
-
Proximal and Federated Random Reshuffling
Random Reshuffling (RR), also known as Stochastic Gradient Descent (SGD)...
read it
-
A Linearly Convergent Algorithm for Decentralized Optimization: Sending Less Bits for Free!
Decentralized optimization methods enable on-device training of machine ...
read it
-
Local SGD: Unified Theory and New Efficient Methods
We present a unified framework for analyzing local SGD methods in the co...
read it
-
Optimal Client Sampling for Federated Learning
It is well understood that client-master communication can be a primary ...
read it
-
Linearly Converging Error Compensated SGD
In this paper, we propose a unified analysis of variants of distributed ...
read it
-
Optimal Gradient Compression for Distributed and Federated Learning
Communicating information, like gradient vectors, between computing node...
read it
-
Lower Bounds and Optimal Algorithms for Personalized Federated Learning
In this work, we consider the optimization formulation of personalized f...
read it
-
Distributed Proximal Splitting Algorithms with Rates and Acceleration
We analyze several generic proximal splitting algorithms well suited for...
read it
-
Variance-Reduced Methods for Machine Learning
Stochastic optimization lies at the heart of machine learning, and its c...
read it
-
PAGE: A Simple and Optimal Probabilistic Gradient Estimator for Nonconvex Optimization
In this paper, we propose a novel stochastic gradient estimator—ProbAbil...
read it
-
Optimal and Practical Algorithms for Smooth and Strongly Convex Decentralized Optimization
We consider the task of decentralized minimization of the sum of smooth ...
read it
-
Unified Analysis of Stochastic Gradient Methods for Composite Convex and Smooth Optimization
We present a unified theorem for the convergence analysis of stochastic ...
read it
-
A Better Alternative to Error Feedback for Communication-Efficient Distributed Learning
Modern large-scale machine learning applications require stochastic opti...
read it
-
Primal Dual Interpretation of the Proximal Stochastic Gradient Langevin Algorithm
We consider the task of sampling with respect to a log concave probabili...
read it
-
A Unified Analysis of Stochastic Gradient Methods for Nonconvex Federated Optimization
In this paper, we study the performance of a large family of SGD variant...
read it
-
Random Reshuffling: Simple Analysis with Vast Improvements
Random Reshuffling (RR) is an algorithm for minimizing finite-sum functi...
read it
-
Adaptive Learning of the Optimal Mini-Batch Size of SGD
Recent advances in the theoretical understandingof SGD (Qian et al., 201...
read it
-
On the Convergence Analysis of Asynchronous SGD for Solving Consistent Linear Systems
In the realm of big data and machine learning, data-parallel, distribute...
read it
-
Dualize, Split, Randomize: Fast Nonsmooth Optimization Algorithms
We introduce a new primal-dual algorithm for minimizing the sum of three...
read it
-
From Local SGD to Local Fixed Point Methods for Federated Learning
Most algorithms for solving optimization problems or finding saddle poin...
read it
-
On Biased Compression for Distributed Learning
In the last few years, various communication compression techniques have...
read it
-
Acceleration for Compressed Gradient Descent in Distributed and Federated Optimization
Due to the high communication cost in distributed and federated learning...
read it
-
Fast Linear Convergence of Randomized BFGS
Since the late 1950's when quasi-Newton methods first appeared, they hav...
read it
-
Stochastic Subspace Cubic Newton Method
In this paper, we propose a new randomized second-order optimization alg...
read it
-
Uncertainty Principle for Communication Compression in Distributed and Federated Learning and the Search for an Optimal Compressor
In order to mitigate the high communication cost in distributed and fede...
read it
-
Adaptivity of Stochastic Gradient Methods for Nonconvex Optimization
Adaptivity is an important yet under-studied property in modern optimiza...
read it
-
Variance Reduced Coordinate Descent with Acceleration: New Method With a Surprising Application to Finite-Sum Problems
We propose an accelerated version of stochastic variance reduced coordin...
read it
-
Federated Learning of a Mixture of Global and Local Models
We propose a new optimization formulation for training federated learnin...
read it
-
Better Theory for SGD in the Nonconvex World
Large-scale nonconvex optimization problems are ubiquitous in modern mac...
read it
-
Distributed Fixed Point Methods with Compressed Iterates
We propose basic and natural assumptions under which iterative optimizat...
read it
-
Stochastic Newton and Cubic Newton Methods with Simple Local Linear-Quadratic Rates
We present two new remarkably simple stochastic second-order methods for...
read it
-
Better Communication Complexity for Local SGD
We revisit the local Stochastic Gradient Descent (local SGD) method and ...
read it
-
Gradient Descent with Compressed Iterates
We propose and analyze a new type of stochastic first order method: grad...
read it
-
First Analysis of Local GD on Heterogeneous Data
We provide the first convergence analysis of local gradient descent for ...
read it
-
Stochastic Convolutional Sparse Coding
State-of-the-art methods for Convolutional Sparse Coding usually employ ...
read it
-
Stochastic Proximal Langevin Algorithm: Potential Splitting and Nonasymptotic Rates
We propose a new algorithm---Stochastic Proximal Langevin Algorithm (SPL...
read it
-
Direct Nonlinear Acceleration
Optimization acceleration techniques such as momentum play a key role in...
read it
-
Revisiting Stochastic Extragradient
We consider a new extension of the extragradient method that is motivate...
read it
-
One Method to Rule Them All: Variance Reduction for Data, Parameters and Many New Methods
We propose a remarkably general variance-reduced method suitable for sol...
read it
-
A Unified Theory of SGD: Variance Reduction, Sampling, Quantization and Coordinate Descent
In this paper we introduce a unified analysis of a large family of varia...
read it
-
Natural Compression for Distributed Deep Learning
Due to their hunger for big data, modern deep learning models are traine...
read it