
IntSGD: Floatless Compression of Stochastic Gradients
We propose a family of lossy integer compressions for Stochastic Gradien...
Proximal and Federated Random Reshuffling
Random Reshuffling (RR), also known as Stochastic Gradient Descent (SGD)...
Random Reshuffling: Simple Analysis with Vast Improvements
Random Reshuffling (RR) is an algorithm for minimizing finitesum functi...
Dualize, Split, Randomize: Fast Nonsmooth Optimization Algorithms
We introduce a new primaldual algorithm for minimizing the sum of three...
Stochastic Newton and Cubic Newton Methods with Simple Local LinearQuadratic Rates
We present two new remarkably simple stochastic secondorder methods for...
Adaptive gradient descent without descent
We present a strikingly simple proof that two rules are sufficient to au...
Sinkhorn Algorithm as a Special Case of Stochastic Mirror Descent
We present a new perspective on the celebrated Sinkhorn algorithm by sho...
Better Communication Complexity for Local SGD
We revisit the local Stochastic Gradient Descent (local SGD) method and ...
First Analysis of Local GD on Heterogeneous Data
We provide the first convergence analysis of local gradient descent for ...
A Selfsupervised Approach to Hierarchical Forecasting with Applications to Groupwise Synthetic Controls
When forecasting time series with a hierarchical structure, the existing...
Revisiting Stochastic Extragradient
We consider a new extension of the extragradient method that is motivate...
It is well known that many optimization methods, including SGD, SAGA, an...
Distributed Learning with Compressed Gradient Differences
Training very large machine learning models requires a distributed compu...
SEGA: Variance Reduction via Gradient Sketching
We propose a randomized first order optimization methodSEGA (SkEtched ...
A Distributed Flexible Delaytolerant Proximal Gradient Algorithm
We develop and analyze an asynchronous algorithm for distributed convex ...
