
IntSGD: Floatless Compression of Stochastic Gradients
We propose a family of lossy integer compressions for Stochastic Gradien...
read it

Proximal and Federated Random Reshuffling
Random Reshuffling (RR), also known as Stochastic Gradient Descent (SGD)...
read it

Random Reshuffling: Simple Analysis with Vast Improvements
Random Reshuffling (RR) is an algorithm for minimizing finitesum functi...
read it

Dualize, Split, Randomize: Fast Nonsmooth Optimization Algorithms
We introduce a new primaldual algorithm for minimizing the sum of three...
read it

Stochastic Newton and Cubic Newton Methods with Simple Local LinearQuadratic Rates
We present two new remarkably simple stochastic secondorder methods for...
read it

Adaptive gradient descent without descent
We present a strikingly simple proof that two rules are sufficient to au...
read it

Sinkhorn Algorithm as a Special Case of Stochastic Mirror Descent
We present a new perspective on the celebrated Sinkhorn algorithm by sho...
read it

Better Communication Complexity for Local SGD
We revisit the local Stochastic Gradient Descent (local SGD) method and ...
read it

First Analysis of Local GD on Heterogeneous Data
We provide the first convergence analysis of local gradient descent for ...
read it

A Selfsupervised Approach to Hierarchical Forecasting with Applications to Groupwise Synthetic Controls
When forecasting time series with a hierarchical structure, the existing...
read it

Revisiting Stochastic Extragradient
We consider a new extension of the extragradient method that is motivate...
read it

99
It is well known that many optimization methods, including SGD, SAGA, an...
read it

Distributed Learning with Compressed Gradient Differences
Training very large machine learning models requires a distributed compu...
read it

SEGA: Variance Reduction via Gradient Sketching
We propose a randomized first order optimization methodSEGA (SkEtched ...
read it

A Distributed Flexible Delaytolerant Proximal Gradient Algorithm
We develop and analyze an asynchronous algorithm for distributed convex ...
read it
Konstantin Mishchenko
verfied profile