Konstantin Mishchenko

research

∙ 08/04/2023

Adaptive Proximal Gradient Method for Convex Optimization

In this paper, we explore two fundamental first-order algorithms in conv...

0 Yura Malitsky, et al. ∙

research

∙ 05/29/2023

Partially Personalized Federated Learning: Breaking the Curse of Data Heterogeneity

We present a partially personalized formulation of Federated Learning (F...

0 Konstantin Mishchenko, et al. ∙

research

∙ 05/25/2023

DoWG Unleashed: An Efficient Universal Parameter-Free Gradient Descent Method

This paper proposes a new easy-to-implement parameter-free gradient-base...

0 Ahmed Khaled, et al. ∙

research

∙ 02/07/2023

Two Losses Are Better Than One: Faster Optimization Using a Cheaper Proxy

We present an algorithm for minimizing an objective with hard-to-compute...

0 Blake Woodworth, et al. ∙

research

∙ 01/17/2023

Convergence of First-Order Algorithms for Meta-Learning with Moreau Envelopes

In this work, we consider the problem of minimizing the sum of Moreau en...

0 Konstantin Mishchenko, et al. ∙

research

∙ 08/11/2022

Super-Universal Regularized Newton Method

We analyze the performance of a variant of Newton method with quadratic ...

0 Nikita Doikov, et al. ∙

research

∙ 08/10/2022

Adaptive Learning Rates for Faster Stochastic Gradient Methods

In this work, we propose new adaptive step size strategies that improve ...

7 Samuel Horvath, et al. ∙

research

∙ 06/15/2022

Asynchronous SGD Beats Minibatch SGD Under Arbitrary Delays

The existing analysis of asynchronous stochastic gradient descent (SGD) ...

0 Konstantin Mishchenko, et al. ∙

research

∙ 02/18/2022

ProxSkip: Yes! Local Gradient Steps Provably Lead to Communication Acceleration! Finally!

We introduce ProxSkip – a surprisingly simple and provably efficient met...

5 Konstantin Mishchenko, et al. ∙

research

∙ 01/26/2022

Server-Side Stepsizes and Sampling Without Replacement Provably Help in Federated Optimization

We present a theoretical study of server-side optimization in federated ...

5 Grigory Malinovsky, et al. ∙

research

∙ 12/03/2021

Regularized Newton Method with Global O(1/k^2) Convergence

We present a Newton-type method that converges fast from any initializat...

0 Konstantin Mishchenko, et al. ∙

research

∙ 02/16/2021

IntSGD: Floatless Compression of Stochastic Gradients

We propose a family of lossy integer compressions for Stochastic Gradien...

17 Konstantin Mishchenko, et al. ∙

research

∙ 02/12/2021

Proximal and Federated Random Reshuffling

Random Reshuffling (RR), also known as Stochastic Gradient Descent (SGD)...

7 Konstantin Mishchenko, et al. ∙

research

∙ 06/10/2020

Random Reshuffling: Simple Analysis with Vast Improvements

Random Reshuffling (RR) is an algorithm for minimizing finite-sum functi...

17 Konstantin Mishchenko, et al. ∙

research

∙ 04/03/2020

Dualize, Split, Randomize: Fast Nonsmooth Optimization Algorithms

We introduce a new primal-dual algorithm for minimizing the sum of three...

8 Adil Salim, et al. ∙

research

∙ 12/03/2019

Stochastic Newton and Cubic Newton Methods with Simple Local Linear-Quadratic Rates

We present two new remarkably simple stochastic second-order methods for...

13 Dmitry Kovalev, et al. ∙

research

∙ 10/21/2019

Adaptive gradient descent without descent

We present a strikingly simple proof that two rules are sufficient to au...

0 Yura Malitsky, et al. ∙

research

∙ 09/16/2019

Sinkhorn Algorithm as a Special Case of Stochastic Mirror Descent

We present a new perspective on the celebrated Sinkhorn algorithm by sho...

0 Konstantin Mishchenko, et al. ∙

research

∙ 09/10/2019

Better Communication Complexity for Local SGD

We revisit the local Stochastic Gradient Descent (local SGD) method and ...

0 Ahmed Khaled, et al. ∙

research

∙ 09/10/2019

First Analysis of Local GD on Heterogeneous Data

We provide the first convergence analysis of local gradient descent for ...

32 Ahmed Khaled, et al. ∙

research

∙ 06/25/2019

A Self-supervised Approach to Hierarchical Forecasting with Applications to Groupwise Synthetic Controls

When forecasting time series with a hierarchical structure, the existing...

0 Konstantin Mishchenko, et al. ∙

research

∙ 05/27/2019

Revisiting Stochastic Extragradient

We consider a new extension of the extragradient method that is motivate...

0 Konstantin Mishchenko, et al. ∙

research

∙ 01/27/2019

99

It is well known that many optimization methods, including SGD, SAGA, an...

0 Konstantin Mishchenko, et al. ∙

research

∙ 01/26/2019

Distributed Learning with Compressed Gradient Differences

Training very large machine learning models requires a distributed compu...

0 Konstantin Mishchenko, et al. ∙

research

∙ 09/09/2018

SEGA: Variance Reduction via Gradient Sketching

We propose a randomized first order optimization method--SEGA (SkEtched ...

0 Filip Hanzely, et al. ∙

research

∙ 06/25/2018

A Distributed Flexible Delay-tolerant Proximal Gradient Algorithm

We develop and analyze an asynchronous algorithm for distributed convex ...

0 Konstantin Mishchenko, et al. ∙

Konstantin Mishchenko

Featured Co-authors

Sign in with Google

Consider DeepAI Pro