Peter Richtarik
Associate Professor
Modern advancements in large-scale machine learning would be impossible
...
We propose a Randomized Progressive Training algorithm (RPT) – a stochas...
Federated Learning is a collaborative training framework that leverages
...
Motivated by the increasing popularity and importance of large-scale tra...
Efficient distributed training is a principal driver of recent advances ...
Stochastic Gradient Descent (SGD) is arguably the most important single
...
Due to the high communication overhead when training machine learning mo...
Federated sampling algorithms have recently gained great popularity in t...
In federated learning, a large number of users are involved in a global
...
In this work, we consider the problem of minimizing the sum of Moreau
en...
The celebrated FedAvg algorithm of McMahan et al. (2017) is based on thr...
We propose Adaptive Compressed Gradient Descent (AdaCGD) - a novel
optim...
In this work, we study distributed optimization algorithms that reduce t...
In the modern paradigm of federated learning, a large number of users ar...
Stein Variational Gradient Descent (SVGD) is a popular sampling algorith...
The starting point of this paper is the discovery of a novel and simple
...
In this paper, we propose a new zero order optimization method called
mi...
In this work, we propose new adaptive step size strategies that improve
...
Inspired by a recent breakthrough of Mishchenko et al (2022), who for th...
Communication is one of the key bottlenecks in the distributed training ...
In this note, we establish a descent lemma for the population limit Mirr...
Despite their high computation and communication costs, Newton-type meth...
Federated learning has recently gained significant attention and popular...
Federated learning uses a set of techniques to efficiently distribute th...
Byzantine-robustness has been gaining a lot of attention due to the grow...
Stein Variational Gradient Descent (SVGD) is an important alternative to...
We present a new method that includes three key components of distribute...
Random Reshuffling (RR), which is a variant of Stochastic Gradient Desce...
The practice of applying several local updates before aggregation across...
We introduce ProxSkip – a surprisingly simple and provably
efficient met...
Federated Learning (FL) has emerged as a promising technique for edge de...
We develop and analyze DASHA: a new family of methods for nonconvex
dist...
We propose and study a new class of gradient communication mechanisms fo...
Communication efficiency has been widely recognized as the bottleneck fo...
We present a theoretical study of server-side optimization in federated
...
Due to the communication bottleneck in distributed and federated learnin...
Federated Learning (FL) is an increasingly popular machine learning para...
Recent advances in distributed optimization have shown that Newton-type
...
We study the MARINA method of Gorbunov et al (2021) – the current
state-...
First proposed by Seide (2014) as a heuristic, error feedback (EF) is a ...
We present a novel adaptive optimization algorithm for large-scale machi...
Federated Averaging (FedAvg, also known as Local-SGD) (McMahan et al., 2...
Due to the high communication cost in distributed and federated learning...
Error feedback (EF), also known as error compensation, is an immensely
p...
We consider the task of minimizing the sum of smooth and strongly convex...
Distributed machine learning has become an indispensable tool for traini...
We study the complexity of Stein Variational Gradient Descent (SVGD), wh...
We propose a generic variance-reduced algorithm, which we call MUltiple
...
Inspired by recent work of Islamov et al (2021), we propose a family of
...
Virtually all state-of-the-art methods for training supervised machine
l...