Sebastian U. Stich

research

∙ 08/11/2023

Adaptive SGD with Polyak stepsize and Line-search: Robust Convergence and Variance Reduction

The recently proposed stochastic Polyak stepsize (SPS) and stochastic li...

0 Xiaowen Jiang, et al. ∙

research

∙ 07/12/2023

Locally Adaptive Federated Learning via Stochastic Polyak Stepsizes

State-of-the-art federated learning algorithms such as FedAvg require ca...

0 Sohom Mukherjee, et al. ∙

research

∙ 06/23/2023

Synthetic data shuffling accelerates the convergence of federated learning under data heterogeneity

In federated learning, data heterogeneity is a critical challenge. A str...

0 Bo Li, et al. ∙

research

∙ 06/08/2023

Communication-Efficient Gradient Descent-Accent Methods for Distributed Variational Inequalities: Unified Analysis and Local Updates

Distributed and federated learning algorithms and techniques associated ...

3 Siqi Zhang, et al. ∙

research

∙ 05/30/2023

Shuffle SGD is Always Better than SGD: Improved Analysis of SGD with Arbitrary Data Orders

Stochastic Gradient Descent (SGD) algorithms are widely used in optimizi...

0 Anastasia Koloskova, et al. ∙

research

∙ 05/02/2023

Revisiting Gradient Clipping: Stochastic bias and tight convergence guarantees

Gradient clipping is a popular modification to standard (stochastic) gra...

0 Anastasia Koloskova, et al. ∙

research

∙ 01/03/2023

Decentralized Gradient Tracking with Local Steps

Gradient tracking (GT) is an algorithm designed for solving decentralize...

0 Yue Liu, et al. ∙

research

∙ 12/05/2022

Partial Variance Reduction improves Non-Convex Federated learning on heterogeneous data

Data heterogeneity across clients is a key challenge in federated learni...

0 Bo Li, et al. ∙

research

∙ 06/16/2022

Sharper Convergence Guarantees for Asynchronous SGD for Distributed and Federated Learning

We study the asynchronous stochastic gradient descent algorithm for dist...

0 Anastasia Koloskova, et al. ∙

research

∙ 04/13/2022

Data-heterogeneity-aware Mixing for Decentralized Learning

Decentralized learning provides an effective framework to train machine ...

11 Yatin Dandi, et al. ∙

research

∙ 02/18/2022

Tackling benign nonconvexity with smoothing and stochastic gradients

Non-convex optimization problems are ubiquitous in machine learning, esp...

0 Harsh Vardhan, et al. ∙

research

∙ 02/08/2022

An Improved Analysis of Gradient Tracking for Decentralized Machine Learning

We consider decentralized machine learning over a network where the trai...

0 Anastasia Koloskova, et al. ∙

research

∙ 12/09/2021

The Peril of Popular Deep Learning Uncertainty Estimation Methods

Uncertainty estimation (UE) techniques – such as the Gaussian process (G...

0 Yehao Liu, et al. ∙

research

∙ 11/10/2021

Linear Speedup in Personalized Collaborative Learning

Personalization in federated learning can improve the accuracy of a mode...

0 El Mahdi Chayti, et al. ∙

research

∙ 10/11/2021

ProgFed: Effective, Communication, and Computation Efficient Federated Learning by Progressive Training

Federated learning is a powerful distributed learning scheme that allows...

0 Hui-Po Wang, et al. ∙

research

∙ 10/08/2021

RelaySum for Decentralized Deep Learning on Heterogeneous Data

In decentralized machine learning, workers compute model updates on thei...

0 Thijs Vogels, et al. ∙

research

∙ 09/06/2021

On Second-order Optimization Methods for Federated Learning

We consider federated learning (FL), where the training data is distribu...

0 Sebastian Bischoff, et al. ∙

research

∙ 08/18/2021

Semantic Perturbations with Normalizing Flows for Improved Generalization

Data augmentation is a widely adopted technique for avoiding overfitting...

0 Oğuz Kaan Yüksel, et al. ∙

research

∙ 06/16/2021

Simultaneous Training of Partially Masked Neural Networks

For deploying deep learning models to lower end devices, it is necessary...

0 Amirkeivan Mohtashami, et al. ∙

research

∙ 06/15/2021

Decentralized Local Stochastic Extra-Gradient for Variational Inequalities

We consider decentralized stochastic variational inequalities where the ...

0 Aleksandr Beznosikov, et al. ∙

research

∙ 03/03/2021

Critical Parameters for Scalable Distributed Learning with Large Batches and Asynchronous Updates

It has been experimentally observed that the efficiency of distributed t...

0 Sebastian U. Stich, et al. ∙

research

∙ 02/09/2021

Consensus Control for Decentralized Deep Learning

Decentralized training of deep learning models enables on-device learnin...

0 Lingjing Kong, et al. ∙

research

∙ 02/09/2021

Quasi-Global Momentum: Accelerating Decentralized Deep Learning on Heterogeneous Data

Decentralized training of deep learning models is a key element for enab...

41 Tao Lin, et al. ∙

research

∙ 11/03/2020

A Linearly Convergent Algorithm for Decentralized Optimization: Sending Less Bits for Free!

Decentralized optimization methods enable on-device training of machine ...

22 Dmitry Kovalev, et al. ∙

research

∙ 09/04/2020

On Communication Compression for Distributed Optimization on Heterogeneous Data

Lossy gradient compression, with either unbiased or biased compressors, ...

0 Sebastian U. Stich, et al. ∙

research

∙ 08/08/2020

Mime: Mimicking Centralized Stochastic Algorithms in Federated Learning

Federated learning is a challenging optimization problem due to the hete...

0 Sai Praneeth Karimireddy, et al. ∙

research

∙ 07/31/2020

Analysis of SGD with Biased Gradient Estimators

We analyze the complexity of biased stochastic gradient methods (SGD), w...

0 Ahmad Ajalloeian, et al. ∙

research

∙ 06/12/2020

Dynamic Model Pruning with Feedback

Deep neural networks often have millions of parameters. This can hinder ...

8 Tao Lin, et al. ∙

research

∙ 06/12/2020

Ensemble Distillation for Robust Model Fusion in Federated Learning

Federated Learning (FL) is a machine learning setting where many devices...

1 Tao Lin, et al. ∙

research

∙ 06/10/2020

Extrapolation for Large-batch Training in Deep Learning

Deep learning networks are typically trained by Stochastic Gradient Desc...

13 Tao Lin, et al. ∙

research

∙ 03/23/2020

A Unified Theory of Decentralized SGD with Changing Topology and Local Updates

Decentralized stochastic optimization methods have gained a lot of atten...

0 Anastasia Koloskova, et al. ∙

research

∙ 02/18/2020

Is Local SGD Better than Minibatch SGD?

We study local SGD (also known as parallel SGD and federated averaging),...

5 Blake Woodworth, et al. ∙

research

∙ 12/10/2019

Advances and Open Problems in Federated Learning

Federated learning (FL) is a machine learning setting where many clients...

33 Peter Kairouz, et al. ∙

research

∙ 10/14/2019

SCAFFOLD: Stochastic Controlled Averaging for On-Device Federated Learning

Federated learning is a key scenario in modern large-scale machine learn...

7 Sai Praneeth Karimireddy, et al. ∙

research

∙ 09/11/2019

The Error-Feedback Framework: Better Rates for SGD with Delayed Gradients and Compressed Communication

We analyze (stochastic) gradient descent (SGD) with delayed updates on s...

0 Sebastian U. Stich, et al. ∙

research

∙ 07/22/2019

Decentralized Deep Learning with Arbitrary Communication Compression

Decentralized training of deep learning models is a key element for enab...

0 Anastasia Koloskova, et al. ∙

research

∙ 07/09/2019

Unified Optimal Analysis of the (Stochastic) Gradient Method

In this note we give a simple proof for the convergence of stochastic gr...

0 Sebastian U. Stich, et al. ∙

research

∙ 02/01/2019

Decentralized Stochastic Optimization and Gossip Algorithms with Compressed Communication

We consider decentralized stochastic optimization with the objective fun...

0 Anastasia Koloskova, et al. ∙

research

∙ 01/28/2019

Error Feedback Fixes SignSGD and other Gradient Compression Schemes

Sign-based algorithms (e.g. signSGD) have been proposed as a biased grad...

0 Sai Praneeth Karimireddy, et al. ∙

research

∙ 10/16/2018

Efficient Greedy Coordinate Descent for Composite Problems

Coordinate descent with random coordinate selection is the current state...

0 Sai Praneeth Karimireddy, et al. ∙

research

∙ 09/20/2018

Sparsified SGD with Memory

Huge scale machine learning problems are nowadays tackled by distributed...

0 Sebastian U. Stich, et al. ∙

research

∙ 08/22/2018

Don't Use Large Mini-Batches, Use Local SGD

Mini-batch stochastic gradient methods are the current state of the art ...

0 Tao Lin, et al. ∙

research

∙ 06/01/2018

Global linear convergence of Newton's method without strong-convexity or Lipschitz gradients

We show that Newton's method converges globally at a linear rate for obj...

0 Sai Praneeth Karimireddy, et al. ∙

research

∙ 05/24/2018

Local SGD Converges Fast and Communicates Little

Mini-batch stochastic gradient descent (SGD) is the state of the art in ...

0 Sebastian U. Stich, et al. ∙

research

∙ 05/02/2018

SVRG meets SAGA: k-SVRG --- A Tale of Limited Memory

In recent years, many variance reduced algorithms for empirical risk min...

0 Anant Raj, et al. ∙

research

∙ 03/26/2018

Revisiting First-Order Convex Optimization Over Linear Spaces

Two popular examples of first-order optimization methods over linear spa...

0 Francesco Locatello, et al. ∙

Sebastian U. Stich

Featured Co-authors

Sign in with Google

Consider DeepAI Pro