
QuasiGlobal Momentum: Accelerating Decentralized Deep Learning on Heterogeneous Data
Decentralized training of deep learning models is a key element for enab...
Learning from History for Byzantine Robust Optimization
Byzantine robustness has received significant attention recently given i...
Mime: Mimicking Centralized Stochastic Algorithms in Federated Learning
Federated learning is a challenging optimization problem due to the hete...
PowerGossip: Practical LowRank Communication Compression in Decentralized Deep Learning
Lossy gradient compression has become a practical tool to overcome the c...
ByzantineRobust Learning on Heterogeneous Datasets via Resampling
In Byzantine robust distributed optimization, a central server wants to ...
Secure ByzantineRobust Machine Learning
Increasingly machine learning systems are being deployed to edge servers...
Why ADAM Beats SGD for Attention Models
While stochastic gradient descent (SGD) is still the de facto algorithm ...
SCAFFOLD: Stochastic Controlled Averaging for OnDevice Federated Learning
Federated learning is a key scenario in modern largescale machine learn...
The ErrorFeedback Framework: Better Rates for SGD with Delayed Gradients and Compressed Communication
We analyze (stochastic) gradient descent (SGD) with delayed updates on s...
Amplifying Rényi Differential Privacy via Shuffling
Differential privacy is a useful tool to build machine learning models w...
PowerSGD: Practical LowRank Gradient Compression for Distributed Optimization
We study gradient compression methods to alleviate the communication bot...
Accelerating Gradient Boosting Machine
Gradient Boosting Machine (GBM) is an extremely powerful supervised lear...
Error Feedback Fixes SignSGD and other Gradient Compression Schemes
Signbased algorithms (e.g. signSGD) have been proposed as a biased grad...
Efficient Greedy Coordinate Descent for Composite Problems
Coordinate descent with random coordinate selection is the current state...
Global linear convergence of Newton's method without strongconvexity or Lipschitz gradients
We show that Newton's method converges globally at a linear rate for obj...
Sai Praneeth Karimireddy
