Byzantine-Resilient Stochastic Gradient Descent for Distributed Learning: A Lipschitz-Inspired Coordinate-wise Median Approach

09/10/2019
by   Haibo Yang, et al.
0

In this work, we consider the resilience of distributed algorithms based on stochastic gradient descent (SGD) in distributed learning with potentially Byzantine attackers, who could send arbitrary information to the parameter server to disrupt the training process. Toward this end, we propose a new Lipschitz-inspired coordinate-wise median approach (LICM-SGD) to mitigate Byzantine attacks. We show that our LICM-SGD algorithm can resist up to half of the workers being Byzantine attackers, while still converging almost surely to a stationary region in non-convex settings. Also, our LICM-SGD method does not require any information about the number of attackers and the Lipschitz constant, which makes it attractive for practical implementations. Moreover, our LICM-SGD method enjoys the optimal O(md) computational time-complexity in the sense that the time-complexity is the same as that of the standard SGD under no attacks. We conduct extensive experiments to show that our LICM-SGD algorithm consistently outperforms existing methods in training multi-class logistic regression and convolutional neural networks with MNIST and CIFAR-10 datasets. In our experiments, LICM-SGD also achieves a much faster running time thanks to its low computational time-complexity.

READ FULL TEXT
research
12/29/2019

Federated Variance-Reduced Stochastic Gradient Descent with Robustness to Byzantine Attacks

This paper deals with distributed finite-sum optimization for learning o...
research
05/23/2018

Phocas: dimensional Byzantine-resilient stochastic gradient descent

We propose a novel robust aggregation rule for distributed synchronous S...
research
02/27/2018

Generalized Byzantine-tolerant SGD

We propose three new robust aggregation rules for distributed synchronou...
research
05/25/2018

Zeno: Byzantine-suspicious stochastic gradient descent

We propose Zeno, a new robust aggregation rule, for distributed synchron...
research
06/16/2021

Robust Training in High Dimensions via Block Coordinate Geometric Median Descent

Geometric median (Gm) is a classical method in statistics for achieving ...
research
12/28/2020

Byzantine-Resilient Non-Convex Stochastic Gradient Descent

We study adversary-resilient stochastic distributed optimization, in whi...
research
02/27/2019

Distributed Byzantine Tolerant Stochastic Gradient Descent in the Era of Big Data

The recent advances in sensor technologies and smart devices enable the ...

Please sign up or login with your details

Forgot password? Click here to reset