The Hidden Vulnerability of Distributed Learning in Byzantium

02/22/2018
by   El Mahdi El Mhamdi, et al.
0

While machine learning is going through an era of celebrated success, concerns have been raised about the vulnerability of its backbone: stochastic gradient descent (SGD). Recent approaches have been proposed to ensure the robustness of distributed SGD against adversarial (Byzantine) workers sending poisoned gradients during the training phase. Some of these approaches have been proven Byzantine-resilient: they ensure the convergence of SGD despite the presence of a minority of adversarial workers. We show in this paper that convergence is not enough. In high dimension d ≫ 1, an adversary can build on the loss function's non--convexity to make SGD converge to ineffective models. More precisely, we bring to light that existing Byzantine--resilient schemes leave a margin of poisoning of Ω(f(d)), where f(d) increases at least like √(d ). Based on this leeway, we build a simple attack, and experimentally show its strong to utmost effectivity on CIFAR--10 and MNIST. We introduce Bulyan, and prove it significantly reduces the attackers leeway to a narrow O( 1/√(d )) bound. We empirically show that Bulyan does not suffer the fragility of existing aggregation rules and, at a reasonable cost in terms of required batch size, achieves convergence as if only non--Byzantine gradients had been used to update the model.

READ FULL TEXT
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

12/29/2019

Federated Variance-Reduced Stochastic Gradient Descent with Robustness to Byzantine Attacks

This paper deals with distributed finite-sum optimization for learning o...
05/23/2018

Phocas: dimensional Byzantine-resilient stochastic gradient descent

We propose a novel robust aggregation rule for distributed synchronous S...
02/22/2018

Asynchronous Byzantine Machine Learning

Asynchronous distributed machine learning solutions have proven very eff...
02/27/2018

Generalized Byzantine-tolerant SGD

We propose three new robust aggregation rules for distributed synchronou...
09/10/2019

Byzantine-Resilient Stochastic Gradient Descent for Distributed Learning: A Lipschitz-Inspired Coordinate-wise Median Approach

In this work, we consider the resilience of distributed algorithms based...
02/28/2020

Distributed Momentum for Byzantine-resilient Learning

Momentum is a variant of gradient descent that has been proposed for its...
02/27/2019

Distributed Byzantine Tolerant Stochastic Gradient Descent in the Era of Big Data

The recent advances in sensor technologies and smart devices enable the ...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.