
ByzantineRobust Distributed Learning: Towards Optimal Statistical Rates
In largescale distributed learning, security issues have become increas...
read it

Defending Against Saddle Point Attack in ByzantineRobust Distributed Learning
In this paper, we study robust largescale distributed learning in the p...
read it

Distributed Newton Can Communicate Less and Resist Byzantine Workers
We develop a distributed second order optimization algorithm that is com...
read it

Distributed Statistical Machine Learning in Adversarial Settings: Byzantine Gradient Descent
We consider the problem of distributed statistical machine learning in a...
read it

Distributed Training with Heterogeneous Data: Bridging Median and Mean Based Algorithms
Recently, there is a growing interest in the study of medianbased algor...
read it

ByzantineResilient Stochastic Gradient Descent for Distributed Learning: A LipschitzInspired Coordinatewise Median Approach
In this work, we consider the resilience of distributed algorithms based...
read it

Compressed Distributed Gradient Descent: CommunicationEfficient Consensus over Networks
Network consensus optimization has received increasing attention in rece...
read it
CommunicationEfficient and ByzantineRobust Distributed Learning
We develop a communicationefficient distributed learning algorithm that is robust against Byzantine worker machines. We propose and analyze a distributed gradientdescent algorithm that performs a simple thresholding based on gradient norms to mitigate Byzantine failures. We show the (statistical) errorrate of our algorithm matches that of [YCKB18], which uses more complicated schemes (like coordinatewise median or trimmed mean) and thus optimal. Furthermore, for communication efficiency, we consider a generic class of δapproximate compressors from [KRSJ19] that encompasses signbased compressors and topk sparsification. Our algorithm uses compressed gradients and gradient norms for aggregation and Byzantine removal respectively. We establish the statistical error rate of the algorithm for arbitrary (convex or nonconvex) smooth loss function. We show that, in the regime when the compression factor δ is constant and the dimension of the parameter space is fixed, the rate of convergence is not affected by the compression operation, and hence we effectively get the compression for free. Moreover, we extend the compressed gradient descent algorithm with error feedback proposed in [KRSJ19] for the distributed setting. We have experimentally validated our results and shown good performance in convergence for convex (leastsquare regression) and nonconvex (neural network training) problems.
READ FULL TEXT
Comments
There are no comments yet.