Efficient Detection and Filtering Systems for Distributed Training

A plethora of modern machine learning tasks requires the utilization of large-scale distributed clusters as a critical component of the training pipeline. However, abnormal Byzantine behavior of the worker nodes can derail the training and compromise the quality of the inference. Such behavior can be attributed to unintentional system malfunctions or orchestrated attacks; as a result, some nodes may return arbitrary results to the parameter server (PS) that coordinates the training. Recent work considers a wide range of attack models and has explored robust aggregation and/or computational redundancy to correct the distorted gradients. In this work, we consider attack models ranging from strong ones: q omniscient adversaries with full knowledge of the defense protocol that can change from iteration to iteration to weak ones: q randomly chosen adversaries with limited collusion abilities that only change every few iterations at a time. Our algorithms rely on redundant task assignments coupled with detection of adversarial behavior. For strong attacks, we demonstrate a reduction in the fraction of distorted gradients ranging from 16 compared to the prior state-of-the-art. Our top-1 classification accuracy results on the CIFAR-10 data set demonstrate a 25 (averaged over strong and weak scenarios) under the most sophisticated attacks compared to state-of-the-art methods.

READ FULL TEXT

page 1

page 14

research
08/05/2021

Aspis: A Robust Detection System for Distributed Learning

State of the art machine learning models are routinely trained on large ...
research
10/10/2020

ByzShield: An Efficient and Robust System for Distributed Training

Training of large scale models on distributed clusters is a critical com...
research
07/29/2019

DETOX: A Redundancy-based Framework for Faster and More Robust Gradient Aggregation

To improve the resilience of distributed training to worst-case, or Byza...
research
09/11/2023

Practical Homomorphic Aggregation for Byzantine ML

Due to the large-scale availability of data, machine learning (ML) algor...
research
09/13/2021

SignGuard: Byzantine-robust Federated Learning through Collaborative Malicious Gradient Filtering

Gradient-based training in federated learning is known to be vulnerable ...
research
10/22/2021

MANDERA: Malicious Node Detection in Federated Learning via Ranking

Federated learning is a distributed learning paradigm which seeks to pre...
research
02/26/2020

On the Effectiveness of Mitigating Data Poisoning Attacks with Gradient Shaping

Machine learning algorithms are vulnerable to data poisoning attacks. Pr...

Please sign up or login with your details

Forgot password? Click here to reset