Distributed Robust Learning

09/21/2014
by   Sharky.TV, et al.
0

We propose a framework for distributed robust statistical learning on big contaminated data. The Distributed Robust Learning (DRL) framework can reduce the computational time of traditional robust learning methods by several orders of magnitude. We analyze the robustness property of DRL, showing that DRL not only preserves the robustness of the base robust learning method, but also tolerates contaminations on a constant fraction of results from computing nodes (node failures). More precisely, even in presence of the most adversarial outlier distribution over computing nodes, DRL still achieves a breakdown point of at least λ^*/2 , where λ^* is the break down point of corresponding centralized algorithm. This is in stark contrast with naive division-and-averaging implementation, which may reduce the breakdown point by a factor of k when k computing nodes are used. We then specialize the DRL framework for two concrete cases: distributed robust principal component analysis and distributed robust regression. We demonstrate the efficiency and the robustness advantages of DRL through comprehensive simulations and predicting image tags on a large-scale image set.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/01/2017

Outlier Robust Online Learning

We consider the problem of learning from noisy data in practical setting...
research
07/29/2019

DETOX: A Redundancy-based Framework for Faster and More Robust Gradient Aggregation

To improve the resilience of distributed training to worst-case, or Byza...
research
08/05/2021

DRL-based Slice Placement Under Non-Stationary Conditions

We consider online learning for optimal network slice placement under th...
research
03/27/2018

DRACO: Robust Distributed Training via Redundant Gradients

Distributed model training is vulnerable to worst-case system failures a...
research
05/31/2022

Communication-efficient distributed eigenspace estimation with arbitrary node failures

We develop an eigenspace estimation algorithm for distributed environmen...
research
02/21/2019

Statistics and Samples in Distributional Reinforcement Learning

We present a unifying framework for designing and analysing distribution...
research
06/12/2023

FADI: Fast Distributed Principal Component Analysis With High Accuracy for Large-Scale Federated Data

Principal component analysis (PCA) is one of the most popular methods fo...

Please sign up or login with your details

Forgot password? Click here to reset