Online and Distributed Robust Regressions under Adversarial Data Corruption

10/02/2017
by   Xuchao Zhang, et al.
0

In today's era of big data, robust least-squares regression becomes a more challenging problem when considering the adversarial corruption along with explosive growth of datasets. Traditional robust methods can handle the noise but suffer from several challenges when applied in huge dataset including 1) computational infeasibility of handling an entire dataset at once, 2) existence of heterogeneously distributed corruption, and 3) difficulty in corruption estimation when data cannot be entirely loaded. This paper proposes online and distributed robust regression approaches, both of which can concurrently address all the above challenges. Specifically, the distributed algorithm optimizes the regression coefficients of each data block via heuristic hard thresholding and combines all the estimates in a distributed robust consolidation. Furthermore, an online version of the distributed algorithm is proposed to incrementally update the existing estimates with new incoming data. We also prove that our algorithms benefit from strong robustness guarantees in terms of regression coefficient recovery with a constant upper bound on the error of state-of-the-art batch methods. Extensive experiments on synthetic and real datasets demonstrate that our approaches are superior to those of existing methods in effectiveness, with competitive efficiency.

READ FULL TEXT

page 1

page 5

research
02/05/2019

Robust Regression via Online Feature Selection under Adversarial Data Corruption

The presence of data corruption in user-generated streaming data, such a...
research
12/24/2022

A Bayesian Robust Regression Method for Corrupted Data Reconstruction

Because of the widespread existence of noise and data corruption, recove...
research
07/06/2018

Distributed Self-Paced Learning in Alternating Direction Method of Multipliers

Self-paced learning (SPL) mimics the cognitive process of humans, who ge...
research
10/10/2020

Making Online Sketching Hashing Even Faster

Data-dependent hashing methods have demonstrated good performance in var...
research
11/23/2017

Online and Batch Supervised Background Estimation via L1 Regression

We propose a surprisingly simple model for supervised video background e...
research
04/29/2023

Subdata selection for big data regression: an improved approach

In the big data era researchers face a series of problems. Even standard...
research
02/18/2021

Extract the information from the big data with randomly distributed noise

In this manuscript, a purely data driven statistical regularization meth...

Please sign up or login with your details

Forgot password? Click here to reset