GLM Regression with Oblivious Corruptions

09/20/2023
by   Ilias Diakonikolas, et al.
0

We demonstrate the first algorithms for the problem of regression for generalized linear models (GLMs) in the presence of additive oblivious noise. We assume we have sample access to examples (x, y) where y is a noisy measurement of g(w^* · x). In particular, the noisy labels are of the form y = g(w^* · x) + ξ + ϵ, where ξ is the oblivious noise drawn independently of x and satisfies [ξ = 0] ≥ o(1), and ϵ∼𝒩(0, σ^2). Our goal is to accurately recover a parameter vector w such that the function g(w · x) has arbitrarily small error when compared to the true values g(w^* · x), rather than the noisy measurements y. We present an algorithm that tackles this problem in its most general distribution-independent setting, where the solution may not even be identifiable. Our algorithm returns an accurate estimate of the solution if it is identifiable, and otherwise returns a small list of candidates, one of which is close to the true solution. Furthermore, we provide a necessary and sufficient condition for identifiability, which holds in broad settings. Specifically, the problem is identifiable when the quantile at which ξ + ϵ = 0 is known, or when the family of hypotheses does not contain candidates that are nearly equal to a translated g(w^* · x) + A for some real number A, while also having large error when compared to g(w^* · x). This is the first algorithmic result for GLM regression with oblivious noise which can handle more than half the samples being arbitrarily corrupted. Prior work focused largely on the setting of linear regression, and gave algorithms under restrictive assumptions.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/31/2018

Efficient Algorithms and Lower Bounds for Robust Linear Regression

We study the problem of high-dimensional linear regression in a robust m...
research
04/18/2019

Memory-Sample Tradeoffs for Linear Regression with Small Error

We consider the problem of performing linear regression over a stream of...
research
06/17/2021

Statistical Query Lower Bounds for List-Decodable Linear Regression

We study the problem of list-decodable linear regression, where an adver...
research
05/12/2019

List Decodable Learning via Sum of Squares

In the list-decodable learning setup, an overwhelming majority (say a 1-...
research
10/24/2019

Inference in High-Dimensional Linear Regression via Lattice Basis Reduction and Integer Relation Detection

We focus on the high-dimensional linear regression problem, where the al...
research
10/18/2020

Robust Learning under Strong Noise via SQs

This work provides several new insights on the robustness of Kearns' sta...
research
03/21/2019

Convergence of Parameter Estimates for Regularized Mixed Linear Regression Models

We consider Mixed Linear Regression (MLR), where training data have bee...

Please sign up or login with your details

Forgot password? Click here to reset