Learning a Single Neuron with Adversarial Label Noise via Gradient Descent

06/17/2022
โˆ™
by   Ilias Diakonikolas, et al.
โˆ™
0
โˆ™

We study the fundamental problem of learning a single neuron, i.e., a function of the form ๐ฑโ†ฆฯƒ(๐ฐยท๐ฑ) for monotone activations ฯƒ:โ„โ†ฆโ„, with respect to the L_2^2-loss in the presence of adversarial label noise. Specifically, we are given labeled examples from a distribution D on (๐ฑ, y)โˆˆโ„^d ร—โ„ such that there exists ๐ฐ^โˆ—โˆˆโ„^d achieving F(๐ฐ^โˆ—)=ฯต, where F(๐ฐ)=๐„_(๐ฑ,y)โˆผ D[(ฯƒ(๐ฐยท๐ฑ)-y)^2]. The goal of the learner is to output a hypothesis vector ๐ฐ such that F(๐•จ)=C ฯต with high probability, where C>1 is a universal constant. As our main contribution, we give efficient constant-factor approximate learners for a broad class of distributions (including log-concave distributions) and activation functions. Concretely, for the class of isotropic log-concave distributions, we obtain the following important corollaries: For the logistic activation, we obtain the first polynomial-time constant factor approximation (even under the Gaussian distribution). Our algorithm has sample complexity O(d/ฯต), which is tight within polylogarithmic factors. For the ReLU activation, we give an efficient algorithm with sample complexity ร•(d (1/ฯต)). Prior to our work, the best known constant-factor approximate learner had sample complexity ฮฉฬƒ(d/ฯต). In both of these settings, our algorithms are simple, performing gradient-descent on the (regularized) L_2^2-loss. The correctness of our algorithms relies on novel structural results that we establish, showing that (essentially all) stationary points of the underlying non-convex loss are approximately optimal.

READ FULL TEXT
research
โˆ™ 10/18/2022

SQ Lower Bounds for Learning Single Neurons with Massart Noise

We study the problem of PAC learning a single neuron in the presence of ...
research
โˆ™ 06/13/2023

Robustly Learning a Single Neuron via Sharpness

We study the problem of learning a single neuron with respect to the L_2...
research
โˆ™ 05/19/2023

Tester-Learners for Halfspaces: Universal Algorithms

We give the first tester-learner for halfspaces that succeeds universall...
research
โˆ™ 05/26/2020

Approximation Schemes for ReLU Regression

We consider the fundamental problem of ReLU regression, where the goal i...
research
โˆ™ 02/13/2020

Learning Halfspaces with Massart Noise Under Structured Distributions

We study the problem of learning halfspaces with Massart noise in the di...
research
โˆ™ 08/17/2021

Statistically Near-Optimal Hypothesis Selection

Hypothesis Selection is a fundamental distribution learning problem wher...
research
โˆ™ 11/06/2012

Active and passive learning of linear separators under log-concave distributions

We provide new results concerning label efficient, polynomial time, pass...

Please sign up or login with your details

Forgot password? Click here to reset