Testing distributional assumptions of learning algorithms

04/14/2022
by   Ronitt Rubinfeld, et al.
6

There are many important high dimensional function classes that have fast agnostic learning algorithms when strong assumptions on the distribution of examples can be made, such as Gaussianity or uniformity over the domain. But how can one be sufficiently confident that the data indeed satisfies the distributional assumption, so that one can trust in the output quality of the agnostic learning algorithm? We propose a model by which to systematically study the design of tester-learner pairs (𝒜,𝒯), such that if the distribution on examples in the data passes the tester 𝒯 then one can safely trust the output of the agnostic learner 𝒜 on the data. To demonstrate the power of the model, we apply it to the classical problem of agnostically learning halfspaces under the standard Gaussian distribution and present a tester-learner pair with a combined run-time of n^Õ(1/ϵ^4). This qualitatively matches that of the best known ordinary agnostic learning algorithms for this task. In contrast, finite sample Gaussian distribution testers do not exist for the L_1 and EMD distance measures. A key step in the analysis is a novel characterization of concentration and anti-concentration properties of a distribution whose low-degree moments approximately match those of a Gaussian. We also use tools from polynomial approximation theory. In contrast, we show strong lower bounds on the combined run-times of tester-learner pairs for the problems of agnostically learning convex sets under the Gaussian distribution and for monotone Boolean functions under the uniform distribution over {0,1}^n. Through these lower bounds we exhibit natural problems where there is a dramatic gap between standard agnostic learning run-time and the run-time of the best tester-learner pair.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/05/2023

Agnostic proper learning of monotone functions: beyond the black-box correction barrier

We give the first agnostic, efficient, proper learning algorithm for mon...
research
02/10/2021

Agnostic Proper Learning of Halfspaces under Gaussian Marginals

We study the problem of agnostically learning halfspaces under the Gauss...
research
02/08/2021

The Optimality of Polynomial Regression for Agnostic Learning under Gaussian Marginals

We study the problem of agnostic learning under the Gaussian distributio...
research
11/23/2022

A Moment-Matching Approach to Testable Learning and a New Characterization of Rademacher Complexity

A remarkable recent paper by Rubinfeld and Vasilyan (2022) initiated the...
research
03/09/2023

Efficient Testable Learning of Halfspaces with Adversarial Label Noise

We give the first polynomial-time algorithm for the testable learning of...
research
03/23/2022

New Distinguishers for Negation-Limited Weak Pseudorandom Functions

We show how to distinguish circuits with log k negations (a.k.a k-monoto...
research
02/28/2023

An Efficient Tester-Learner for Halfspaces

We give the first efficient algorithm for learning halfspaces in the tes...

Please sign up or login with your details

Forgot password? Click here to reset