Reproducibility in Learning

01/20/2022
by   Russell Impagliazzo, et al.
2

We introduce the notion of a reproducible algorithm in the context of learning. A reproducible learning algorithm is resilient to variations in its samples – with high probability, it returns the exact same output when run on two samples from the same underlying distribution. We begin by unpacking the definition, clarifying how randomness is instrumental in balancing accuracy and reproducibility. We initiate a theory of reproducible algorithms, showing how reproducibility implies desirable properties such as data reuse and efficient testability. Despite the exceedingly strong demand of reproducibility, there are efficient reproducible algorithms for several fundamental problems in statistics and learning. First, we show that any statistical query algorithm can be made reproducible with a modest increase in sample complexity, and we use this to construct reproducible algorithms for finding approximate heavy-hitters and medians. Using these ideas, we give the first reproducible algorithm for learning halfspaces via a reproducible weak learner and a reproducible boosting algorithm. Finally, we initiate the study of lower bounds and inherent tradeoffs for reproducible algorithms, giving nearly tight sample complexity upper and lower bounds for reproducible versus nonreproducible SQ algorithms.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/28/2023

Information-Computation Tradeoffs for Learning Margin Halfspaces with Random Classification Noise

We study the problem of PAC learning γ-margin halfspaces with Random Cla...
research
05/31/2023

Replicability in Reinforcement Learning

We initiate the mathematical study of replicability as an algorithmic pr...
research
04/10/2019

Settling the Sample Complexity of Single-parameter Revenue Maximization

This paper settles the sample complexity of single-parameter revenue max...
research
10/10/2022

Reaching Goals is Hard: Settling the Sample Complexity of the Stochastic Shortest Path

We study the sample complexity of learning an ϵ-optimal policy in the St...
research
06/03/2022

Optimal Weak to Strong Learning

The classic algorithm AdaBoost allows to convert a weak learner, that is...
research
01/27/2023

AdaBoost is not an Optimal Weak to Strong Learner

AdaBoost is a classic boosting algorithm for combining multiple inaccura...
research
02/11/2020

Efficiently Learning and Sampling Interventional Distributions from Observations

We study the problem of efficiently estimating the effect of an interven...

Please sign up or login with your details

Forgot password? Click here to reset