Optimal PAC Bounds Without Uniform Convergence

by   Ishaq Aden-Ali, et al.

In statistical learning theory, determining the sample complexity of realizable binary classification for VC classes was a long-standing open problem. The results of Simon and Hanneke established sharp upper bounds in this setting. However, the reliance of their argument on the uniform convergence principle limits its applicability to more general learning settings such as multiclass classification. In this paper, we address this issue by providing optimal high probability risk bounds through a framework that surpasses the limitations of uniform convergence arguments. Our framework converts the leave-one-out error of permutation invariant predictors into high probability risk bounds. As an application, by adapting the one-inclusion graph algorithm of Haussler, Littlestone, and Warmuth, we propose an algorithm that achieves an optimal PAC bound for binary classification. Specifically, our result shows that certain aggregations of one-inclusion graph algorithms are optimal, addressing a variant of a classic question posed by Warmuth. We further instantiate our framework in three settings where uniform convergence is provably suboptimal. For multiclass classification, we prove an optimal risk bound that scales with the one-inclusion hypergraph density of the class, addressing the suboptimality of the analysis of Daniely and Shalev-Shwartz. For partial hypothesis classification, we determine the optimal sample complexity bound, resolving a question posed by Alon, Hanneke, Holzman, and Moran. For realizable bounded regression with absolute loss, we derive an optimal risk bound that relies on a modified version of the scale-sensitive dimension, refining the results of Bartlett and Long. Our rates surpass standard uniform convergence-based results due to the smaller complexity measure in our risk bound.


page 1

page 2

page 3

page 4


The One-Inclusion Graph Algorithm is not Always Optimal

The one-inclusion graph algorithm of Haussler, Littlestone, and Warmuth ...

PAC-Learning Uniform Ergodic Communicative Networks

This work addressed the problem of learning a network with communication...

Proper Learning, Helly Number, and an Optimal SVM Bound

The classical PAC sample complexity bounds are stated for any Empirical ...

An Improved Uniform Convergence Bound with Fat-Shattering Dimension

The fat-shattering dimension characterizes the uniform convergence prope...

Uniform-PAC Bounds for Reinforcement Learning with Linear Function Approximation

We study reinforcement learning (RL) with linear function approximation....

On Coresets for Regularized Loss Minimization

We design and mathematically analyze sampling-based algorithms for regul...

Empirical Hypothesis Space Reduction

Selecting appropriate regularization coefficients is critical to perform...

Please sign up or login with your details

Forgot password? Click here to reset