Towards Optimal Problem Dependent Generalization Error Bounds in Statistical Learning Theory

11/12/2020
by   Yunbei Xu, et al.
0

We study problem-dependent rates, i.e., generalization errors that scale near-optimally with the variance, the effective loss, or the gradient norms evaluated at the "best hypothesis." We introduce a principled framework dubbed "uniform localized convergence," and characterize sharp problem-dependent rates for central statistical learning problems. From a methodological viewpoint, our framework resolves several fundamental limitations of existing uniform convergence and localization analysis approaches. It also provides improvements and some level of unification in the study of localized complexities, one-sided uniform inequalities, and sample-based iterative algorithms. In the so-called "slow rate" regime, we provides the first (moment-penalized) estimator that achieves the optimal variance-dependent rate for general "rich" classes; we also establish improved loss-dependent rate for standard empirical risk minimization. In the "fast rate" regime, we establish finite-sample problem-dependent bounds that are comparable to precise asymptotics. In addition, we show that iterative algorithms like gradient descent and first-order Expectation-Maximization can achieve optimal generalization error in several representative problems across the areas of non-convex learning, stochastic optimization, and learning with missing data.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/19/2021

Improved Learning Rates for Stochastic Optimization: Two Theoretical Viewpoints

Generalization performance of stochastic optimization stands a central p...
research
11/26/2014

Localized Complexities for Transductive Learning

We show two novel concentration inequalities for suprema of empirical pr...
research
07/19/2017

Generalization Bounds of SGLD for Non-convex Learning: Two Theoretical Viewpoints

Algorithm-dependent generalization error bounds are central to statistic...
research
07/03/2019

Quickly Finding the Best Linear Model in High Dimensions

We study the problem of finding the best linear model that can minimize ...
research
10/11/2011

The Generalization Ability of Online Algorithms for Dependent Data

We study the generalization performance of online learning algorithms tr...
research
09/16/2022

Stability and Generalization for Markov Chain Stochastic Gradient Methods

Recently there is a large amount of work devoted to the study of Markov ...
research
09/19/2022

Generalization Bounds for Stochastic Gradient Descent via Localized ε-Covers

In this paper, we propose a new covering technique localized for the tra...

Please sign up or login with your details

Forgot password? Click here to reset