Towards Understanding Generalization via Decomposing Excess Risk Dynamics

by   Jiaye Teng, et al.

Generalization is one of the critical issues in machine learning. However, traditional methods like uniform convergence are not powerful enough to fully explain generalization because they may yield vacuous bounds even in overparameterized linear regression regimes. An alternative solution is to analyze the generalization dynamics to derive algorithm-dependent bounds, e.g., stability. Unfortunately, the stability-based bound is still far from explaining the remarkable generalization ability of neural networks due to the coarse-grained analysis of the signal and noise. Inspired by the observation that neural networks show a slow convergence rate when fitting noise, we propose decomposing the excess risk dynamics and applying stability-based bound only on the variance part (which measures how the model performs on pure noise). We provide two applications for the framework, including a linear case (overparameterized linear regression with gradient descent) and a non-linear case (matrix recovery with gradient flow). Under the decomposition framework, the new bound accords better with the theoretical and empirical evidence compared to the stability-based bound and uniform convergence bound.



There are no comments yet.


page 1

page 2

page 3

page 4


Generalization Error Bounds for Optimization Algorithms via Stability

Many machine learning tasks can be formulated as Regularized Empirical R...

An Exponential Efron-Stein Inequality for Lq Stable Learning Rules

There is accumulating evidence in the literature that stability of learn...

Understanding the Role of Adversarial Regularization in Supervised Learning

Despite numerous attempts sought to provide empirical evidence of advers...

Stability Based Generalization Bounds for Exponential Family Langevin Dynamics

We study generalization bounds for noisy stochastic mini-batch iterative...

Explaining generalization in deep learning: progress and fundamental limits

This dissertation studies a fundamental open challenge in deep learning ...

Bennett-type Generalization Bounds: Large-deviation Case and Faster Rate of Convergence

In this paper, we present the Bennett-type generalization bounds of the ...

Harmless interpolation of noisy data in regression

A continuing mystery in understanding the empirical success of deep neur...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.