SGLD-Based Information Criteria and the Over-Parameterized Regime

06/08/2023
by   Haobo Chen, et al.
0

Double-descent refers to the unexpected drop in test loss of a learning algorithm beyond an interpolating threshold with over-parameterization, which is not predicted by information criteria in their classical forms due to the limitations in the standard asymptotic approach. We update these analyses using the information risk minimization framework and provide Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC) for models learned by stochastic gradient Langevin dynamics (SGLD). Notably, the AIC and BIC penalty terms for SGLD correspond to specific information measures, i.e., symmetrized KL information and KL divergence. We extend this information-theoretic analysis to over-parameterized models by characterizing the SGLD-based BIC for the random feature model in the regime where the number of parameters p and the number of samples n tend to infinity, with p/n fixed. Our experiments demonstrate that the refined SGLD-based BIC can track the double-descent curve, providing meaningful guidance for model selection and revealing new insights into the behavior of SGLD learning algorithms in the over-parameterized regime.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/15/2023

The Interpolating Information Criterion for Overparameterized Models

The problem of model selection is considered for the setting of interpol...
research
01/19/2023

Convergence beyond the over-parameterized regime using Rayleigh quotients

In this paper, we present a new strategy to prove the convergence of dee...
research
07/02/2021

Mitigating deep double descent by concatenating inputs

The double descent curve is one of the most intriguing properties of dee...
research
08/03/2020

Multiple Descent: Design Your Own Generalization Curve

This paper explores the generalization loss of linear regression in vari...
research
11/18/2022

Understanding the double descent curve in Machine Learning

The theory of bias-variance used to serve as a guide for model selection...
research
10/18/2022

Information-theoretic Characterizations of Generalization Error for the Gibbs Algorithm

Various approaches have been developed to upper bound the generalization...

Please sign up or login with your details

Forgot password? Click here to reset