Phenomenology of Double Descent in Finite-Width Neural Networks

03/14/2022
by   Sidak Pal Singh, et al.
4

`Double descent' delineates the generalization behaviour of models depending on the regime they belong to: under- or over-parameterized. The current theoretical understanding behind the occurrence of this phenomenon is primarily based on linear and kernel regression models – with informal parallels to neural networks via the Neural Tangent Kernel. Therefore such analyses do not adequately capture the mechanisms behind double descent in finite-width neural networks, as well as, disregard crucial components – such as the choice of the loss function. We address these shortcomings by leveraging influence functions in order to derive suitable expressions of the population loss and its lower bound, while imposing minimal assumptions on the form of the parametric model. Our derived bounds bear an intimate connection with the spectrum of the Hessian at the optimum, and importantly, exhibit a double descent behaviour at the interpolation threshold. Building on our analysis, we further investigate how the loss function affects double descent – and thus uncover interesting properties of neural networks and their Hessian spectra near the interpolation threshold.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/02/2020

On the linearity of large non-linear models: when and why the tangent kernel is constant

The goal of this work is to shed light on the remarkable phenomenon of t...
research
03/02/2020

Double Trouble in Double Descent : Bias and Variance(s) in the Lazy Regime

Deep neural networks can achieve remarkable generalization performances ...
research
12/06/2021

Multi-scale Feature Learning Dynamics: Insights for Double Descent

A key challenge in building theoretical foundations for deep learning is...
research
10/19/2020

Do Deeper Convolutional Networks Perform Better?

Over-parameterization is a recent topic of much interest in the machine ...
research
03/08/2021

Asymptotics of Ridge Regression in Convolutional Models

Understanding generalization and estimation error of estimators for simp...
research
03/01/2022

Contrasting random and learned features in deep Bayesian linear regression

Understanding how feature learning affects generalization is among the f...
research
03/04/2020

Rethinking Parameter Counting in Deep Models: Effective Dimensionality Revisited

Neural networks appear to have mysterious generalization properties when...

Please sign up or login with your details

Forgot password? Click here to reset