Geometric Regularization from Overparameterization explains Double Descent and other findings

02/18/2022
by   Nicholas J. Teague, et al.
0

The volume of the distribution of possible weight configurations associated with a loss value may be the source of implicit regularization from overparameterization due to the phenomenon of contracting volume with increasing dimensions for geometric figures demonstrated by hyperspheres. This paper introduces geometric regularization and explores potential applicability to several unexplained phenomenon including double descent, the differences between wide and deep networks, the benefits of He initialization and retained proximity in training, gradient confusion, fitness landscape properties, double descent in other learning paradigms, and other findings for overparameterized learning. Experiments are conducted by aggregating histograms of loss values corresponding to randomly sampled initializations in small setups, which find directional correlations in zero or central mode dominance from deviations in width, depth, and initialization distributions. Double descent is likely due to a regularization phase change when a training path reaches low enough loss that the loss manifold volume contraction from a reduced range of potential weight sets is amplified by an overparameterized geometry.

READ FULL TEXT
research
02/26/2023

Can we avoid Double Descent in Deep Neural Networks?

Finding the optimal size of deep learning models is very actual and of b...
research
03/02/2023

Dodging the Sparse Double Descent

This paper presents an approach to addressing the issue of over-parametr...
research
02/20/2020

Do We Need Zero Training Loss After Achieving Zero Training Error?

Overparameterized deep networks have the capacity to memorize training d...
research
05/25/2023

Double Descent of Discrepancy: A Task-, Data-, and Model-Agnostic Phenomenon

In this paper, we studied two identically-trained neural networks (i.e. ...
research
09/21/2022

Deep Double Descent via Smooth Interpolation

Overparameterized deep networks are known to be able to perfectly fit th...
research
12/11/2020

Avoiding The Double Descent Phenomenon of Random Feature Models Using Hybrid Regularization

We demonstrate the ability of hybrid regularization methods to automatic...
research
08/31/2023

The Quest of Finding the Antidote to Sparse Double Descent

In energy-efficient schemes, finding the optimal size of deep learning m...

Please sign up or login with your details

Forgot password? Click here to reset