Relating Regularization and Generalization through the Intrinsic Dimension of Activations

11/23/2022
by   Bradley C. A. Brown, et al.
0

Given a pair of models with similar training set performance, it is natural to assume that the model that possesses simpler internal representations would exhibit better generalization. In this work, we provide empirical evidence for this intuition through an analysis of the intrinsic dimension (ID) of model activations, which can be thought of as the minimal number of factors of variation in the model's representation of the data. First, we show that common regularization techniques uniformly decrease the last-layer ID (LLID) of validation set activations for image classification models and show how this strongly affects generalization performance. We also investigate how excessive regularization decreases a model's ability to extract features from data in earlier layers, leading to a negative effect on validation accuracy even while LLID continues to decrease and training accuracy remains near-perfect. Finally, we examine the LLID over the course of training of models that exhibit grokking. We observe that well after training accuracy saturates, when models “grok” and validation accuracy suddenly improves from random to perfect, there is a co-occurent sudden drop in LLID, thus providing more insight into the dynamics of sudden generalization.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/29/2019

Intrinsic dimension of data representations in deep neural networks

Deep neural networks progressively transform their inputs across multipl...
research
01/28/2022

With Greater Distance Comes Worse Performance: On the Perspective of Layer Utilization and Model Generalization

Generalization of deep neural networks remains one of the main open prob...
research
02/16/2023

Exploring the Representation Manifolds of Stable Diffusion Through the Lens of Intrinsic Dimension

Prompting has become an important mechanism by which users can more effe...
research
08/16/2023

It Ain't That Bad: Understanding the Mysterious Performance Drop in OOD Generalization for Generative Transformer Models

Generative Transformer-based models have achieved remarkable proficiency...
research
10/25/2022

Pruning's Effect on Generalization Through the Lens of Training and Regularization

Practitioners frequently observe that pruning improves model generalizat...
research
02/05/2018

Learning Compact Neural Networks with Regularization

We study the impact of regularization for learning neural networks. Our ...
research
05/30/2023

Stable Anisotropic Regularization

Given the success of Large Language Models (LLMs), there has been consid...

Please sign up or login with your details

Forgot password? Click here to reset