The Low-Rank Simplicity Bias in Deep Networks

03/18/2021
by   Minyoung Huh, et al.
0

Modern deep neural networks are highly over-parameterized compared to the data on which they are trained, yet they often generalize remarkably well. A flurry of recent work has asked: why do deep networks not overfit to their training data? We investigate the hypothesis that deeper nets are implicitly biased to find lower rank solutions and that these are the solutions that generalize well. We prove for the asymptotic case that the percent volume of low effective-rank solutions increases monotonically as linear neural networks are made deeper. We then show empirically that our claim holds true on finite width models. We further empirically find that a similar result holds for non-linear networks: deeper non-linear networks learn a feature space whose kernel has a lower rank. We further demonstrate how linear over-parameterization of deep non-linear models can be used to induce low-rank bias, improving generalization performance without changing the effective model capacity. We evaluate on various model architectures and demonstrate that linearly over-parameterized models outperform existing baselines on image classification tasks, including ImageNet.

READ FULL TEXT

page 6

page 7

research
02/28/2022

Robust Training under Label Noise by Over-parameterization

Recently, over-parameterized deep networks, with increasingly more netwo...
research
02/02/2022

Algorithms for Efficiently Learning Low-Rank Neural Networks

We study algorithms for learning low-rank neural networks – networks whe...
research
02/12/2021

A Too-Good-to-be-True Prior to Reduce Shortcut Reliance

Despite their impressive performance in object recognition and other tas...
research
06/12/2019

Generalization Guarantees for Neural Networks via Harnessing the Low-rank Structure of the Jacobian

Modern neural network architectures often generalize well despite contai...
research
08/29/2023

An Adaptive Tangent Feature Perspective of Neural Networks

In order to better understand feature learning in neural networks, we pr...
research
11/21/2022

Linear Stability Hypothesis and Rank Stratification for Nonlinear Models

Models with nonlinear architectures/parameterizations such as deep neura...
research
02/22/2019

Capacity allocation through neural network layers

Capacity analysis has been recently introduced as a way to analyze how l...

Please sign up or login with your details

Forgot password? Click here to reset