The Low-Rank Simplicity Bias in Deep Networks

by   Minyoung Huh, et al.

Modern deep neural networks are highly over-parameterized compared to the data on which they are trained, yet they often generalize remarkably well. A flurry of recent work has asked: why do deep networks not overfit to their training data? We investigate the hypothesis that deeper nets are implicitly biased to find lower rank solutions and that these are the solutions that generalize well. We prove for the asymptotic case that the percent volume of low effective-rank solutions increases monotonically as linear neural networks are made deeper. We then show empirically that our claim holds true on finite width models. We further empirically find that a similar result holds for non-linear networks: deeper non-linear networks learn a feature space whose kernel has a lower rank. We further demonstrate how linear over-parameterization of deep non-linear models can be used to induce low-rank bias, improving generalization performance without changing the effective model capacity. We evaluate on various model architectures and demonstrate that linearly over-parameterized models outperform existing baselines on image classification tasks, including ImageNet.


page 6

page 7


Robust Training under Label Noise by Over-parameterization

Recently, over-parameterized deep networks, with increasingly more netwo...

Algorithms for Efficiently Learning Low-Rank Neural Networks

We study algorithms for learning low-rank neural networks – networks whe...

A Too-Good-to-be-True Prior to Reduce Shortcut Reliance

Despite their impressive performance in object recognition and other tas...

Generalization Guarantees for Neural Networks via Harnessing the Low-rank Structure of the Jacobian

Modern neural network architectures often generalize well despite contai...

Parallel Deep Neural Networks Have Zero Duality Gap

Training deep neural networks is a well-known highly non-convex problem....

Deep learning generalizes because the parameter-function map is biased towards simple functions

Deep neural networks generalize remarkably well without explicit regular...

Counterfactual Learning-to-Rank for Additive Metrics and Deep Models

Implicit feedback (e.g., clicks, dwell times) is an attractive source of...

Code Repositories