Implicit Regularization Towards Rank Minimization in ReLU Networks

01/30/2022
by   Nadav Timor, et al.
0

We study the conjectured relationship between the implicit regularization in neural networks, trained with gradient-based methods, and rank minimization of their weight matrices. Previously, it was proved that for linear networks (of depth 2 and vector-valued outputs), gradient flow (GF) w.r.t. the square loss acts as a rank minimization heuristic. However, understanding to what extent this generalizes to nonlinear networks is an open problem. In this paper, we focus on nonlinear ReLU networks, providing several new positive and negative results. On the negative side, we prove (and demonstrate empirically) that, unlike the linear case, GF on ReLU networks may no longer tend to minimize ranks, in a rather strong sense (even approximately, for "most" datasets of size 2). On the positive side, we reveal that ReLU networks of sufficient depth are provably biased towards low-rank solutions in several reasonable settings.

READ FULL TEXT

page 10

page 11

research
12/09/2020

Implicit Regularization in ReLU Networks with the Square Loss

Understanding the implicit regularization (or implicit bias) of gradient...
research
01/28/2022

Training invariances and the low-rank phenomenon: beyond linear networks

The implicit bias induced by the training of neural networks has become ...
research
12/17/2020

Towards Resolving the Implicit Bias of Gradient Descent for Matrix Factorization: Greedy Low-Rank Learning

Matrix factorization is a simple and natural test-bed to investigate the...
research
09/29/2022

Implicit Bias of Large Depth Networks: a Notion of Rank for Nonlinear Functions

We show that the representation cost of fully connected neural networks ...
research
03/02/2023

The Double-Edged Sword of Implicit Bias: Generalization vs. Robustness in ReLU Networks

In this work, we study the implications of the implicit bias of gradient...
research
07/02/2021

Implicit Greedy Rank Learning in Autoencoders via Overparameterized Linear Networks

Deep linear networks trained with gradient descent yield low rank soluti...

Please sign up or login with your details

Forgot password? Click here to reset