The Asymmetric Maximum Margin Bias of Quasi-Homogeneous Neural Networks

10/07/2022
by   Daniel Kunin, et al.
0

In this work, we explore the maximum-margin bias of quasi-homogeneous neural networks trained with gradient flow on an exponential loss and past a point of separability. We introduce the class of quasi-homogeneous models, which is expressive enough to describe nearly all neural networks with homogeneous activations, even those with biases, residual connections, and normalization layers, while structured enough to enable geometric analysis of its gradient dynamics. Using this analysis, we generalize the existing results of maximum-margin bias for homogeneous networks to this richer class of models. We find that gradient flow implicitly favors a subset of the parameters, unlike in the case of a homogeneous model where all parameters are treated equally. We demonstrate through simple examples how this strong favoritism toward minimizing an asymmetric norm can degrade the robustness of quasi-homogeneous models. On the other hand, we conjecture that this norm-minimization discards, when possible, unnecessary higher-order parameters, reducing the model to a sparser parameterization. Lastly, by applying our theorem to sufficiently expressive neural networks with normalization layers, we reveal a universal mechanism behind the empirical phenomenon of Neural Collapse.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/06/2021

On Margin Maximization in Linear and ReLU Networks

The implicit bias of neural networks has been extensively studied in rec...
research
02/11/2020

Implicit Bias of Gradient Descent for Wide Two-layer Neural Networks Trained with the Logistic Loss

Neural networks trained to minimize the logistic (a.k.a. cross-entropy) ...
research
05/17/2019

Lexicographic and Depth-Sensitive Margins in Homogeneous and Non-Homogeneous Deep Models

With an eye toward understanding complexity control in deep learning, we...
research
12/11/2020

The Implicit Bias for Adaptive Optimization Algorithms on Homogeneous Neural Networks

Despite their overwhelming capacity to overfit, deep neural networks tra...
research
10/24/2020

Inductive Bias of Gradient Descent for Exponentially Weight Normalized Smooth Homogeneous Neural Nets

We analyze the inductive bias of gradient descent for weight normalized ...
research
02/09/2022

Gradient Methods Provably Converge to Non-Robust Networks

Despite a great deal of research, it is still unclear why neural network...
research
02/21/2018

WQO dichotomy for 3-graphs

We investigate data-enriched models, like Petri nets with data, where ex...

Please sign up or login with your details

Forgot password? Click here to reset