Deep Neural Collapse Is Provably Optimal for the Deep Unconstrained Features Model

05/22/2023
by   Peter Súkeník, et al.
0

Neural collapse (NC) refers to the surprising structure of the last layer of deep neural networks in the terminal phase of gradient descent training. Recently, an increasing amount of experimental evidence has pointed to the propagation of NC to earlier layers of neural networks. However, while the NC in the last layer is well studied theoretically, much less is known about its multi-layered counterpart - deep neural collapse (DNC). In particular, existing work focuses either on linear layers or only on the last two layers at the price of an extra assumption. Our paper fills this gap by generalizing the established analytical framework for NC - the unconstrained features model - to multiple non-linear layers. Our key technical contribution is to show that, in a deep unconstrained features model, the unique global optimum for binary classification exhibits all the properties typical of DNC. This explains the existing experimental evidence of DNC. We also empirically show that (i) by optimizing deep unconstrained features models via gradient descent, the resulting solution agrees well with our theory, and (ii) trained networks recover the unconstrained features suitable for the occurrence of DNC, thus supporting the validity of this modeling principle.

READ FULL TEXT

page 28

page 31

research
05/02/2016

Simple2Complex: Global Optimization by Gradient Descent

A method named simple2complex for modeling and training deep neural netw...
research
11/23/2020

Neural collapse with unconstrained features

Neural collapse is an emergent phenomenon in deep learning that was rece...
research
08/01/2018

Geometry of energy landscapes and the optimizability of deep neural networks

Deep neural networks are workhorse models in machine learning with multi...
research
08/05/2019

Gradient Descent Finds Global Minima for Generalizable Deep Neural Networks of Practical Sizes

In this paper, we theoretically prove that gradient descent can find a g...
research
05/30/2021

On the geometry of generalization and memorization in deep neural networks

Understanding how large neural networks avoid memorizing training data i...
research
06/19/2021

Learning and Generalization in Overparameterized Normalizing Flows

In supervised learning, it is known that overparameterized neural networ...
research
10/29/2022

Perturbation Analysis of Neural Collapse

Training deep neural networks for classification often includes minimizi...

Please sign up or login with your details

Forgot password? Click here to reset