Freeze and Chaos for DNNs: an NTK view of Batch Normalization, Checkerboard and Boundary Effects

07/11/2019
by   Arthur Jacot, et al.
2

In this paper, we analyze a number of architectural features of Deep Neural Networks (DNNs), using the so-called Neural Tangent Kernel (NTK). The NTK describes the training trajectory and generalization of DNNs in the infinite-width limit. In this limit, we show that for (fully-connected) DNNs, as the depth grows, two regimes appear: "freeze" (also known as "order"), where the (scaled) NTK converges to a constant (slowing convergence), and "chaos", where it converges to a Kronecker delta (limiting generalization). We show that when using the scaled ReLU as a nonlinearity, we naturally end up in the "freeze". We show that Batch Normalization (BN) avoids the freeze regime by reducing the importance of the constant mode in the NTK. A similar effect is obtained by normalizing the nonlinearity which moves the network to the chaotic regime. We uncover the same "freeze" and "chaos" modes in Deep Deconvolutional Networks (DC-NNs). The "freeze" regime is characterized by checkerboard patterns in the image space in addition to the constant modes in input space. Finally, we introduce a new NTK-based parametrization to eliminate border artifacts and we propose a layer-dependent learning rate to improve the convergence of DC-NNs. We illustrate our findings by training DCGANs using our setup. When trained in the "freeze" regime, we see that the generator collapses to a checkerboard mode. We also demonstrate numerically that the generator collapse can be avoided and that good quality samples can be obtained, by tuning the nonlinearity to reach the "chaos" regime (without using batch normalization).

READ FULL TEXT

page 8

page 12

research
12/08/2020

Analyzing Finite Neural Networks: Can We Trust Neural Tangent Kernel Theory?

Neural Tangent Kernel (NTK) theory is widely used to study the dynamics ...
research
03/14/2023

Context Normalization for Robust Image Classification

Normalization is a pre-processing step that converts the data into a mor...
research
02/23/2023

Phase diagram of training dynamics in deep neural networks: effect of learning rate, depth, and width

We systematically analyze optimization dynamics in deep neural networks ...
research
04/12/2021

A Recipe for Global Convergence Guarantee in Deep Neural Networks

Existing global convergence guarantees of (stochastic) gradient descent ...
research
04/11/2020

From Quantized DNNs to Quantizable DNNs

This paper proposes Quantizable DNNs, a special type of DNNs that can fl...
research
05/30/2023

Benign Overfitting in Deep Neural Networks under Lazy Training

This paper focuses on over-parameterized deep neural networks (DNNs) wit...

Please sign up or login with your details

Forgot password? Click here to reset