Towards Understanding the Condensation of Two-layer Neural Networks at Initial Training

05/25/2021
by   Zhi-Qin John Xu, et al.
0

It is important to study what implicit regularization is imposed on the loss function during the training that leads over-parameterized neural networks (NNs) to good performance on real dataset. Empirically, existing works have shown that weights of NNs condense on isolated orientations with small initialization. The condensation implies that the NN learns features from the training data and is effectively a much smaller network. In this work, we show that the singularity of the activation function at original point is a key factor to understanding the condensation at initial training stage. Our experiments suggest that the maximal number of condensed orientations is twice of the singularity order. Our theoretical analysis confirms experiments for two cases, one is for the first-order singularity activation function and the other is for the one-dimensional input. This work takes a step towards understanding how small initialization implicitly leads NNs to condensation at initial training, which is crucial to understand the training and the learning of deep NNs.

READ FULL TEXT
research
02/19/2019

On the Impact of the Activation Function on Deep Neural Networks Training

The weight initialization and the activation function of deep neural net...
research
05/21/2018

On the Selection of Initialization and Activation Function for Deep Neural Networks

The weight initialization and the activation function of deep neural net...
research
05/17/2023

Understanding the Initial Condensation of Convolutional Neural Networks

Previous research has shown that fully-connected networks with small ini...
research
08/31/2020

Extreme Memorization via Scale of Initialization

We construct an experimental setup in which changing the scale of initia...
research
06/25/2018

Propagating Uncertainty through the tanh Function with Application to Reservoir Computing

Many neural networks use the tanh activation function, however when give...
research
03/12/2023

Phase Diagram of Initial Condensation for Two-layer Neural Networks

The phenomenon of distinct behaviors exhibited by neural networks under ...
research
11/01/2021

Investigating the locality of neural network training dynamics

A fundamental quest in the theory of deep-learning is to understand the ...

Please sign up or login with your details

Forgot password? Click here to reset