Early Neuron Alignment in Two-layer ReLU Networks with Small Initialization

07/24/2023
by   Hancheng Min, et al.
0

This paper studies the problem of training a two-layer ReLU network for binary classification using gradient flow with small initialization. We consider a training dataset with well-separated input vectors: Any pair of input data with the same label are positively correlated, and any pair with different labels are negatively correlated. Our analysis shows that, during the early phase of training, neurons in the first layer try to align with either the positive data or the negative data, depending on its corresponding weight on the second layer. A careful analysis of the neurons' directional dynamics allows us to provide an 𝒪(log n/√(μ)) upper bound on the time it takes for all neurons to achieve good alignment with the input data, where n is the number of data points and μ measures how well the data are separated. After the early alignment phase, the loss converges to zero at a 𝒪(1/t) rate, and the weight matrix on the first layer is approximately low-rank. Numerical experiments on the MNIST dataset illustrate our theoretical findings.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/10/2023

Learning a Neuron by a Shallow ReLU Network: Dynamics and Implicit Bias for Correlated Inputs

We prove that, for the fundamental regression task of learning a single ...
research
09/27/2022

Magnitude and Angle Dynamics in Training Single ReLU Neurons

To understand learning the dynamics of deep ReLU networks, we investigat...
research
02/15/2022

Random Feature Amplification: Feature Learning and Generalization in Neural Networks

In this work, we provide a characterization of the feature-learning proc...
research
05/18/2022

On the Effective Number of Linear Regions in Shallow Univariate ReLU Networks: Convergence Guarantees and Implicit Bias

We study the dynamics and implicit bias of gradient flow (GF) on univari...
research
06/14/2020

Global Convergence of Sobolev Training for Overparametrized Neural Networks

Sobolev loss is used when training a network to approximate the values a...
research
10/09/2020

Neural Random Projection: From the Initial Task To the Input Similarity Problem

In this paper, we propose a novel approach for implicit data representat...
research
06/16/2023

Training shallow ReLU networks on noisy data using hinge loss: when do we overfit and is it benign?

We study benign overfitting in two-layer ReLU networks trained using gra...

Please sign up or login with your details

Forgot password? Click here to reset