Trainability and Data-dependent Initialization of Over-parameterized ReLU Neural Networks

07/23/2019
by   Yeonjong Shin, et al.
3

A neural network is said to be over-specified if its representational power is more than needed, and is said to be over-parameterized if the number of parameters is larger than the number of training data. In both cases, the number of neurons is larger than what it is necessary. In many applications, over-specified or over-parameterized neural networks are successfully employed and shown to be trained effectively. In this paper, we study the trainability of ReLU networks, a necessary condition for the successful training. We show that over-parameterization is both a necessary and a sufficient condition for minimizing the training loss. Specifically, we study the probability distribution of the number of active neurons at the initialization. We say a network is trainable if the number of active neurons is sufficiently large for a learning task. With this notion, we derive an upper bound of the probability of the successful training. Furthermore, we propose a data-dependent initialization method in the over-parameterized setting. Numerical examples are provided to demonstrate the effectiveness of the method and our theoretical findings.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/15/2019

Dying ReLU and Initialization: Theory and Numerical Examples

The dying ReLU refers to the problem when ReLU neurons become inactive a...
research
01/25/2021

Approximating Probability Distributions by ReLU Networks

How many neurons are needed to approximate a target probability distribu...
research
02/04/2021

A Local Convergence Theory for Mildly Over-Parameterized Two-Layer Neural Network

While over-parameterization is widely believed to be crucial for the suc...
research
01/11/2019

The Benefits of Over-parameterization at Initialization in Deep ReLU Networks

It has been noted in existing literature that over-parameterization in R...
research
06/26/2022

Bounding the Width of Neural Networks via Coupled Initialization – A Worst Case Analysis

A common method in training neural networks is to initialize all the wei...
research
12/23/2013

Nonparametric Weight Initialization of Neural Networks via Integral Representation

A new initialization method for hidden parameters in a neural network is...
research
04/09/2020

Orthogonal Over-Parameterized Training

The inductive bias of a neural network is largely determined by the arch...

Please sign up or login with your details

Forgot password? Click here to reset