Trainability and Data-dependent Initialization of Over-parameterized ReLU Neural Networks
A neural network is said to be over-specified if its representational power is more than needed, and is said to be over-parameterized if the number of parameters is larger than the number of training data. In both cases, the number of neurons is larger than what it is necessary. In many applications, over-specified or over-parameterized neural networks are successfully employed and shown to be trained effectively. In this paper, we study the trainability of ReLU networks, a necessary condition for the successful training. We show that over-parameterization is both a necessary and a sufficient condition for minimizing the training loss. Specifically, we study the probability distribution of the number of active neurons at the initialization. We say a network is trainable if the number of active neurons is sufficiently large for a learning task. With this notion, we derive an upper bound of the probability of the successful training. Furthermore, we propose a data-dependent initialization method in the over-parameterized setting. Numerical examples are provided to demonstrate the effectiveness of the method and our theoretical findings.
READ FULL TEXT