Learning Compact Neural Networks with Regularization
We study the impact of regularization for learning neural networks. Our goal is speeding up training, improving generalization performance, and training compact models that are cost efficient. Our results apply to weight-sharing (e.g. convolutional), sparsity (i.e. pruning), and low-rank constraints among others. We first introduce covering dimension of the constraint set and provide a Rademacher complexity bound providing insights on generalization properties. Then, we propose and analyze regularized gradient descent algorithms for learning shallow networks. We show that problem becomes well conditioned and local linear convergence occurs once the amount of data exceeds covering dimension (e.g. # of nonzero weights). Finally, we provide insights on layerwise training of deep models by studying a random activation model. Our results show how regularization can be beneficial to overcome overparametrization.
READ FULL TEXT