Learning Compact Neural Networks with Regularization

02/05/2018
by   Samet Oymak, et al.
0

We study the impact of regularization for learning neural networks. Our goal is speeding up training, improving generalization performance, and training compact models that are cost efficient. Our results apply to weight-sharing (e.g. convolutional), sparsity (i.e. pruning), and low-rank constraints among others. We first introduce covering dimension of the constraint set and provide a Rademacher complexity bound providing insights on generalization properties. Then, we propose and analyze regularized gradient descent algorithms for learning shallow networks. We show that problem becomes well conditioned and local linear convergence occurs once the amount of data exceeds covering dimension (e.g. # of nonzero weights). Finally, we provide insights on layerwise training of deep models by studying a random activation model. Our results show how regularization can be beneficial to overcome overparametrization.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/04/2018

Understanding Regularization in Batch Normalization

Batch Normalization (BN) makes output of hidden neuron had zero mean and...
research
06/04/2018

Algorithmic Regularization in Learning Deep Homogeneous Models: Layers are Automatically Balanced

We study the implicit regularization imposed by gradient descent for lea...
research
02/07/2019

Combining learning rate decay and weight decay with complexity gradient descent - Part I

The role of L^2 regularization, in the specific case of deep neural netw...
research
10/10/2017

High-dimensional dynamics of generalization error in neural networks

We perform an average case analysis of the generalization dynamics of la...
research
11/27/2020

Deep orthogonal linear networks are shallow

We consider the problem of training a deep orthogonal linear network, wh...
research
02/12/2020

Topologically Densified Distributions

We study regularization in the context of small sample-size learning wit...
research
11/23/2022

Relating Regularization and Generalization through the Intrinsic Dimension of Activations

Given a pair of models with similar training set performance, it is natu...

Please sign up or login with your details

Forgot password? Click here to reset