The Role of Regularization in Shaping Weight and Node Pruning Dependency and Dynamics
The pressing need to reduce the capacity of deep neural networks has stimulated the development of network dilution methods and their analysis. While the ability of L_1 and L_0 regularization to encourage sparsity is often mentioned, L_2 regularization is seldom discussed in this context. We present a novel framework for weight pruning by sampling from a probability function that favors the zeroing of smaller weights. In addition, we examine the contribution of L_1 and L_2 regularization to the dynamics of node pruning while optimizing for weight pruning. We then demonstrate the effectiveness of the proposed stochastic framework when used together with a weight decay regularizer on popular classification models in removing 50 the nodes in an MLP for MNIST classification, 60 CIFAR10 classification, and on medical image models in removing 60 channels in a U-Net for instance segmentation and 50 model for COVID-19 detection. For these node-pruned networks, we also present competitive weight pruning results that are only slightly less accurate than the original, dense networks.
READ FULL TEXT