Robust Learning of Parsimonious Deep Neural Networks
We propose a simultaneous learning and pruning algorithm capable of identifying and eliminating irrelevant structures in a neural network during the early stages of training. Thus, the computational cost of subsequent training iterations, besides that of inference, is considerably reduced. Our method, based on variational inference principles, learns the posterior distribution of Bernoulli random variables multiplying the units/filters similarly to adaptive dropout. We derive a novel hyper-prior distribution over the prior parameters that is crucial for their optimal selection in a way that the Bernoulli parameters practically converge to either 0 or 1 establishing a deterministic final network. Our algorithm is robust in the sense that it achieves consistent pruning levels and prediction accuracy regardless of weight initialization or the size of the starting network. We provide an analysis of its convergence properties establishing theoretical and practical pruning conditions. We evaluate the proposed algorithm on the MNIST data set and commonly used fully connected and convolutional LeNet architectures. The simulations show that our method achieves pruning levels on par with state-of the-art methods for structured pruning, while maintaining better test-accuracy and more importantly in a manner robust with respect to network initialization and initial size.
READ FULL TEXT