Hierarchical Adaptive Lasso: Learning Sparse Neural Networks with Shrinkage via Single Stage Training

by   Skyler Seto, et al.

Deep neural networks achieve state-of-the-art performance in a variety of tasks, however this performance is closely tied to model size. Sparsity is one approach to limiting model size. Modern techniques for inducing sparsity in neural networks are (1) network pruning, a procedure involving iteratively training a model initialized with a previous run's weights and hard thresholding, (2) training in one-stage with a sparsity inducing penalty (usually based on the Lasso), and (3) training a binary mask jointly with the weights of the network. In this work, we study different sparsity inducing penalties from the perspective of Bayesian hierarchical models with the goal of designing penalties which perform well without retraining subnetworks in isolation. With this motivation, we present a novel penalty called Hierarchical Adaptive Lasso (HALO) which learns to adaptively sparsify weights of a given network via trainable parameters without learning a mask. When used to train over-parametrized networks, our penalty yields small subnetworks with high accuracy (winning tickets) even when the subnetworks are not trained in isolation. Empirically, on the CIFAR-100 dataset, we find that HALO is able to learn highly sparse network (only 5% of the parameters) with approximately a 2% and 4% gain in performance over state-of-the-art magnitude pruning methods at the same level of sparsity.


page 1

page 2

page 3

page 4


Training Sparse Neural Networks using Compressed Sensing

Pruning the weights of neural networks is an effective and widely-used t...

Channel Pruning In Quantization-aware Training: An Adaptive Projection-gradient Descent-shrinkage-splitting Method

We propose an adaptive projection-gradient descent-shrinkage-splitting m...

Adaptive Optimizers with Sparse Group Lasso for Neural Networks in CTR Prediction

We develop a novel framework that adds the regularizers of the sparse gr...

ESPN: Extremely Sparse Pruned Networks

Deep neural networks are often highly overparameterized, prohibiting the...

Controlled Sparsity via Constrained Optimization or: How I Learned to Stop Tuning Penalties and Love Constraints

The performance of trained neural networks is robust to harsh levels of ...

Efficient and Sparse Neural Networks by Pruning Weights in a Multiobjective Learning Approach

Overparameterization and overfitting are common concerns when designing ...

Signing the Supermask: Keep, Hide, Invert

The exponential growth in numbers of parameters of neural networks over ...