Log In Sign Up

DiffPrune: Neural Network Pruning with Deterministic Approximate Binary Gates and L_0 Regularization

by   Yaniv Shulman, et al.

Modern neural network architectures typically have many millions of parameters and can be pruned significantly without substantial loss in effectiveness which demonstrates they are over-parameterized. The contribution of this work is two-fold. The first is a method for approximating a multivariate Bernoulli random variable by means of a deterministic and differentiable transformation of any real-valued multivariate random variable. The second is a method for model selection by element-wise multiplication of parameters with approximate binary gates that may be computed deterministically or stochastically and take on exact zero values. Sparsity is encouraged by the inclusion of a surrogate regularization to the L_0 loss. Since the method is differentiable it enables straightforward and efficient learning of model architectures by an empirical risk minimization procedure with stochastic gradient descent and theoretically enables conditional computation during training. The method also supports any arbitrary group sparsity over parameters or activations and therefore offers a framework for unstructured or flexible structured model pruning. To conclude experiments are performed to demonstrate the effectiveness of the proposed approach.


page 1

page 2

page 3

page 4


Learning Sparse Neural Networks through L_0 Regularization

We propose a practical method for L_0 norm regularization for neural net...

Exact Backpropagation in Binary Weighted Networks with Group Weight Transformations

Quantization based model compression serves as high performing and fast ...

Embedding Differentiable Sparsity into Deep Neural Network

In this paper, we propose embedding sparsity into the structure of deep ...

Deep Differentiable Logic Gate Networks

Recently, research has increasingly focused on developing efficient neur...

Flexible Network Binarization with Layer-wise Priority

How to effectively approximate real-valued parameters with binary codes ...

DeepHoyer: Learning Sparser Neural Network with Differentiable Scale-Invariant Sparsity Measures

In seeking for sparse and efficient neural network models, many previous...

L_0-ARM: Network Sparsification via Stochastic Binary Optimization

We consider network sparsification as an L_0-norm regularized binary opt...