Log In Sign Up

Dynamic Model Pruning with Feedback

by   Tao Lin, et al.

Deep neural networks often have millions of parameters. This can hinder their deployment to low-end devices, not only due to high memory requirements but also because of increased latency at inference. We propose a novel model compression method that generates a sparse trained model without additional overhead: by allowing (i) dynamic allocation of the sparsity pattern and (ii) incorporating feedback signal to reactivate prematurely pruned weights we obtain a performant sparse model in one single training pass (retraining is not needed, but can further improve the performance). We evaluate our method on CIFAR-10 and ImageNet, and show that the obtained sparse models can reach the state-of-the-art performance of dense models. Moreover, their performance surpasses that of models generated by all previously proposed pruning schemes.


page 4

page 16


WeightMom: Learning Sparse Networks using Iterative Momentum-based pruning

Deep Neural Networks have been used in a wide variety of applications wi...

Dynamic Collective Intelligence Learning: Finding Efficient Sparse Model via Refined Gradients for Pruned Weights

With the growth of deep neural networks (DNN), the number of DNN paramet...

Dynamic ConvNets on Tiny Devices via Nested Sparsity

This work introduces a new training and compression pipeline to build Ne...

SNIPER Training: Variable Sparsity Rate Training For Text-To-Speech

Text-to-speech (TTS) models have achieved remarkable naturalness in rece...

Holistic Filter Pruning for Efficient Deep Neural Networks

Deep neural networks (DNNs) are usually over-parameterized to increase t...

FreezeNet: Full Performance by Reduced Storage Costs

Pruning generates sparse networks by setting parameters to zero. In this...

Lottery Pools: Winning More by Interpolating Tickets without Increasing Training or Inference Cost

Lottery tickets (LTs) is able to discover accurate and sparse subnetwork...