Log In Sign Up

Network Pruning That Matters: A Case Study on Retraining Variants

by   Duong H. Le, et al.

Network pruning is an effective method to reduce the computational expense of over-parameterized neural networks for deployment on low-resource systems. Recent state-of-the-art techniques for retraining pruned networks such as weight rewinding and learning rate rewinding have been shown to outperform the traditional fine-tuning technique in recovering the lost accuracy (Renda et al., 2020), but so far it is unclear what accounts for such performance. In this work, we conduct extensive experiments to verify and analyze the uncanny effectiveness of learning rate rewinding. We find that the reason behind the success of learning rate rewinding is the usage of a large learning rate. Similar phenomenon can be observed in other learning rate schedules that involve large learning rates, e.g., the 1-cycle learning rate schedule (Smith et al., 2019). By leveraging the right learning rate schedule in retraining, we demonstrate a counter-intuitive phenomenon in that randomly pruned networks could even achieve better performance than methodically pruned networks (fine-tuned with the conventional approach). Our results emphasize the cruciality of the learning rate schedule in pruned network retraining - a detail often overlooked by practitioners during the implementation of network pruning. One-sentence Summary: We study the effective of different retraining mechanisms while doing pruning


page 1

page 2

page 3

page 4


Comparing Rewinding and Fine-tuning in Neural Network Pruning

Many neural network pruning algorithms proceed in three steps: train the...

How to decay your learning rate

Complex learning rate schedules have become an integral part of deep lea...

Cascade Weight Shedding in Deep Neural Networks: Benefits and Pitfalls for Network Pruning

We report, for the first time, on the cascade weight shedding phenomenon...

Reproducibility Study: Comparing Rewinding and Fine-tuning in Neural Network Pruning

Scope of reproducibility: We are reproducing Comparing Rewinding and Fin...

Lottery Ticket Implies Accuracy Degradation, Is It a Desirable Phenomenon?

In deep model compression, the recent finding "Lottery Ticket Hypothesis...

S-Cyc: A Learning Rate Schedule for Iterative Pruning of ReLU-based Networks

We explore a new perspective on adapting the learning rate (LR) schedule...

Scheduling the Learning Rate via Hypergradients: New Insights and a New Algorithm

We study the problem of fitting task-specific learning rate schedules fr...