Optimizing Learning Rate Schedules for Iterative Pruning of Deep Neural Networks

12/09/2022
by   Shiyu Liu, et al.
0

The importance of learning rate (LR) schedules on network pruning has been observed in a few recent works. As an example, Frankle and Carbin (2019) highlighted that winning tickets (i.e., accuracy preserving subnetworks) can not be found without applying a LR warmup schedule and Renda, Frankle and Carbin (2020) demonstrated that rewinding the LR to its initial state at the end of each pruning cycle improves performance. In this paper, we go one step further by first providing a theoretical justification for the surprising effect of LR schedules. Next, we propose a LR schedule for network pruning called SILO, which stands for S-shaped Improved Learning rate Optimization. The advantages of SILO over existing state-of-the-art (SOTA) LR schedules are two-fold: (i) SILO has a strong theoretical motivation and dynamically adjusts the LR during pruning to improve generalization. Specifically, SILO increases the LR upper bound (max_lr) in an S-shape. This leads to an improvement of 2 4 Transformers, ResNet) on popular datasets such as ImageNet, CIFAR-10/100. (ii) In addition to the strong theoretical motivation, SILO is empirically optimal in the sense of matching an Oracle, which exhaustively searches for the optimal value of max_lr via grid search. We find that SILO is able to precisely adjust the value of max_lr to be within the Oracle optimized interval, resulting in performance competitive with the Oracle with significantly lower complexity.

READ FULL TEXT
research
10/17/2021

S-Cyc: A Learning Rate Schedule for Iterative Pruning of ReLU-based Networks

We explore a new perspective on adapting the learning rate (LR) schedule...
research
05/07/2021

Network Pruning That Matters: A Case Study on Retraining Variants

Network pruning is an effective method to reduce the computational expen...
research
06/20/2020

Paying more attention to snapshots of Iterative Pruning: Improving Model Compression via Ensemble Distillation

Network pruning is one of the most dominant methods for reducing the hea...
research
07/25/2022

Trainability Preserving Neural Structured Pruning

Several recent works empirically find finetuning learning rate is critic...
research
05/12/2021

Dynamical Isometry: The Missing Ingredient for Neural Network Pruning

Several recent works [40, 24] observed an interesting phenomenon in neur...
research
10/29/2020

Greedy Optimization Provably Wins the Lottery: Logarithmic Number of Winning Tickets is Enough

Despite the great success of deep learning, recent works show that large...
research
12/09/2022

AP: Selective Activation for De-sparsifying Pruned Neural Networks

The rectified linear unit (ReLU) is a highly successful activation funct...

Please sign up or login with your details

Forgot password? Click here to reset