S-Cyc: A Learning Rate Schedule for Iterative Pruning of ReLU-based Networks

10/17/2021
by   Shiyu Liu, et al.
0

We explore a new perspective on adapting the learning rate (LR) schedule to improve the performance of the ReLU-based network as it is iteratively pruned. Our work and contribution consist of four parts: (i) We find that, as the ReLU-based network is iteratively pruned, the distribution of weight gradients tends to become narrower. This leads to the finding that as the network becomes more sparse, a larger value of LR should be used to train the pruned network. (ii) Motivated by this finding, we propose a novel LR schedule, called S-Cyclical (S-Cyc) which adapts the conventional cyclical LR schedule by gradually increasing the LR upper bound (max_lr) in an S-shape as the network is iteratively pruned.We highlight that S-Cyc is a method agnostic LR schedule that applies to many iterative pruning methods. (iii) We evaluate the performance of the proposed S-Cyc and compare it to four LR schedule benchmarks. Our experimental results on three state-of-the-art networks (e.g., VGG-19, ResNet-20, ResNet-50) and two popular datasets (e.g., CIFAR-10, ImageNet-200) demonstrate that S-Cyc consistently outperforms the best performing benchmark with an improvement of 2.1 increase in complexity. (iv) We evaluate S-Cyc against an oracle and show that S-Cyc achieves comparable performance to the oracle, which carefully tunes max_lr via grid search.

READ FULL TEXT
research
12/09/2022

Optimizing Learning Rate Schedules for Iterative Pruning of Deep Neural Networks

The importance of learning rate (LR) schedules on network pruning has be...
research
04/13/2020

k-decay: A New Method For Learning Rate Schedule

It is well known that the learning rate is the most important hyper-para...
research
05/30/2021

LRTuner: A Learning Rate Tuner for Deep Neural Networks

One very important hyperparameter for training deep neural networks is t...
research
07/05/2021

One-Cycle Pruning: Pruning ConvNets Under a Tight Training Budget

Introducing sparsity in a neural network has been an efficient way to re...
research
12/09/2022

AP: Selective Activation for De-sparsifying Pruned Neural Networks

The rectified linear unit (ReLU) is a highly successful activation funct...
research
03/31/2021

Fast Certified Robust Training via Better Initialization and Shorter Warmup

Recently, bound propagation based certified adversarial defense have bee...
research
11/21/2018

Network Abstractions of Prescription Patterns in a Medicaid Population

Understanding prescription patterns have relied largely on aggregate sta...

Please sign up or login with your details

Forgot password? Click here to reset