Picking Winning Tickets Before Training by Preserving Gradient Flow

02/18/2020
by   Chaoqi Wang, et al.
2

Overparameterization has been shown to benefit both the optimization and generalization of neural networks, but large networks are resource hungry at both training and test time. Network pruning can reduce test-time resource requirements, but is typically applied to trained networks and therefore cannot avoid the expensive training process. We aim to prune networks at initialization, thereby saving resources at training time as well. Specifically, we argue that efficient training requires preserving the gradient flow through the network. This leads to a simple but effective pruning criterion we term Gradient Signal Preservation (GraSP). We empirically investigate the effectiveness of the proposed method with extensive experiments on CIFAR-10, CIFAR-100, Tiny-ImageNet and ImageNet, using VGGNet and ResNet architectures. Our method can prune 80 ImageNet at initialization, with only a 1.6 our method achieves significantly better performance than the baseline at extreme sparsity levels.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/09/2020

Pruning neural networks without any data by iteratively conserving synaptic flow

Pruning the parameters of deep neural networks has generated intense int...
research
06/16/2020

Progressive Skeletonization: Trimming more fat from a network at initialization

Recent studies have shown that skeletonization (pruning parameters) of n...
research
10/12/2022

Towards Theoretically Inspired Neural Initialization Optimization

Automated machine learning has been widely explored to reduce human effo...
research
10/22/2020

PHEW: Paths with higher edge-weights give "winning tickets" without training data

Sparse neural networks have generated substantial interest recently beca...
research
09/13/2022

One-shot Network Pruning at Initialization with Discriminative Image Patches

One-shot Network Pruning at Initialization (OPaI) is an effective method...
research
02/18/2022

Transfer and Marginalize: Explaining Away Label Noise with Privileged Information

Supervised learning datasets often have privileged information, in the f...
research
05/18/2020

Joint Multi-Dimension Pruning

We present joint multi-dimension pruning (named as JointPruning), a new ...

Please sign up or login with your details

Forgot password? Click here to reset