Good Subnetworks Provably Exist: Pruning via Greedy Forward Selection

03/03/2020
by   Mao Ye, et al.
0

Recent empirical works show that large deep neural networks are often highly redundant and one can find much smaller subnetworks without a significant drop of accuracy. However, most existing methods of network pruning are empirical and heuristic, leaving it open whether good subnetworks provably exist, how to find them efficiently, and if network pruning can be provably better than direct training using gradient descent. We answer these problems positively by proposing a simple greedy selection approach for finding good subnetworks, which starts from an empty network and greedily adds important neurons from the large network. This differs from the existing methods based on backward elimination, which remove redundant neurons from the large network. Theoretically, applying our greedy selection strategy on sufficiently large pre-trained networks guarantees to find small subnetworks with lower loss than networks directly trained with gradient descent. Practically, we improve prior arts of network pruning on learning compact neural architectures on ImageNet, including ResNet, MobilenetV2/V3, and ProxylessNet. Our theory and empirical results on MobileNet suggest that we should fine-tune the pruned subnetworks to leverage the information from the large model, instead of re-training from new random initialization as suggested in <cit.>.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/29/2020

Greedy Optimization Provably Wins the Lottery: Logarithmic Number of Winning Tickets is Enough

Despite the great success of deep learning, recent works show that large...
research
01/01/2023

Theoretical Characterization of How Neural Network Pruning Affects its Generalization

It has been observed in practice that applying pruning-at-initialization...
research
03/09/2022

Data-Efficient Structured Pruning via Submodular Optimization

Structured pruning is an effective approach for compressing large pre-tr...
research
07/31/2021

Provably Efficient Lottery Ticket Discovery

The lottery ticket hypothesis (LTH) claims that randomly-initialized, de...
research
10/11/2019

SiPPing Neural Networks: Sensitivity-informed Provable Pruning of Neural Networks

We introduce a pruning algorithm that provably sparsifies the parameters...
research
03/23/2020

Steepest Descent Neural Architecture Optimization: Escaping Local Optimum with Signed Neural Splitting

We propose signed splitting steepest descent (S3D), which progressively ...
research
12/30/2021

Few-shot Backdoor Defense Using Shapley Estimation

Deep neural networks have achieved impressive performance in a variety o...

Please sign up or login with your details

Forgot password? Click here to reset