Directional Pruning of Deep Neural Networks

06/16/2020
by   Shih-Kang Chao, et al.
0

In the light of the fact that the stochastic gradient descent (SGD) often finds a flat minimum valley in the training loss, we propose a novel directional pruning method which searches for a sparse minimizer in that flat region. The proposed pruning method is automatic in the sense that neither retraining nor expert knowledge is required. To overcome the computational formidability of estimating the flat directions, we propose to use a carefully tuned ℓ_1 proximal gradient algorithm which can provably achieve the directional pruning with a small learning rate after sufficient training. The empirical results show that our algorithm performs competitively in highly sparse regime (92% sparsity) among many existing automatic pruning methods on the ResNet50 with the ImageNet, while using only a slightly higher wall time and memory footprint than the SGD. Using the VGG16 and the wide ResNet 28x10 on the CIFAR-10 and CIFAR-100, we demonstrate that our algorithm reaches the same minima valley as the SGD, and the minima found by our algorithm and the SGD do not deviate in directions that impact the training loss.

READ FULL TEXT

page 8

page 29

research
07/12/2021

Structured Directional Pruning via Perturbation Orthogonal Projection

Structured pruning is an effective compression technique to reduce the c...
research
06/04/2021

Stochastic gradient descent with noise of machine learning type. Part II: Continuous time analysis

The representation of functions by artificial neural networks depends on...
research
09/05/2020

S-SGD: Symmetrical Stochastic Gradient Descent with Weight Noise Injection for Reaching Flat Minima

The stochastic gradient descent (SGD) method is most widely used for dee...
research
09/14/2020

Deforming the Loss Surface to Affect the Behaviour of the Optimizer

In deep learning, it is usually assumed that the optimization process is...
research
07/03/2017

Parle: parallelizing stochastic gradient descent

We propose a new algorithm called Parle for parallel training of deep ne...
research
07/24/2020

Deforming the Loss Surface

In deep learning, it is usually assumed that the shape of the loss surfa...
research
07/18/2022

Hidden Progress in Deep Learning: SGD Learns Parities Near the Computational Limit

There is mounting empirical evidence of emergent phenomena in the capabi...

Please sign up or login with your details

Forgot password? Click here to reset