Optimizing Gradient-driven Criteria in Network Sparsity: Gradient is All You Need

01/30/2022
by   Yuxin Zhang, et al.
0

Network sparsity receives popularity mostly due to its capability to reduce the network complexity. Extensive studies excavate gradient-driven sparsity. Typically, these methods are constructed upon premise of weight independence, which however, is contrary to the fact that weights are mutually influenced. Thus, their performance remains to be improved. In this paper, we propose to further optimize gradient-driven sparsity (OptG) by solving this independence paradox. Our motive comes from the recent advances on supermask training which shows that sparse subnetworks can be located in a randomly initialized network by simply updating mask values without modifying any weight. We prove that supermask training is to accumulate the weight gradients and can partly solve the independence paradox. Consequently, OptG integrates supermask training into gradient-driven sparsity, and a specialized mask optimizer is designed to solve the independence paradox. Experiments show that OptG can well surpass many existing state-of-the-art competitors. Our code is available at <https://github.com/zyxxmu/OptG>.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/02/2022

Are Straight-Through gradients and Soft-Thresholding all you need for Sparse Training?

Turning the weights to zero when training a neural network helps in redu...
research
06/30/2023

Systematic Investigation of Sparse Perturbed Sharpness-Aware Minimization Optimizer

Deep neural networks often suffer from poor generalization due to comple...
research
02/13/2023

Bi-directional Masks for Efficient N:M Sparse Training

We focus on addressing the dense backward propagation issue for training...
research
06/14/2022

Learning Best Combination for Efficient N:M Sparsity

By forcing at most N out of M consecutive weights to be non-zero, the re...
research
10/25/2022

Gradient-based Weight Density Balancing for Robust Dynamic Sparse Training

Training a sparse neural network from scratch requires optimizing connec...
research
10/29/2021

Hyperparameter Tuning is All You Need for LISTA

Learned Iterative Shrinkage-Thresholding Algorithm (LISTA) introduces th...
research
06/13/2023

Lookaround Optimizer: k steps around, 1 step average

Weight Average (WA) is an active research topic due to its simplicity in...

Please sign up or login with your details

Forgot password? Click here to reset