Are Straight-Through gradients and Soft-Thresholding all you need for Sparse Training?

12/02/2022
by   Antoine Vanderschueren, et al.
0

Turning the weights to zero when training a neural network helps in reducing the computational complexity at inference. To progressively increase the sparsity ratio in the network without causing sharp weight discontinuities during training, our work combines soft-thresholding and straight-through gradient estimation to update the raw, i.e. non-thresholded, version of zeroed weights. Our method, named ST-3 for straight-through/soft-thresholding/sparse-training, obtains SoA results, both in terms of accuracy/sparsity and accuracy/FLOPS trade-offs, when progressively increasing the sparsity ratio in a single training cycle. In particular, despite its simplicity, ST-3 favorably compares to the most recent methods, adopting differentiable formulations or bio-inspired neuroregeneration principles. This suggests that the key ingredients for effective sparsification primarily lie in the ability to give the weights the freedom to evolve smoothly across the zero state while progressively increasing the sparsity ratio. Source code and weights available at https://github.com/vanderschuea/stthree

READ FULL TEXT

page 5

page 7

research
01/30/2022

Optimizing Gradient-driven Criteria in Network Sparsity: Gradient is All You Need

Network sparsity receives popularity mostly due to its capability to red...
research
05/03/2023

Dynamic Sparse Training with Structured Sparsity

DST methods achieve state-of-the-art results in sparse neural network tr...
research
04/14/2023

AUTOSPARSE: Towards Automated Sparse Training of Deep Neural Networks

Sparse training is emerging as a promising avenue for reducing the compu...
research
07/19/2023

Deep unrolling Shrinkage Network for Dynamic MR imaging

Deep unrolling networks that utilize sparsity priors have achieved great...
research
06/09/2023

Spatial Re-parameterization for N:M Sparsity

This paper presents a Spatial Re-parameterization (SpRe) method for the ...
research
06/14/2022

Learning Best Combination for Efficient N:M Sparsity

By forcing at most N out of M consecutive weights to be non-zero, the re...
research
10/29/2021

Hyperparameter Tuning is All You Need for LISTA

Learned Iterative Shrinkage-Thresholding Algorithm (LISTA) introduces th...

Please sign up or login with your details

Forgot password? Click here to reset