Learning Best Combination for Efficient N:M Sparsity

06/14/2022
by   Yuxin Zhang, et al.
0

By forcing at most N out of M consecutive weights to be non-zero, the recent N:M network sparsity has received increasing attention for its two attractive advantages: 1) Promising performance at a high sparsity. 2) Significant speedups on NVIDIA A100 GPUs. Recent studies require an expensive pre-training phase or a heavy dense-gradient computation. In this paper, we show that the N:M learning can be naturally characterized as a combinatorial problem which searches for the best combination candidate within a finite collection. Motivated by this characteristic, we solve N:M sparsity in an efficient divide-and-conquer manner. First, we divide the weight vector into C_M^N combination subsets of a fixed size N. Then, we conquer the combinatorial problem by assigning each combination a learnable score that is jointly optimized with its associate weights. We prove that the introduced scoring mechanism can well model the relative importance between combination subsets. And by gradually removing low-scored subsets, N:M fine-grained sparsity can be efficiently optimized during the normal training phase. Comprehensive experiments demonstrate that our learning best combination (LBC) performs consistently better than off-the-shelf N:M sparsity methods across various networks. Our code is released at <https://github.com/zyxxmu/LBC>.

READ FULL TEXT
research
02/08/2021

Learning N:M Fine-grained Structured Sparse Neural Networks From Scratch

Sparsity in Deep Neural Networks (DNNs) has been widely studied to compr...
research
01/30/2022

Optimizing Gradient-driven Criteria in Network Sparsity: Gradient is All You Need

Network sparsity receives popularity mostly due to its capability to red...
research
02/16/2021

Accelerated Sparse Neural Training: A Provable and Efficient Method to Find N:M Transposable Masks

Recently, researchers proposed pruning deep neural network weights (DNNs...
research
11/01/2018

Balanced Sparsity for Efficient DNN Inference on GPU

In trained deep neural networks, unstructured pruning can reduce redunda...
research
12/02/2022

Are Straight-Through gradients and Soft-Thresholding all you need for Sparse Training?

Turning the weights to zero when training a neural network helps in redu...
research
02/13/2023

Bi-directional Masks for Efficient N:M Sparse Training

We focus on addressing the dense backward propagation issue for training...
research
06/11/2023

Predicting Software Performance with Divide-and-Learn

Predicting the performance of highly configurable software systems is th...

Please sign up or login with your details

Forgot password? Click here to reset