Dynamic Sparsity Is Channel-Level Sparsity Learner

05/30/2023
by   Lu Yin, et al.
1

Sparse training has received an upsurging interest in machine learning due to its tantalizing saving potential for the entire training process as well as inference. Dynamic sparse training (DST), as a leading sparse training approach, can train deep neural networks at high sparsity from scratch to match the performance of their dense counterparts. However, most if not all DST prior arts demonstrate their effectiveness on unstructured sparsity with highly irregular sparse patterns, which receives limited support in common hardware. This limitation hinders the usage of DST in practice. In this paper, we propose Channel-aware dynamic sparse (Chase), which for the first time seamlessly translates the promise of unstructured dynamic sparsity to GPU-friendly channel-level sparsity (not fine-grained N:M or group sparsity) during one end-to-end training process, without any ad-hoc operations. The resulting small sparse networks can be directly accelerated by commodity hardware, without using any particularly sparsity-aware hardware accelerators. This appealing outcome is partially motivated by a hidden phenomenon of dynamic sparsity: off-the-shelf unstructured DST implicitly involves biased parameter reallocation across channels, with a large fraction of channels (up to 60%) being sparser than others. By progressively identifying and removing these channels during training, our approach translates unstructured sparsity to channel-wise sparsity. Our experimental results demonstrate that Chase achieves 1.7 X inference throughput speedup on common GPU devices without compromising accuracy with ResNet-50 on ImageNet. We release our codes in https://github.com/luuyin/chase.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/08/2021

Learning N:M Fine-grained Structured Sparse Neural Networks From Scratch

Sparsity in Deep Neural Networks (DNNs) has been widely studied to compr...
research
02/09/2022

Coarsening the Granularity: Towards Structurally Sparse Lottery Tickets

The lottery ticket hypothesis (LTH) has shown that dense models contain ...
research
06/28/2021

FreeTickets: Accurate, Robust and Efficient Deep Ensemble by Training with Dynamic Sparsity

Recent works on sparse neural networks have demonstrated that it is poss...
research
06/09/2023

Spatial Re-parameterization for N:M Sparsity

This paper presents a Spatial Re-parameterization (SpRe) method for the ...
research
11/01/2018

Balanced Sparsity for Efficient DNN Inference on GPU

In trained deep neural networks, unstructured pruning can reduce redunda...
research
06/08/2021

Chasing Sparsity in Vision Transformers: An End-to-End Exploration

Vision transformers (ViTs) have recently received explosive popularity, ...
research
05/27/2022

Spartan: Differentiable Sparsity via Regularized Transportation

We present Spartan, a method for training sparse neural network models w...

Please sign up or login with your details

Forgot password? Click here to reset