DARB: A Density-Aware Regular-Block Pruning for Deep Neural Networks

11/19/2019
by   Ao Ren, et al.
0

The rapidly growing parameter volume of deep neural networks (DNNs) hinders the artificial intelligence applications on resource constrained devices, such as mobile and wearable devices. Neural network pruning, as one of the mainstream model compression techniques, is under extensive study to reduce the number of parameters and computations. In contrast to irregular pruning that incurs high index storage and decoding overhead, structured pruning techniques have been proposed as the promising solutions. However, prior studies on structured pruning tackle the problem mainly from the perspective of facilitating hardware implementation, without analyzing the characteristics of sparse neural networks. The neglect on the study of sparse neural networks causes inefficient trade-off between regularity and pruning ratio. Consequently, the potential of structurally pruning neural networks is not sufficiently mined. In this work, we examine the structural characteristics of the irregularly pruned weight matrices, such as the diverse redundancy of different rows, the sensitivity of different rows to pruning, and the positional characteristics of retained weights. By leveraging the gained insights as a guidance, we first propose the novel block-max weight masking (BMWM) method, which can effectively retain the salient weights while imposing high regularity to the weight matrix. As a further optimization, we propose a density-adaptive regular-block (DARB) pruning that outperforms prior structured pruning work with high pruning ratio and decoding efficiency. Our experimental results show that  can achieve 13× to 25× pruning ratio, which are 2.8× to 4.3× improvements than the state-of-the-art counterparts on multiple neural network models and tasks. Moreover,  can achieve 14.3× decoding efficiency than block pruning with higher pruning ratio.

READ FULL TEXT
research
05/24/2019

Structured Compression by Unstructured Pruning for Sparse Quantized Neural Networks

Model compression techniques, such as pruning and quantization, are beco...
research
02/21/2023

Structured Bayesian Compression for Deep Neural Networks Based on The Turbo-VBI Approach

With the growth of neural network size, model compression has attracted ...
research
06/19/2019

Joint Pruning on Activations and Weights for Efficient Neural Networks

With rapidly scaling up of deep neural networks (DNNs), extensive resear...
research
05/03/2022

Compact Neural Networks via Stacking Designed Basic Units

Unstructured pruning has the limitation of dealing with the sparse and i...
research
06/15/2021

Efficient Micro-Structured Weight Unification and Pruning for Neural Network Compression

Compressing Deep Neural Network (DNN) models to alleviate the storage an...
research
10/28/2021

RGP: Neural Network Pruning through Its Regular Graph Structure

Lightweight model design has become an important direction in the applic...
research
07/14/2023

Structured Pruning of Neural Networks for Constraints Learning

In recent years, the integration of Machine Learning (ML) models with Op...

Please sign up or login with your details

Forgot password? Click here to reset