Network Pruning for Low-Rank Binary Indexing

05/14/2019
by   Dongsoo Lee, et al.
0

Pruning is an efficient model compression technique to remove redundancy in the connectivity of deep neural networks (DNNs). Computations using sparse matrices obtained by pruning parameters, however, exhibit vastly different parallelism depending on the index representation scheme. As a result, fine-grained pruning has not gained much attention due to its irregular index form leading to large memory footprint and low parallelism for convolutions and matrix multiplications. In this paper, we propose a new network pruning technique that generates a low-rank binary index matrix to compress index data while decompressing index data is performed by simple binary matrix multiplication. This proposed compression method finds a particular fine-grained pruning mask that can be decomposed into two binary matrices. We also propose a tile-based factorization technique that not only lowers memory requirements but also enhances compression ratio. Various DNN models can be pruned with much fewer indexes compared to previous sparse matrix formats while maintaining the same pruning rate.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/05/2021

Sequential Encryption of Sparse Neural Networks Toward Optimum Representation of Irregular Sparsity

Even though fine-grained pruning techniques achieve a high compression r...
research
06/25/2023

Low-Rank Prune-And-Factorize for Language Model Compression

The components underpinning PLMs – large weight matrices – were shown to...
research
02/11/2020

PCNN: Pattern-based Fine-Grained Regular Pruning towards Optimizing CNN Accelerators

Weight pruning is a powerful technique to realize model compression. We ...
research
06/20/2023

LoSparse: Structured Compression of Large Language Models based on Low-Rank and Sparse Approximation

Transformer models have achieved remarkable results in various natural l...
research
07/27/2020

ALF: Autoencoder-based Low-rank Filter-sharing for Efficient Convolutional Neural Networks

Closing the gap between the hardware requirements of state-of-the-art co...
research
05/26/2019

Prediction of Compression Index of Fine-Grained Soils Using a Gene Expression Programming Model

In construction projects, estimation of the settlement of fine-grained s...
research
02/20/2018

DeepThin: A Self-Compressing Library for Deep Neural Networks

As the industry deploys increasingly large and complex neural networks t...

Please sign up or login with your details

Forgot password? Click here to reset