MPDCompress - Matrix Permutation Decomposition Algorithm for Deep Neural Network Compression

05/30/2018
by   Lazar Supic, et al.
0

Deep neural networks (DNNs) have become the state-of-the-art technique for machine learning tasks in various applications. However, due to their size and the computational complexity, large DNNs are not readily deployable on edge devices in real-time. To manage complexity and accelerate computation, network compression techniques based on pruning and quantization have been proposed and shown to be effective in reducing network size. However, such network compression can result in irregular matrix structures that are mismatched with modern hardware-accelerated platforms, such as graphics processing units (GPUs) designed to perform the DNN matrix multiplications in a structured (block-based) way. We propose MPDCompress, a DNN compression algorithm based on matrix permutation decomposition via random mask generation. In-training application of the masks molds the synaptic weight connection matrix to a sub-graph separation format. Aided by the random permutations, a hardware-desirable block matrix is generated, allowing for a more efficient implementation and compression of the network. To show versatility, we empirically verify MPDCompress on several network models, compression rates, and image datasets. On the LeNet 300-100 model (MNIST dataset), Deep MNIST, and CIFAR10, we achieve 10 X network compression with less than 1 compared to non-compressed accuracy performance. On AlexNet for the full ImageNet ILSVRC-2012 dataset, we achieve 8 X network compression with less than 1 respectively. Finally, we observe that the algorithm can offer inference speedups across various hardware platforms, with 4 X faster operation achieved on several mobile GPUs.

READ FULL TEXT
research
08/29/2017

CirCNN: Accelerating and Compressing Deep Neural Networks Using Block-CirculantWeight Matrices

Large-scale deep neural networks (DNNs) are both compute and memory inte...
research
04/05/2020

Reducing Data Motion to Accelerate the Training of Deep Neural Networks

This paper reduces the cost of DNNs training by decreasing the amount of...
research
05/07/2020

SmartExchange: Trading Higher-cost Memory Storage/Access for Lower-cost Computation

We present SmartExchange, an algorithm-hardware co-design framework to t...
research
11/10/2020

Neural Network Compression Via Sparse Optimization

The compression of deep neural networks (DNNs) to reduce inference cost ...
research
08/28/2020

MCMIA: Model Compression Against Membership Inference Attack in Deep Neural Networks

Deep learning or deep neural networks (DNNs) have nowadays enabled high ...
research
11/09/2019

Hardware-aware Pruning of DNNs using LFSR-Generated Pseudo-Random Indices

Deep neural networks (DNNs) have been emerged as the state-of-the-art al...
research
06/02/2022

DepthShrinker: A New Compression Paradigm Towards Boosting Real-Hardware Efficiency of Compact Neural Networks

Efficient deep neural network (DNN) models equipped with compact operato...

Please sign up or login with your details

Forgot password? Click here to reset