DeepThin: A Self-Compressing Library for Deep Neural Networks

02/20/2018
by   Matthew Sotoudeh, et al.
0

As the industry deploys increasingly large and complex neural networks to mobile devices, more pressure is put on the memory and compute resources of those devices. Deep compression, or compression of deep neural network weight matrices, is a technique to stretch resources for such scenarios. Existing compression methods cannot effectively compress models smaller than 1-2 their original size. We develop a new compression technique, DeepThin, building on existing research in the area of low rank factorization. We identify and break artificial constraints imposed by low rank approximations by combining rank factorization with a reshaping process that adds nonlinearity to the approximation function. We deploy DeepThin as a plug-gable library integrated with TensorFlow that enables users to seamlessly compress models at different granularities. We evaluate DeepThin on two state-of-the-art acoustic models, TFKaldi and DeepSpeech, comparing it to previous compression work (Pruning, HashNet, and Rank Factorization), empirical limit study approaches, and hand-tuned models. For TFKaldi, our DeepThin networks show better word error rates (WER) than competing methods at practically all tested compression rates, achieving an average of 60 over pruning, 23 computationally expensive HashedNets. For DeepSpeech, DeepThin-compressed networks achieve better test loss than all other compression methods, reaching a 28 better than hand-tuned same-size networks, and 12 DeepThin also provide inference performance benefits ranging from 2X to 14X speedups, depending on the compression ratio and platform cache sizes.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/30/2021

A Highly Effective Low-Rank Compression of Deep Neural Networks with Modified Beam-Search and Modified Stable Rank

Compression has emerged as one of the essential deep learning research t...
research
12/06/2018

Trained Rank Pruning for Efficient Deep Neural Networks

The performance of Deep Neural Networks (DNNs) keeps elevating in recent...
research
08/01/2018

SlimNets: An Exploration of Deep Model Compression and Acceleration

Deep neural networks have achieved increasingly accurate results on a wi...
research
06/25/2023

Low-Rank Prune-And-Factorize for Language Model Compression

The components underpinning PLMs – large weight matrices – were shown to...
research
05/14/2019

Network Pruning for Low-Rank Binary Indexing

Pruning is an efficient model compression technique to remove redundancy...
research
09/23/2019

Class-dependent Compression of Deep Neural Networks

Today's deep neural networks require substantial computation resources f...
research
10/06/2015

Structured Transforms for Small-Footprint Deep Learning

We consider the task of building compact deep learning pipelines suitabl...

Please sign up or login with your details

Forgot password? Click here to reset