Run-Time Efficient RNN Compression for Inference on Edge Devices

06/12/2019
by   Urmish Thakker, et al.
0

Recurrent neural networks can be large and compute-intensive, yet many applications that benefit from RNNs run on small devices with very limited compute and storage capabilities while still having run-time constraints. As a result, there is a need for compression techniques that can achieve significant compression without negatively impacting inference run-time and task accuracy. This paper explores a new compressed RNN cell implementation called Hybrid Matrix Decomposition (HMD) that achieves this dual objective. This scheme divides the weight matrix into two parts - an unconstrained upper half and a lower half composed of rank-1 blocks. This results in output features where the upper sub-vector has "richer" features while the lower-sub vector has "constrained" features". HMD can compress RNNs by a factor of 2-4x while having a faster run-time than pruning and retaining more model accuracy than matrix factorization. We evaluate this technique on 3 benchmarks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/06/2020

Rank and run-time aware compression of NLP Applications

Sequence model based NLP applications can be large. Yet, many applicatio...
research
10/04/2019

Pushing the limits of RNN Compression

Recurrent Neural Networks (RNN) can be difficult to deploy on resource c...
research
05/31/2018

A faster hafnian formula for complex matrices and its benchmarking on the Titan supercomputer

We introduce new and simple algorithms for the calculation of the number...
research
06/07/2019

Compressing RNNs for IoT devices by 15-38x using Kronecker Products

Recurrent Neural Networks (RNN) can be large and compute-intensive, maki...
research
07/12/2023

DeepMapping: The Case for Learned Data Mapping for Compression and Efficient Query Processing

Storing tabular data in a way that balances storage and query efficienci...
research
07/19/2018

Hybrid scene Compression for Visual Localization

Localizing an image wrt. a large scale 3D scene represents a core task f...
research
05/20/2020

TinyLSTMs: Efficient Neural Speech Enhancement for Hearing Aids

Modern speech enhancement algorithms achieve remarkable noise suppressio...

Please sign up or login with your details

Forgot password? Click here to reset