Rank and run-time aware compression of NLP Applications

10/06/2020
by   Urmish Thakker, et al.
0

Sequence model based NLP applications can be large. Yet, many applications that benefit from them run on small devices with very limited compute and storage capabilities, while still having run-time constraints. As a result, there is a need for a compression technique that can achieve significant compression without negatively impacting inference run-time and task accuracy. This paper proposes a new compression technique called Hybrid Matrix Factorization that achieves this dual objective. HMF improves low-rank matrix factorization (LMF) techniques by doubling the rank of the matrix using an intelligent hybrid-structure leading to better accuracy than LMF. Further, by preserving dense matrices, it leads to faster inference run-time than pruning or structure matrix based compression technique. We evaluate the impact of this technique on 5 NLP benchmarks across multiple tasks (Translation, Intent Detection, Language Modeling) and show that for similar accuracy values and compression factors, HMF can achieve more than 2.32x faster inference run-time than pruning and 16.77

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/12/2019

Run-Time Efficient RNN Compression for Inference on Edge Devices

Recurrent neural networks can be large and compute-intensive, yet many a...
research
07/19/2018

Hybrid scene Compression for Visual Localization

Localizing an image wrt. a large scale 3D scene represents a core task f...
research
02/14/2021

Doping: A technique for efficient compression of LSTM models using sparse structured additive matrices

Structured matrices, such as those derived from Kronecker products (KP),...
research
07/12/2023

DeepMapping: The Case for Learned Data Mapping for Compression and Efficient Query Processing

Storing tabular data in a way that balances storage and query efficienci...
research
08/19/2019

Fast End-to-End Wikification

Wikification of large corpora is beneficial for various NLP applications...
research
05/16/2016

Geometry Aware Mappings for High Dimensional Sparse Factors

While matrix factorisation models are ubiquitous in large scale recommen...
research
05/02/2019

Compression of Acoustic Event Detection Models with Low-rank Matrix Factorization and Quantization Training

In this paper, we present a compression approach based on the combinatio...

Please sign up or login with your details

Forgot password? Click here to reset