Doping: A technique for efficient compression of LSTM models using sparse structured additive matrices

02/14/2021
by   Urmish Thakker, et al.
0

Structured matrices, such as those derived from Kronecker products (KP), are effective at compressing neural networks, but can lead to unacceptable accuracy loss when applied to large models. In this paper, we propose the notion of doping – addition of an extremely sparse matrix to a structured matrix. Doping facilitates additional degrees of freedom for a small number of parameters, allowing them to independently diverge from the fixed structure. To train LSTMs with doped structured matrices, we introduce the additional parameter matrix while slowly annealing its sparsity level. However, we find that performance degrades as we slowly sparsify the doping matrix, due to co-matrix adaptation (CMA) between the structured and the sparse matrices. We address this over dependence on the sparse matrix using a co-matrix dropout regularization (CMR) scheme. We provide empirical evidence to show that doping, CMA and CMR are concepts generally applicable to multiple structured matrices (Kronecker Product, LMF, Hybrid Matrix Decomposition). Additionally, results with doped kronecker product matrices demonstrate state-of-the-art accuracy at large compression factors (10 - 25x) across 4 natural language processing applications with minor loss in accuracy. Doped KP compression technique outperforms previous state-of-the art compression results by achieving 1.3 - 2.4x higher compression factor at a similar accuracy, while also beating strong alternatives like pruning and low-rank methods by a large margin (8 Additionally, we show that doped KP can be deployed on commodity hardware using the current software stack and achieve 2.5 - 5.5x inference run-time speed-up over baseline.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/24/2020

Compressing Language Models using Doped Kronecker Products

Kronecker Products (KP) have been used to compress IoT RNN Applications ...
research
10/06/2020

Rank and run-time aware compression of NLP Applications

Sequence model based NLP applications can be large. Yet, many applicatio...
research
06/20/2023

LoSparse: Structured Compression of Large Language Models based on Low-Rank and Sparse Approximation

Transformer models have achieved remarkable results in various natural l...
research
05/03/2021

Structured Matrix Approximations via Tensor Decompositions

We provide a computational framework for approximating a class of struct...
research
10/27/2020

Matrix compression along isogenic blocks

A matrix-compression algorithm is derived from a novel isogenic block de...
research
03/02/2021

Task-parallel in-situ temporal compression of large-scale computational fluid dynamics data

Present day computational fluid dynamics simulations generate extremely ...
research
12/29/2020

Kaleidoscope: An Efficient, Learnable Representation For All Structured Linear Maps

Modern neural network architectures use structured linear transformation...

Please sign up or login with your details

Forgot password? Click here to reset