Low-Rank Prune-And-Factorize for Language Model Compression

06/25/2023
by   Siyu Ren, et al.
0

The components underpinning PLMs – large weight matrices – were shown to bear considerable redundancy. Matrix factorization, a well-established technique from matrix theory, has been utilized to reduce the number of parameters in PLM. However, it fails to retain satisfactory performance under moderate to high compression rate. In this paper, we identify the full-rankness of fine-tuned PLM as the fundamental bottleneck for the failure of matrix factorization and explore the use of network pruning to extract low-rank sparsity pattern desirable to matrix factorization. We find such low-rank sparsity pattern exclusively exists in models generated by first-order pruning, which motivates us to unite the two approaches and achieve more effective model compression. We further propose two techniques: sparsity-aware SVD and mixed-rank fine-tuning, which improve the initialization and training of the compression procedure, respectively. Experiments on GLUE and question-answering tasks show that the proposed method has superior compression-performance trade-off compared to existing approaches.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/30/2021

A Highly Effective Low-Rank Compression of Deep Neural Networks with Modified Beam-Search and Modified Stable Rank

Compression has emerged as one of the essential deep learning research t...
research
10/10/2019

Asymmetric Multiresolution Matrix Factorization

Multiresolution Matrix Factorization (MMF) was recently introduced as an...
research
06/20/2023

LoSparse: Structured Compression of Large Language Models based on Low-Rank and Sparse Approximation

Transformer models have achieved remarkable results in various natural l...
research
05/14/2019

Network Pruning for Low-Rank Binary Indexing

Pruning is an efficient model compression technique to remove redundancy...
research
05/02/2019

Compression of Acoustic Event Detection Models with Low-rank Matrix Factorization and Quantization Training

In this paper, we present a compression approach based on the combinatio...
research
02/20/2018

DeepThin: A Self-Compressing Library for Deep Neural Networks

As the industry deploys increasingly large and complex neural networks t...
research
08/27/2019

On the Effectiveness of Low-Rank Matrix Factorization for LSTM Model Compression

Despite their ubiquity in NLP tasks, Long Short-Term Memory (LSTM) netwo...

Please sign up or login with your details

Forgot password? Click here to reset