Language model compression with weighted low-rank factorization

06/30/2022
by   Yen-Chang Hsu, et al.
0

Factorizing a large matrix into small matrices is a popular strategy for model compression. Singular value decomposition (SVD) plays a vital role in this compression strategy, approximating a learned matrix with fewer parameters. However, SVD minimizes the squared error toward reconstructing the original matrix without gauging the importance of the parameters, potentially giving a larger reconstruction error for those who affect the task accuracy more. In other words, the optimization objective of SVD is not aligned with the trained model's task accuracy. We analyze this previously unexplored problem, make observations, and address it by introducing Fisher information to weigh the importance of parameters affecting the model prediction. This idea leads to our method: Fisher-Weighted SVD (FWSVD). Although the factorized matrices from our approach do not result in smaller reconstruction errors, we find that our resulting task accuracy is much closer to the original model's performance. We perform analysis with the transformer-based language models, showing our weighted SVD largely alleviates the mismatched optimization objectives and can maintain model performance with a higher compression rate. Our method can directly compress a task-specific model while achieving better performance than other compact model strategies requiring expensive model pre-training. Moreover, the evaluation of compressing an already compact model shows our method can further reduce 9 task accuracy.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/06/2022

Information Compression and Performance Evaluation of Tic-Tac-Toe's Evaluation Function Using Singular Value Decomposition

We approximated the evaluation function for the game Tic-Tac-Toe by sing...
research
07/13/2017

Batched QR and SVD Algorithms on GPUs with Applications in Hierarchical Matrix Compression

We present high performance implementations of the QR and the singular v...
research
08/15/2023

Ternary Singular Value Decomposition as a Better Parameterized Form in Linear Mapping

We present a simple yet novel parameterized form of linear mapping to ac...
research
09/03/2020

Compression-aware Continual Learning using Singular Value Decomposition

We propose a compression based continual task learning method that can d...
research
10/08/2020

Deep Learning Meets Projective Clustering

A common approach for compressing NLP networks is to encode the embeddin...
research
12/07/2021

Enhancing the SVD Compression

Orthonormality is the foundation of matrix decomposition. For example, S...
research
10/13/2021

Comparison of SVD and factorized TDNN approaches for speech to text

This work concentrates on reducing the RTF and word error rate of a hybr...

Please sign up or login with your details

Forgot password? Click here to reset