TensorGPT: Efficient Compression of the Embedding Layer in LLMs based on the Tensor-Train Decomposition

07/02/2023
by   Mingxue Xu, et al.
0

High-dimensional token embeddings underpin Large Language Models (LLMs), as they can capture subtle semantic information and significantly enhance the modelling of complex language patterns. However, the associated high dimensionality also introduces considerable model parameters, and a prohibitively high model storage. To address this issue, this work proposes an approach based on the Tensor-Train Decomposition (TTD), where each token embedding is treated as a Matrix Product State (MPS) that can be efficiently computed in a distributed manner. The experimental results on GPT-2 demonstrate that, through our approach, the embedding layer can be compressed by a factor of up to 38.40 times, and when the compression factor is 3.31 times, even produced a better performance than the original GPT-2 model.

READ FULL TEXT
research
09/30/2021

Semi-tensor Product-based TensorDecomposition for Neural Network Compression

The existing tensor networks adopt conventional matrix product for conne...
research
09/02/2023

eDKM: An Efficient and Accurate Train-time Weight Clustering for Large Language Models

Since Large Language Models or LLMs have demonstrated high-quality perfo...
research
06/18/2018

GroupReduce: Block-Wise Low-Rank Approximation for Neural Language Model Shrinking

Model compression is essential for serving large deep neural nets on dev...
research
12/07/2021

Low-rank Tensor Decomposition for Compression of Convolutional Neural Networks Using Funnel Regularization

Tensor decomposition is one of the fundamental technique for model compr...
research
08/17/2023

Discrete Prompt Compression with Reinforcement Learning

Instruction-tuned Language Models (LMs) are widely used by users to addr...
research
08/27/2019

Correlation-based Initialization Algorithm for Tensor-based HSI Compression Methods

Tensor decomposition (TD) is widely used in hyperspectral image (HSI) co...
research
03/27/2023

Scaling Pre-trained Language Models to Deeper via Parameter-efficient Architecture

In this paper, we propose a highly parameter-efficient approach to scali...

Please sign up or login with your details

Forgot password? Click here to reset