Efficient GPT Model Pre-training using Tensor Train Matrix Representation

06/05/2023
by   Viktoriia Chekalina, et al.
0

Large-scale transformer models have shown remarkable performance in language modelling tasks. However, such models feature billions of parameters, leading to difficulties in their deployment and prohibitive training costs from scratch. To reduce the number of the parameters in the GPT-2 architecture, we replace the matrices of fully-connected layers with the corresponding Tensor Train Matrix (TTM) structure. Finally, we customize forward and backward operations through the TTM-based layer for simplicity and the stableness of further training. parameters, showing the perplexity comparable to the original model. On the downstream tasks, including language understanding and text summarization, the model performs similarly to the original GPT-2 model. The proposed tensorized layers could be used to efficiently pre-training other Transformer models.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/03/2021

An Empirical Study of Training End-to-End Vision-and-Language Transformers

Vision-and-language (VL) pre-training has proven to be highly effective ...
research
12/27/2021

ViR:the Vision Reservoir

The most recent year has witnessed the success of applying the Vision Tr...
research
09/10/2019

Accelerating Training using Tensor Decomposition

Tensor decomposition is one of the well-known approaches to reduce the l...
research
10/15/2021

Kronecker Decomposition for GPT Compression

GPT is an auto-regressive Transformer-based pre-trained language model w...
research
06/01/2023

Quantization-Aware and Tensor-Compressed Training of Transformers for Natural Language Understanding

Fine-tuned transformer models have shown superior performances in many n...
research
07/17/2023

BUS:Efficient and Effective Vision-language Pre-training with Bottom-Up Patch Summarization

Vision Transformer (ViT) based Vision-Language Pre-training (VLP) models...
research
02/17/2021

Beyond Fully-Connected Layers with Quaternions: Parameterization of Hypercomplex Multiplications with 1/n Parameters

Recent works have demonstrated reasonable success of representation lear...

Please sign up or login with your details

Forgot password? Click here to reset