Blockwise Compression of Transformer-based Models without Retraining

04/04/2023
by   Gaochen Dong, et al.
0

Transformer-based models, represented by GPT-3, ChatGPT, and GPT-4, have recently attracted increasing interest, research enthusiasm, and business demand. However, their massive computation resources and huge memory footprint are inevitable challenges. To tackle this issue, we propose BCT, a framework of blockwise compression for transformers without retraining, to lower deployment thresholds. BCT achieves more fine-grained compression of the whole transformer, including embedding, matrix multiplication, GELU, Softmax, layer normalization, and all the intermediate results. As a case, we compress an efficient model with BCT and evaluate it on several General Language Understanding Evaluation (GLUE) datasets. The results show that BCT can achieve a less than 0.90

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/16/2023

Block-wise Bit-Compression of Transformer-based Models

With the popularity of the recent Transformer-based models represented b...
research
07/19/2022

Benchmarking Transformers-based models on French Spoken Language Understanding tasks

In the last five years, the rise of the self-attentional Transformer-bas...
research
06/15/2021

Direction is what you need: Improving Word Embedding Compression in Large Language Models

The adoption of Transformer-based models in natural language processing ...
research
03/04/2021

Hardware Acceleration of Fully Quantized BERT for Efficient Natural Language Processing

BERT is the most recent Transformer-based model that achieves state-of-t...
research
05/27/2022

X-ViT: High Performance Linear Vision Transformer without Softmax

Vision transformers have become one of the most important models for com...
research
02/17/2022

Revisiting Over-smoothing in BERT from the Perspective of Graph

Recently over-smoothing phenomenon of Transformer-based models is observ...
research
06/15/2022

VCT: A Video Compression Transformer

We show how transformers can be used to vastly simplify neural video com...

Please sign up or login with your details

Forgot password? Click here to reset