Transformer on a Diet

02/14/2020
by   Chenguang Wang, et al.
0

Transformer has been widely used thanks to its ability to capture sequence information in an efficient way. However, recent developments, such as BERT and GPT-2, deliver only heavy architectures with a focus on effectiveness. In this paper, we explore three carefully-designed light Transformer architectures to figure out whether the Transformer with less computations could produce competitive results. Experimental results on language model benchmark datasets hint that such trade-off is promising, and the light Transformer reduces 70 parameters at best, while obtains competitive perplexity compared to standard Transformer. The source code is publicly available.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/20/2019

Language Models with Transformers

The Transformer architecture is superior to RNN-based models in computat...
research
04/16/2022

Towards Lightweight Transformer via Group-wise Transformation for Vision-and-Language Tasks

Despite the exciting performance, Transformer is criticized for its exce...
research
08/01/2023

CodeBPE: Investigating Subtokenization Options for Large Language Model Pretraining on Source Code

Recent works have widely adopted large language model pretraining for so...
research
04/22/2020

Keyphrase Prediction With Pre-trained Language Model

Recently, generative methods have been widely used in keyphrase predicti...
research
09/20/2022

Relaxed Attention for Transformer Models

The powerful modeling capabilities of all-attention-based transformer ar...
research
08/07/2023

Detecting Spells in Fantasy Literature with a Transformer Based Artificial Intelligence

Transformer architectures and models have made significant progress in l...
research
02/25/2019

Star-Transformer

Although the fully-connected attention-based model Transformer has achie...

Please sign up or login with your details

Forgot password? Click here to reset