DeepAI AI Chat
Log In Sign Up


by   Qipeng Guo, et al.

Although the fully-connected attention-based model Transformer has achieved great successes on many NLP tasks, it has heavy structure and usually requires large training data. In this paper, we present the Star-Transformer, an alternative and light-weighted model of the Transformer. To reduce the model complexity, we replace the fully-connected structure with a star-shaped structure, in which every two non-adjacent nodes are connected through a shared relay node. Thus, the Star-Transformer has lower complexity than the standard Transformer (from quadratic to linear according to the input length) and preserves the ability to handle with the long-range dependencies. The experiments on four tasks (22 datasets) show the Star-Transformer achieved significant improvements against the standard Transformer for the modestly sized datasets.


page 1

page 2

page 3

page 4


Analyzing the Structure of Attention in a Transformer Language Model

The Transformer is a fully attention-based alternative to recurrent netw...

The NLP Task Effectiveness of Long-Range Transformers

Transformer models cannot easily scale to long sequences due to their O(...

Exploring Attention Map Reuse for Efficient Transformer Neural Networks

Transformer-based deep neural networks have achieved great success in va...

Dynamic Graph Message Passing Networks for Visual Recognition

Modelling long-range dependencies is critical for scene understanding ta...

Machine-Learning Love: classifying the equation of state of neutron stars with Transformers

The use of the Audio Spectrogram Transformer (AST) model for gravitation...

Transformer on a Diet

Transformer has been widely used thanks to its ability to capture sequen...