DeepAI AI Chat
Log In Sign Up

Star-Transformer

02/25/2019
by   Qipeng Guo, et al.
0

Although the fully-connected attention-based model Transformer has achieved great successes on many NLP tasks, it has heavy structure and usually requires large training data. In this paper, we present the Star-Transformer, an alternative and light-weighted model of the Transformer. To reduce the model complexity, we replace the fully-connected structure with a star-shaped structure, in which every two non-adjacent nodes are connected through a shared relay node. Thus, the Star-Transformer has lower complexity than the standard Transformer (from quadratic to linear according to the input length) and preserves the ability to handle with the long-range dependencies. The experiments on four tasks (22 datasets) show the Star-Transformer achieved significant improvements against the standard Transformer for the modestly sized datasets.

READ FULL TEXT

page 1

page 2

page 3

page 4

06/07/2019

Analyzing the Structure of Attention in a Transformer Language Model

The Transformer is a fully attention-based alternative to recurrent netw...
02/16/2022

The NLP Task Effectiveness of Long-Range Transformers

Transformer models cannot easily scale to long sequences due to their O(...
01/29/2023

Exploring Attention Map Reuse for Efficient Transformer Neural Networks

Transformer-based deep neural networks have achieved great success in va...
09/20/2022

Dynamic Graph Message Passing Networks for Visual Recognition

Modelling long-range dependencies is critical for scene understanding ta...
10/15/2022

Machine-Learning Love: classifying the equation of state of neutron stars with Transformers

The use of the Audio Spectrogram Transformer (AST) model for gravitation...
02/14/2020

Transformer on a Diet

Transformer has been widely used thanks to its ability to capture sequen...