Star-Transformer

02/25/2019
by   Qipeng Guo, et al.
0

Although the fully-connected attention-based model Transformer has achieved great successes on many NLP tasks, it has heavy structure and usually requires large training data. In this paper, we present the Star-Transformer, an alternative and light-weighted model of the Transformer. To reduce the model complexity, we replace the fully-connected structure with a star-shaped structure, in which every two non-adjacent nodes are connected through a shared relay node. Thus, the Star-Transformer has lower complexity than the standard Transformer (from quadratic to linear according to the input length) and preserves the ability to handle with the long-range dependencies. The experiments on four tasks (22 datasets) show the Star-Transformer achieved significant improvements against the standard Transformer for the modestly sized datasets.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/07/2019

Analyzing the Structure of Attention in a Transformer Language Model

The Transformer is a fully attention-based alternative to recurrent netw...
research
06/08/2023

RRWKV: Capturing Long-range Dependencies in RWKV

Owing to the impressive dot-product attention, the Transformers have bee...
research
01/29/2023

Exploring Attention Map Reuse for Efficient Transformer Neural Networks

Transformer-based deep neural networks have achieved great success in va...
research
09/20/2022

Dynamic Graph Message Passing Networks for Visual Recognition

Modelling long-range dependencies is critical for scene understanding ta...
research
10/15/2022

Machine-Learning Love: classifying the equation of state of neutron stars with Transformers

The use of the Audio Spectrogram Transformer (AST) model for gravitation...
research
07/25/2021

H-Transformer-1D: Fast One-Dimensional Hierarchical Attention for Sequences

We describe an efficient hierarchical method to compute attention in the...
research
02/14/2020

Transformer on a Diet

Transformer has been widely used thanks to its ability to capture sequen...

Please sign up or login with your details

Forgot password? Click here to reset