Explicit Sparse Transformer: Concentrated Attention Through Explicit Selection

12/25/2019
by   Guangxiang Zhao, et al.
0

Self-attention based Transformer has demonstrated the state-of-the-art performances in a number of natural language processing tasks. Self-attention is able to model long-term dependencies, but it may suffer from the extraction of irrelevant information in the context. To tackle the problem, we propose a novel model called Explicit Sparse Transformer. Explicit Sparse Transformer is able to improve the concentration of attention on the global context through an explicit selection of the most relevant segments. Extensive experimental results on a series of natural language processing and computer vision tasks, including neural machine translation, image captioning, and language modeling, all demonstrate the advantages of Explicit Sparse Transformer in model performance. We also show that our proposed sparse attention method achieves comparable or better results than the previous sparse attention method, but significantly reduces training and testing time. For example, the inference speed is twice that of sparsemax in Transformer model. Code will be available at <https://github.com/lancopku/Explicit-Sparse-Transformer>

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/24/2019

A Tensorized Transformer for Language Modeling

Latest development of neural models has connected the encoder and decode...
research
11/11/2019

BP-Transformer: Modelling Long-Range Context via Binary Partitioning

The Transformer model is widely successful on many natural language proc...
research
06/15/2021

PairConnect: A Compute-Efficient MLP Alternative to Attention

Transformer models have demonstrated superior performance in natural lan...
research
03/06/2023

Neighborhood Contrastive Transformer for Change Captioning

Change captioning is to describe the semantic change between a pair of s...
research
07/19/2023

Improving Domain Generalization for Sound Classification with Sparse Frequency-Regularized Transformer

Sound classification models' performance suffers from generalizing on ou...
research
09/14/2020

Contrastive Triple Extraction with Generative Transformer

Triple extraction is an essential task in information extraction for nat...
research
12/16/2022

Convolution-enhanced Evolving Attention Networks

Attention-based neural networks, such as Transformers, have become ubiqu...

Please sign up or login with your details

Forgot password? Click here to reset