Self-Attention with Relative Position Representations

03/06/2018
by   Peter Shaw, et al.
0

Relying entirely on an attention mechanism, the Transformer introduced by Vaswani et al. (2017) achieves state-of-the-art results for machine translation. In contrast to recurrent and convolutional neural networks, it does not explicitly model relative or absolute position information in its structure. Instead, it requires adding representations of absolute positions to its inputs. In this work we present an alternative approach, extending the self-attention mechanism to efficiently consider representations of the relative positions, or distances between sequence elements. On the WMT 2014 English-to-German and English-to-French translation tasks, this approach yields improvements of 1.3 BLEU and 0.3 BLEU over absolute position representations, respectively. Notably, we observe that combining relative and absolute position representations yields no further improvement in translation quality. We describe an efficient implementation of our method and cast it as an instance of relation-aware self-attention mechanisms that can generalize to arbitrary graph-labeled inputs.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/01/2019

Self-Attention with Structural Position Representations

Although self-attention networks (SANs) have advanced the state-of-the-a...
research
11/06/2017

Weighted Transformer Network for Machine Translation

State-of-the-art results on neural machine translation often use attenti...
research
07/27/2021

PiSLTRc: Position-informed Sign Language Transformer with Content-aware Convolution

Since the superiority of Transformer in learning long-term dependency, t...
research
08/27/2019

Multiresolution Transformer Networks: Recurrence is Not Essential for Modeling Hierarchical Structure

The architecture of Transformer is based entirely on self-attention, and...
research
09/12/2018

An Improved Relative Self-Attention Mechanism for Transformer with Application to Music Generation

Music relies heavily on self-reference to build structure and meaning. W...
research
07/09/2018

Position-aware Self-attention with Relative Positional Encodings for Slot Filling

This paper describes how to apply self-attention with relative positiona...
research
05/17/2023

Deep Multiple Instance Learning with Distance-Aware Self-Attention

Traditional supervised learning tasks require a label for every instance...

Please sign up or login with your details

Forgot password? Click here to reset