Assessing the Ability of Self-Attention Networks to Learn Word Order

06/03/2019
by   Baosong Yang, et al.
0

Self-attention networks (SAN) have attracted a lot of interests due to their high parallelization and strong performance on a variety of NLP tasks, e.g. machine translation. Due to the lack of recurrence structure such as recurrent neural networks (RNN), SAN is ascribed to be weak at learning positional information of words for sequence modeling. However, neither this speculation has been empirically confirmed, nor explanations for their strong performances on machine translation tasks when "lacking positional information" have been explored. To this end, we propose a novel word reordering detection task to quantify how well the word order information learned by SAN and RNN. Specifically, we randomly move one word to another position, and examine whether a trained model can detect both the original and inserted positions. Experimental results reveal that: 1) SAN trained on word reordering detection indeed has difficulty learning the positional information even with the position embedding; and 2) SAN trained on machine translation learns better positional information than its RNN counterpart, in which position embedding plays a critical role. Although recurrence structure make the model more universally-effective on learning word order, learning objectives matter more in the downstream tasks such as machine translation.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/05/2019

Accelerating Transformer Decoding via a Hybrid of Self-attention and Recurrent Neural Network

Due to the highly parallelizable architecture, Transformer is faster to ...
research
05/03/2020

How Does Selective Mechanism Improve Self-Attention Networks?

Self-attention networks (SANs) with selective mechanism has produced sub...
research
08/27/2018

Why Self-Attention? A Targeted Evaluation of Neural Machine Translation Architectures

Recently, non-recurrent architectures (convolutional, self-attentional) ...
research
04/28/2020

Self-Attention with Cross-Lingual Position Representation

Position encoding (PE), an essential part of self-attention networks (SA...
research
09/01/2019

Self-Attention with Structural Position Representations

Although self-attention networks (SANs) have advanced the state-of-the-a...
research
03/03/2020

Meta-Embeddings Based On Self-Attention

Creating meta-embeddings for better performance in language modelling ha...
research
03/22/2017

Classification-based RNN machine translation using GRUs

We report the results of our classification-based machine translation mo...

Please sign up or login with your details

Forgot password? Click here to reset