Why Self-Attention? A Targeted Evaluation of Neural Machine Translation Architectures

08/27/2018
by   Gongbo Tang, et al.
0

Recently, non-recurrent architectures (convolutional, self-attentional) have outperformed RNNs in neural machine translation. CNNs and self-attentional networks can connect distant words via shorter network paths than RNNs, and it has been speculated that this improves their ability to model long-range dependencies. However, this theoretical argument has not been tested empirically, nor have alternative explanations for their strong performance been explored in-depth. We hypothesize that the strong performance of CNNs and self-attentional networks could also be due to their ability to extract semantic features from the source text, and we evaluate RNNs, CNNs and self-attention networks on two tasks: subject-verb agreement (where capturing long-range dependencies is required) and word sense disambiguation (where semantic feature extraction is required). Our experimental results show that: 1) self-attentional networks and CNNs do not outperform RNNs in modeling subject-verb agreement over long distances; 2) self-attentional networks perform distinctly better than RNNs and CNNs on word sense disambiguation.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/21/2018

Dense Recurrent Neural Networks for Scene Labeling

Recently recurrent neural networks (RNNs) have demonstrated the ability ...
research
06/03/2019

Assessing the Ability of Self-Attention Networks to Learn Word Order

Self-attention networks (SAN) have attracted a lot of interests due to t...
research
09/04/2019

Towards Better Modeling Hierarchical Structure for Self-Attention with Ordered Neurons

Recent studies have shown that a hybrid of self-attention networks (SANs...
research
10/13/2019

T-GSA: Transformer with Gaussian-weighted self-attention for speech enhancement

Transformer neural networks (TNN) demonstrated state-of-art performance ...
research
10/24/2018

Modeling Localness for Self-Attention Networks

Self-attention networks have proven to be of profound value for its stre...
research
10/30/2018

Simplifying Neural Machine Translation with Addition-Subtraction Twin-Gated Recurrent Networks

In this paper, we propose an additionsubtraction twin-gated recurrent ne...
research
12/29/2020

Multiple Structural Priors Guided Self Attention Network for Language Understanding

Self attention networks (SANs) have been widely utilized in recent NLP s...

Please sign up or login with your details

Forgot password? Click here to reset