Boosting Neural Machine Translation with Dependency-Scaled Self-Attention Network

11/23/2021
by   Ru Peng, et al.
0

The neural machine translation model assumes that syntax knowledge can be learned from the bilingual corpus via an attention network automatically. However, the attention network trained in weak supervision actually cannot capture the deep structure of the sentence. Naturally, we expect to introduce external syntax knowledge to guide the learning of the attention network. Thus, we propose a novel, parameter-free, dependency-scaled self-attention network, which integrates explicit syntactic dependencies into the attention network to dispel the dispersion of attention distribution. Finally, two knowledge sparse techniques are proposed to prevent the model from overfitting noisy syntactic dependencies. Experiments and extensive analyses on the IWSLT14 German-to-English and WMT16 German-to-English translation tasks validate the effectiveness of our approach.

READ FULL TEXT

page 3

page 5

research
09/06/2019

Improving Neural Machine Translation with Parent-Scaled Self-Attention

Most neural machine translation (NMT) models operate on source and targe...
research
10/31/2018

Convolutional Self-Attention Network

Self-attention network (SAN) has recently attracted increasing interest ...
research
05/22/2023

Syntactic Knowledge via Graph Attention with BERT in Machine Translation

Although the Transformer model can effectively acquire context features ...
research
12/27/2020

SG-Net: Syntax Guided Transformer for Language Representation

Understanding human language is one of the key themes of artificial inte...
research
10/24/2019

Promoting the Knowledge of Source Syntax in Transformer NMT Is Not Needed

The utility of linguistic annotation in neural machine translation seeme...
research
09/05/2019

Source Dependency-Aware Transformer with Supervised Self-Attention

Recently, Transformer has achieved the state-of-the-art performance on m...
research
05/22/2023

GATology for Linguistics: What Syntactic Dependencies It Knows

Graph Attention Network (GAT) is a graph neural network which is one of ...

Please sign up or login with your details

Forgot password? Click here to reset