Source Dependency-Aware Transformer with Supervised Self-Attention

09/05/2019
by   Chengyi Wang, et al.
0

Recently, Transformer has achieved the state-of-the-art performance on many machine translation tasks. However, without syntax knowledge explicitly considered in the encoder, incorrect context information that violates the syntax structure may be integrated into source hidden states, leading to erroneous translations. In this paper, we propose a novel method to incorporate source dependencies into the Transformer. Specifically, we adopt the source dependency tree and define two matrices to represent the dependency relations. Based on the matrices, two heads in the multi-head self-attention module are trained in a supervised manner and two extra cross entropy losses are introduced into the training objective function. Under this training objective, the model is trained to learn the source dependency relations directly. Without requiring pre-parsed input during inference, our model can generate better translations with the dependency-aware context information. Experiments on bi-directional Chinese-to-English, English-to-Japanese and English-to-German translation tasks show that our proposed method can significantly improve the Transformer baseline.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/06/2019

Improving Neural Machine Translation with Parent-Scaled Self-Attention

Most neural machine translation (NMT) models operate on source and targe...
research
10/24/2019

Promoting the Knowledge of Source Syntax in Transformer NMT Is Not Needed

The utility of linguistic annotation in neural machine translation seeme...
research
12/23/2020

Future-Guided Incremental Transformer for Simultaneous Translation

Simultaneous translation (ST) starts translations synchronously while re...
research
03/11/2022

Integrating Dependency Tree Into Self-attention for Sentence Representation

Recent progress on parse tree encoder for sentence representation learni...
research
11/23/2021

Boosting Neural Machine Translation with Dependency-Scaled Self-Attention Network

The neural machine translation model assumes that syntax knowledge can b...
research
12/25/2021

Combining Improvements for Exploiting Dependency Trees in Neural Semantic Parsing

The dependency tree of a natural language sentence can capture the inter...
research
01/16/2021

To Understand Representation of Layer-aware Sequence Encoders as Multi-order-graph

In this paper, we propose a unified explanation of representation for la...

Please sign up or login with your details

Forgot password? Click here to reset