Promoting the Knowledge of Source Syntax in Transformer NMT Is Not Needed

10/24/2019
by   Thuong-Hai Pham, et al.
0

The utility of linguistic annotation in neural machine translation seemed to had been established in past papers. The experiments were however limited to recurrent sequence-to-sequence architectures and relatively small data settings. We focus on the state-of-the-art Transformer model and use comparably larger corpora. Specifically, we try to promote the knowledge of source-side syntax using multi-task learning either through simple data manipulation techniques or through a dedicated model component. In particular, we train one of Transformer attention heads to produce source-side dependency tree. Overall, our results cast some doubt on the utility of multi-task setups with linguistic information. The data manipulation techniques, recommended in previous works, prove ineffective in large data settings. The treatment of self-attention as dependencies seems much more promising: it helps in translation and reveals that Transformer model can very easily grasp the syntactic structure. An important but curious result is, however, that identical gains are obtained by using trivial "linear trees" instead of true dependencies. The reason for the gain thus may not be coming from the added linguistic knowledge but from some simpler regularizing effect we induced on self-attention matrices.

READ FULL TEXT

page 6

page 11

page 12

research
09/05/2019

Source Dependency-Aware Transformer with Supervised Self-Attention

Recently, Transformer has achieved the state-of-the-art performance on m...
research
09/11/2021

HYDRA – Hyper Dependency Representation Attentions

Attention is all we need as long as we have enough data. Even so, it is ...
research
11/23/2021

Boosting Neural Machine Translation with Dependency-Scaled Self-Attention Network

The neural machine translation model assumes that syntax knowledge can b...
research
08/19/2019

Recurrent Graph Syntax Encoder for Neural Machine Translation

Syntax-incorporated machine translation models have been proven successf...
research
04/23/2018

Linguistically-Informed Self-Attention for Semantic Role Labeling

The current state-of-the-art end-to-end semantic role labeling (SRL) mod...
research
06/05/2019

From Balustrades to Pierre Vinken: Looking for Syntax in Transformer Self-Attentions

We inspect the multi-head self-attention in Transformer NMT encoders for...

Please sign up or login with your details

Forgot password? Click here to reset