Self-Attention with Cross-Lingual Position Representation

04/28/2020
by   Liang Ding, et al.
0

Position encoding (PE), an essential part of self-attention networks (SANs), is used to preserve the word order information for natural language processing tasks, generating fixed position indices for input sequences. However, in cross-lingual scenarios, machine translation, the PEs of source and target sentences are modeled independently. Due to word order divergences in different languages, modeling the cross-lingual positional relationships might help SANs tackle this problem. In this paper, we augment SANs with cross-lingual position representations to model the bilingually aware latent structure for the input sentence. Specifically, we utilize bracketing transduction grammar (BTG)-based reordering information to encourage SANs to learn bilingual diagonal alignments. Experimental results on WMT'14 EnglishGerman, WAT'17 JapaneseEnglish, and WMT'17 ChineseEnglish translation tasks demonstrate that our approach significantly and consistently improves translation quality over strong baselines. Extensive analyses confirm that the performance gains come from the cross-lingual information.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/30/2020

Do We Need Word Order Information for Cross-lingual Sequence Labeling

Most of the recent work in cross-lingual adaptation does not consider th...
research
10/09/2022

Cross-Align: Modeling Deep Cross-lingual Interactions for Word Alignment

Word alignment which aims to extract lexicon translation equivalents bet...
research
07/31/2020

On Learning Universal Representations Across Languages

Recent studies have demonstrated the overwhelming advantage of cross-lin...
research
08/18/2021

Contributions of Transformer Attention Heads in Multi- and Cross-lingual Tasks

This paper studies the relative importance of attention heads in Transfo...
research
05/16/2022

Towards Debiasing Translation Artifacts

Cross-lingual natural language processing relies on translation, either ...
research
06/03/2019

Assessing the Ability of Self-Attention Networks to Learn Word Order

Self-attention networks (SAN) have attracted a lot of interests due to t...
research
12/17/2020

The effectiveness of unsupervised subword modeling with autoregressive and cross-lingual phone-aware networks

This study addresses unsupervised subword modeling, i.e., learning acous...

Please sign up or login with your details

Forgot password? Click here to reset