Bilingual Mutual Information Based Adaptive Training for Neural Machine Translation

05/26/2021
by   Yangyifan Xu, et al.
0

Recently, token-level adaptive training has achieved promising improvement in machine translation, where the cross-entropy loss function is adjusted by assigning different training weights to different tokens, in order to alleviate the token imbalance problem. However, previous approaches only use static word frequency information in the target language without considering the source language, which is insufficient for bilingual tasks like machine translation. In this paper, we propose a novel bilingual mutual information (BMI) based adaptive objective, which measures the learning difficulty for each target token from the perspective of bilingualism, and assigns an adaptive weight accordingly to improve token-level adaptive training. This method assigns larger training weights to tokens with higher BMI, so that easy tokens are updated with coarse granularity while difficult tokens are updated with fine granularity. Experimental results on WMT14 English-to-German and WMT19 Chinese-to-English demonstrate the superiority of our approach compared with the Transformer baseline and previous token-level adaptive training approaches. Further analyses confirm that our method can improve the lexical diversity.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/06/2022

Conditional Bilingual Mutual Information Based Adaptive Training for Neural Machine Translation

Token-level adaptive training approaches can alleviate the token imbalan...
research
10/09/2020

Token-level Adaptive Training for Neural Machine Translation

There exists a token imbalance phenomenon in natural language as differe...
research
11/13/2022

WR-ONE2SET: Towards Well-Calibrated Keyphrase Generation

Keyphrase generation aims to automatically generate short phrases summar...
research
04/24/2021

Modeling Coverage for Non-Autoregressive Neural Machine Translation

Non-Autoregressive Neural Machine Translation (NAT) has achieved signifi...
research
12/12/2018

Sentence-wise Smooth Regularization for Sequence to Sequence Learning

Maximum-likelihood estimation (MLE) is widely used in sequence to sequen...
research
09/03/2019

The Bottom-up Evolution of Representations in the Transformer: A Study with Machine Translation and Language Modeling Objectives

We seek to understand how the representations of individual tokens and t...
research
07/22/2021

Confidence-Aware Scheduled Sampling for Neural Machine Translation

Scheduled sampling is an effective method to alleviate the exposure bias...

Please sign up or login with your details

Forgot password? Click here to reset