Focus on the Target's Vocabulary: Masked Label Smoothing for Machine Translation

03/06/2022
by   Liang Chen, et al.
0

Label smoothing and vocabulary sharing are two widely used techniques in neural machine translation models. However, we argue that simply applying both techniques can be conflicting and even leads to sub-optimal performance. When allocating smoothed probability, original label smoothing treats the source-side words that would never appear in the target language equally to the real target-side words, which could bias the translation model. To address this issue, we propose Masked Label Smoothing (MLS), a new mechanism that masks the soft label probability of source-side words to zero. Simple yet effective, MLS manages to better integrate label smoothing with vocabulary sharing. Our extensive experiments show that MLS consistently yields improvement over original label smoothing on different datasets, including bilingual and multilingual translation from both translation quality and model's calibration. Our code is released at https://github.com/PKUnlp-icler/MLS

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/18/2023

On the Off-Target Problem of Zero-Shot Multilingual Neural Machine Translation

While multilingual neural machine translation has achieved great success...
research
12/24/2020

Why Neural Machine Translation Prefers Empty Outputs

We investigate why neural machine translation (NMT) systems assign high ...
research
01/06/2019

A Comparative Study on Vocabulary Reduction for Phrase Table Smoothing

This work systematically analyzes the smoothing effect of vocabulary red...
research
12/08/2022

DC-MBR: Distributional Cooling for Minimum Bayesian Risk Decoding

Minimum Bayesian Risk Decoding (MBR) emerges as a promising decoding alg...
research
02/21/2023

Efficient CTC Regularization via Coarse Labels for End-to-End Speech Translation

For end-to-end speech translation, regularizing the encoder with the Con...
research
05/08/2023

LABO: Towards Learning Optimal Label Regularization via Bi-level Optimization

Regularization techniques are crucial to improving the generalization pe...
research
05/30/2021

Diversifying Dialog Generation via Adaptive Label Smoothing

Neural dialogue generation models trained with the one-hot target distri...

Please sign up or login with your details

Forgot password? Click here to reset