Rejuvenating Low-Frequency Words: Making the Most of Parallel Data in Non-Autoregressive Translation

06/02/2021
by   Liang Ding, et al.
0

Knowledge distillation (KD) is commonly used to construct synthetic data for training non-autoregressive translation (NAT) models. However, there exists a discrepancy on low-frequency words between the distilled and the original data, leading to more errors on predicting low-frequency words. To alleviate the problem, we directly expose the raw data into NAT by leveraging pretraining. By analyzing directed alignments, we found that KD makes low-frequency source words aligned with targets more deterministically but fails to align sufficient low-frequency words from target to source. Accordingly, we propose reverse KD to rejuvenate more alignments for low-frequency target words. To make the most of authentic and synthetic data, we combine these complementary approaches as a new training strategy for further boosting NAT performance. We conduct experiments on five translation benchmarks over two advanced architectures. Results demonstrate that the proposed approach can significantly and universally improve translation quality by reducing translation errors on low-frequency words. Encouragingly, our approach achieves 28.2 and 33.9 BLEU points on the WMT14 English-German and WMT16 Romanian-English datasets, respectively. Our code, data, and trained models are available at <https://github.com/longyuewangdcu/RLFW-NAT>.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/29/2020

Understanding and Improving Lexical Choice in Non-Autoregressive Translation

Knowledge distillation (KD) is essential for training non-autoregressive...
research
08/27/2018

Back-Translation Sampling by Targeting Difficult Words in Neural Machine Translation

Neural Machine Translation has achieved state-of-the-art performance for...
research
10/24/2020

Multi-Task Learning with Shared Encoder for Non-Autoregressive Machine Translation

Non-Autoregressive machine Translation (NAT) models have demonstrated si...
research
07/17/2023

Improving End-to-End Speech Translation by Imitation-Based Knowledge Distillation with Synthetic Transcripts

End-to-end automatic speech translation (AST) relies on data that combin...
research
04/28/2022

Neighbors Are Not Strangers: Improving Non-Autoregressive Translation under Low-Frequency Lexical Constraints

However, current autoregressive approaches suffer from high latency. In ...
research
05/27/2021

How Does Distilled Data Complexity Impact the Quality and Confidence of Non-Autoregressive Machine Translation?

While non-autoregressive (NAR) models are showing great promise for mach...
research
06/10/2021

Progressive Multi-Granularity Training for Non-Autoregressive Translation

Non-autoregressive translation (NAT) significantly accelerates the infer...

Please sign up or login with your details

Forgot password? Click here to reset