Improving Non-autoregressive Translation Quality with Pretrained Language Model, Embedding Distillation and Upsampling Strategy for CTC

06/10/2023
by   Shen-sian Syu, et al.
0

Non-autoregressive approaches aim to improve the inference speed of translation models, particularly those that generate output in a one-pass forward manner. However, these approaches often suffer from a significant drop in translation quality compared to autoregressive models. This paper introduces a series of innovative techniques to enhance the translation quality of Non-Autoregressive Translation (NAT) models while maintaining a substantial acceleration in inference speed. We propose fine-tuning Pretrained Multilingual Language Models (PMLMs) with the CTC loss to train NAT models effectively. Furthermore, we adopt the MASK insertion scheme for up-sampling instead of token duplication, and we present an embedding distillation method to further enhance performance. In our experiments, our model outperforms the baseline autoregressive model (Transformer base) on multiple datasets, including WMT'14 DE↔EN, WMT'16 RO↔EN, and IWSLT'14 DE↔EN. Notably, our model achieves better performance than the baseline autoregressive model on the IWSLT'14 En↔De and WMT'16 En↔Ro datasets, even without using distillation data during training. It is worth highlighting that on the IWSLT'14 DE→EN dataset, our model achieves an impressive BLEU score of 39.59, setting a new state-of-the-art performance. Additionally, our model exhibits a remarkable speed improvement of 16.35 times compared to the autoregressive model.

READ FULL TEXT
research
11/07/2019

Understanding Knowledge Distillation in Non-autoregressive Machine Translation

Non-autoregressive machine translation (NAT) systems predict a sequence ...
research
05/21/2022

Non-Autoregressive Neural Machine Translation: A Call for Clarity

Non-autoregressive approaches aim to improve the inference speed of tran...
research
11/07/2017

Non-Autoregressive Neural Machine Translation

Existing approaches to neural machine translation condition each output ...
research
01/27/2023

Candidate Soups: Fusing Candidate Results Improves Translation Quality for Non-Autoregressive Translation

Non-autoregressive translation (NAT) model achieves a much faster infere...
research
11/20/2019

Fine-Tuning by Curriculum Learning for Non-Autoregressive Neural Machine Translation

Non-autoregressive translation (NAT) models remove the dependence on pre...
research
08/19/2021

MvSR-NAT: Multi-view Subset Regularization for Non-Autoregressive Machine Translation

Conditional masked language models (CMLM) have shown impressive progress...
research
10/24/2020

Unsupervised Paraphrase Generation via Dynamic Blocking

We propose Dynamic Blocking, a decoding algorithm which enables large-sc...

Please sign up or login with your details

Forgot password? Click here to reset