Progressive Multi-Granularity Training for Non-Autoregressive Translation

06/10/2021
by   Liang Ding, et al.
0

Non-autoregressive translation (NAT) significantly accelerates the inference process via predicting the entire target sequence. However, recent studies show that NAT is weak at learning high-mode of knowledge such as one-to-many translations. We argue that modes can be divided into various granularities which can be learned from easy to hard. In this study, we empirically show that NAT models are prone to learn fine-grained lower-mode knowledge, such as words and phrases, compared with sentences. Based on this observation, we propose progressive multi-granularity training for NAT. More specifically, to make the most of the training data, we break down the sentence-level examples into three types, i.e. words, phrases, sentences, and with the training goes, we progressively increase the granularities. Experiments on Romanian-English, English-German, Chinese-English, and Japanese-English demonstrate that our approach improves the phrase translation accuracy and model reordering ability, therefore resulting in better translation quality against strong NAT baselines. Also, we show that more deterministic fine-grained knowledge can further enhance performance.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/27/2019

Towards Recognizing Phrase Translation Processes: Experiments on English-French

When translating phrases (words or group of words), human translators, c...
research
09/05/2019

Multi-Granularity Self-Attention for Neural Machine Translation

Current state-of-the-art neural machine translation (NMT) uses a deep mu...
research
08/27/2020

AMBERT: A Pre-trained Language Model with Multi-Grained Tokenization

Pre-trained language models such as BERT have exhibited remarkable perfo...
research
10/08/2022

ngram-OAXE: Phrase-Based Order-Agnostic Cross Entropy for Non-Autoregressive Machine Translation

Recently, a new training oaxe loss has proven effective to ameliorate th...
research
01/06/2022

Phrase-level Adversarial Example Generation for Neural Machine Translation

While end-to-end neural machine translation (NMT) has achieved impressiv...
research
06/02/2021

Rejuvenating Low-Frequency Words: Making the Most of Parallel Data in Non-Autoregressive Translation

Knowledge distillation (KD) is commonly used to construct synthetic data...
research
11/15/2022

Hierarchical Phrase-based Sequence-to-Sequence Learning

We describe a neural transducer that maintains the flexibility of standa...

Please sign up or login with your details

Forgot password? Click here to reset