AMOM: Adaptive Masking over Masking for Conditional Masked Language Model

03/13/2023
by   Yisheng Xiao, et al.
0

Transformer-based autoregressive (AR) methods have achieved appealing performance for varied sequence-to-sequence generation tasks, e.g., neural machine translation, summarization, and code generation, but suffer from low inference efficiency. To speed up the inference stage, many non-autoregressive (NAR) strategies have been proposed in the past few years. Among them, the conditional masked language model (CMLM) is one of the most versatile frameworks, as it can support many different sequence generation scenarios and achieve very competitive performance on these tasks. In this paper, we further introduce a simple yet effective adaptive masking over masking strategy to enhance the refinement capability of the decoder and make the encoder optimization easier. Experiments on 3 different tasks (neural machine translation, summarization, and code generation) with 15 datasets in total confirm that our proposed simple method achieves significant performance improvement over the strong CMLM model. Surprisingly, our proposed model yields state-of-the-art performance on neural machine translation (34.62 BLEU on WMT16 EN→RO, 34.82 BLEU on WMT16 RO→EN, and 34.84 BLEU on IWSLT De→En) and even better performance than the AR Transformer on 7 benchmark datasets with at least 2.2× speedup. Our code is available at GitHub.

READ FULL TEXT
research
08/18/2020

Glancing Transformer for Non-Autoregressive Neural Machine Translation

Non-autoregressive neural machine translation achieves remarkable infere...
research
12/22/2021

Diformer: Directional Transformer for Neural Machine Translation

Autoregressive (AR) and Non-autoregressive (NAR) models have their own s...
research
03/22/2019

Pre-trained Language Model Representations for Language Generation

Pre-trained language model representations have been successful in a wid...
research
10/05/2020

Inference Strategies for Machine Translation with Conditional Masking

Conditional masked language model (CMLM) training has proven successful ...
research
04/20/2022

A Survey on Non-Autoregressive Generation for Neural Machine Translation and Beyond

Non-autoregressive (NAR) generation, which is first proposed in neural m...
research
07/06/2020

Relevance Transformer: Generating Concise Code Snippets with Relevance Feedback

Tools capable of automatic code generation have the potential to augment...
research
05/02/2020

Improving Non-autoregressive Neural Machine Translation with Monolingual Data

Non-autoregressive (NAR) neural machine translation is usually done via ...

Please sign up or login with your details

Forgot password? Click here to reset