Constant-Time Machine Translation with Conditional Masked Language Models

04/19/2019
by   Marjan Ghazvininejad, et al.
6

Most machine translation systems generate text autoregressively, by sequentially predicting tokens from left to right. We, instead, use a masked language modeling objective to train a model to predict any subset of the target words, conditioned on both the input text and a partially masked target translation. This approach allows for efficient iterative decoding, where we first predict all of the target words non-autoregressively, and then repeatedly mask out and regenerate the subset of words that the model is least confident about. By applying this strategy for a constant number of iterations, our model improves state-of-the-art performance levels for constant-time translation models by over 3 BLEU on average. It is also able to reach 92-95 performance of a typical left-to-right transformer model, while decoding significantly faster.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/19/2020

Infusing Sequential Information into Conditional Masked Translation Model with Self-Review Mechanism

Non-autoregressive models generate target words in a parallel way, which...
research
01/15/2020

Parallel Machine Translation with Disentangled Context Transformer

State-of-the-art neural machine translation models generate a translatio...
research
01/23/2020

Semi-Autoregressive Training Improves Mask-Predict Decoding

The recently proposed mask-predict decoding algorithm has narrowed the p...
research
08/19/2021

MvSR-NAT: Multi-view Subset Regularization for Non-Autoregressive Machine Translation

Conditional masked language models (CMLM) have shown impressive progress...
research
09/03/2019

The Bottom-up Evolution of Representations in the Transformer: A Study with Machine Translation and Language Modeling Objectives

We seek to understand how the representations of individual tokens and t...
research
09/01/2018

Beyond Error Propagation in Neural Machine Translation: Characteristics of Language Also Matter

Neural machine translation usually adopts autoregressive models and suff...
research
02/05/2019

The Referential Reader: A Recurrent Entity Network for Anaphora Resolution

We present a new architecture for storing and accessing entity mentions ...

Please sign up or login with your details

Forgot password? Click here to reset