Parallel Machine Translation with Disentangled Context Transformer

01/15/2020
by   Jungo Kasai, et al.
0

State-of-the-art neural machine translation models generate a translation from left to right and every step is conditioned on the previously generated tokens. The sequential nature of this generation process causes fundamental latency in inference since we cannot generate multiple tokens in each sentence in parallel. We propose an attention-masking based model, called Disentangled Context (DisCo) transformer, that simultaneously generates all tokens given different contexts. The DisCo transformer is trained to predict every output token given an arbitrary subset of the other reference tokens. We also develop the parallel easy-first inference algorithm, which iteratively refines every token in parallel and reduces the number of required iterations. Our extensive experiments on 7 directions with varying data sizes demonstrate that our model achieves competitive, if not better, performance compared to the state of the art in non-autoregressive machine translation while significantly reducing decoding time on average.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/08/2020

LAVA NAT: A Non-Autoregressive Translation Model with Look-Around Decoding and Vocabulary Attention

Non-autoregressive translation (NAT) models generate multiple tokens in ...
research
07/14/2022

Forming Trees with Treeformers

Popular models such as Transformers and LSTMs use tokens as its unit of ...
research
04/19/2019

Constant-Time Machine Translation with Conditional Masked Language Models

Most machine translation systems generate text autoregressively, by sequ...
research
06/24/2019

Decomposable Neural Paraphrase Generation

Paraphrasing exists at different granularity levels, such as lexical lev...
research
04/11/2022

ConSLT: A Token-level Contrastive Framework for Sign Language Translation

Sign language translation (SLT) is an important technology that can brid...
research
02/04/2019

Insertion-based Decoding with Automatically Inferred Generation Order

Conventional neural autoregressive decoding commonly assumes a left-to-r...
research
05/27/2019

Levenshtein Transformer

Modern neural sequence generation models are built to either generate to...

Please sign up or login with your details

Forgot password? Click here to reset