Putting Machine Translation in Context with the Noisy Channel Model

10/01/2019
by   Lei Yu, et al.
0

We show that Bayes' rule provides a compelling mechanism for controlling unconditional document language models, using the long-standing challenge of effectively leveraging document context in machine translation. In our formulation, we estimate the probability of a candidate translation as the product of the unconditional probability of the candidate output document and the “reverse translation probability” of translating the candidate output back into the input source language document—the so-called “noisy channel” decomposition. A particular advantage of our model is that it requires only parallel sentences to train, rather than parallel documents, which are not always available. Using a new beam search reranking approximation to solve the decoding problem, we find that document language models outperform language models that assume independence between sentences, and that using either a document or sentence language model outperforms comparable models that directly estimate the translation probability. We obtain the best-published results on the NIST Chinese–English translation task, a standard task for evaluating document translation. Our model also outperforms the benchmark Transformer model by approximately 2.5 BLEU on the WMT19 Chinese–English translation task.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/11/2020

Capturing document context inside sentence-level neural machine translation models with self-training

Neural machine translation (NMT) has arguably achieved human level parit...
research
02/19/2020

Toward Making the Most of Context in Neural Machine Translation

Document-level machine translation manages to outperform sentence level ...
research
05/29/2021

Korean-English Machine Translation with Multiple Tokenization Strategy

This work was conducted to find out how tokenization methods affect the ...
research
04/30/2020

Exploiting Sentence Order in Document Alignment

In this work, we exploit the simple idea that a document and its transla...
research
04/25/2023

Escaping the sentence-level paradigm in machine translation

It is well-known that document context is vital for resolving a range of...
research
11/04/2017

Language as a matrix product state

We propose a statistical model for natural language that begins by consi...
research
11/06/2022

Noisy Channel for Automatic Text Simplification

In this paper we present a simple re-ranking method for Automatic Senten...

Please sign up or login with your details

Forgot password? Click here to reset