DeepAI AI Chat
Log In Sign Up

Putting Machine Translation in Context with the Noisy Channel Model

by   Lei Yu, et al.

We show that Bayes' rule provides a compelling mechanism for controlling unconditional document language models, using the long-standing challenge of effectively leveraging document context in machine translation. In our formulation, we estimate the probability of a candidate translation as the product of the unconditional probability of the candidate output document and the “reverse translation probability” of translating the candidate output back into the input source language document—the so-called “noisy channel” decomposition. A particular advantage of our model is that it requires only parallel sentences to train, rather than parallel documents, which are not always available. Using a new beam search reranking approximation to solve the decoding problem, we find that document language models outperform language models that assume independence between sentences, and that using either a document or sentence language model outperforms comparable models that directly estimate the translation probability. We obtain the best-published results on the NIST Chinese–English translation task, a standard task for evaluating document translation. Our model also outperforms the benchmark Transformer model by approximately 2.5 BLEU on the WMT19 Chinese–English translation task.


page 1

page 2

page 3

page 4


Capturing document context inside sentence-level neural machine translation models with self-training

Neural machine translation (NMT) has arguably achieved human level parit...

Toward Making the Most of Context in Neural Machine Translation

Document-level machine translation manages to outperform sentence level ...

Korean-English Machine Translation with Multiple Tokenization Strategy

This work was conducted to find out how tokenization methods affect the ...

Exploiting Sentence Order in Document Alignment

In this work, we exploit the simple idea that a document and its transla...

Arithmetic Sampling: Parallel Diverse Decoding for Large Language Models

Decoding methods for large language models often trade-off between diver...

Language as a matrix product state

We propose a statistical model for natural language that begins by consi...

Noisy Channel for Automatic Text Simplification

In this paper we present a simple re-ranking method for Automatic Senten...