Exploiting Sentence Order in Document Alignment

04/30/2020
by   Brian Thompson, et al.
0

In this work, we exploit the simple idea that a document and its translation should contain approximately the same information, in approximately the same order. We propose methods for both document pair candidate generation and candidate re-scoring which incorporate high-level order information. Our method results in 61 result on the WMT16 document alignment shared task. We also apply our method to web-scraped Sinhala-English documents from ParaCrawl and find that our method improves MT performance by 1.2 BLEU over the current ParaCrawl document alignment method.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/20/2021

CDA: a Cost Efficient Content-based Multilingual Web Document Aligner

We introduce a Content-based Document Alignment approach (CDA), an effic...
research
10/29/2019

Big Bidirectional Insertion Representations for Documents

The Insertion Transformer is well suited for long form text generation d...
research
10/01/2019

Putting Machine Translation in Context with the Noisy Channel Model

We show that Bayes' rule provides a compelling mechanism for controlling...
research
05/22/2023

Non-Autoregressive Document-Level Machine Translation (NA-DMT): Exploring Effective Approaches, Challenges, and Opportunities

Non-autoregressive translation (NAT) models have been extensively invest...
research
10/03/2020

Multilevel Text Alignment with Cross-Document Attention

Text alignment finds application in tasks such as citation recommendatio...
research
12/12/2020

SenSeNet: Neural Keyphrase Generation with Document Structure

Keyphrase Generation (KG) is the task of generating central topics from ...
research
07/29/2017

Bilingual Document Alignment with Latent Semantic Indexing

We apply cross-lingual Latent Semantic Indexing to the Bilingual Documen...

Please sign up or login with your details

Forgot password? Click here to reset