Document Sub-structure in Neural Machine Translation

12/13/2019
by   Radina Dobreva, et al.
0

Current approaches to machine translation (MT) either translate sentences in isolation, disregarding the context they appear in, or model context on the level of the full document, without a notion of any internal structure the document may have. In this work we consider the fact that documents are rarely homogeneous blocks of text, but rather consist of parts covering different topics. Some documents, e.g. biographies and encyclopedia entries have highly predictable, regular structures in which sections are characterised by different topics. We draw inspiration from Louis and Webber (2014) who use this information to improve MT and transfer their proposal into the framework of neural MT. We compare two different methods of including information about the topic of the section within which each sentence is found: one using side constraints and the other using a cache-based model. We create and release the data on which we run our experiments – parallel corpora for three language pairs (Chinese-English, French-English, Bulgarian-English) from Wikipedia biographies, preserving the boundaries of sections within the articles.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/24/2020

Context-aware Decoder for Neural Machine Translation using a Target-side Document-Level Language Model

Although many context-aware neural machine translation models have been ...
research
10/26/2022

A Bilingual Parallel Corpus with Discourse Annotations

Machine translation (MT) has almost achieved human parity at sentence-le...
research
10/16/2019

Using Whole Document Context in Neural Machine Translation

In Machine Translation, considering the document as a whole can help to ...
research
11/30/2017

Cache-based Document-level Neural Machine Translation

Sentences in a well-formed text are connected to each other via various ...
research
06/27/2019

The Impact of Preprocessing on Arabic-English Statistical and Neural Machine Translation

Neural networks have become the state-of-the-art approach for machine tr...
research
07/06/2019

Evolutionary Algorithm for Sinhala to English Translation

Machine Translation (MT) is an area in natural language processing, whic...
research
05/01/2020

Facilitating Access to Multilingual COVID-19 Information via Neural Machine Translation

Every day, more people are becoming infected and dying from exposure to ...

Please sign up or login with your details

Forgot password? Click here to reset