DeepAI
Log In Sign Up

A Parallel Corpus of Translationese

09/11/2015
by   Ella Rabinovich, et al.
0

We describe a set of bilingual English--French and English--German parallel corpora in which the direction of translation is accurately and reliably annotated. The corpora are diverse, consisting of parliamentary proceedings, literary works, transcriptions of TED talks and political commentary. They will be instrumental for research of translationese and its applications to (human and machine) translation; specifically, they can be used for the task of translationese identification, a research direction that enjoys a growing interest in recent years. To validate the quality and reliability of the corpora, we replicated previous results of supervised and unsupervised identification of translationese, and further extended the experiments to additional datasets and languages.

READ FULL TEXT

page 1

page 2

page 3

page 4

06/06/2022

MorisienMT: A Dataset for Mauritian Creole Machine Translation

In this paper, we describe MorisienMT, a dataset for benchmarking machin...
10/08/2017

The IIT Bombay English-Hindi Parallel Corpus

We present the IIT Bombay English-Hindi Parallel Corpus. The corpus is a...
02/04/2019

An Effective Approach to Unsupervised Machine Translation

While machine translation has traditionally relied on large amounts of p...
05/21/2020

MultiMWE: Building a Multi-lingual Multi-Word Expression (MWE) Parallel Corpora

Multi-word expressions (MWEs) are a hot topic in research in natural lan...
05/10/2022

ParaCotta: Synthetic Multilingual Paraphrase Corpora from the Most Diverse Translation Sample Pair

We release our synthetic parallel paraphrase corpus across 17 languages:...
04/27/2019

Towards Recognizing Phrase Translation Processes: Experiments on English-French

When translating phrases (words or group of words), human translators, c...
08/05/2020

Designing the Business Conversation Corpus

While the progress of machine translation of written text has come far i...