The UN Parallel Corpus Annotated for Translation Direction

05/20/2018
by   Elad Tolochinsky, et al.
0

This work distinguishes between translated and original text in the UN protocol corpus. By modeling the problem as classification problem, we can achieve up to 95 corpus for different language-pairs annotated for translation direction, and then classify the data by using various feature extraction methods. We compare the different methods as well as the ability to distinguish between translated and original texts in the different languages. The annotated corpus is publicly available.

READ FULL TEXT

page 5

page 6

page 7

research
12/20/2022

Original or Translated? On the Use of Parallel Data for Translation Quality Estimation

Machine Translation Quality Estimation (QE) is the task of evaluating tr...
research
10/15/2019

Detecting Machine-Translated Text using Back Translation

Machine-translated text plays a crucial role in the communication of peo...
research
09/11/2016

Unsupervised Identification of Translationese

Translated texts are distinctively different from original ones, to the ...
research
04/27/2019

Towards Recognizing Phrase Translation Processes: Experiments on English-French

When translating phrases (words or group of words), human translators, c...
research
06/08/2017

The Algorithmic Inflection of Russian and Generation of Grammatically Correct Text

We present a deterministic algorithm for Russian inflection. This algori...
research
09/26/2020

ARPA: Armenian Paraphrase Detection Corpus and Models

In this work, we employ a semi-automatic method based on back translatio...
research
08/17/2017

An Annotated Corpus of Relational Strategies in Customer Service

We create and release the first publicly available commercial customer s...

Please sign up or login with your details

Forgot password? Click here to reset