MLQE-PE: A Multilingual Quality Estimation and Post-Editing Dataset

10/09/2020
by   Marina Fomicheva, et al.
0

We present MLQE-PE, a new dataset for Machine Translation (MT) Quality Estimation (QE) and Automatic Post-Editing (APE). The dataset contains seven language pairs, with human labels for 9,000 translations per language pair in the following formats: sentence-level direct assessments and post-editing effort, and word-level good/bad labels. It also contains the post-edited sentences, as well as titles of the articles where the sentences were extracted from, and the neural MT models used to translate the text.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/25/2021

Automatic Post-Editing for Translating Chinese Novels to Vietnamese

Automatic post-editing (APE) is an important remedy for reducing errors ...
research
09/13/2022

Rethink about the Word-level Quality Estimation for Machine Translation from Human Judgement

Word-level Quality Estimation (QE) of Machine Translation (MT) aims to f...
research
05/02/2020

Practical Perspectives on Quality Estimation for Machine Translation

Sentence level quality estimation (QE) for machine translation (MT) atte...
research
09/17/2021

The JHU-Microsoft Submission for WMT21 Quality Estimation Shared Task

This paper presents the JHU-Microsoft joint submission for WMT 2021 qual...
research
06/14/2019

A Simple and Effective Approach to Automatic Post-Editing with Transfer Learning

Automatic post-editing (APE) seeks to automatically refine the output of...
research
06/20/2023

Efficient Machine Translation Corpus Generation

This paper proposes an efficient and semi-automated method for human-in-...
research
11/24/2021

A Self-Supervised Automatic Post-Editing Data Generation Tool

Data building for automatic post-editing (APE) requires extensive and ex...

Please sign up or login with your details

Forgot password? Click here to reset