Verdi: Quality Estimation and Error Detection for Bilingual

05/31/2021
by   Mingjun Zhao, et al.
0

Translation Quality Estimation is critical to reducing post-editing efforts in machine translation and to cross-lingual corpus cleaning. As a research problem, quality estimation (QE) aims to directly estimate the quality of translation in a given pair of source and target sentences, and highlight the words that need corrections, without referencing to golden translations. In this paper, we propose Verdi, a novel framework for word-level and sentence-level post-editing effort estimation for bilingual corpora. Verdi adopts two word predictors to enable diverse features to be extracted from a pair of sentences for subsequent quality estimation, including a transformer-based neural machine translation (NMT) model and a pre-trained cross-lingual language model (XLM). We exploit the symmetric nature of bilingual corpora and apply model-level dual learning in the NMT predictor, which handles a primal task and a dual task simultaneously with weight sharing, leading to stronger context prediction ability than single-direction NMT models. By taking advantage of the dual learning scheme, we further design a novel feature to directly encode the translated target information without relying on the source context. Extensive experiments conducted on WMT20 QE tasks demonstrate that our method beats the winner of the competition and outperforms other baseline methods by a great margin. We further use the sentence-level scores provided by Verdi to clean a parallel corpus and observe benefits on both model performance and training efficiency.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/30/2020

Learning Contextualized Sentence Representations for Document-Level Neural Machine Translation

Document-level machine translation incorporates inter-sentential depende...
research
04/15/2021

Simultaneous Multi-Pivot Neural Machine Translation

Parallel corpora are indispensable for training neural machine translati...
research
11/01/2018

Addressing word-order Divergence in Multilingual Neural Machine Translation for extremely Low Resource Languages

Transfer learning approaches for Neural Machine Translation (NMT) train ...
research
05/15/2021

DirectQE: Direct Pretraining for Machine Translation Quality Estimation

Machine Translation Quality Estimation (QE) is a task of predicting the ...
research
05/24/2022

DivEMT: Neural Machine Translation Post-Editing Effort Across Typologically Diverse Languages

We introduce DivEMT, the first publicly available post-editing study of ...
research
11/27/2019

word2word: A Collection of Bilingual Lexicons for 3,564 Language Pairs

We present word2word, a publicly available dataset and an open-source Py...
research
04/29/2020

Revisiting Round-Trip Translation for Quality Estimation

Quality estimation (QE) is the task of automatically evaluating the qual...

Please sign up or login with your details

Forgot password? Click here to reset