Mismatching-Aware Unsupervised Translation Quality Estimation For Low-Resource Languages

07/31/2022
by   Fatemeh Azadi, et al.
0

Translation Quality Estimation (QE) is the task of predicting the quality of machine translation (MT) output without any reference. This task has gained increasing attention as an important component in practical applications of MT. In this paper, we first propose XLMRScore, a simple unsupervised QE method based on the BERTScore computed using the XLM-RoBERTa (XLMR) model while discussing the issues that occur using this method. Next, we suggest two approaches to mitigate the issues: replacing untranslated words with the unknown token and the cross-lingual alignment of pre-trained model to represent aligned words closer to each other. We evaluate the proposed method on four low-resource language pairs of WMT21 QE shared task, as well as a new English-Farsi test dataset introduced in this paper. Experiments show that our method could get comparable results with the supervised baseline for two zero-shot scenarios, i.e., with less than 0.01 difference in Pearson correlation, while outperforming the unsupervised rivals in all the low-resource language pairs for above 8

READ FULL TEXT
research
06/30/2021

What Can Unsupervised Machine Translation Contribute to High-Resource Language Pairs?

Whereas existing literature on unsupervised machine translation (MT) foc...
research
03/15/2021

MENYO-20k: A Multi-domain English-Yorùbá Corpus for Machine Translation and Domain Adaptation

Massively multilingual machine translation (MT) has shown impressive cap...
research
03/31/2021

Zero-Shot Language Transfer vs Iterative Back Translation for Unsupervised Machine Translation

This work focuses on comparing different solutions for machine translati...
research
10/13/2022

Low-resource Neural Machine Translation with Cross-modal Alignment

How to achieve neural machine translation with limited parallel data? Ex...
research
08/21/2019

Improving Captioning for Low-Resource Languages by Cycle Consistency

Improving the captioning performance on low-resource languages by levera...
research
10/13/2020

The Tatoeba Translation Challenge – Realistic Data Sets for Low Resource and Multilingual MT

This paper describes the development of a new benchmark for machine tran...
research
06/09/2023

Assisting Language Learners: Automated Trans-Lingual Definition Generation via Contrastive Prompt Learning

The standard definition generation task requires to automatically produc...

Please sign up or login with your details

Forgot password? Click here to reset