Machine Translation Evaluation using Bi-directional Entailment

11/02/2019
by   Rakesh Khobragade, et al.
0

In this paper, we propose a new metric for Machine Translation (MT) evaluation, based on bi-directional entailment. We show that machine generated translation can be evaluated by determining paraphrasing with a reference translation provided by a human translator. We hypothesize, and show through experiments, that paraphrasing can be detected by evaluating entailment relationship in the forward and backward direction. Unlike conventional metrics, like BLEU or METEOR, our approach uses deep learning to determine the semantic similarity between candidate and reference translation for generating scores rather than relying upon simple n-gram overlap. We use BERT's pre-trained implementation of transformer networks, fine-tuned on MNLI corpus, for natural language inferencing. We apply our evaluation metric on WMT'14 and WMT'17 dataset to evaluate systems participating in the translation task and find that our metric has a better correlation with the human annotated score compared to the other traditional metrics at system level.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/27/2022

The Effect of Normalization for Bi-directional Amharic-English Neural Machine Translation

Machine translation (MT) is one of the main tasks in natural language pr...
research
01/30/2023

KG-BERTScore: Incorporating Knowledge Graph into BERTScore for Reference-Free Machine Translation Evaluation

BERTScore is an effective and robust automatic metric for referencebased...
research
07/29/2019

Machine Translation Evaluation with BERT Regressor

We introduce the metric using BERT (Bidirectional Encoder Representation...
research
10/27/2022

ACES: Translation Accuracy Challenge Sets for Evaluating Machine Translation Metrics

As machine translation (MT) metrics improve their correlation with human...
research
01/21/2023

Poor Man's Quality Estimation: Predicting Reference-Based MT Metrics Without the Reference

Machine translation quality estimation (QE) predicts human judgements of...
research
05/19/2023

The Inside Story: Towards Better Understanding of Machine Translation Neural Evaluation Metrics

Neural metrics for machine translation evaluation, such as COMET, exhibi...
research
09/16/2019

Communication-based Evaluation for Natural Language Generation

Natural language generation (NLG) systems are commonly evaluated using n...

Please sign up or login with your details

Forgot password? Click here to reset