DeepAI AI Chat
Log In Sign Up

Fine-grained evaluation of Quality Estimation for Machine translation based on a linguistically-motivated Test Suite

10/16/2019
by   Avramidis Eleftherios, et al.
0

We present an alternative method of evaluating Quality Estimation systems, which is based on a linguistically-motivated Test Suite. We create a test-set consisting of 14 linguistic error categories and we gather for each of them a set of samples with both correct and erroneous translations. Then, we measure the performance of 5 Quality Estimation systems by checking their ability to distinguish between the correct and the erroneous translations. The detailed results are much more informative about the ability of each system. The fact that different Quality Estimation systems perform differently at various phenomena confirms the usefulness of the Test Suite.

READ FULL TEXT

page 1

page 2

page 3

page 4

04/01/2022

A Test Suite for the Evaluation of Portuguese-English Machine Translation

This paper describes the development of the first test suite for the lan...
10/13/2020

Fine-grained linguistic evaluation for state-of-the-art Machine Translation

This paper describes a test suite submission providing detailed statisti...
10/16/2019

Fine-grained evaluation of German-English Machine Translation based on a Test Suite

We present an analysis of 16 state-of-the-art MT systems on German-Engli...
10/16/2019

Linguistic evaluation of German-English Machine Translation using a Test Suite

We present the results of the application of a grammatical test suite fo...
08/31/2019

Evaluating Pronominal Anaphora in Machine Translation: An Evaluation Measure and a Test Suite

The ongoing neural revolution in machine translation has made it easier ...
03/08/2019

Filling Gender & Number Gaps in Neural Machine Translation with Black-box Context Injection

When translating from a language that does not morphologically mark info...
10/24/2021

Understanding the Impact of UGC Specificities on Translation Quality

This work takes a critical look at the evaluation of user-generated cont...