Machine Translation Evaluation: A Survey

by   Aaron Li-Feng Han, et al.

We introduce the Machine Translation (MT) evaluation survey that contains both manual and automatic evaluation methods. The traditional human evaluation criteria mainly include the intelligibility, fidelity, fluency, adequacy, comprehension, and informativeness. The advanced human assessments include task-oriented measures, post-editing, segment ranking, and extended criteriea, etc. We classify the automatic evaluation methods into two categories, including lexical similarity scenario and linguistic features application. The lexical similarity methods contain edit distance, precision, recall, F-measure, and word order. The linguistic features can be divided into syntactic features and semantic features respectively. The syntactic features include part of speech tag, phrase types and sentence structures, and the semantic features include named entity, synonyms, textual entailment, paraphrase, semantic roles, and language models. The deep learning models for evaluation are very newly proposed. Subsequently, we also introduce the evaluation methods for MT evaluation including different correlation scores, and the recent quality estimation (QE) tasks for MT. This paper differs from the existing works GALEprogram2009,EuroMatrixProject2007 from several aspects, by introducing some recent development of MT evaluation measures, the different classifications from manual to automatic evaluation measures, the introduction of recent QE tasks of MT, and the concise construction of the content. We hope this work will be helpful for MT researchers to easily pick up some metrics that are best suitable for their specific MT model development, and help MT evaluation researchers to get a general clue of how MT evaluation research developed. Furthermore, hopefully, this work can also shine some light on other evaluation tasks, except for translation, of NLP fields.


page 1

page 2

page 3

page 4


An Overview on Machine Translation Evaluation

Since the 1950s, machine translation (MT) has become one of the importan...

HUME: Human UCCA-Based Evaluation of Machine Translation

Human evaluation of machine translation normally uses sentence-level mea...

LEPOR: An Augmented Machine Translation Evaluation Metric

Machine translation (MT) was developed as one of the hottest research to...

HOPE: A Task-Oriented and Human-Centric Evaluation Framework Using Professional Post-Editing Towards More Effective MT Evaluation

Traditional automatic evaluation metrics for machine translation have be...

Word2Vec vs DBnary: Augmenting METEOR using Vector Representations or Lexical Resources?

This paper presents an approach combining lexico-semantic resources and ...

Exploring Lexical, Syntactic, and Semantic Features for Chinese Textual Entailment in NTCIR RITE Evaluation Tasks

We computed linguistic information at the lexical, syntactic, and semant...

Evaluating MT Systems: A Theoretical Framework

This paper outlines a theoretical framework using which different automa...

Please sign up or login with your details

Forgot password? Click here to reset