DeepAI AI Chat
Log In Sign Up

Reference-less Quality Estimation of Text Simplification Systems

by   Louis Martin, et al.

The evaluation of text simplification (TS) systems remains an open challenge. As the task has common points with machine translation (MT), TS is often evaluated using MT metrics such as BLEU. However, such metrics require high quality reference data, which is rarely available for TS. TS has the advantage over MT of being a monolingual task, which allows for direct comparisons to be made between the simplified text and its original version. In this paper, we compare multiple approaches to reference-less quality estimation of sentence-level text simplification systems, based on the dataset used for the QATS 2016 shared task. We distinguish three different dimensions: gram-maticality, meaning preservation and simplicity. We show that n-gram-based MT metrics such as BLEU and METEOR correlate the most with human judgment of grammaticality and meaning preservation, whereas simplicity is best evaluated by basic length-based metrics.


HilMeMe: A Human-in-the-Loop Machine Translation Evaluation Metric Looking into Multi-Word Expressions

With the fast development of Machine Translation (MT) systems, especiall...

Rethinking Automatic Evaluation in Sentence Simplification

Automatic evaluation remains an open research question in Natural Langua...

Evaluating Subtitle Segmentation for End-to-end Generation Systems

Subtitles appear on screen as short pieces of text, segmented based on f...

CLaC @ QATS: Quality Assessment for Text Simplification

This paper describes our approach to the 2016 QATS quality assessment sh...

BLEU is Not Suitable for the Evaluation of Text Simplification

BLEU is widely considered to be an informative metric for text-to-text g...

Pushing the Right Buttons: Adversarial Evaluation of Quality Estimation

Current Machine Translation (MT) systems achieve very good results on a ...