Mark-Evaluate: Assessing Language Generation using Population Estimation Methods

10/09/2020
by   Gonçalo Mordido, et al.
0

We propose a family of metrics to assess language generation derived from population estimation methods widely used in ecology. More specifically, we use mark-recapture and maximum-likelihood methods that have been applied over the past several decades to estimate the size of closed populations in the wild. We propose three novel metrics: ME_Petersen and ME_CAPTURE, which retrieve a single-valued assessment, and ME_Schnabel which returns a double-valued metric to assess the evaluation set in terms of quality and diversity, separately. In synthetic experiments, our family of methods is sensitive to drops in quality and diversity. Moreover, our methods show a higher correlation to human evaluation than existing metrics on several challenging tasks, namely unconditional language generation, machine translation, and text summarization.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/02/2021

InfoLM: A New Metric to Evaluate Summarization Data2Text Generation

Assessing the quality of natural language generation systems through hum...
research
09/15/2022

Distribution Aware Metrics for Conditional Natural Language Generation

Traditional automated metrics for evaluating conditional natural languag...
research
03/07/2023

Is ChatGPT a Good NLG Evaluator? A Preliminary Study

Recently, the emergence of ChatGPT has attracted wide attention from the...
research
05/23/2023

APPLS: A Meta-evaluation Testbed for Plain Language Summarization

While there has been significant development of models for Plain Languag...
research
08/05/2017

Referenceless Quality Estimation for Natural Language Generation

Traditional automatic evaluation measures for natural language generatio...
research
04/04/2019

Unifying Human and Statistical Evaluation for Natural Language Generation

How can we measure whether a natural language generation system produces...
research
03/02/2020

Assessing Software Defection Prediction Performance: Why Using the Matthews Correlation Coefficient Matters

Context: There is considerable diversity in the range and design of comp...

Please sign up or login with your details

Forgot password? Click here to reset