Evaluating NLG Evaluation Metrics: A Measurement Theory Perspective

05/24/2023
by   Ziang Xiao, et al.
0

We address the fundamental challenge in Natural Language Generation (NLG) model evaluation, the design and validation of evaluation metrics. Recognizing the limitations of existing metrics and issues with human judgment, we propose using measurement theory, the foundation of test design, as a framework for conceptualizing and evaluating the validity and reliability of NLG evaluation metrics. This approach offers a systematic method for defining "good" metrics, developing robust metrics, and assessing metric performance. In this paper, we introduce core concepts in measurement theory in the context of NLG evaluation and key methods to evaluate the performance of NLG metrics. Through this framework, we aim to promote the design, evaluation, and interpretation of valid and reliable metrics, ultimately contributing to the advancement of robust and effective NLG models in real-world settings.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/28/2017

Data-driven Natural Language Generation: Paving the Road to Success

We argue that there are currently two major bottlenecks to the commercia...
research
08/15/2022

MENLI: Robust Evaluation Metrics from Natural Language Inference

Recently proposed BERT-based evaluation metrics perform well on standard...
research
01/10/2023

Assessing the applicability of common performance metrics for real-world infrared small-target detection

Infrared small target detection (IRSTD) is a challenging task in compute...
research
03/26/2023

An Evaluation of Memory Optimization Methods for Training Neural Networks

As models continue to grow in size, the development of memory optimizati...
research
02/02/2021

The GEM Benchmark: Natural Language Generation, its Evaluation and Metrics

We introduce GEM, a living benchmark for natural language Generation (NL...
research
04/13/2023

Streamlined Framework for Agile Forecasting Model Development towards Efficient Inventory Management

This paper proposes a framework for developing forecasting models by str...
research
08/21/2023

Foundation Model-oriented Robustness: Robust Image Model Evaluation with Pretrained Models

Machine learning has demonstrated remarkable performance over finite dat...

Please sign up or login with your details

Forgot password? Click here to reset