Human-like Summarization Evaluation with ChatGPT

04/05/2023
by   Mingqi Gao, et al.
0

Evaluating text summarization is a challenging problem, and existing evaluation metrics are far from satisfactory. In this study, we explored ChatGPT's ability to perform human-like summarization evaluation using four human evaluation methods on five datasets. We found that ChatGPT was able to complete annotations relatively smoothly using Likert scale scoring, pairwise comparison, Pyramid, and binary factuality evaluation. Additionally, it outperformed commonly used automatic evaluation metrics on some datasets. Furthermore, we discussed the impact of different prompts, compared its performance with that of human evaluation, and analyzed the generated explanations and invalid responses.

READ FULL TEXT
research
03/18/2023

Revisiting Automatic Question Summarization Evaluation in the Biomedical Domain

Automatic evaluation metrics have been facilitating the rapid developmen...
research
10/14/2020

Re-evaluating Evaluation in Text Summarization

Automated evaluation metrics as a stand-in for manual evaluation are an ...
research
03/11/2022

Active Evaluation: Efficient NLG Evaluation with Few Pairwise Comparisons

Recent studies have shown the advantages of evaluating NLG systems using...
research
05/24/2023

Analyzing Influential Factors in Human Preference Judgments via GPT-4

Pairwise human judgments are pivotal in guiding large language models (L...
research
03/31/2021

A Statistical Analysis of Summarization Evaluation Metrics using Resampling Methods

The quality of a summarization evaluation metric is quantified by calcul...
research
05/20/2020

Examining the State-of-the-Art in News Timeline Summarization

Previous work on automatic news timeline summarization (TLS) leaves an u...
research
05/30/2019

Assessing The Factual Accuracy of Generated Text

We propose a model-based metric to estimate the factual accuracy of gene...

Please sign up or login with your details

Forgot password? Click here to reset