Crowdsourcing Lightweight Pyramids for Manual Summary Evaluation

04/11/2019
by   Ori Shapira, et al.
0

Conducting a manual evaluation is considered an essential part of summary evaluation methodology. Traditionally, the Pyramid protocol, which exhaustively compares system summaries to references, has been perceived as very reliable, providing objective scores. Yet, due to the high cost of the Pyramid method and the required expertise, researchers resorted to cheaper and less thorough manual evaluation methods, such as Responsiveness and pairwise comparison, attainable via crowdsourcing. We revisit the Pyramid approach, proposing a lightweight sampling-based version that is crowdsourcable. We analyze the performance of our method in comparison to original expert-based Pyramid evaluations, showing higher correlation relative to the common Responsiveness method. We release our crowdsourced Summary-Content-Units, along with all crowdsourcing scripts, for future evaluations.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/23/2021

Finding a Balanced Degree of Automation for Summary Evaluation

Human evaluation for summarization tasks is reliable but brings in issue...
research
09/19/2021

Investigating Crowdsourcing Protocols for Evaluating the Factual Consistency of Summaries

Current pre-trained models applied to summarization are prone to factual...
research
10/07/2021

GeSERA: General-domain Summary Evaluation by Relevance Analysis

We present GeSERA, an open-source improved version of SERA for evaluatin...
research
10/01/2020

Towards Question-Answering as an Automatic Metric for Evaluating the Content Quality of a Summary

Recently, there has been growing interest in using question-answering (Q...
research
09/14/2023

Less is More for Long Document Summary Evaluation by LLMs

Large Language Models (LLMs) have shown promising performance in summary...
research
01/27/2021

How to Evaluate a Summarizer: Study Design and Statistical Analysis for Manual Linguistic Quality Evaluation

Manual evaluation is essential to judge progress on automatic text summa...
research
06/13/2022

Automated Evaluation of Standardized Dementia Screening Tests

For dementia screening and monitoring, standardized tests play a key rol...

Please sign up or login with your details

Forgot password? Click here to reset