DocAsRef: A Pilot Empirical Study on Repurposing Reference-Based Summary Quality Metrics Reference-Freely

12/20/2022
by   Forrest Sheng Bao, et al.
0

Summary quality assessment metrics have two categories: reference-based and reference-free. Reference-based metrics are theoretically more accurate but are limited by the availability and quality of the human-written references, which are both difficulty to ensure. This inspires the development of reference-free metrics, which are independent from human-written references, in the past few years. However, existing reference-free metrics cannot be both zero-shot and accurate. In this paper, we propose a zero-shot but accurate reference-free approach in a sneaky way: feeding documents, based upon which summaries generated, as references into reference-based metrics. Experimental results show that this zero-shot approach can give us the best-performing reference-free metrics on nearly all aspects on several recently-released datasets, even beating reference-free metrics specifically trained for this task sometimes. We further investigate what reference-based metrics can benefit from such repurposing and whether our additional tweaks help.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/05/2020

Unsupervised Reference-Free Summary Quality Evaluation via Contrastive Learning

Evaluation of a document summarization system has been a critical factor...
research
09/26/2022

News Summarization and Evaluation in the Era of GPT-3

The recent success of zero- and few-shot prompting with models like GPT-...
research
08/09/2020

A Modular Approach for Synchronized Wireless Multimodal Multisensor Data Acquisition in Highly Dynamic Social Settings

Existing data acquisition literature for human behavior research provide...
research
10/22/2022

On the Limitations of Reference-Free Evaluations of Generated Text

There is significant interest in developing evaluation metrics which acc...
research
02/28/2023

Large Language Models Are State-of-the-Art Evaluators of Translation Quality

We describe GEMBA, a GPT-based metric for assessment of translation qual...
research
03/19/2021

Play the Shannon Game With Language Models: A Human-Free Approach to Summary Evaluation

The goal of a summary is to concisely state the most important informati...
research
06/27/2022

Audio Similarity is Unreliable as a Proxy for Audio Quality

Many audio processing tasks require perceptual assessment. However, the ...

Please sign up or login with your details

Forgot password? Click here to reset