HaRiM^+: Evaluating Summary Quality with Hallucination Risk

11/22/2022
by   Seonil Son, et al.
0

One of the challenges of developing a summarization model arises from the difficulty in measuring the factual inconsistency of the generated text. In this study, we reinterpret the decoder overconfidence-regularizing objective suggested in (Miao et al., 2021) as a hallucination risk measurement to better estimate the quality of generated summaries. We propose a reference-free metric, HaRiM+, which only requires an off-the-shelf summarization model to compute the hallucination risk based on token likelihoods. Deploying it requires no additional training of models or ad-hoc modules, which usually need alignment to human judgments. For summary-quality estimation, HaRiM+ records state-of-the-art correlation to human judgment on three summary-quality annotation sets: FRANK, QAGS, and SummEval. We hope that our work, which merits the use of summarization models, facilitates the progress of both automated evaluation and generation of summary.

READ FULL TEXT

page 16

page 17

page 18

page 19

page 23

research
01/23/2022

WIDAR – Weighted Input Document Augmented ROUGE

The task of automatic text summarization has gained a lot of traction du...
research
10/23/2020

Understanding the Extent to which Summarization Evaluation Metrics Measure the Information Quality of Summaries

Reference-based metrics such as ROUGE or BERTScore evaluate the content ...
research
02/23/2020

Fill in the BLANC: Human-free quality estimation of document summaries

We present BLANC, a new approach to the automatic estimation of document...
research
09/02/2019

SumQE: a BERT-based Summary Quality Estimation Model

We propose SumQE, a novel Quality Estimation model for summarization bas...
research
10/25/2022

Towards Interpretable Summary Evaluation via Allocation of Contextual Embeddings to Reference Text Topics

Despite extensive recent advances in summary generation models, evaluati...
research
10/18/2022

Summary Workbench: Unifying Application and Evaluation of Text Summarization Models

This paper presents Summary Workbench, a new tool for developing and eva...
research
06/16/2023

I Want This, Not That: Personalized Summarization of Scientific Scholarly Texts

In this paper, we present a proposal for an unsupervised algorithm, P-Su...

Please sign up or login with your details

Forgot password? Click here to reset