Questioning the Validity of Summarization Datasets and Improving Their Factual Consistency

10/31/2022
by   Yanzhu Guo, et al.
0

The topic of summarization evaluation has recently attracted a surge of attention due to the rapid development of abstractive summarization systems. However, the formulation of the task is rather ambiguous, neither the linguistic nor the natural language processing community has succeeded in giving a mutually agreed-upon definition. Due to this lack of well-defined formulation, a large number of popular abstractive summarization datasets are constructed in a manner that neither guarantees validity nor meets one of the most essential criteria of summarization: factual consistency. In this paper, we address this issue by combining state-of-the-art factual consistency models to identify the problematic instances present in popular summarization datasets. We release SummFC, a filtered summarization dataset with improved factual consistency, and demonstrate that models trained on this dataset achieve improved performance in nearly all quality aspects. We argue that our dataset should become a valid benchmark for developing and evaluating summarization systems.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/14/2020

Re-evaluating Evaluation in Text Summarization

Automated evaluation metrics as a stand-in for manual evaluation are an ...
research
04/11/2022

Evaluation of Automatic Text Summarization using Synthetic Facts

Despite some recent advances, automatic text summarization remains unrel...
research
01/26/2018

A Formal Definition of Importance for Summarization

Research on summarization has mainly been driven by empirical approaches...
research
09/19/2021

Investigating Crowdsourcing Protocols for Evaluating the Factual Consistency of Summaries

Current pre-trained models applied to summarization are prone to factual...
research
04/24/2022

Embedding Knowledge for Document Summarization: A Survey

Knowledge-aware methods have boosted a range of Natural Language Process...
research
11/14/2020

DebateSum: A large-scale argument mining and summarization dataset

Prior work in Argument Mining frequently alludes to its potential applic...
research
06/21/2021

How well do you know your summarization datasets?

State-of-the-art summarization systems are trained and evaluated on mass...

Please sign up or login with your details

Forgot password? Click here to reset