How well do you know your summarization datasets?

06/21/2021
by   Priyam Tejaswin, et al.
0

State-of-the-art summarization systems are trained and evaluated on massive datasets scraped from the web. Despite their prevalence, we know very little about the underlying characteristics (data noise, summarization complexity, etc.) of these datasets, and how these affect system performance and the reliability of automatic metrics like ROUGE. In this study, we manually analyze 600 samples from three popular summarization datasets. Our study is driven by a six-class typology which captures different noise types (missing facts, entities) and degrees of summarization difficulty (extractive, abstractive). We follow with a thorough analysis of 27 state-of-the-art summarization models and 5 popular metrics, and report our key insights: (1) Datasets have distinct data quality and complexity distributions, which can be traced back to their collection process. (2) The performance of models and reliability of metrics is dependent on sample complexity. (3) Faithful summaries often receive low scores because of the poor diversity of references. We release the code, annotated data and model outputs.

READ FULL TEXT

page 7

page 12

research
04/27/2021

Understanding Factuality in Abstractive Summarization with FRANK: A Benchmark for Factuality Metrics

Modern summarization models generate highly fluent but often factually u...
research
09/14/2019

Efficiency Metrics for Data-Driven Models: A Text Summarization Case Study

Using data-driven models for solving text summarization or similar tasks...
research
12/13/2017

Everything You Always Wanted to Know About TREC RTS* (*But Were Afraid to Ask)

The TREC Real-Time Summarization (RTS) track provides a framework for ev...
research
11/15/2022

ED-FAITH: Evaluating Dialogue Summarization on Faithfulness

Abstractive summarization models typically generate content unfaithful t...
research
11/08/2020

Metrics also Disagree in the Low Scoring Range: Revisiting Summarization Evaluation Metrics

In text summarization, evaluating the efficacy of automatic metrics with...
research
10/31/2022

Questioning the Validity of Summarization Datasets and Improving Their Factual Consistency

The topic of summarization evaluation has recently attracted a surge of ...
research
12/19/2022

Improving Faithfulness of Abstractive Summarization by Controlling Confounding Effect of Irrelevant Sentences

Lack of factual correctness is an issue that still plagues state-of-the-...

Please sign up or login with your details

Forgot password? Click here to reset