MiRANews: Dataset and Benchmarks for Multi-Resource-Assisted News Summarization

09/22/2021
by   Xinnuo Xu, et al.
0

One of the most challenging aspects of current single-document news summarization is that the summary often contains 'extrinsic hallucinations', i.e., facts that are not present in the source document, which are often derived via world knowledge. This causes summarization systems to act more like open-ended language models tending to hallucinate facts that are erroneous. In this paper, we mitigate this problem with the help of multiple supplementary resource documents assisting the task. We present a new dataset MiRANews and benchmark existing summarization models. In contrast to multi-document summarization, which addresses multiple events from several source documents, we still aim at generating a summary for a single document. We show via data analysis that it's not only the models which are to blame: more than 27 facts mentioned in the gold summaries of MiRANews are better grounded on assisting documents than in the main source articles. An error analysis of generated summaries from pretrained models fine-tuned on MiRANews reveals that this has an even bigger effects on models: assisted summarization reduces 55 of hallucinations when compared to single-document summarization models trained on the main article only. Our code and data are available at https://github.com/XinnuoXu/MiRANews.

READ FULL TEXT

page 1

page 8

research
11/02/2020

Liputan6: A Large-scale Indonesian Dataset for Text Summarization

In this paper, we introduce a large-scale Indonesian summarization datas...
research
03/22/2021

Nutri-bullets: Summarizing Health Studies by Composing Segments

We introduce Nutri-bullets, a multi-document summarization task for heal...
research
06/02/2021

Enriching Transformers with Structured Tensor-Product Representations for Abstractive Summarization

Abstractive summarization, the task of generating a concise summary of i...
research
10/22/2022

ECTSum: A New Benchmark Dataset For Bullet Point Summarization of Long Earnings Call Transcripts

Despite tremendous progress in automatic summarization, state-of-the-art...
research
04/13/2021

MS2: Multi-Document Summarization of Medical Studies

To assess the effectiveness of any medical intervention, researchers mus...
research
09/14/2023

Investigating Gender Bias in News Summarization

Summarization is an important application of large language models (LLMs...
research
04/30/2020

TLDR: Extreme Summarization of Scientific Documents

We introduce TLDR generation for scientific papers, a new automatic summ...

Please sign up or login with your details

Forgot password? Click here to reset