CNewSum: A Large-scale Chinese News Summarization Dataset with Human-annotated Adequacy and Deducibility Level

10/21/2021
by   Danqing Wang, et al.
0

Automatic text summarization aims to produce a brief but crucial summary for the input documents. Both extractive and abstractive methods have witnessed great success in English datasets in recent years. However, there has been a minimal exploration of text summarization in Chinese, limited by the lack of large-scale datasets. In this paper, we present a large-scale Chinese news summarization dataset CNewSum, which consists of 304,307 documents and human-written summaries for the news feed. It has long documents with high-abstractive summaries, which can encourage document-level understanding and generation for current summarization models. An additional distinguishing feature of CNewSum is that its test set contains adequacy and deducibility annotations for the summaries. The adequacy level measures the degree of summary information covered by the document, and the deducibility indicates the reasoning ability the model needs to generate the summary. These annotations can help researchers analyze and target their model performance bottleneck. We examine recent methods on CNewSum and release our dataset to provide a solid testbed for automatic Chinese summarization research.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/09/2022

CLTS+: A New Chinese Long Text Summarization Dataset with Abstractive Summaries

The abstractive methods lack of creative ability is particularly a probl...
research
06/29/2021

Topic Modeling Based Extractive Text Summarization

Text summarization is an approach for identifying important information ...
research
12/02/2022

NarraSum: A Large-Scale Dataset for Abstractive Narrative Summarization

Narrative summarization aims to produce a distilled version of a narrati...
research
06/19/2015

LCSTS: A Large Scale Chinese Short Text Summarization Dataset

Automatic text summarization is widely regarded as the highly difficult ...
research
05/09/2023

VCSUM: A Versatile Chinese Meeting Summarization Dataset

Compared to news and chat summarization, the development of meeting summ...
research
01/31/2020

Approximate Summaries for Why and Why-not Provenance (Extended Version)

Why and why-not provenance have been studied extensively in recent years...
research
06/03/2021

To Point or Not to Point: Understanding How Abstractive Summarizers Paraphrase Text

Abstractive neural summarization models have seen great improvements in ...

Please sign up or login with your details

Forgot password? Click here to reset