QBSUM: a Large-Scale Query-Based Document Summarization Dataset from Real-world Applications

10/27/2020
by   Mingjun Zhao, et al.
0

Query-based document summarization aims to extract or generate a summary of a document which directly answers or is relevant to the search query. It is an important technique that can be beneficial to a variety of applications such as search engines, document-level machine reading comprehension, and chatbots. Currently, datasets designed for query-based summarization are short in numbers and existing datasets are also limited in both scale and quality. Moreover, to the best of our knowledge, there is no publicly available dataset for Chinese query-based document summarization. In this paper, we present QBSUM, a high-quality large-scale dataset consisting of 49,000+ data samples for the task of Chinese query-based document summarization. We also propose multiple unsupervised and supervised solutions to the task and demonstrate their high-speed inference and superior performance via both offline experiments and online A/B tests. The QBSUM dataset is released in order to facilitate future advancement of this research field.

READ FULL TEXT
research
10/23/2020

AQuaMuSe: Automatically Generating Datasets for Query-Based Multi-Document Summarization

Summarization is the task of compressing source document(s) into coheren...
research
04/26/2017

Diversity driven Attention Model for Query-based Abstractive Summarization

Abstractive summarization aims to generate a shorter version of the docu...
research
04/11/2023

Explicit and Implicit Semantic Ranking Framework

The core challenge in numerous real-world applications is to match an in...
research
11/14/2017

DuReader: a Chinese Machine Reading Comprehension Dataset from Real-world Applications

In this paper, we introduce DuReader, a new large-scale, open-domain Chi...
research
10/27/2020

Multi-XScience: A Large-scale Dataset for Extreme Multi-document Summarization of Scientific Articles

Multi-document summarization is a challenging task for which there exist...
research
10/07/2021

HowSumm: A Multi-Document Summarization Dataset Derived from WikiHow Articles

We present HowSumm, a novel large-scale dataset for the task of query-fo...
research
10/14/2019

Knowledge-guided Unsupervised Rhetorical Parsing for Text Summarization

Automatic text summarization (ATS) has recently achieved impressive perf...

Please sign up or login with your details

Forgot password? Click here to reset