QBSUM: a Large-Scale Query-Based Document Summarization Dataset from Real-world Applications

10/27/2020
by   Mingjun Zhao, et al.
0

Query-based document summarization aims to extract or generate a summary of a document which directly answers or is relevant to the search query. It is an important technique that can be beneficial to a variety of applications such as search engines, document-level machine reading comprehension, and chatbots. Currently, datasets designed for query-based summarization are short in numbers and existing datasets are also limited in both scale and quality. Moreover, to the best of our knowledge, there is no publicly available dataset for Chinese query-based document summarization. In this paper, we present QBSUM, a high-quality large-scale dataset consisting of 49,000+ data samples for the task of Chinese query-based document summarization. We also propose multiple unsupervised and supervised solutions to the task and demonstrate their high-speed inference and superior performance via both offline experiments and online A/B tests. The QBSUM dataset is released in order to facilitate future advancement of this research field.

READ FULL TEXT
POST COMMENT

Comments

There are no comments yet.

Authors

page 6

10/23/2020

AQuaMuSe: Automatically Generating Datasets for Query-Based Multi-Document Summarization

Summarization is the task of compressing source document(s) into coheren...
04/26/2017

Diversity driven Attention Model for Query-based Abstractive Summarization

Abstractive summarization aims to generate a shorter version of the docu...
11/14/2017

DuReader: a Chinese Machine Reading Comprehension Dataset from Real-world Applications

In this paper, we introduce DuReader, a new large-scale, open-domain Chi...
10/27/2020

Multi-XScience: A Large-scale Dataset for Extreme Multi-document Summarization of Scientific Articles

Multi-document summarization is a challenging task for which there exist...
05/08/2021

D2S: Document-to-Slide Generation Via Query-Based Text Summarization

Presentations are critical for communication in all areas of our lives, ...
05/19/2011

A Multiple-Choice Test Recognition System based on the Gamera Framework

This article describes JECT-OMR, a system that analyzes digital images r...
07/19/2019

What is this Article about? Extreme Summarization with Topic-aware Convolutional Neural Networks

We introduce 'extreme summarization', a new single-document summarizatio...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.