OASum: Large-Scale Open Domain Aspect-based Summarization

12/19/2022
by   Xianjun Yang, et al.
0

Aspect or query-based summarization has recently caught more attention, as it can generate differentiated summaries based on users' interests. However, the current dataset for aspect or query-based summarization either focuses on specific domains, contains relatively small-scale instances, or includes only a few aspect types. Such limitations hinder further explorations in this direction. In this work, we take advantage of crowd-sourcing knowledge on Wikipedia.org and automatically create a high-quality, large-scale open-domain aspect-based summarization dataset named OASum, which contains more than 3.7 million instances with around 1 million different aspects on 2 million Wikipedia pages. We provide benchmark results on OAsum and demonstrate its ability for diverse aspect-based summarization generation. To overcome the data scarcity problem on specific domains, we also perform zero-shot, few-shot, and fine-tuning on seven downstream datasets. Specifically, zero/few-shot and fine-tuning results show that the model pre-trained on our corpus demonstrates a strong aspect or query-focused generation ability compared with the backbone model. Our dataset and pre-trained checkpoints are publicly available.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/08/2019

Transforming Wikipedia into Augmented Data for Query-Focused Summarization

The manual construction of a query-focused summarization corpus is costl...
research
11/16/2020

WikiAsp: A Dataset for Multi-domain Aspect-based Summarization

Aspect-based summarization is the task of generating focused summaries b...
research
05/12/2022

CiteSum: Citation Text-guided Scientific Extreme Summarization and Low-resource Domain Adaptation

Scientific extreme summarization (TLDR) aims to form ultra-short summari...
research
11/29/2022

Few-shot Query-Focused Summarization with Prefix-Merging

Query-focused summarization has been considered as an important extensio...
research
05/04/2022

Efficient Few-Shot Fine-Tuning for Opinion Summarization

Abstractive summarization models are typically pre-trained on large amou...
research
12/29/2020

Generating Wikipedia Article Sections from Diverse Data Sources

Datasets for data-to-text generation typically focus either on multi-dom...
research
10/14/2020

Summarizing Text on Any Aspects: A Knowledge-Informed Weakly-Supervised Approach

Given a document and a target aspect (e.g., a topic of interest), aspect...

Please sign up or login with your details

Forgot password? Click here to reset