Open4Business(O4B): An Open Access Dataset for Summarizing Business Documents

11/15/2020
by   Amanpreet Singh, et al.
13

A major challenge in fine-tuning deep learning models for automatic summarization is the need for large domain specific datasets. One of the barriers to curating such data from resources like online publications is navigating the license regulations applicable to their re-use, especially for commercial purposes. As a result, despite the availability of several business journals there are no large scale datasets for summarizing business documents. In this work, we introduce Open4Business(O4B),a dataset of 17,458 open access business articles and their reference summaries. The dataset introduces a new challenge for summarization in the business domain, requiring highly abstractive and more concise summaries as compared to other existing datasets. Additionally, we evaluate existing models on it and consequently show that models trained on O4B and a 7x larger non-open access dataset achieve comparable performance on summarization. We release the dataset, along with the code which can be leveraged to similarly gather data for multiple domains.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/10/2019

BIGPATENT: A Large-Scale Dataset for Abstractive and Coherent Summarization

Most existing text summarization datasets are compiled from the news dom...
research
02/17/2020

GameWikiSum: a Novel Large Multi-Document Summarization Dataset

Today's research progress in the field of multi-document summarization i...
research
10/12/2018

Unsupervised Neural Multi-document Abstractive Summarization

Abstractive summarization has been studied using neural sequence transdu...
research
06/09/2022

CLTS+: A New Chinese Long Text Summarization Dataset with Abstractive Summaries

The abstractive methods lack of creative ability is particularly a probl...
research
11/16/2020

WikiAsp: A Dataset for Multi-domain Aspect-based Summarization

Aspect-based summarization is the task of generating focused summaries b...
research
05/26/2023

Domain Aligned Prefix Averaging for Domain Generalization in Abstractive Summarization

Domain generalization is hitherto an underexplored area applied in abstr...
research
07/24/2023

Guidance in Radiology Report Summarization: An Empirical Evaluation and Error Analysis

Automatically summarizing radiology reports into a concise impression ca...

Please sign up or login with your details

Forgot password? Click here to reset