WikiHow: A Large Scale Text Summarization Dataset

10/18/2018
by   Mahnaz Koupaee, et al.
1

Sequence-to-sequence models have recently gained the state of the art performance in summarization. However, not too many large-scale high-quality datasets are available and almost all the available ones are mainly news articles with specific writing style. Moreover, abstractive human-style systems involving description of the content at a deeper level require data with higher levels of abstraction. In this paper, we present WikiHow, a dataset of more than 230,000 article and summary pairs extracted and constructed from an online knowledge base written by different human authors. The articles span a wide range of topics and therefore represent high diversity styles. We evaluate the performance of the existing methods on WikiHow to present its challenges and set some baselines to further improve it.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/24/2018

Data-driven Summarization of Scientific Articles

Data-driven approaches to sequence-to-sequence modelling have been succe...
research
04/30/2018

Newsroom: A Dataset of 1.3 Million Summaries with Diverse Extractive Strategies

We present NEWSROOM, a summarization dataset of 1.3 million articles and...
research
08/13/2020

Cognitive Representation Learning of Self-Media Online Article Quality

The automatic quality assessment of self-media online articles is an urg...
research
06/18/2021

Subjective Bias in Abstractive Summarization

Due to the subjectivity of the summarization, it is a good practice to h...
research
10/07/2020

WikiLingua: A New Benchmark Dataset for Cross-Lingual Abstractive Summarization

We introduce WikiLingua, a large-scale, multilingual dataset for the eva...
research
10/04/2021

TLDR9+: A Large Scale Resource for Extreme Summarization of Social Media Posts

Recent models in developing summarization systems consist of millions of...
research
08/27/2018

Don't Give Me the Details, Just the Summary! Topic-Aware Convolutional Neural Networks for Extreme Summarization

We introduce extreme summarization, a new single-document summarization ...

Please sign up or login with your details

Forgot password? Click here to reset