KPTimes: A Large-Scale Dataset for Keyphrase Generation on News Documents

11/28/2019
by   Ygor Gallina, et al.
0

Keyphrase generation is the task of predicting a set of lexical units that conveys the main content of a source text. Existing datasets for keyphrase generation are only readily available for the scholarly domain and include non-expert annotations. In this paper we present KPTimes, a large-scale dataset of news texts paired with editor-curated keyphrases. Exploring the dataset, we show how editors tag documents, and how their annotations differ from those found in existing datasets. We also train and evaluate state-of-the-art neural keyphrase generation models on KPTimes to gain insights on how well they perform on the news domain. The dataset is available online at https://github.com/ygorg/KPTimes .

READ FULL TEXT
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

10/04/2019

Template-free Data-to-Text Generation of Finnish Sports News

News articles such as sports game reports are often thought to closely f...
10/21/2021

CNewSum: A Large-scale Chinese News Summarization Dataset with Human-annotated Adequacy and Deducibility Level

Automatic text summarization aims to produce a brief but crucial summary...
04/17/2021

Moving on from OntoNotes: Coreference Resolution Model Transfer

Academic neural models for coreference resolution are typically trained ...
06/09/2017

Overview of the NLPCC 2017 Shared Task: Chinese News Headline Categorization

In this paper, we give an overview for the shared task at the CCF Confer...
06/23/2021

Open Images V5 Text Annotation and Yet Another Mask Text Spotter

A large scale human-labeled dataset plays an important role in creating ...
10/27/2019

Memeify: A Large-Scale Meme Generation System

Interest in the research areas related to meme propagation and generatio...
08/16/2020

OpenFraming: We brought the ML; you bring the data. Interact with your data and discover its frames

When journalists cover a news story, they can cover the story from multi...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.