KPTimes: A Large-Scale Dataset for Keyphrase Generation on News Documents

11/28/2019
by   Ygor Gallina, et al.
0

Keyphrase generation is the task of predicting a set of lexical units that conveys the main content of a source text. Existing datasets for keyphrase generation are only readily available for the scholarly domain and include non-expert annotations. In this paper we present KPTimes, a large-scale dataset of news texts paired with editor-curated keyphrases. Exploring the dataset, we show how editors tag documents, and how their annotations differ from those found in existing datasets. We also train and evaluate state-of-the-art neural keyphrase generation models on KPTimes to gain insights on how well they perform on the news domain. The dataset is available online at https://github.com/ygorg/KPTimes .

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/22/2022

A Large-Scale Dataset for Biomedical Keyphrase Generation

Keyphrase generation is the task consisting in generating a set of words...
research
10/04/2019

Template-free Data-to-Text Generation of Finnish Sports News

News articles such as sports game reports are often thought to closely f...
research
12/31/2022

Towards Proactively Forecasting Sentence-Specific Information Popularity within Online News Documents

Multiple studies have focused on predicting the prospective popularity o...
research
06/09/2017

Overview of the NLPCC 2017 Shared Task: Chinese News Headline Categorization

In this paper, we give an overview for the shared task at the CCF Confer...
research
06/23/2021

Open Images V5 Text Annotation and Yet Another Mask Text Spotter

A large scale human-labeled dataset plays an important role in creating ...
research
07/25/2018

A Novel ILP Framework for Summarizing Content with High Lexical Variety

Summarizing content contributed by individuals can be challenging, becau...
research
08/16/2020

OpenFraming: We brought the ML; you bring the data. Interact with your data and discover its frames

When journalists cover a news story, they can cover the story from multi...

Please sign up or login with your details

Forgot password? Click here to reset