DeepAI AI Chat
Log In Sign Up

A Large-Scale Dataset for Biomedical Keyphrase Generation

by   Mael Houbre, et al.

Keyphrase generation is the task consisting in generating a set of words or phrases that highlight the main topics of a document. There are few datasets for keyphrase generation in the biomedical domain and they do not meet the expectations in terms of size for training generative models. In this paper, we introduce kp-biomed, the first large-scale biomedical keyphrase generation dataset with more than 5M documents collected from PubMed abstracts. We train and release several generative models and conduct a series of experiments showing that using large scale datasets improves significantly the performances for present and absent keyphrase generation. The dataset is available under CC-BY-NC v4.0 license at datasets/taln-ls2n/kpbiomed.


page 1

page 2

page 3

page 4


KPTimes: A Large-Scale Dataset for Keyphrase Generation on News Documents

Keyphrase generation is the task of predicting a set of lexical units th...

Keyphrase Generation Beyond the Boundaries of Title and Abstract

Keyphrase generation aims at generating phrases (keyphrases) that best d...

Biomedical Data-to-Text Generation via Fine-Tuning Transformers

Data-to-text (D2T) generation in the biomedical domain is a promising - ...

LARD: Large-scale Artificial Disfluency Generation

Disfluency detection is a critical task in real-time dialogue systems. H...

UVA Resources for the Biomedical Vocabulary Alignment at Scale in the UMLS Metathesaurus

The construction and maintenance process of the UMLS (Unified Medical La...

KPDrop: An Approach to Improving Absent Keyphrase Generation

Keyphrase generation is the task of generating phrases (keyphrases) that...

Operationalizing Specifications, In Addition to Test Sets for Evaluating Constrained Generative Models

In this work, we present some recommendations on the evaluation of state...