LDKP: A Dataset for Identifying Keyphrases from Long Scientific Documents

03/29/2022
by   Debanjan Mahata, et al.
30

Identifying keyphrases (KPs) from text documents is a fundamental task in natural language processing and information retrieval. Vast majority of the benchmark datasets for this task are from the scientific domain containing only the document title and abstract information. This limits keyphrase extraction (KPE) and keyphrase generation (KPG) algorithms to identify keyphrases from human-written summaries that are often very short (approx 8 sentences). This presents three challenges for real-world applications: human-written summaries are unavailable for most documents, the documents are almost always long, and a high percentage of KPs are directly found beyond the limited context of title and abstract. Therefore, we release two extensive corpora mapping KPs of  1.3M and  100K scientific articles with their fully extracted text and additional metadata including publication venue, year, author, field of study, and citations for facilitating research on this real-world problem.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/02/2022

TSTR: Too Short to Represent, Summarize with Details! Intro-Guided Extended Summary Generation

Many scientific papers such as those in arXiv and PubMed data collection...
research
01/08/2022

Coherence-Based Distributed Document Representation Learning for Scientific Documents

Distributed document representation is one of the basic problems in natu...
research
06/01/2023

Hybrid Long Document Summarization using C2F-FAR and ChatGPT: A Practical Study

Text summarization is a downstream natural language processing (NLP) tas...
research
10/18/2022

Making Science Simple: Corpora for the Lay Summarisation of Scientific Literature

Lay summarisation aims to jointly summarise and simplify a given text, t...
research
11/03/2020

Topic-Centric Unsupervised Multi-Document Summarization of Scientific and News Articles

Recent advances in natural language processing have enabled automation o...
research
04/27/2023

ChatGPT vs State-of-the-Art Models: A Benchmarking Study in Keyphrase Generation Task

Transformer-based language models, including ChatGPT, have demonstrated ...
research
05/11/2022

Query-Based Keyphrase Extraction from Long Documents

Transformer-based architectures in natural language processing force inp...

Please sign up or login with your details

Forgot password? Click here to reset