Keyphrase Generation Beyond the Boundaries of Title and Abstract

12/13/2021
by   Krishna Garg, et al.
0

Keyphrase generation aims at generating phrases (keyphrases) that best describe a given document. In scholarly domains, current approaches to this task are neural approaches and have largely worked with only the title and abstract of the articles. In this work, we explore whether the integration of additional data from semantically similar articles or from the full text of the given article can be helpful for a neural keyphrase generation model. We discover that adding sentences from the full text particularly in the form of summary of the article can significantly improve the generation of both types of keyphrases that are either present or absent from the title and abstract. The experimental results on the three acclaimed models along with one of the latest transformer models suitable for longer documents, Longformer Encoder-Decoder (LED) validate the observation. We also present a new large-scale scholarly dataset FullTextKP for keyphrase generation, which we use for our experiments. Unlike prior large-scale datasets, FullTextKP includes the full text of the articles alongside title and abstract. We will release the source code to stimulate research on the proposed ideas.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/22/2022

A Large-Scale Dataset for Biomedical Keyphrase Generation

Keyphrase generation is the task consisting in generating a set of words...
research
04/27/2023

Neural Keyphrase Generation: Analysis and Evaluation

Keyphrase generation aims at generating topical phrases from a given tex...
research
10/27/2020

Multi-XScience: A Large-scale Dataset for Extreme Multi-document Summarization of Scientific Articles

Multi-document summarization is a challenging task for which there exist...
research
01/17/2023

Transformer Based Implementation for Automatic Book Summarization

Document Summarization is the procedure of generating a meaningful and c...
research
05/10/2023

Vārta: A Large-Scale Headline-Generation Dataset for Indic Languages

We present Vārta, a large-scale multilingual dataset for headline genera...
research
07/25/2017

Challenges in Data-to-Document Generation

Recent neural models have shown significant progress on the problem of g...
research
12/02/2021

KPDrop: An Approach to Improving Absent Keyphrase Generation

Keyphrase generation is the task of generating phrases (keyphrases) that...

Please sign up or login with your details

Forgot password? Click here to reset