Query-Based Keyphrase Extraction from Long Documents

05/11/2022
by   Martin Docekal, et al.
0

Transformer-based architectures in natural language processing force input size limits that can be problematic when long documents need to be processed. This paper overcomes this issue for keyphrase extraction by chunking the long documents while keeping a global context as a query defining the topic for which relevant keyphrases should be extracted. The developed system employs a pre-trained BERT model and adapts it to estimate the probability that a given text span forms a keyphrase. We experimented using various context sizes on two popular datasets, Inspec and SemEval, and a large novel dataset. The presented results show that a shorter context with a query overcomes a longer one without the query on long documents.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/02/2021

A Span Extraction Approach for Information Extraction on Visually-Rich Documents

Information extraction (IE) from visually-rich documents (VRDs) has achi...
research
09/17/2019

Extractive Summarization of Long Documents by Combining Global and Local Context

In this paper, we propose a novel neural single document extractive summ...
research
11/18/2021

The Power of Selecting Key Blocks with Local Pre-ranking for Long Document Information Retrieval

On a wide range of natural language processing and information retrieval...
research
09/10/2021

Query-driven Segment Selection for Ranking Long Documents

Transformer-based rankers have shown state-of-the-art performance. Howev...
research
03/29/2022

LDKP: A Dataset for Identifying Keyphrases from Long Scientific Documents

Identifying keyphrases (KPs) from text documents is a fundamental task i...
research
07/04/2022

Understanding Performance of Long-Document Ranking Models through Comprehensive Evaluation and Leaderboarding

We carry out a comprehensive evaluation of 13 recent models for ranking ...
research
12/25/2021

CABACE: Injecting Character Sequence Information and Domain Knowledge for Enhanced Acronym and Long-Form Extraction

Acronyms and long-forms are commonly found in research documents, more s...

Please sign up or login with your details

Forgot password? Click here to reset