Improving Performance of Automatic Keyword Extraction (AKE) Methods Using PoS-Tagging and Enhanced Semantic-Awareness

11/09/2022
by   Enes Altuncu, et al.
0

Automatic keyword extraction (AKE) has gained more importance with the increasing amount of digital textual data that modern computing systems process. It has various applications in information retrieval (IR) and natural language processing (NLP), including text summarisation, topic analysis and document indexing. This paper proposes a simple but effective post-processing-based universal approach to improve the performance of any AKE methods, via an enhanced level of semantic-awareness supported by PoS-tagging. To demonstrate the performance of the proposed approach, we considered word types retrieved from a PoS-tagging step and two representative sources of semantic information – specialised terms defined in one or more context-dependent thesauri, and named entities in Wikipedia. The above three steps can be simply added to the end of any AKE methods as part of a post-processor, which simply re-evaluate all candidate keywords following some context-specific and semantic-aware criteria. For five state-of-the-art (SOTA) AKE methods, our experimental results with 17 selected datasets showed that the proposed approach improved their performances both consistently (up to 100% in terms of improved cases) and significantly (between 10.2% and 53.8%, with an average of 25.8%, in terms of F1-score and across all five methods), especially when all the three enhancement steps are used. Our results have profound implications considering the ease to apply our proposed approach to any AKE methods and to further extend it.

READ FULL TEXT
research
06/09/2021

Phraseformer: Multimodal Key-phrase Extraction using Transformer and Graph Embedding

Background: Keyword extraction is a popular research topic in the field ...
research
11/27/2018

sCAKE: Semantic Connectivity Aware Keyword Extraction

Keyword Extraction is an important task in several text analysis endeavo...
research
01/10/2020

Machine Learning Approaches for Amharic Parts-of-speech Tagging

Part-of-speech (POS) tagging is considered as one of the basic but neces...
research
09/13/2021

Keyword Extraction for Improved Document Retrieval in Conversational Search

Recent research has shown that mixed-initiative conversational search, b...
research
06/13/2023

A Cloud-based Machine Learning Pipeline for the Efficient Extraction of Insights from Customer Reviews

The efficiency of natural language processing has improved dramatically ...
research
06/12/2023

SE#PCFG: Semantically Enhanced PCFG for Password Analysis and Cracking

Much research has been done on user-generated textual passwords. Surpris...
research
06/04/2018

An unsupervised and customizable misspelling generator for mining noisy health-related text sources

In this paper, we present a customizable datacentric system that automat...

Please sign up or login with your details

Forgot password? Click here to reset