Back to the Basics: A Quantitative Analysis of Statistical and Graph-Based Term Weighting Schemes for Keyword Extraction

04/16/2021
by   Asahi Ushio, et al.
0

Term weighting schemes are widely used in Natural Language Processing and Information Retrieval. In particular, term weighting is the basis for keyword extraction. However, there are relatively few evaluation studies that shed light about the strengths and shortcomings of each weighting scheme. In fact, in most cases researchers and practitioners resort to the well-known tf-idf as default, despite the existence of other suitable alternatives, including graph-based models. In this paper, we perform an exhaustive and large-scale empirical comparison of both statistical and graph-based term weighting methods in the context of keyword extraction. Our analysis reveals some interesting findings such as the advantages of the less-known lexical specificity with respect to tf-idf, or the qualitative differences between statistical and graph-based methods. Finally, based on our findings we discuss and devise some suggestions for practitioners. We release our code at https://github.com/asahi417/kex .

READ FULL TEXT
research
11/27/2018

sCAKE: Semantic Connectivity Aware Keyword Extraction

Keyword Extraction is an important task in several text analysis endeavo...
research
08/15/2022

Retrieval-efficiency trade-off of Unsupervised Keyword Extraction

Efficiently identifying keyphrases that represent a given document is a ...
research
12/13/2010

Inverse-Category-Frequency based supervised term weighting scheme for text categorization

Term weighting schemes often dominate the performance of many classifier...
research
09/20/2019

Dependency-based Text Graphs for Keyphrase and Summary Extraction with Applications to Interactive Content Retrieval

We build a bridge between neural network-based machine learning and grap...
research
06/06/2012

Feature Weighting for Improving Document Image Retrieval System Performance

Feature weighting is a technique used to approximate the optimal degree ...
research
04/20/2018

Benchmarking Top-K Keyword and Top-K Document Processing with T^2K^2 and T^2K^2D^2

Top-k keyword and top-k document extraction are very popular text analys...
research
02/07/2023

KENGIC: KEyword-driven and N-Gram Graph based Image Captioning

This paper presents a Keyword-driven and N-gram Graph based approach for...

Please sign up or login with your details

Forgot password? Click here to reset