Unsupervised extraction of local and global keywords from a single text

07/26/2023
by   Lida Aleksanyan, et al.
0

We propose an unsupervised, corpus-independent method to extract keywords from a single text. It is based on the spatial distribution of words and the response of this distribution to a random permutation of words. As compared to existing methods (such as e.g. YAKE) our method has three advantages. First, it is significantly more effective at extracting keywords from long texts. Second, it allows inference of two types of keywords: local and global. Third, it uncovers basic themes in texts. Additionally, our method is language-independent and applies to short texts. The results are obtained via human annotators with previous knowledge of texts from our database of classical literary works (the agreement between annotators is from moderate to substantial). Our results are supported via human-independent arguments based on the average length of extracted content words and on the average number of nouns in extracted words. We discuss relations of keywords with higher-order textual features and reveal a connection between keywords and chapter divisions.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/01/2019

Semi-automatic System for Title Construction

In this paper, we propose a semi-automatic system for title construction...
research
06/18/2019

Query Generation for Patent Retrieval with Keyword Extraction based on Syntactic Features

This paper describes a new method to extract relevant keywords from pate...
research
08/21/2020

Keywords lie far from the mean of all words in local vector space

Keyword extraction is an important document process that aims at finding...
research
04/17/2021

Customized determination of stop words using Random Matrix Theory approach

The distances between words calculated in word units are studied and com...
research
03/23/2022

Multi-Mosaics: Corpus Summarizing and Exploration using multiple Concordance Mosaic Visualisations

Researchers working in areas such as lexicography, translation studies, ...
research
04/11/2023

Mathematical and Linguistic Characterization of Orhan Pamuk's Nobel Works

In this study, Nobel Laureate Orhan Pamuk's works are chosen as examples...
research
11/08/2018

Quantum Semantic Correlations in Hate and Non-Hate Speeches

This paper aims to apply the notions of quantum geometry and correlation...

Please sign up or login with your details

Forgot password? Click here to reset