Leveraging web resources for keyword assignment to short text documents

06/19/2017
by   Ayush Singhal, et al.
0

Assigning relevant keywords to documents is very important for efficient retrieval, clustering and management of the documents. Especially with the web corpus deluged with digital documents, automation of this task is of prime importance. Keyword assignment is a broad topic of research which refers to tagging of document with keywords, key-phrases or topics. For text documents, the keyword assignment techniques have been developed under two sub-topics: automatic keyword extraction (AKE) and automatic key-phrase abstraction. However, the approaches developed in the literature for full text documents cannot be used to assign keywords to low text content documents like twitter feeds, news clips, product reviews or even short scholarly text. In this work, we point out several practical challenges encountered in tagging such low text content documents. As a solution to these challenges, we show that the proposed approaches which leverage knowledge from several open source web resources enhance the quality of the tags (keywords) assigned to the low text content documents. The performance of the proposed approach is tested on real world corpus consisting of scholarly documents with text content ranging from only the text in the title of the document (5-10 words) to the summary text/abstract (100- 150 words). We find that the proposed approach not just improves the accuracy of keyword assignment but offer a computationally efficient solution which can be used in real world applications.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/16/2018

Theme-weighted Ranking of Keywords from Text Documents using Phrase Embeddings

Keyword extraction is a fundamental task in natural language processing ...
research
06/20/2016

Comparing the hierarchy of keywords in on-line news portals

The tagging of on-line content with informative keywords is a widespread...
research
10/12/2022

Lbl2Vec: An Embedding-Based Approach for Unsupervised Document Retrieval on Predefined Topics

In this paper, we consider the task of retrieving documents with predefi...
research
11/30/2020

Diversifying Relevant Phrases

Diverse keyword suggestions for a given landing page or matching queries...
research
04/16/2020

An approach based on Combination of Features for automatic news retrieval

Nowadays, according to the increasingly increasing information, the impo...
research
05/03/2022

A Comparison of Approaches for Imbalanced Classification Problems in the Context of Retrieving Relevant Documents for an Analysis

One of the first steps in many text-based social science studies is to r...
research
04/20/2018

Benchmarking Top-K Keyword and Top-K Document Processing with T^2K^2 and T^2K^2D^2

Top-k keyword and top-k document extraction are very popular text analys...

Please sign up or login with your details

Forgot password? Click here to reset