Theme-weighted Ranking of Keywords from Text Documents using Phrase Embeddings

07/16/2018
by   Debanjan Mahata, et al.
0

Keyword extraction is a fundamental task in natural language processing that facilitates mapping of documents to a concise set of representative single and multi-word phrases. Keywords from text documents are primarily extracted using supervised and unsupervised approaches. In this paper, we present an unsupervised technique that uses a combination of theme-weighted personalized PageRank algorithm and neural phrase embeddings for extracting and ranking keywords. We also introduce an efficient way of processing text documents and training phrase embeddings using existing techniques. We share an evaluation dataset derived from an existing dataset that is used for choosing the underlying embedding model. The evaluations for ranked keyword extraction are performed on two benchmark datasets comprising of short abstracts (Inspec), and long scientific papers (SemEval 2010), and is shown to produce results better than the state-of-the-art systems.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/26/2019

Complex Network based Supervised Keyword Extractor

In this paper, we present a supervised framework for automatic keyword e...
research
06/19/2017

Leveraging web resources for keyword assignment to short text documents

Assigning relevant keywords to documents is very important for efficient...
research
06/09/2021

Phraseformer: Multimodal Key-phrase Extraction using Transformer and Graph Embedding

Background: Keyword extraction is a popular research topic in the field ...
research
01/25/2021

Unsupervised Key-phrase Extraction and Clustering for Classification Scheme in Scientific Publications

Several methods have been explored for automating parts of Systematic Ma...
research
08/17/2023

Enhancing Phrase Representation by Information Bottleneck Guided Text Diffusion Process for Keyphrase Extraction

Keyphrase extraction (KPE) is an important task in Natural Language Proc...
research
01/26/2021

pdfPapers: shell-script utilities for frequency-based multi-word phrase extraction from PDF documents

Biomedical research is intensive in processing information in the previo...
research
08/09/2019

Using Semantic Role Knowledge for Relevance Ranking of Key Phrases in Documents: An Unsupervised Approach

In this paper, we investigate the integration of sentence position and s...

Please sign up or login with your details

Forgot password? Click here to reset