Semantic Sensitive TF-IDF to Determine Word Relevance in Documents

01/06/2020
by   Amir Jalilifard, et al.
0

Keyword extraction has received an increasing attention as an important research topic which can lead to have advancements in diverse applications such as document context categorization, text indexing and document classification. In this paper we propose STF-IDF, a novel semantic method based on TF-IDF, for scoring word importance of informal documents in a corpus. A set of nearly four million documents from health-care social media was collected and was trained in order to draw semantic model and to find the word embeddings. Then, the features of semantic space were utilized to rearrange the original TF-IDF scores through an iterative solution so as to improve the moderate performance of this algorithm on informal texts. After testing the proposed method with 200 randomly chosen documents, our method managed to decrease the TF-IDF mean error rate by a factor of 50 27.2

READ FULL TEXT
research
07/25/2017

From Image to Text Classification: A Novel Approach based on Clustering Word Embeddings

In this paper, we propose a novel approach for text classification based...
research
01/10/2020

Inductive Document Network Embedding with Topic-Word Attention

Document network embedding aims at learning representations for a struct...
research
11/24/2018

Novelty and Coverage in context-based information filtering

We present a collection of algorithms to filter a stream of documents in...
research
07/27/2018

Clustering Prominent People and Organizations in Topic-Specific Text Corpora

Named entities in text documents are the names of people, organization, ...
research
12/23/2016

"What is Relevant in a Text Document?": An Interpretable Machine Learning Approach

Text documents can be described by a number of abstract concepts such as...
research
10/16/2016

Term-Class-Max-Support (TCMS): A Simple Text Document Categorization Approach Using Term-Class Relevance Measure

In this paper, a simple text categorization method using term-class rele...
research
12/28/2017

Corpus specificity in LSA and Word2vec: the role of out-of-domain documents

Latent Semantic Analysis (LSA) and Word2vec are some of the most widely ...

Please sign up or login with your details

Forgot password? Click here to reset