Term-Class-Max-Support (TCMS): A Simple Text Document Categorization Approach Using Term-Class Relevance Measure

10/16/2016
by   D. S. Guru, et al.
0

In this paper, a simple text categorization method using term-class relevance measures is proposed. Initially, text documents are processed to extract significant terms present in them. For every term extracted from a document, we compute its importance in preserving the content of a class through a novel term-weighting scheme known as Term_Class Relevance (TCR) measure proposed by Guru and Suhil (2015) [1]. In this way, for every term, its relevance for all the classes present in the corpus is computed and stored in the knowledgebase. During testing, the terms present in the test document are extracted and the term-class relevance of each term is obtained from the stored knowledgebase. To achieve quick search of term weights, Btree indexing data structure has been adapted. Finally, the class which receives maximum support in terms of term-class relevance is decided to be the class of the given test document. The proposed method works in logarithmic complexity in testing time and simple to implement when compared to any other text categorization techniques available in literature. The experiments conducted on various benchmarking datasets have revealed that the performance of the proposed method is satisfactory and encouraging.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/25/2016

A Novel Term_Class Relevance Measure for Text Categorization

In this paper, we introduce a new measure called Term_Class relevance to...
research
12/13/2010

Inverse-Category-Frequency based supervised term weighting scheme for text categorization

Term weighting schemes often dominate the performance of many classifier...
research
11/30/2018

Document Structure Measure for Hypernym discovery

Hypernym discovery is the problem of finding terms that have is-a relati...
research
01/06/2015

Arabic Text Categorization Algorithm using Vector Evaluation Method

Text categorization is the process of grouping documents into categories...
research
01/06/2020

Semantic Sensitive TF-IDF to Determine Word Relevance in Documents

Keyword extraction has received an increasing attention as an important ...
research
09/07/2015

Integrate Document Ranking Information into Confidence Measure Calculation for Spoken Term Detection

This paper proposes an algorithm to improve the calculation of confidenc...
research
06/24/2017

Cluster Based Symbolic Representation for Skewed Text Categorization

In this work, a problem associated with imbalanced text corpora is addre...

Please sign up or login with your details

Forgot password? Click here to reset