A Novel Term_Class Relevance Measure for Text Categorization

08/25/2016
by   D. S. Guru, et al.
0

In this paper, we introduce a new measure called Term_Class relevance to compute the relevancy of a term in classifying a document into a particular class. The proposed measure estimates the degree of relevance of a given term, in placing an unlabeled document to be a member of a known class, as a product of Class_Term weight and Class_Term density; where the Class_Term weight is the ratio of the number of documents of the class containing the term to the total number of documents containing the term and the Class_Term density is the relative density of occurrence of the term in the class to the total occurrence of the term in the entire population. Unlike the other existing term weighting schemes such as TF-IDF and its variants, the proposed relevance measure takes into account the degree of relative participation of the term across all documents of the class to the entire population. To demonstrate the significance of the proposed measure experimentation has been conducted on the 20 Newsgroups dataset. Further, the superiority of the novel measure is brought out through a comparative analysis.

READ FULL TEXT

page 1

page 6

page 7

page 8

page 9

research
10/16/2016

Term-Class-Max-Support (TCMS): A Simple Text Document Categorization Approach Using Term-Class Relevance Measure

In this paper, a simple text categorization method using term-class rele...
research
09/07/2015

Integrate Document Ranking Information into Confidence Measure Calculation for Spoken Term Detection

This paper proposes an algorithm to improve the calculation of confidenc...
research
02/09/2019

A new simple and effective measure for bag-of-word inter-document similarity measurement

To measure the similarity of two documents in the bag-of-words (BoW) vec...
research
07/12/2023

Testing different Log Bases For Vector Model Weighting Technique

Information retrieval systems retrieves relevant documents based on a qu...
research
03/12/2020

TF-IDFC-RF: A Novel Supervised Term Weighting Scheme

Sentiment Analysis is a branch of Affective Computing usually considered...
research
11/30/2018

Document Structure Measure for Hypernym discovery

Hypernym discovery is the problem of finding terms that have is-a relati...
research
07/17/2020

Scalable Methods for Calculating Term Co-Occurrence Frequencies

Search techniques make use of elementary information such as term freque...

Please sign up or login with your details

Forgot password? Click here to reset