TF-IDFC-RF: A Novel Supervised Term Weighting Scheme

03/12/2020
by   Flavio Carvalho, et al.
0

Sentiment Analysis is a branch of Affective Computing usually considered a binary classification task. In this line of reasoning, Sentiment Analysis can be applied in several contexts to classify the attitude expressed in text samples, for example, movie reviews, sarcasm, among others. A common approach to represent text samples is the use of the Vector Space Model to compute numerical feature vectors consisting of the weight of terms. The most popular term weighting scheme is TF-IDF (Term Frequency - Inverse Document Frequency). It is an Unsupervised Weighting Scheme (UWS) since it does not consider the class information in the weighting of terms. Apart from that, there are Supervised Weighting Schemes (SWS), which consider the class information on term weighting calculation. Several SWS have been recently proposed, demonstrating better results than TF-IDF. In this scenario, this work presents a comparative study on different term weighting schemes and proposes a novel supervised term weighting scheme, named as TF-IDFC-RF (Term Frequency - Inverse Document Frequency in Class - Relevance Frequency). The effectiveness of TF-IDFC-RF is validated with SVM (Support Vector Machine) and NB (Naive Bayes) classifiers on four commonly used Sentiment Analysis datasets. TF-IDFC-RF outperforms all other weighting schemes and achieves F1 results of more than 99.9

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/13/2010

Inverse-Category-Frequency based supervised term weighting scheme for text categorization

Term weighting schemes often dominate the performance of many classifier...
research
10/10/2016

Supervised Term Weighting Metrics for Sentiment Analysis in Short Text

Term weighting metrics assign weights to terms in order to discriminate ...
research
02/17/2019

A Comparative Study of Feature Selection Methods for Dialectal Arabic Sentiment Classification Using Support Vector Machine

Unlike other languages, the Arabic language has a morphological complexi...
research
03/28/2019

Learning to Weight for Text Classification

In information retrieval (IR) and related tasks, term weighting approach...
research
11/13/2017

Targeted Advertising Based on Browsing History

Audience interest, demography, purchase behavior and other possible clas...
research
08/25/2016

A Novel Term_Class Relevance Measure for Text Categorization

In this paper, we introduce a new measure called Term_Class relevance to...
research
04/25/2023

A Novel Dual of Shannon Information and Weighting Scheme

Shannon Information theory has achieved great success in not only commun...

Please sign up or login with your details

Forgot password? Click here to reset