A Novel Dual of Shannon Information and Weighting Scheme

04/25/2023
by   Arthur Jun Zhang, et al.
0

Shannon Information theory has achieved great success in not only communication technology where it was originally developed for but also many other science and engineering fields such as machine learning and artificial intelligence. Inspired by the famous weighting scheme TF-IDF, we discovered that information entropy has a natural dual. We complement the classical Shannon information theory by proposing a novel quantity, namely troenpy. Troenpy measures the certainty, commonness and similarity of the underlying distribution. To demonstrate its usefulness, we propose a troenpy based weighting scheme for document with class labels, namely positive class frequency (PCF). On a collection of public datasets we show the PCF based weighting scheme outperforms the classical TF-IDF and a popular Optimal Transportation based word moving distance algorithm in a kNN setting. We further developed a new odds-ratio type feature, namely Expected Class Information Bias(ECIB), which can be regarded as the expected odds ratio of the information quantity entropy and troenpy. In the experiments we observe that including the new ECIB features and simple binary term features in a simple logistic regression model can further significantly improve the performance. The simple new weighting scheme and ECIB features are very effective and can be computed with linear order complexity.

READ FULL TEXT
research
04/25/2023

A New Information Theory of Certainty for Machine Learning

Claude Shannon coined entropy to quantify the uncertainty of a random di...
research
03/30/2021

A genuinely natural information measure

The theoretical measuring of information was famously initiated by Shann...
research
02/09/2019

A new simple and effective measure for bag-of-word inter-document similarity measurement

To measure the similarity of two documents in the bag-of-words (BoW) vec...
research
11/19/2020

A Theory on AI Uncertainty Based on Rademacher Complexity and Shannon Entropy

In this paper, we present a theoretical discussion on AI deep learning n...
research
03/12/2020

TF-IDFC-RF: A Novel Supervised Term Weighting Scheme

Sentiment Analysis is a branch of Affective Computing usually considered...
research
01/07/2018

Shannon Information Entropy in Heavy-ion Collisions

The general idea of information entropy provided by C.E. Shannon "hangs ...
research
02/17/2020

Large-Scale Evaluation of Shape-Aware Neighborhood Weights Neighborhood Sizes

Point sets arise naturally in many 3D acquisition processes and have div...

Please sign up or login with your details

Forgot password? Click here to reset