Robust Feature Selection by Mutual Information Distributions

08/07/2014
by   Marco Zaffalon, et al.
0

Mutual information is widely used in artificial intelligence, in a descriptive way, to measure the stochastic dependence of discrete random variables. In order to address questions such as the reliability of the empirical value, one must consider sample-to-population inferential approaches. This paper deals with the distribution of mutual information, as obtained in a Bayesian framework by a second-order Dirichlet prior distribution. The exact analytical expression for the mean and an analytical approximation of the variance are reported. Asymptotic approximations of the distribution are proposed. The results are applied to the problem of selecting features for incremental learning and classification of the naive Bayes classifier. A fast, newly defined method is shown to outperform the traditional approach based on empirical mutual information on a number of real data sets. Finally, a theoretical development is reported that allows one to efficiently extend the above methods to incomplete samples in an easy and effective way.

READ FULL TEXT

page 1

page 4

research
05/04/2020

Renormalized Mutual Information for Extraction of Continuous Features

We derive a well-defined renormalized version of mutual information that...
research
11/25/2017

Feature Selection Facilitates Learning Mixtures of Discrete Product Distributions

Feature selection can facilitate the learning of mixtures of discrete ra...
research
06/27/2012

Ranking by Dependence - A Fair Criteria

Estimating the dependences between random variables, and ranking them ac...
research
09/25/2020

Measuring Dependencies of Order Statistics: An Information Theoretic Perspective

Consider a random sample X_1 , X_2 , ..., X_n drawn independently and id...
research
01/21/2015

A Bayesian alternative to mutual information for the hierarchical clustering of dependent random variables

The use of mutual information as a similarity measure in agglomerative h...
research
11/11/2017

Feature Selection based on the Local Lift Dependence Scale

This paper uses a classical approach to feature selection: minimization ...
research
09/14/2023

Causal Entropy and Information Gain for Measuring Causal Control

Artificial intelligence models and methods commonly lack causal interpre...

Please sign up or login with your details

Forgot password? Click here to reset