Variational Information Maximization for Feature Selection

06/09/2016
by   Shuyang Gao, et al.
0

Feature selection is one of the most fundamental problems in machine learning. An extensive body of work on information-theoretic feature selection exists which is based on maximizing mutual information between subsets of features and class labels. Practical methods are forced to rely on approximations due to the difficulty of estimating mutual information. We demonstrate that approximations made by existing methods are based on unrealistic assumptions. We formulate a more flexible and general class of assumptions based on variational distributions and use them to tractably generate lower bounds for mutual information. These bounds define a novel information-theoretic framework for feature selection, which we prove to be optimal under tree graphical models with proper choice of variational distributions. Our experiments demonstrate that the proposed method strongly outperforms existing information-theoretic feature selection approaches.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/24/2015

A Review of Feature Selection Methods Based on Mutual Information

In this work we present a review of the state of the art of information ...
research
06/23/2017

Efficient Approximate Solutions to Mutual Information Based Global Feature Selection

Mutual Information (MI) is often used for feature selection when develop...
research
11/29/2018

Simple stopping criteria for information theoretic feature selection

Information theoretic feature selection aims to select a smallest featur...
research
02/08/2021

Mutual Information of Neural Network Initialisations: Mean Field Approximations

The ability to train randomly initialised deep neural networks is known ...
research
03/28/2019

Information Theoretic Feature Transformation Learning for Brain Interfaces

Objective: A variety of pattern analysis techniques for model training i...
research
06/05/2023

Estimating Conditional Mutual Information for Dynamic Feature Selection

Dynamic feature selection, where we sequentially query features to make ...
research
05/01/2021

Stochastic Mutual Information Gradient Estimation for Dimensionality Reduction Networks

Feature ranking and selection is a widely used approach in various appli...

Please sign up or login with your details

Forgot password? Click here to reset