A Rigorous Information-Theoretic Definition of Redundancy and Relevancy in Feature Selection Based on (Partial) Information Decomposition

05/10/2021
by   Patricia Wollstadt, et al.
0

Selecting a minimal feature set that is maximally informative about a target variable is a central task in machine learning and statistics. Information theory provides a powerful framework for formulating feature selection algorithms – yet, a rigorous, information-theoretic definition of feature relevancy, which accounts for feature interactions such as redundant and synergistic contributions, is still missing. We argue that this lack is inherent to classical information theory which does not provide measures to decompose the information a set of variables provides about a target into unique, redundant, and synergistic contributions. Such a decomposition has been introduced only recently by the partial information decomposition (PID) framework. Using PID, we clarify why feature selection is a conceptually difficult problem when approached using information theory and provide a novel definition of feature relevancy and redundancy in PID terms. From this definition, we show that the conditional mutual information (CMI) maximizes relevancy while minimizing redundancy and propose an iterative, CMI-based algorithm for practical feature selection. We demonstrate the power of our CMI-based algorithm in comparison to the unconditional mutual information on benchmark examples and provide corresponding PID estimates to highlight how PID allows to quantify information contribution of features and their interactions in feature-selection problems.

READ FULL TEXT
research
12/10/2021

Interaction-Aware Sensitivity Analysis for Aerodynamic Optimization Results using Information Theory

An important issue during an engineering design process is to develop an...
research
10/06/2012

Feature Selection via L1-Penalized Squared-Loss Mutual Information

Feature selection is a technique to screen out less important features. ...
research
01/27/2020

Feature selection in machine learning: Rényi min-entropy vs Shannon entropy

Feature selection, in the context of machine learning, is the process of...
research
10/30/2020

Information-theoretic Feature Selection via Tensor Decomposition and Submodularity

Feature selection by maximizing high-order mutual information between th...
research
10/22/2019

Orthogonal variance decomposition based feature selection

Existing feature selection methods fail to properly account for interact...
research
07/26/2021

Feature Synergy, Redundancy, and Independence in Global Model Explanations using SHAP Vector Decomposition

We offer a new formalism for global explanations of pairwise feature dep...

Please sign up or login with your details

Forgot password? Click here to reset