An Information-theoretic Approach to Unsupervised Feature Selection for High-Dimensional Data

10/08/2019
by   Shao-Lun Huang, et al.
0

In this paper, we propose an information-theoretic approach to design the functional representations to extract the hidden common structure shared by a set of random variables. The main idea is to measure the common information between the random variables by Watanabe's total correlation, and then find the hidden attributes of these random variables such that the common information is reduced the most given these attributes. We show that these attributes can be characterized by an exponential family specified by the eigen-decomposition of some pairwise joint distribution matrix. Then, we adopt the log-likelihood functions for estimating these attributes as the desired functional representations of the random variables, and show that such representations are informative to describe the common structure. Moreover, we design both the multivariate alternating conditional expectation (MACE) algorithm to compute the proposed functional representations for discrete data, and a novel neural network training approach for continuous or high-dimensional data. Furthermore, we show that our approach has deep connections to existing techniques, such as Hirschfeld-Gebelein-Rényi (HGR) maximal correlation, linear principal component analysis (PCA), and consistent functional map, which establishes insightful connections between information theory and machine learning. Finally, the performances of our algorithms are validated by numerical simulations.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/11/2023

A Third Information-Theoretic Approach to Finite de Finetti Theorems

A new finite form of de Finetti's representation theorem is established ...
research
01/09/2020

D-GCCA: Decomposition-based Generalized Canonical Correlation Analysis for Multiple High-dimensional Datasets

Modern biomedical studies often collect multiple types of high-dimension...
research
06/04/2014

Discovering Structure in High-Dimensional Data Through Correlation Explanation

We introduce a method to learn a hierarchy of successively more abstract...
research
02/20/2018

Geometry of Discrete Copulas

Multivariate discrete distributions are fundamental to modeling. Discret...
research
06/15/2016

Network Maximal Correlation

We introduce Network Maximal Correlation (NMC) as a multivariate measure...
research
04/20/2012

Modeling, dependence, classification, united statistical science, many cultures

Breiman (2001) proposed to statisticians awareness of two cultures: 1. P...
research
05/22/2020

Information-Theoretic Limits for the Matrix Tensor Product

This paper studies a high-dimensional inference problem involving the ma...

Please sign up or login with your details

Forgot password? Click here to reset