On Interpretable Approaches to Cluster, Classify and Represent Multi-Subspace Data via Minimum Lossy Coding Length based on Rate-Distortion Theory

02/21/2023
by   Kai-Liang Lu, et al.
0

To cluster, classify and represent are three fundamental objectives of learning from high-dimensional data with intrinsic structure. To this end, this paper introduces three interpretable approaches, i.e., segmentation (clustering) via the Minimum Lossy Coding Length criterion, classification via the Minimum Incremental Coding Length criterion and representation via the Maximal Coding Rate Reduction criterion. These are derived based on the lossy data coding and compression framework from the principle of rate distortion in information theory. These algorithms are particularly suitable for dealing with finite-sample data (allowed to be sparse or almost degenerate) of mixed Gaussian distributions or subspaces. The theoretical value and attractive features of these methods are summarized by comparison with other learning methods or evaluation criteria. This summary note aims to provide a theoretical guide to researchers (also engineers) interested in understanding 'white-box' machine (deep) learning methods.

READ FULL TEXT
research
06/15/2020

Learning Diverse and Discriminative Representations via the Principle of Maximal Coding Rate Reduction

To learn intrinsic low-dimensional structures from high-dimensional data...
research
11/30/2018

Rate-Distortion-Perception Tradeoff of Variable-Length Source Coding for General Information Sources

Blau and Michaeli recently introduced a novel concept for inverse proble...
research
02/14/2022

An Introduction to Neural Data Compression

Neural compression is the application of neural networks and other machi...
research
05/26/2023

Rate-Distortion Theory in Coding for Machines and its Application

Recent years have seen a tremendous growth in both the capability and po...
research
03/15/2021

Data Discovery Using Lossless Compression-Based Sparse Representation

Sparse representation has been widely used in data compression, signal a...
research
04/10/2022

On the Cleaning Lemma of Quantum Coding Theory

The term "Cleaning Lemma" refers to a family of similar propositions tha...
research
03/31/2022

Efficient Maximal Coding Rate Reduction by Variational Forms

The principle of Maximal Coding Rate Reduction (MCR^2) has recently been...

Please sign up or login with your details

Forgot password? Click here to reset