Supervised Dimensionality Reduction and Visualization using Centroid-encoder

02/27/2020
by   Tomojit Ghosh, et al.
96

Visualizing high-dimensional data is an essential task in Data Science and Machine Learning. The Centroid-Encoder (CE) method is similar to the autoencoder but incorporates label information to keep objects of a class close together in the reduced visualization space. CE exploits nonlinearity and labels to encode high variance in low dimensions while capturing the global structure of the data. We present a detailed analysis of the method using a wide variety of data sets and compare it with other supervised dimension reduction techniques, including NCA, nonlinear NCA, t-distributed NCA, t-distributed MCML, supervised UMAP, supervised PCA, Colored Maximum Variance Unfolding, supervised Isomap, Parametric Embedding, supervised Neighbor Retrieval Visualizer, and Multiple Relational Embedding. We empirically show that centroid-encoder outperforms most of these techniques. We also show that when the data variance is spread across multiple modalities, centroid-encoder extracts a significant amount of information from the data in low dimensional space. This key feature establishes its value to use it as a tool for data visualization.

READ FULL TEXT

page 12

page 13

page 14

page 20

research
12/02/2020

q-SNE: Visualizing Data using q-Gaussian Distributed Stochastic Neighbor Embedding

The dimensionality reduction has been widely introduced to use the high-...
research
10/12/2021

Label scarcity in biomedicine: Data-rich latent factor discovery enhances phenotype prediction

High-quality data accumulation is now becoming ubiquitous in the health ...
research
07/17/2020

Visualizing the Finer Cluster Structure of Large-Scale and High-Dimensional Data

Dimension reduction and visualization of high-dimensional data have beco...
research
09/29/2019

Capacity Preserving Mapping for High-dimensional Data Visualization

We provide a rigorous mathematical treatment to the crowding issue in da...
research
02/18/2021

Joint Characterization of Multiscale Information in High Dimensional Data

High dimensional data can contain multiple scales of variance. Analysis ...
research
05/02/2020

Stochastic Neighbor Embedding of Multimodal Relational Data for Image-Text Simultaneous Visualization

Multimodal relational data analysis has become of increasing importance ...
research
05/24/2019

Conditional t-SNE: Complementary t-SNE embeddings through factoring out prior information

Dimensionality reduction and manifold learning methods such as t-Distrib...

Please sign up or login with your details

Forgot password? Click here to reset