Opening the black-box of Neighbor Embedding with Hotelling's T2 statistic and Q-residuals

09/05/2022
by   Roman Josef Rainer, et al.
3

In contrast to classical techniques for exploratory analysis of high-dimensional data sets, such as principal component analysis (PCA), neighbor embedding (NE) techniques tend to better preserve the local structure/topology of high-dimensional data. However, the ability to preserve local structure comes at the expense of interpretability: Techniques such as t-Distributed Stochastic Neighbor Embedding (t-SNE) or Uniform Manifold Approximation and Projection (UMAP) do not give insights into which input variables underlie the topological (cluster) structure seen in the corresponding embedding. We here propose different "tricks" from the chemometrics field based on PCA, Q-residuals and Hotelling's T2 contributions in combination with novel visualization approaches to derive local and global explanations of neighbor embedding. We show how our approach is capable of identifying discriminatory features between groups of data points that remain unnoticed when exploring NEs using standard univariate or multivariate approaches.

READ FULL TEXT

page 3

page 8

page 9

page 10

page 11

page 12

page 13

page 14

research
04/12/2021

Deep Recursive Embedding for High-Dimensional Data

t-distributed stochastic neighbor embedding (t-SNE) is a well-establishe...
research
02/18/2021

Joint Characterization of Multiscale Information in High Dimensional Data

High dimensional data can contain multiple scales of variance. Analysis ...
research
10/12/2015

Towards Meaningful Maps of Polish Case Law

In this work, we analyze the utility of two dimensional document maps fo...
research
08/29/2023

Tuning the perplexity for and computing sampling-based t-SNE embeddings

Widely used pipelines for the analysis of high-dimensional data utilize ...
research
11/04/2019

Visualization of Multi-Objective Switched Reluctance Machine Optimization at Multiple Operating Conditions with t-SNE

The optimization of electric machines at multiple operating points is cr...
research
11/03/2018

Stochastic Neighbor Embedding under f-divergences

The t-distributed Stochastic Neighbor Embedding (t-SNE) is a powerful an...
research
05/28/2018

Linear tSNE optimization for the Web

The t-distributed Stochastic Neighbor Embedding (tSNE) algorithm has bec...

Please sign up or login with your details

Forgot password? Click here to reset