Tuning the perplexity for and computing sampling-based t-SNE embeddings

08/29/2023
by   Martin Skrodzki, et al.
0

Widely used pipelines for the analysis of high-dimensional data utilize two-dimensional visualizations. These are created, e.g., via t-distributed stochastic neighbor embedding (t-SNE). When it comes to large data sets, applying these visualization techniques creates suboptimal embeddings, as the hyperparameters are not suitable for large data. Cranking up these parameters usually does not work as the computations become too expensive for practical workflows. In this paper, we argue that a sampling-based embedding approach can circumvent these problems. We show that hyperparameters must be chosen carefully, depending on the sampling rate and the intended final embedding. Further, we show how this approach speeds up the computation and increases the quality of the embeddings.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/05/2022

Opening the black-box of Neighbor Embedding with Hotelling's T2 statistic and Q-residuals

In contrast to classical techniques for exploratory analysis of high-dim...
research
06/10/2011

A Computational Framework for Nonlinear Dimensionality Reduction of Large Data Sets: The Exploratory Inspection Machine (XIM)

In this paper, we present a novel computational framework for nonlinear ...
research
02/24/2022

SQuadMDS: a lean Stochastic Quartet MDS improving global structure preservation in neighbor embedding like t-SNE and UMAP

Multidimensional scaling is a statistical process that aims to embed hig...
research
11/26/2019

FCA2VEC: Embedding Techniques for Formal Concept Analysis

Embedding large and high dimensional data into low dimensional vector sp...
research
07/01/2022

Assessing the Effects of Hyperparameters on Knowledge Graph Embedding Quality

Embedding knowledge graphs into low-dimensional spaces is a popular meth...
research
01/25/2023

Deep Generative Neural Embeddings for High Dimensional Data Visualization

We propose a visualization technique that utilizes neural network embedd...
research
12/05/2017

Optimal Fast Johnson-Lindenstrauss Embeddings for Large Data Sets

We introduce a new fast construction of a Johnson-Lindenstrauss matrix b...

Please sign up or login with your details

Forgot password? Click here to reset