Towards a comprehensive visualization of structure in data

11/30/2021
by   Joan Garriga, et al.
0

Dimensional data reduction methods are fundamental to explore and visualize large data sets. Basic requirements for unsupervised data exploration are simplicity, flexibility and scalability. However, current methods show complex parameterizations and strong computational limitations when exploring large data structures across scales. Here, we focus on the t-SNE algorithm and show that a simplified parameter setup with a single control parameter, namely the perplexity, can effectively balance local and global data structure visualization. We also designed a chunk&mix protocol to efficiently parallelize t-SNE and explore data structure across a much wide range of scales than currently available. Our parallel version of the BH-tSNE, namely pt-SNE, converges to good global embedding, comparable to state-of-the-art solutions, though the chunk&mix protocol adds little noise and decreases the accuracy at the local scale. Nonetheless, we show that simple post-processing can efficiently restore local scale visualization, without any loss of precision at the global scales. We expect the same approach to apply to faster embedding algorithms other than BH-tSNE, like FIt-SNE or UMAP, thus, extending the state-of-the-art and leading to more comprehensive data structure visualization and analysis.

READ FULL TEXT

page 15

page 32

research
08/31/2023

Balancing between the Local and Global Structures (LGS) in Graph Embedding

We present a method for balancing between the Local and Global Structure...
research
07/17/2020

Visualizing the Finer Cluster Structure of Large-Scale and High-Dimensional Data

Dimension reduction and visualization of high-dimensional data have beco...
research
06/10/2011

A Computational Framework for Nonlinear Dimensionality Reduction of Large Data Sets: The Exploratory Inspection Machine (XIM)

In this paper, we present a novel computational framework for nonlinear ...
research
02/23/2019

Parallel Rendering and Large Data Visualization

We are living in the big data age: An ever increasing amount of data is ...
research
04/01/2023

NeuroDAVIS: A neural network model for data visualization

The task of dimensionality reduction and visualization of high-dimension...
research
08/15/2022

A Novel Tree Visualization to Guide Interactive Exploration of Multi-dimensional Topological Hierarchies

Understanding the response of an output variable to multi-dimensional in...
research
08/06/2019

Global Fixed Income Portfolios: A Macroeconomic Invariant Solution

Global fixed income returns span across multiple maturities and economie...

Please sign up or login with your details

Forgot password? Click here to reset