DeepAI AI Chat
Log In Sign Up

Towards a comprehensive visualization of structure in data

11/30/2021
by   Joan Garriga, et al.
CSIC
0

Dimensional data reduction methods are fundamental to explore and visualize large data sets. Basic requirements for unsupervised data exploration are simplicity, flexibility and scalability. However, current methods show complex parameterizations and strong computational limitations when exploring large data structures across scales. Here, we focus on the t-SNE algorithm and show that a simplified parameter setup with a single control parameter, namely the perplexity, can effectively balance local and global data structure visualization. We also designed a chunk&mix protocol to efficiently parallelize t-SNE and explore data structure across a much wide range of scales than currently available. Our parallel version of the BH-tSNE, namely pt-SNE, converges to good global embedding, comparable to state-of-the-art solutions, though the chunk&mix protocol adds little noise and decreases the accuracy at the local scale. Nonetheless, we show that simple post-processing can efficiently restore local scale visualization, without any loss of precision at the global scales. We expect the same approach to apply to faster embedding algorithms other than BH-tSNE, like FIt-SNE or UMAP, thus, extending the state-of-the-art and leading to more comprehensive data structure visualization and analysis.

READ FULL TEXT

page 15

page 32

07/17/2020

Visualizing the Finer Cluster Structure of Large-Scale and High-Dimensional Data

Dimension reduction and visualization of high-dimensional data have beco...
02/23/2019

Parallel Rendering and Large Data Visualization

We are living in the big data age: An ever increasing amount of data is ...
06/15/2020

Supervised Visualization for Data Exploration

Dimensionality reduction is often used as an initial step in data explor...
09/16/2020

Visualizing structure and transitions in high-dimensional biological data

The high-dimensional data created by high-throughput technologies requir...
10/21/2022

TAP: Transparent and Privacy-Preserving Data Services

Users today expect more security from services that handle their data. I...
08/15/2022

A Novel Tree Visualization to Guide Interactive Exploration of Multi-dimensional Topological Hierarchies

Understanding the response of an output variable to multi-dimensional in...