Visualizing the Finer Cluster Structure of Large-Scale and High-Dimensional Data

07/17/2020
by   Yu Liang, et al.
0

Dimension reduction and visualization of high-dimensional data have become very important research topics because of the rapid growth of large databases in data science. In this paper, we propose using a generalized sigmoid function to model the distance similarity in both high- and low-dimensional spaces. In particular, the parameter b is introduced to the generalized sigmoid function in low-dimensional space, so that we can adjust the heaviness of the function tail by changing the value of b. Using both simulated and real-world data sets, we show that our proposed method can generate visualization results comparable to those of uniform manifold approximation and projection (UMAP), which is a newly developed manifold learning technique with fast running speed, better global structure, and scalability to massive data sets. In addition, according to the purpose of the study and the data structure, we can decrease or increase the value of b to either reveal the finer cluster structure of the data or maintain the neighborhood continuity of the embedding for better visualization. Finally, we use domain knowledge to demonstrate that the finer subclusters revealed with small values of b are meaningful.

READ FULL TEXT
POST COMMENT

Comments

There are no comments yet.

Authors

page 7

page 8

page 9

page 10

12/02/2020

q-SNE: Visualizing Data using q-Gaussian Distributed Stochastic Neighbor Embedding

The dimensionality reduction has been widely introduced to use the high-...
12/28/2016

Optimal bandwidth estimation for a fast manifold learning algorithm to detect circular structure in high-dimensional data

We provide a way to infer about existence of topological circularity in ...
11/30/2021

Towards a comprehensive visualization of structure in data

Dimensional data reduction methods are fundamental to explore and visual...
08/05/2015

Dimension Reduction with Non-degrading Generalization

Visualizing high dimensional data by projecting them into two or three d...
02/19/2018

Entropy-Isomap: Manifold Learning for High-dimensional Dynamic Processes

Scientific and engineering processes produce massive high-dimensional da...
02/15/2019

Heavy-tailed kernels reveal a finer cluster structure in t-SNE visualisations

T-distributed stochastic neighbour embedding (t-SNE) is a widely used da...
09/29/2019

Capacity Preserving Mapping for High-dimensional Data Visualization

We provide a rigorous mathematical treatment to the crowding issue in da...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.