Visualizing and Exploring Dynamic High-Dimensional Datasets with LION-tSNE

08/16/2017
by   Andrey Boytsov, et al.
0

T-distributed stochastic neighbor embedding (tSNE) is a popular and prize-winning approach for dimensionality reduction and visualizing high-dimensional data. However, tSNE is non-parametric: once visualization is built, tSNE is not designed to incorporate additional data into existing representation. It highly limits the applicability of tSNE to the scenarios where data are added or updated over time (like dashboards or series of data snapshots). In this paper we propose, analyze and evaluate LION-tSNE (Local Interpolation with Outlier coNtrol) - a novel approach for incorporating new data into tSNE representation. LION-tSNE is based on local interpolation in the vicinity of training data, outlier detection and a special outlier mapping algorithm. We show that LION-tSNE method is robust both to outliers and to new samples from existing clusters. We also discuss multiple possible improvements for special cases. We compare LION-tSNE to a comprehensive list of possible benchmark approaches that include multiple interpolation techniques, gradient descent for new data, and neural network approximation.

READ FULL TEXT

page 23

page 24

page 25

research
11/01/2016

Local Subspace-Based Outlier Detection using Global Neighbourhoods

Outlier detection in high-dimensional data is a challenging yet importan...
research
04/01/2023

NeuroDAVIS: A neural network model for data visualization

The task of dimensionality reduction and visualization of high-dimension...
research
12/09/2019

Self Organizing Nebulous Growths for Robust and Incremental Data Visualization

Non-parametric dimensionality reduction techniques, such as t-SNE and UM...
research
09/09/2019

Outlier Detection in High Dimensional Data

High-dimensional data poses unique challenges in outlier detection proce...
research
09/03/2020

Kernel Interpolation of High Dimensional Scattered Data

Data sites selected from modeling high-dimensional problems often appear...
research
03/26/2020

Robust Classification of High-Dimensional Spectroscopy Data Using Deep Learning and Data Synthesis

This paper presents a new approach to classification of high dimensional...
research
02/27/2023

In search of the most efficient and memory-saving visualization of high dimensional data

Interactive exploration of large, multidimensional datasets plays a very...

Please sign up or login with your details

Forgot password? Click here to reset