Improving Stochastic Neighbour Embedding fundamentally with a well-defined data-dependent kernel

06/24/2019
by   Ye Zhu, et al.
0

We identify a fundamental issue in the popular Stochastic Neighbour Embedding (SNE and t-SNE), i.e., the "learned" similarity of any two points in high-dimensional space is not defined and cannot be computed. It underlines two previously unexplored issues in the algorithm which have undermined the quality of its final visualisation output and its ability to process large datasets. The issues are:(a) the reference probability in high-dimensional space is set based on entropy which has undefined relation with local density; and (b) the use of data independent kernel which leads to the need to determine n bandwidths for a dataset of n points. This paper establishes a principle to set the reference probability via a data-dependent kernel which has a well-defined kernel characteristic that linked directly to local density. A solution based on a recent data-dependent kernel called Isolation Kernel addresses the fundamental issue as well as its two ensuing issues. As a result, it significantly improves the quality of the final visualisation output and removes one obstacle that prevents t-SNE from processing large datasets. The solution is extremely simple, i.e., simply replacing the existing data independent kernel with Isolation Kernel, leaving the rest of the t-SNE procedure unchanged.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/24/2020

Isolation Distributional Kernel: A New Tool for Point Group Anomaly Detection

We introduce Isolation Distributional Kernel as a new way to measure the...
research
07/13/2023

Kernel t-distributed stochastic neighbor embedding

This paper presents a kernelized version of the t-SNE algorithm, capable...
research
06/30/2019

Nearest-Neighbour-Induced Isolation Similarity and its Impact on Density-Based Clustering

A recent proposal of data dependent similarity called Isolation Kernel/S...
research
10/12/2020

The Impact of Isolation Kernel on Agglomerative Hierarchical Clustering Algorithms

Agglomerative hierarchical clustering (AHC) is one of the popular cluste...
research
07/28/2016

Kernel functions based on triplet comparisons

Given only information in the form of similarity triplets "Object A is m...
research
09/16/2022

Robust Inference of Manifold Density and Geometry by Doubly Stochastic Scaling

The Gaussian kernel and its traditional normalizations (e.g., row-stocha...
research
07/02/2019

Isolation Kernel: The X Factor in Efficient and Effective Large Scale Online Kernel Learning

Large scale online kernel learning aims to build an efficient and scalab...

Please sign up or login with your details

Forgot password? Click here to reset