Efficient Algorithms for t-distributed Stochastic Neighborhood Embedding

12/25/2017
by   George C. Linderman, et al.
0

t-distributed Stochastic Neighborhood Embedding (t-SNE) is a method for dimensionality reduction and visualization that has become widely popular in recent years. Efficient implementations of t-SNE are available, but they scale poorly to datasets with hundreds of thousands to millions of high dimensional data-points. We present Fast Fourier Transform-accelerated Interpolation-based t-SNE (FIt-SNE), which dramatically accelerates the computation of t-SNE. The most time-consuming step of t-SNE is a convolution that we accelerate by interpolating onto an equispaced grid and subsequently using the fast Fourier transform to perform the convolution. We also optimize the computation of input similarities in high dimensions using multi-threaded approximate nearest neighbors. We further present a modification to t-SNE called "late exaggeration," which allows for easier identification of clusters in t-SNE embeddings. Finally, for datasets that cannot be loaded into the memory, we present out-of-core randomized principal component analysis (oocPCA), so that the top principal components of a dataset can be computed without ever fully loading the matrix, hence allowing for t-SNE of large datasets to be computed on resource-limited machines.

READ FULL TEXT
research
11/01/2017

Analyzing the Approximation Error of the Fast Graph Fourier Transform

The graph Fourier transform (GFT) is in general dense and requires O(n^2...
research
12/02/2014

Fast Steerable Principal Component Analysis

Cryo-electron microscopy nowadays often requires the analysis of hundred...
research
10/20/2020

Grouped Transformations in High-Dimensional Explainable ANOVA Approximation

Many applications are based on the use of efficient Fourier algorithms s...
research
09/21/2017

Lazy stochastic principal component analysis

Stochastic principal component analysis (SPCA) has become a popular dime...
research
05/28/2018

Linear tSNE optimization for the Web

The t-distributed Stochastic Neighbor Embedding (tSNE) algorithm has bec...
research
07/31/2016

An exact, cache-localized algorithm for the sub-quadratic convolution of hypercubes

Fast multidimensional convolution can be performed naively in quadratic ...
research
02/27/2023

In search of the most efficient and memory-saving visualization of high dimensional data

Interactive exploration of large, multidimensional datasets plays a very...

Please sign up or login with your details

Forgot password? Click here to reset