Statistical embedding: Beyond principal components

06/03/2021
by   Dag Tjøstheim, et al.
90

There has been an intense recent activity in embedding of very high dimensional and nonlinear data structures, much of it in the data science and machine learning literature. We survey this activity in four parts. In the first part we cover nonlinear methods such as principal curves, multidimensional scaling, local linear methods, ISOMAP, graph based methods and kernel based methods. The second part is concerned with topological embedding methods, in particular mapping topological properties into persistence diagrams. Another type of data sets with a tremendous growth is very high-dimensional network data. The task considered in part three is how to embed such data in a vector space of moderate dimension to make the data amenable to traditional techniques such as cluster and classification techniques. The final part of the survey deals with embedding in ℝ^2, which is visualization. Three methods are presented: t-SNE, UMAP and LargeVis based on methods in parts one, two and three, respectively. The methods are illustrated and compared on two simulated data sets; one consisting of a triple of noisy Ranunculoid curves, and one consisting of networks of increasing complexity and with two types of nodes.

READ FULL TEXT
research
04/20/2019

PersLay: A Simple and Versatile Neural Network Layer for Persistence Diagrams

Persistence diagrams, a key descriptor from Topological Data Analysis, e...
research
07/08/2018

Hierarchical Stochastic Graphlet Embedding for Graph-based Pattern Recognition

Despite being very successful within the pattern recognition and machine...
research
10/18/2019

Adaptive Partitioning for Template Functions on Persistence Diagrams

As the field of Topological Data Analysis continues to show success in t...
research
09/03/2020

TopoMap: A 0-dimensional Homology Preserving Projection of High-Dimensional Data

Multidimensional Projection is a fundamental tool for high-dimensional d...
research
02/01/2017

High Order Stochastic Graphlet Embedding for Graph-Based Pattern Recognition

Graph-based methods are known to be successful for pattern description a...
research
09/27/2019

Stratified Space Learning: Reconstructing Embedded Graphs

Many data-rich industries are interested in the efficient discovery and ...
research
11/26/2019

FCA2VEC: Embedding Techniques for Formal Concept Analysis

Embedding large and high dimensional data into low dimensional vector sp...

Please sign up or login with your details

Forgot password? Click here to reset