Too many secants: a hierarchical approach to secant-based dimensionality reduction on large data sets

08/05/2018
by   Henry Kvinge, et al.
0

A fundamental question in many data analysis settings is the problem of discerning the "natural" dimension of a data set. That is, when a data set is drawn from a manifold (possibly with noise), a meaningful aspect of the data is the dimension of that manifold. Various approaches exist for estimating this dimension, such as the method of Secant-Avoidance Projection (SAP). Intuitively, the SAP algorithm seeks to determine a projection which best preserves the lengths of all secants between points in a data set; by applying the algorithm to find the best projections to vector spaces of various dimensions, one may infer the dimension of the manifold of origination. That is, one may learn the dimension at which it is possible to construct a diffeomorphic copy of the data in a lower-dimensional Euclidean space. Using Whitney's embedding theorem, we can relate this information to the natural dimension of the data. A drawback of the SAP algorithm is that a data set with T points has O(T^2) secants, making the computation and storage of all secants infeasible for very large data sets. In this paper, we propose a novel algorithm that generalizes the SAP algorithm with an emphasis on addressing this issue. That is, we propose a hierarchical secant-based dimensionality-reduction method, which can be employed for data sets where explicitly calculating all secants is not feasible.

READ FULL TEXT

page 5

page 6

research
10/27/2018

Monitoring the shape of weather, soundscapes, and dynamical systems: a new statistic for dimension-driven data analysis on large data sets

Dimensionality-reduction methods are a fundamental tool in the analysis ...
research
09/06/2017

A Quasi-isometric Embedding Algorithm

The Whitney embedding theorem gives an upper bound on the smallest embed...
research
07/10/2018

A GPU-Oriented Algorithm Design for Secant-Based Dimensionality Reduction

Dimensionality-reduction techniques are a fundamental tool for extractin...
research
07/14/2021

MESS: Manifold Embedding Motivated Super Sampling

Many approaches in the field of machine learning and data analysis rely ...
research
12/05/2017

Optimal Fast Johnson-Lindenstrauss Embeddings for Large Data Sets

We introduce a new fast construction of a Johnson-Lindenstrauss matrix b...
research
11/26/2019

FCA2VEC: Embedding Techniques for Formal Concept Analysis

Embedding large and high dimensional data into low dimensional vector sp...
research
01/21/2021

Computation of quantile sets for bivariate data

Algorithms are proposed for the computation of set-valued quantiles and ...

Please sign up or login with your details

Forgot password? Click here to reset