Intrinsic Dimensionality Estimation within Tight Localities: A Theoretical and Experimental Analysis

09/29/2022
by   Laurent Amsaleg, et al.
6

Accurate estimation of Intrinsic Dimensionality (ID) is of crucial importance in many data mining and machine learning tasks, including dimensionality reduction, outlier detection, similarity search and subspace clustering. However, since their convergence generally requires sample sizes (that is, neighborhood sizes) on the order of hundreds of points, existing ID estimation methods may have only limited usefulness for applications in which the data consists of many natural groups of small size. In this paper, we propose a local ID estimation strategy stable even for `tight' localities consisting of as few as 20 sample points. The estimator applies MLE techniques over all available pairwise distances among the members of the sample, based on a recent extreme-value-theoretic model of intrinsic dimensionality, the Local Intrinsic Dimension (LID). Our experimental results show that our proposed estimation technique can achieve notably smaller variance, while maintaining comparable levels of bias, at much smaller sample sizes than state-of-the-art estimators.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/18/2019

Intrinsic dimension estimation for locally undersampled data

High-dimensional data are ubiquitous in contemporary science and finding...
research
06/23/2020

ABID: Angle Based Intrinsic Dimensionality

The intrinsic dimensionality refers to the “true” dimensionality of the ...
research
01/31/2020

Local intrinsic dimensionality estimators based on concentration of measure

Intrinsic dimensionality (ID) is one of the most fundamental characteris...
research
04/28/2021

Distributional Results for Model-Based Intrinsic Dimension Estimators

Modern datasets are characterized by a large number of features that may...
research
03/19/2018

Estimating the intrinsic dimension of datasets by a minimal neighborhood information

Analyzing large volumes of high-dimensional data is an issue of fundamen...
research
12/29/2022

Robust Bayesian Subspace Identification for Small Data Sets

Model estimates obtained from traditional subspace identification method...
research
02/23/2021

intRinsic: an R package for model-based estimation of the intrinsic dimension of a dataset

The estimation of the intrinsic dimension of a dataset is a fundamental ...

Please sign up or login with your details

Forgot password? Click here to reset