Intrinsic dimension estimation for locally undersampled data

06/18/2019
by   Vittorio Erba, et al.
0

High-dimensional data are ubiquitous in contemporary science and finding methods to compress them is one of the primary goals of machine learning. Given a dataset lying in a high-dimensional space (in principle hundreds to several thousands of dimensions), it is often useful to project it onto a lower-dimensional manifold, without loss of information. Identifying the minimal dimension of such manifold is a challenging problem known in the literature as intrinsic dimension estimation (IDE). Traditionally, most IDE algorithms are either based on multiscale principal component analysis (PCA) or on the notion of correlation dimension (and more in general on k-nearest-neighbors distances). These methods are affected, in different ways, by a severe curse of dimensionality. In particular, none of the existing algorithms can provide accurate ID estimates in the extreme locally undersampled regime, i.e. in the limit where the number of samples in any local patch of the manifold is less than (or of the same order of) the ID of the dataset. Here we introduce a new ID estimator that leverages on simple properties of the tangent space of a manifold to overcome these shortcomings. The method is based on the full correlation integral, going beyond the limit of small radius used for the estimation of the correlation dimension. Our estimator alleviates the extreme undersampling problem, intractable with other methods. Based on this insight, we explore a multiscale generalization of the algorithm. We show that it is capable of (i) identifying multiple dimensionalities in a dataset, and (ii) providing accurate estimates of the ID of extremely curved manifolds. In particular, we test the method on manifolds generated from global transformations of high-contrast images, relevant for invariant object recognition and considered a challenge for state-of-the-art ID estimators.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/19/2018

Estimating the intrinsic dimension of datasets by a minimal neighborhood information

Analyzing large volumes of high-dimensional data is an issue of fundamen...
research
09/29/2022

Intrinsic Dimensionality Estimation within Tight Localities: A Theoretical and Experimental Analysis

Accurate estimation of Intrinsic Dimensionality (ID) is of crucial impor...
research
07/20/2022

Intrinsic dimension estimation for discrete metrics

Real world-datasets characterized by discrete features are ubiquitous: f...
research
05/04/2018

Local angles and dimension estimation from data on manifolds

For data living in a manifold M⊆R^m and a point p∈ M we consider a stati...
research
02/27/2019

Clustering by the local intrinsic dimension: the hidden structure of real-world data

It is well known that a small number of variables is often sufficient to...
research
12/09/2013

On the Estimation of Pointwise Dimension

Our goal in this paper is to develop an effective estimator of fractal d...
research
10/11/2022

Intrinsic Dimension for Large-Scale Geometric Learning

The concept of dimension is essential to grasp the complexity of data. A...

Please sign up or login with your details

Forgot password? Click here to reset