ABID: Angle Based Intrinsic Dimensionality

06/23/2020
by   Erik Thordsen, et al.
0

The intrinsic dimensionality refers to the “true” dimensionality of the data, as opposed to the dimensionality of the data representation. For example, when attributes are highly correlated, the intrinsic dimensionality can be much lower than the number of variables. Local intrinsic dimensionality refers to the observation that this property can vary for different parts of the data set; and intrinsic dimensionality can serve as a proxy for the local difficulty of the data set. Most popular methods for estimating the local intrinsic dimensionality are based on distances, and the rate at which the distances to the nearest neighbors increase, a concept known as “expansion dimension”. In this paper we introduce an orthogonal concept, which does not use any distances: we use the distribution of angles between neighbor points. We derive the theoretical distribution of angles and use this to construct an estimator for intrinsic dimensionality. Experimentally, we verify that this measure behaves similarly, but complementarily, to existing measures of intrinsic dimensionality. By introducing a new idea of intrinsic dimensionality to the research community, we hope to contribute to a better understanding of intrinsic dimensionality and to spur new research in this direction.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/28/2021

Distributional Results for Model-Based Intrinsic Dimension Estimators

Modern datasets are characterized by a large number of features that may...
research
03/26/2018

On the Intrinsic Dimensionality of Face Representation

The two underlying factors that determine the efficacy of face represent...
research
09/29/2022

Intrinsic Dimensionality Estimation within Tight Localities: A Theoretical and Experimental Analysis

Accurate estimation of Intrinsic Dimensionality (ID) is of crucial impor...
research
10/02/2020

Note: An alternative proof of the vulnerability of k-NN classifiers in high intrinsic dimensionality regions

This document proposes an alternative proof of the result contained in a...
research
04/30/2018

A Data-Dependent Distance for Regression

We develop a new data-dependent distance for regression problems to comp...
research
03/01/2017

Fast k-Nearest Neighbour Search via Prioritized DCI

Most exact methods for k-nearest neighbour search suffer from the curse ...
research
09/14/2013

Ultrametric Component Analysis with Application to Analysis of Text and of Emotion

We review the theory and practice of determining what parts of a data se...

Please sign up or login with your details

Forgot password? Click here to reset