Local Regularization of Noisy Point Clouds: Improved Global Geometric Estimates and Data Analysis

04/06/2019
by   Nicolas Garcia Trillos, et al.
0

Several data analysis techniques employ similarity relationships between data points to uncover the intrinsic dimension and geometric structure of the underlying data-generating mechanism. In this paper we work under the model assumption that the data is made of random perturbations of feature vectors lying on a low-dimensional manifold. We study two questions: how to define the similarity relationship over noisy data points, and what is the resulting impact of the choice of similarity in the extraction of global geometric information from the underlying manifold. We provide concrete mathematical evidence that using a local regularization of the noisy data to define the similarity improves the approximation of the hidden Euclidean distance between unperturbed points. Furthermore, graph-based objects constructed with the locally regularized similarity function satisfy better error bounds in their recovery of global geometric ones. Our theory is supported by numerical experiments that demonstrate that the gain in geometric understanding facilitated by local regularization translates into a gain in classification accuracy in simulated and real data.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/15/2018

Shamap: Shape-based Manifold Learning

For manifold learning, it is assumed that high-dimensional sample/data p...
research
06/29/2019

Geodesic Distance Estimation with Spherelets

Many statistical and machine learning approaches rely on pairwise distan...
research
09/30/2019

Manifold Fitting in Ambient Space

Modern data sets in many applications no longer comprise samples of real...
research
05/25/2011

Multiscale Geometric Methods for Data Sets II: Geometric Multi-Resolution Analysis

Data sets are often modeled as point clouds in R^D, for D large. It is o...
research
07/28/2021

Large sample spectral analysis of graph-based multi-manifold clustering

In this work we study statistical properties of graph-based algorithms f...
research
04/16/2023

Autoencoders with Intrinsic Dimension Constraints for Learning Low Dimensional Image Representations

Autoencoders have achieved great success in various computer vision appl...
research
09/15/2015

The Shape of Data and Probability Measures

We introduce the notion of multiscale covariance tensor fields (CTF) ass...

Please sign up or login with your details

Forgot password? Click here to reset