Learning Low-Dimensional Nonlinear Structures from High-Dimensional Noisy Data: An Integral Operator Approach

02/28/2022
by   Xiucai Ding, et al.
0

We propose a kernel-spectral embedding algorithm for learning low-dimensional nonlinear structures from high-dimensional and noisy observations, where the datasets are assumed to be sampled from an intrinsically low-dimensional manifold and corrupted by high-dimensional noise. The algorithm employs an adaptive bandwidth selection procedure which does not rely on prior knowledge of the underlying manifold. The obtained low-dimensional embeddings can be further utilized for downstream purposes such as data visualization, clustering and prediction. Our method is theoretically justified and practically interpretable. Specifically, we establish the convergence of the final embeddings to their noiseless counterparts when the dimension and size of the samples are comparably large, and characterize the effect of the signal-to-noise ratio on the rate of convergence and phase transition. We also prove convergence of the embeddings to the eigenfunctions of an integral operator defined by the kernel map of some reproducing kernel Hilbert space capturing the underlying nonlinear structures. Numerical simulations and analysis of three real datasets show the superior empirical performance of the proposed method, compared to many existing methods, on learning various manifolds in diverse applications.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/18/2019

A kernel-based method for coarse graining complex dynamical systems

We present a novel kernel-based machine learning algorithm for identifyi...
research
11/22/2021

How do kernel-based sensor fusion algorithms behave under high dimensional noise?

We study the behavior of two kernel based sensor fusion algorithms, nonp...
research
11/21/2020

Phase transition of graph Laplacian of high dimensional noisy random point cloud

We systematically explore the spectral distribution of kernel-based grap...
research
05/16/2021

Theoretical Foundations of t-SNE for Visualizing High-Dimensional Clustered Data

This study investigates the theoretical foundations of t-distributed sto...
research
06/24/2009

On landmark selection and sampling in high-dimensional data analysis

In recent years, the spectral analysis of appropriately defined kernel m...
research
06/01/2019

Learning low-dimensional state embeddings and metastable clusters from time series data

This paper studies how to find compact state embeddings from high-dimens...
research
03/18/2019

A Geometrical Method for Low-Dimensional Representations of Simulations

We propose a new data analysis approach for the efficient post-processin...

Please sign up or login with your details

Forgot password? Click here to reset