Learning Low-Dimensional Nonlinear Structures from High-Dimensional Noisy Data: An Integral Operator Approach

by   Xiucai Ding, et al.
University of California-Davis
Stanford University

We propose a kernel-spectral embedding algorithm for learning low-dimensional nonlinear structures from high-dimensional and noisy observations, where the datasets are assumed to be sampled from an intrinsically low-dimensional manifold and corrupted by high-dimensional noise. The algorithm employs an adaptive bandwidth selection procedure which does not rely on prior knowledge of the underlying manifold. The obtained low-dimensional embeddings can be further utilized for downstream purposes such as data visualization, clustering and prediction. Our method is theoretically justified and practically interpretable. Specifically, we establish the convergence of the final embeddings to their noiseless counterparts when the dimension and size of the samples are comparably large, and characterize the effect of the signal-to-noise ratio on the rate of convergence and phase transition. We also prove convergence of the embeddings to the eigenfunctions of an integral operator defined by the kernel map of some reproducing kernel Hilbert space capturing the underlying nonlinear structures. Numerical simulations and analysis of three real datasets show the superior empirical performance of the proposed method, compared to many existing methods, on learning various manifolds in diverse applications.


page 1

page 2

page 3

page 4


A kernel-based method for coarse graining complex dynamical systems

We present a novel kernel-based machine learning algorithm for identifyi...

How do kernel-based sensor fusion algorithms behave under high dimensional noise?

We study the behavior of two kernel based sensor fusion algorithms, nonp...

Phase transition of graph Laplacian of high dimensional noisy random point cloud

We systematically explore the spectral distribution of kernel-based grap...

Theoretical Foundations of t-SNE for Visualizing High-Dimensional Clustered Data

This study investigates the theoretical foundations of t-distributed sto...

On landmark selection and sampling in high-dimensional data analysis

In recent years, the spectral analysis of appropriately defined kernel m...

Learning low-dimensional state embeddings and metastable clusters from time series data

This paper studies how to find compact state embeddings from high-dimens...

A Geometrical Method for Low-Dimensional Representations of Simulations

We propose a new data analysis approach for the efficient post-processin...

Please sign up or login with your details

Forgot password? Click here to reset