DeepAI AI Chat
Log In Sign Up

On landmark selection and sampling in high-dimensional data analysis

by   Mohamed-Ali Belabbas, et al.
Harvard University

In recent years, the spectral analysis of appropriately defined kernel matrices has emerged as a principled way to extract the low-dimensional structure often prevalent in high-dimensional data. Here we provide an introduction to spectral methods for linear and nonlinear dimension reduction, emphasizing ways to overcome the computational limitations currently faced by practitioners with massive datasets. In particular, a data subsampling or landmark selection process is often employed to construct a kernel based on partial information, followed by an approximate spectral analysis termed the Nystrom extension. We provide a quantitative framework to analyse this procedure, and use it to demonstrate algorithmic performance bounds on a range of practical approaches designed to optimize the landmark selection process. We compare the practical implications of these bounds by way of real-world examples drawn from the field of computer vision, whereby low-dimensional manifold structure is shown to emerge from high-dimensional video data streams.


An Explicit Nonlinear Mapping for Manifold Learning

Manifold learning is a hot research topic in the field of computer scien...

Learning Low-Dimensional Nonlinear Structures from High-Dimensional Noisy Data: An Integral Operator Approach

We propose a kernel-spectral embedding algorithm for learning low-dimens...

Fast Computation of Robust Subspace Estimators

Dimension reduction is often an important step in the analysis of high-d...

Hybrid Kronecker Product Decomposition and Approximation

Discovering the underlying low dimensional structure of high dimensional...

Landmark Diffusion Maps (L-dMaps): Accelerated manifold learning out-of-sample extension

Diffusion maps are a nonlinear manifold learning technique based on harm...

A Spectral Series Approach to High-Dimensional Nonparametric Regression

A key question in modern statistics is how to make fast and reliable inf...