Scalable Manifold Learning for Big Data with Apache Spark

08/31/2018
by   Frank Schoeneman, et al.
0

Non-linear spectral dimensionality reduction methods, such as Isomap, remain important technique for learning manifolds. However, due to computational complexity, exact manifold learning using Isomap is currently impossible from large-scale data. In this paper, we propose a distributed memory framework implementing end-to-end exact Isomap under Apache Spark model. We show how each critical step of the Isomap algorithm can be efficiently realized using basic Spark model, without the need to provision data in the secondary storage. We show how the entire method can be implemented using PySpark, offloading compute intensive linear algebra routines to BLAS. Through experimental results, we demonstrate excellent scalability of our method, and we show that it can process datasets orders of magnitude larger than what is currently possible, using a 25-node parallel cluster.

READ FULL TEXT
research
10/17/2017

S-Isomap++: Multi Manifold Learning from Streaming Data

Manifold learning based methods have been widely used for non-linear dim...
research
06/14/2016

Bayesian Inference on Matrix Manifolds for Linear Dimensionality Reduction

We reframe linear dimensionality reduction as a problem of Bayesian infe...
research
04/19/2018

Randomized ICA and LDA Dimensionality Reduction Methods for Hyperspectral Image Classification

Dimensionality reduction is an important step in processing the hyperspe...
research
05/30/2013

Non-linear dimensionality reduction: Riemannian metric estimation and the problem of geometric discovery

In recent years, manifold learning has become increasingly popular as a ...
research
02/21/2018

Angle constrained path to cluster multiple manifolds

In this paper, we propose a method to cluster multiple intersected manif...
research
10/19/2020

Product Manifold Learning

We consider problems of dimensionality reduction and learning data repre...
research
08/25/2018

Hyperscaling Internet Graph Analysis with D4M on the MIT SuperCloud

Detecting anomalous behavior in network traffic is a major challenge due...

Please sign up or login with your details

Forgot password? Click here to reset