Efficient Principal Subspace Projection of Streaming Data Through Fast Similarity Matching

08/06/2018
by   Andrea Giovannucci, et al.
0

Big data problems frequently require processing datasets in a streaming fashion, either because all data are available at once but collectively are larger than available memory or because the data intrinsically arrive one data point at a time and must be processed online. Here, we introduce a computationally efficient version of similarity matching, a framework for online dimensionality reduction that incrementally estimates the top K-dimensional principal subspace of streamed data while keeping in memory only the last sample and the current iterate. To assess the performance of our approach, we construct and make public a test suite containing both a synthetic data generator and the infrastructure to test online dimensionality reduction algorithms on real datasets, as well as performant implementations of our algorithm and competing algorithms with similar aims. Among the algorithms considered we find our approach to be competitive, performing among the best on both synthetic and real data.

READ FULL TEXT
research
06/04/2018

MOSES: A Streaming Algorithm for Linear Dimensionality Reduction

This paper introduces Memory-limited Online Subspace Estimation Scheme (...
research
03/11/2021

Distributed Principal Subspace Analysis for Partitioned Big Data: Algorithms, Analysis, and Implementation

Principal Subspace Analysis (PSA) is one of the most popular approaches ...
research
01/16/2013

Experiments with Random Projection

Recent theoretical work has identified random projection as a promising ...
research
11/30/2015

A Normative Theory of Adaptive Dimensionality Reduction in Neural Networks

To make sense of the world our brains must analyze high-dimensional data...
research
10/25/2017

DPCA: Dimensionality Reduction for Discriminative Analytics of Multiple Large-Scale Datasets

Principal component analysis (PCA) has well-documented merits for data e...
research
01/31/2016

Principal Polynomial Analysis

This paper presents a new framework for manifold learning based on a seq...
research
12/11/2016

Self-calibrating Neural Networks for Dimensionality Reduction

Recently, a novel family of biologically plausible online algorithms for...

Please sign up or login with your details

Forgot password? Click here to reset