DROP: Dimensionality Reduction Optimization for Time Series

08/01/2017
by   Sahaana Suri, et al.
0

Dimensionality reduction is critical in analyzing increasingly high-volume, high-dimensional time series. In this paper, we revisit a now-classic study of time series dimensionality reduction operators and find that for a given quality constraint, Principal Component Analysis (PCA) uncovers representations that are over 2x smaller than those obtained via alternative techniques favored in the literature. However, as classically implemented via Singular Value Decomposition (SVD), PCA is incredibly expensive for large datasets. Therefore, we present DROP, a dimensionality reduction optimizer for high-dimensional analytics pipelines that greatly reduces the cost of the PCA operation over time series datasets. We show that many time series are highly structured, hence a small number of data points are sufficient to characterize the data set, which permits aggressive sampling during dimensionality reduction. This sampling allows DROP to uncover high quality low-dimensional bases in running time proportional to the dataset's intrinsic dimensionality, independent of the actual dataset size, without requiring the user to specify this intrinsic dimensionality a priori. DROP further enables downstream-operation-aware optimization by coupling sampling with online progress estimation, trading-off degree of dimensionality reduction with the combined runtime of DROP and downstream analytics tasks. By progressively sampling its input, computing a candidate basis for transformation, and terminating once it finds a sufficiently high quality basis in a reasonable running time, DROP provides speedups of up to 50x over PCA via SVD and 33x in end-to-end high-dimensional analytics pipelines.

READ FULL TEXT

page 4

page 5

page 9

page 10

page 11

research
06/04/2023

Prescriptive PCA: Dimensionality Reduction for Two-stage Stochastic Optimization

In this paper, we consider the alignment between an upstream dimensional...
research
07/22/2017

Using PCA and Factor Analysis for Dimensionality Reduction of Bio-informatics Data

Large volume of Genomics data is produced on daily basis due to the adva...
research
05/15/2018

Nonlinear Dimensionality Reduction for Discriminative Analytics of Multiple Datasets

Principal component analysis (PCA) is widely used for feature extraction...
research
11/14/2019

A Recurrent Probabilistic Neural Network with Dimensionality Reduction Based on Time-series Discriminant Component Analysis

This paper proposes a probabilistic neural network developed on the basi...
research
09/27/2022

Linear Dimensionality Reduction

These notes are an overview of some classical linear methods in Multivar...
research
07/10/2023

Functional PCA and Deep Neural Networks-based Bayesian Inverse Uncertainty Quantification with Transient Experimental Data

Inverse UQ is the process to inversely quantify the model input uncertai...
research
10/12/2021

Label scarcity in biomedicine: Data-rich latent factor discovery enhances phenotype prediction

High-quality data accumulation is now becoming ubiquitous in the health ...

Please sign up or login with your details

Forgot password? Click here to reset