DROP: Dimensionality Reduction Optimization for Time Series
Dimensionality reduction is critical in analyzing increasingly high-volume, high-dimensional time series. In this paper, we revisit a now-classic study of time series dimensionality reduction operators and find that for a given quality constraint, Principal Component Analysis (PCA) uncovers representations that are over 2x smaller than those obtained via alternative techniques favored in the literature. However, as classically implemented via Singular Value Decomposition (SVD), PCA is incredibly expensive for large datasets. Therefore, we present DROP, a dimensionality reduction optimizer for high-dimensional analytics pipelines that greatly reduces the cost of the PCA operation over time series datasets. We show that many time series are highly structured, hence a small number of data points are sufficient to characterize the data set, which permits aggressive sampling during dimensionality reduction. This sampling allows DROP to uncover high quality low-dimensional bases in running time proportional to the dataset's intrinsic dimensionality, independent of the actual dataset size, without requiring the user to specify this intrinsic dimensionality a priori. DROP further enables downstream-operation-aware optimization by coupling sampling with online progress estimation, trading-off degree of dimensionality reduction with the combined runtime of DROP and downstream analytics tasks. By progressively sampling its input, computing a candidate basis for transformation, and terminating once it finds a sufficiently high quality basis in a reasonable running time, DROP provides speedups of up to 50x over PCA via SVD and 33x in end-to-end high-dimensional analytics pipelines.
READ FULL TEXT