Monitoring the shape of weather, soundscapes, and dynamical systems: a new statistic for dimension-driven data analysis on large data sets

10/27/2018
by   Henry Kvinge, et al.
0

Dimensionality-reduction methods are a fundamental tool in the analysis of large data sets. These algorithms work on the assumption that the "intrinsic dimension" of the data is generally much smaller than the ambient dimension in which it is collected. Alongside their usual purpose of mapping data into a smaller dimension with minimal information loss, dimensionality-reduction techniques implicitly or explicitly provide information about the dimension of the data set. In this paper, we propose a new statistic that we call the κ-profile for analysis of large data sets. The κ-profile arises from a dimensionality-reduction optimization problem: namely that of finding a projection into k-dimensions that optimally preserves the secants between points in the data set. From this optimal projection we extract κ, the norm of the shortest projected secant from among the set of all normalized secants. This κ can be computed for any k; thus the tuple of κ values (indexed by dimension) becomes a κ-profile. Algorithms such as the Secant-Avoidance Projection algorithm and the Hierarchical Secant-Avoidance Projection algorithm, provide a computationally feasible means of estimating the κ-profile for large data sets, and thus a method of understanding and monitoring their behavior. As we demonstrate in this paper, the κ-profile serves as a useful statistic in several representative settings: weather data, soundscape data, and dynamical systems data.

READ FULL TEXT
research
08/05/2018

Too many secants: a hierarchical approach to secant-based dimensionality reduction on large data sets

A fundamental question in many data analysis settings is the problem of ...
research
10/29/2017

Dimensionality reduction methods for molecular simulations

Molecular simulations produce very high-dimensional data-sets with milli...
research
09/24/2022

Fractal dimension, approximation and data sets

The purpose of this paper is to study the fractal phenomena in large dat...
research
07/10/2018

A GPU-Oriented Algorithm Design for Secant-Based Dimensionality Reduction

Dimensionality-reduction techniques are a fundamental tool for extractin...
research
06/10/2011

A Computational Framework for Nonlinear Dimensionality Reduction of Large Data Sets: The Exploratory Inspection Machine (XIM)

In this paper, we present a novel computational framework for nonlinear ...
research
09/07/2022

Dimensionality Reduction using Elastic Measures

With the recent surge in big data analytics for hyper-dimensional data t...
research
05/15/2018

Intrinsic dimension and its application to association rules

The curse of dimensionality in the realm of association rules is twofold...

Please sign up or login with your details

Forgot password? Click here to reset