Joint Characterization of Multiscale Information in High Dimensional Data

02/18/2021
by   Daniel Sousa, et al.
1

High dimensional data can contain multiple scales of variance. Analysis tools that preferentially operate at one scale can be ineffective at capturing all the information present in this cross-scale complexity. We propose a multiscale joint characterization approach designed to exploit synergies between global and local approaches to dimensionality reduction. We illustrate this approach using Principal Components Analysis (PCA) to characterize global variance structure and t-stochastic neighbor embedding (t-sne) to characterize local variance structure. Using both synthetic images and real-world imaging spectroscopy data, we show that joint characterization is capable of detecting and isolating signals which are not evident from either PCA or t-sne alone. Broadly, t-sne is effective at rendering a randomly oriented low-dimensional map of local clusters, and PCA renders this map interpretable by providing global, physically meaningful structure. This approach is illustrated using imaging spectroscopy data, and may prove particularly useful for other geospatial data given robust local variance structure due to spatial autocorrelation and physical interpretability of global variance structure due to spectral properties of Earth surface materials. However, the fundamental premise could easily be extended to other high dimensional datasets, including image time series and non-image data.

READ FULL TEXT

page 4

page 6

page 8

page 10

page 12

page 15

research
09/05/2022

Opening the black-box of Neighbor Embedding with Hotelling's T2 statistic and Q-residuals

In contrast to classical techniques for exploratory analysis of high-dim...
research
12/02/2021

Joint Characterization of the Cryospheric Spectral Feature Space

Hyperspectral feature spaces are useful for many remote sensing applicat...
research
02/27/2020

Supervised Dimensionality Reduction and Visualization using Centroid-encoder

Visualizing high-dimensional data is an essential task in Data Science a...
research
08/21/2021

Joint Characterization of Spatiotemporal Data Manifolds

Spatiotemporal (ST) image data are increasingly common and often high-di...
research
12/13/2020

k-Variance: A Clustered Notion of Variance

We introduce k-variance, a generalization of variance built on the machi...
research
07/05/2019

Visualization of Emergency Department Clinical Data for Interpretable Patient Phenotyping

Visual summarization of clinical data collected on patients contained wi...
research
08/01/2019

Structure retrieval from 4D-STEM: statistical analysis of potential pitfalls in high-dimensional data

Four-dimensional scanning transmission electron microscopy (4D-STEM) is ...

Please sign up or login with your details

Forgot password? Click here to reset