Relationship-aware Multivariate Sampling Strategy for Scientific Simulation Data

08/31/2020
by   Subhashis Hazarika, et al.
5

With the increasing computational power of current supercomputers, the size of data produced by scientific simulations is rapidly growing. To reduce the storage footprint and facilitate scalable post-hoc analyses of such scientific data sets, various data reduction/summarization methods have been proposed over the years. Different flavors of sampling algorithms exist to sample the high-resolution scientific data, while preserving important data properties required for subsequent analyses. However, most of these sampling algorithms are designed for univariate data and cater to post-hoc analyses of single variables. In this work, we propose a multivariate sampling strategy which preserves the original variable relationships and enables different multivariate analyses directly on the sampled data. Our proposed strategy utilizes principal component analysis to capture the variance of multivariate data and can be built on top of any existing state-of-the-art sampling algorithms for single variables. In addition, we also propose variants of different data partitioning schemes (regular and irregular) to efficiently model the local multivariate relationships. Using two real-world multivariate data sets, we demonstrate the efficacy of our proposed multivariate sampling strategy with respect to its data reduction capabilities as well as the ease of performing efficient post-hoc multivariate analyses.

READ FULL TEXT

page 2

page 3

page 4

research
07/26/2019

Multivariate Pointwise Information-Driven Data Sampling and Visualization

With increasing computing capabilities of modern supercomputers, the siz...
research
08/19/2019

A Co-analysis Framework for Exploring Multivariate Scientific Data

In complex multivariate data sets, different features usually include di...
research
01/27/2023

Inference for all variants of the multivariate coefficient of variation in factorial designs

The multivariate coefficient of variation (MCV) is an attractive and eas...
research
11/02/2020

Sparse Functional Principal Component Analysis in High Dimensions

Functional principal component analysis (FPCA) is a fundamental tool and...
research
06/09/2021

On the Use of Minimum Penalties in Statistical Learning

Modern multivariate machine learning and statistical methodologies estim...
research
08/14/2022

Virgo: Scalable Unsupervised Classification of Cosmological Shock Waves

Cosmological shock waves are essential to understanding the formation of...

Please sign up or login with your details

Forgot password? Click here to reset