Simulating High-Dimensional Multivariate Data using the bigsimr R Package

11/11/2021
by   A. Grant Schissler, et al.
0

It is critical to accurately simulate data when employing Monte Carlo techniques and evaluating statistical methodology. Measurements are often correlated and high dimensional in this era of big data, such as data obtained in high-throughput biomedical experiments. Due to the computational complexity and a lack of user-friendly software available to simulate these massive multivariate constructions, researchers resort to simulation designs that posit independence or perform arbitrary data transformations. To close this gap, we developed the Bigsimr Julia package with R and Python interfaces. This paper focuses on the R interface. These packages empower high-dimensional random vector simulation with arbitrary marginal distributions and dependency via a Pearson, Spearman, or Kendall correlation matrix. bigsimr contains high-performance features, including multi-core and graphical-processing-unit-accelerated algorithms to estimate correlation and compute the nearest correlation matrix. Monte Carlo studies quantify the accuracy and scalability of our approach, up to d=10,000. We describe example workflows and apply to a high-dimensional data set – RNA-sequencing data obtained from breast cancer tumor samples.

READ FULL TEXT

page 10

page 12

page 17

page 18

page 20

research
07/03/2019

mgcpy: A Comprehensive High Dimensional Independence Testing Python Package

With the increase in the amount of data in many fields, a method to cons...
research
09/23/2022

hdtg: An R package for high-dimensional truncated normal simulation

Simulating from the multivariate truncated normal distribution (MTN) is ...
research
12/04/2017

Tracy-Widom limit for Kendall's tau

In this paper, we study a high-dimensional random matrix model from nonp...
research
01/10/2022

Robust graphical lasso based on multivariate Winsorization

We propose the use of a robust covariance estimator based on multivariat...
research
01/04/2020

High-Dimensional Independence Testing and Maximum Marginal Correlation

A number of universally consistent dependence measures have been recentl...
research
02/24/2022

Multiple multi-sample testing under arbitrary covariance dependency

Modern high-throughput biomedical devices routinely produce data on a la...

Please sign up or login with your details

Forgot password? Click here to reset