High-Performance Statistical Computing in the Computing Environments of the 2020s

01/07/2020
by   Seyoon Ko, et al.
0

Technological advances in the past decade, hardware and software alike, have made access to high-performance computing (HPC) easier than ever. We review these advances from a statistical computing perspective. Cloud computing allows access to supercomputers affordable. Deep learning software libraries make programming statistical algorithms easy, and enable users to write code once and run it anywhere from a laptop to a workstation with multiple graphics processing units (GPUs) or a supercomputer in a cloud. To promote statisticians to benefit from these developments, we review recent optimization algorithms that are useful for high-dimensional models and can harness the power of HPC. Code snippets are provided for the readers to grasp the ease of programming. We also provide an easy-to-use distributed matrix data structure suitable for HPC. Employing this data structure, we illustrate various statistical applications including large-scale nonnegative matrix factorization, positron emission tomography, multidimensional scaling, and ℓ_1-regularized Cox regression. Our examples easily scale up to an 8-GPU workstation and a 720-CPU-core cluster in a cloud. As a case in point, we analyze the on-set of type-2 diabetes from the UK Biobank with 200,000 subjects and about 500,000 single nucleotide polymorphisms using the HPC ℓ_1-regularized Cox regression. Fitting a half-million-variate model takes less than 45 minutes, reconfirming known associations. To our knowledge, the feasibility of jointly genome-wide association analysis of survival outcomes at this scale is first demonstrated.

READ FULL TEXT

page 19

page 24

page 25

research
10/30/2020

DistStat.jl: Towards Unified Programming for High-Performance Statistical Computing Environments in Julia

The demand for high-performance computing (HPC) is ever-increasing for e...
research
05/13/2020

Literature Review and Implementation Overview: High Performance Computing with Graphics Processing Units for Classroom and Research Use

In this report, I discuss the history and current state of GPU HPC syste...
research
06/16/2020

High-performance cloud computing for exhaustive protein-protein docking

Public cloud computing environments, such as Amazon AWS, Microsoft Azure...
research
12/11/2019

High Performance Computing for Geospatial Applications: A Retrospective View

Many types of geospatial analyses are computationally complex, involving...
research
03/28/2020

Sparse Matrix-Based HPC Tomography

Tomographic imaging has benefited from advances in X-ray sources, detect...
research
08/26/2020

8 Steps to 3.7 TFLOP/s on NVIDIA V100 GPU: Roofline Analysis and Other Tricks

Performance optimization can be a daunting task especially as the hardwa...

Please sign up or login with your details

Forgot password? Click here to reset