DistStat.jl: Towards Unified Programming for High-Performance Statistical Computing Environments in Julia

10/30/2020
by   Seyoon Ko, et al.
0

The demand for high-performance computing (HPC) is ever-increasing for everyday statistical computing purposes. The downside is that we need to write specialized code for each HPC environment. CPU-level parallelization needs to be explicitly coded for effective use of multiple nodes in cluster supercomputing environments. Acceleration via graphics processing units (GPUs) requires to write kernel code. The Julia software package DistStat.jl implements a data structure for distributed arrays that work on both multi-node CPU clusters and multi-GPU environments transparently. This package paves a way to developing high-performance statistical software in various HPC environments simultaneously. As a demonstration of the transparency and scalability of the package, we provide applications to large-scale nonnegative matrix factorization, multidimensional scaling, and ℓ_1-regularized Cox proportional hazards model on an 8-GPU workstation and a 720-CPU-core virtual cluster in Amazon Web Services (AWS) cloud. As a case in point, we analyze the on-set of type-2 diabetes from the UK Biobank with 400,000 subjects and 500,000 single nucleotide polymorphisms using the ℓ_1-regularized Cox proportional hazards model. Fitting a half-million-variate regression model took less than 50 minutes on AWS.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/07/2020

High-Performance Statistical Computing in the Computing Environments of the 2020s

Technological advances in the past decade, hardware and software alike, ...
research
07/07/2020

On the Efficient Evaluation of the Exchange Correlation Potential on Graphics Processing Unit Clusters

The predominance of Kohn-Sham density functional theory (KS-DFT) for the...
research
06/16/2020

High-performance cloud computing for exhaustive protein-protein docking

Public cloud computing environments, such as Amazon AWS, Microsoft Azure...
research
06/26/2020

Self-Scaling Clusters and Reproducible Containers to Enable Scientific Computing

Container technologies such as Docker have become a crucial component of...
research
11/28/2022

High-performance xPU Stencil Computations in Julia

We present an efficient approach for writing architecture-agnostic paral...
research
04/19/2022

CPU- and GPU-based Distributed Sampling in Dirichlet Process Mixtures for Large-scale Analysis

In the realm of unsupervised learning, Bayesian nonparametric mixture mo...
research
03/28/2020

Sparse Matrix-Based HPC Tomography

Tomographic imaging has benefited from advances in X-ray sources, detect...

Please sign up or login with your details

Forgot password? Click here to reset