Virtualizing the Stampede2 Supercomputer with Applications to HPC in the Cloud

07/12/2018
by   W. Cyrus Proctor, et al.
0

Methods developed at the Texas Advanced Computing Center (TACC) are described and demonstrated for automating the construction of an elastic, virtual cluster emulating the Stampede2 high performance computing (HPC) system. The cluster can be built and/or scaled in a matter of minutes on the Jetstream self-service cloud system and shares many properties of the original Stampede2, including: i) common identity management, ii) access to the same file systems, iii) equivalent software application stack and module system, iv) similar job scheduling interface via Slurm. We measure time-to-solution for a number of common scientific applications on our virtual cluster against equivalent runs on Stampede2 and develop an application profile where performance is similar or otherwise acceptable. For such applications, the virtual cluster provides an effective form of "cloud bursting" with the potential to significantly improve overall turnaround time, particularly when Stampede2 is experiencing long queue wait times. In addition, the virtual cluster can be used for test and debug without directly impacting Stampede2. We conclude with a discussion of how science gateways can leverage the TACC Jobs API web service to incorporate this cloud bursting technique transparently to the end user.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/26/2020

Self-Scaling Clusters and Reproducible Containers to Enable Scientific Computing

Container technologies such as Docker have become a crucial component of...
research
07/15/2023

PSI/J: A Portable Interface for Submitting, Monitoring, and Managing Jobs

It is generally desirable for high-performance computing (HPC) applicati...
research
09/06/2022

Deploying a sharded MongoDB cluster as a queued job on a shared HPC architecture

Data stores are the foundation on which data science, in all its variati...
research
02/17/2021

Deployment of Elastic Virtual Hybrid Clusters Across Cloud Sites

Virtual clusters are widely used computing platforms than can be deploye...
research
11/01/2022

Using Unused: Non-Invasive Dynamic FaaS Infrastructure with HPC-Whisk

Modern HPC workload managers and their careful tuning contribute to the ...
research
10/23/2018

LincoSim: a web based HPC-cloud platform for automatic virtual towing tank analysis

In this work, we present a new web based HPC-cloud platform for automati...
research
07/23/2018

From Bare Metal to Virtual: Lessons Learned when a Supercomputing Institute Deploys its First Cloud

As primary provider for research computing services at the University of...

Please sign up or login with your details

Forgot password? Click here to reset