From Bare Metal to Virtual: Lessons Learned when a Supercomputing Institute Deploys its First Cloud

07/23/2018
by   Evan F. Bollig, et al.
0

As primary provider for research computing services at the University of Minnesota, the Minnesota Supercomputing Institute (MSI) has long been responsible for serving the needs of a user-base numbering in the thousands. In recent years, MSI---like many other HPC centers---has observed a growing need for self-service, on-demand, data-intensive research, as well as the emergence of many new controlled-access datasets for research purposes. In light of this, MSI constructed a new on-premise cloud service, named Stratus, which is architected from the ground up to easily satisfy data-use agreements and fill four gaps left by traditional HPC. The resulting OpenStack cloud, constructed from HPC-specific compute nodes and backed by Ceph storage, is designed to fully comply with controls set forth by the NIH Genomic Data Sharing Policy. Herein, we present twelve lessons learned during the ambitious sprint to take Stratus from inception and into production in less than 18 months. Important, and often overlooked, components of this timeline included the development of new leadership roles, staff and user training, and user support documentation. Along the way, the lessons learned extended well beyond the technical challenges often associated with acquiring, configuring, and maintaining large-scale systems.

READ FULL TEXT
research
07/23/2018

Leveraging OpenStack and Ceph for a Controlled-Access Data Cloud

While traditional HPC has and continues to satisfy most workflows, a new...
research
07/30/2021

Cloud to Ground Secured Computing: User Experiences on the Transition from Cloud-Based to Locally-Sited Hardware

The application of high-performance computing (HPC) processes, tools, an...
research
08/03/2022

The Case for Non-Volatile RAM in Cloud HPCaaS

HPC as a service (HPCaaS) is a new way to expose HPC resources via cloud...
research
02/13/2017

Data-Intensive Supercomputing in the Cloud: Global Analytics for Satellite Imagery

We present our experiences using cloud computing to support data-intensi...
research
09/24/2021

Aristotle Cloud Federation: Container Runtimes Technical Report

A National Science Foundation-sponsored container runtimes investigation...
research
07/12/2018

Virtualizing the Stampede2 Supercomputer with Applications to HPC in the Cloud

Methods developed at the Texas Advanced Computing Center (TACC) are desc...

Please sign up or login with your details

Forgot password? Click here to reset