Running a Pre-Exascale, Geographically Distributed, Multi-Cloud Scientific Simulation

02/16/2020
by   Igor Sfiligoi, et al.
0

As we approach the Exascale era, it is important to verify that the existing frameworks and tools will still work at that scale. Moreover, public Cloud computing has been emerging as a viable solution for both prototyping and urgent computing. Using the elasticity of the Cloud, we have thus put in place a pre-exascale HTCondor setup for running a scientific simulation in the Cloud, with the chosen application being IceCube's photon propagation simulation. I.e. this was not a purely demonstration run, but it was also used to produce valuable and much needed scientific results for the IceCube collaboration. In order to reach the desired scale, we aggregated GPU resources across 8 GPU models from many geographic regions across Amazon Web Services, Microsoft Azure, and the Google Cloud Platform. Using this setup, we reached a peak of over 51k GPUs corresponding to almost 380 PFLOP32s, for a total integrated compute of about 100k GPU hours. In this paper we provide the description of the setup, the problems that were discovered and overcome, as well as a short description of the actual science output of the exercise.

READ FULL TEXT

page 9

page 17

research
04/18/2020

Demonstrating a Pre-Exascale, Cost-Effective Multi-Cloud Environment for Scientific Computing

Scientific computing needs are growing dramatically with time and are ex...
research
06/09/2020

Reproducible and Portable Workflows for Scientific Computing and HPC in the Cloud

The increasing availability of cloud computing services for science has ...
research
07/08/2021

Expanding IceCube GPU computing into the Clouds

The IceCube collaboration relies on GPU compute for many of its needs, i...
research
05/19/2022

Comparing single-node and multi-node performance of an important fusion HPC code benchmark

Fusion simulations have traditionally required the use of leadership sca...
research
05/22/2019

SciTokens: Demonstrating Capability-Based Access to Remote Scientific Data using HTCondor

The management of security credentials (e.g., passwords, secret keys) fo...
research
10/21/2020

Serverless Containers – rising viable approach to Scientific Workflows

Increasing popularity of the serverless computing approach has led to th...
research
03/08/2019

Application of Google Cloud Platform in Astrophysics

The availability of new Cloud Platform offered by Google motivated us to...

Please sign up or login with your details

Forgot password? Click here to reset