Is Disaggregation possible for HPC Cognitive Simulation?

12/09/2021
by   Michael R. Wyatt II, et al.
0

Cognitive simulation (CogSim) is an important and emerging workflow for HPC scientific exploration and scientific machine learning (SciML). One challenging workload for CogSim is the replacement of one component in a complex physical simulation with a fast, learned, surrogate model that is "inside" of the computational loop. The execution of this in-the-loop inference is particularly challenging because it requires frequent inference across multiple possible target models, can be on the simulation's critical path (latency bound), is subject to requests from multiple MPI ranks, and typically contains a small number of samples per request. In this paper we explore the use of large, dedicated Deep Learning / AI accelerators that are disaggregated from compute nodes for this CogSim workload. We compare the trade-offs of using these accelerators versus the node-local GPU accelerators on leadership-class HPC systems.

READ FULL TEXT
research
05/20/2020

Deploying Scientific AI Networks at Petaflop Scale on Secure Large Scale HPC Production Systems with Containers

There is an ever-increasing need for computational power to train comple...
research
12/16/2020

Container Orchestration on HPC Systems

Containerisation demonstrates its efficiency in application deployment i...
research
01/13/2022

Development and performance of a HemeLB GPU code for human-scale blood flow simulation

In recent years, it has become increasingly common for high performance ...
research
06/27/2023

A Survey on Deep Learning Hardware Accelerators for Heterogeneous HPC Platforms

Recent trends in deep learning (DL) imposed hardware accelerators as the...
research
10/01/2020

Supercomputing with MPI meets the Common Workflow Language standards: an experience report

Use of standards-based workflows is still somewhat unusual by high-perfo...
research
09/17/2021

Cross-layer Visualization and Profiling of Network and I/O Communication for HPC Clusters

Understanding and visualizing the full-stack performance trade-offs and ...
research
06/28/2022

Workflows to driving high-performance interactive supercomputing for urgent decision making

Interactive urgent computing is a small but growing user of supercomputi...

Please sign up or login with your details

Forgot password? Click here to reset