Characterizing the Performance of Executing Many-tasks on Summit

09/08/2019
by   Matteo Turilli, et al.
0

Many scientific workloads are comprised of many tasks, where each task is an independent simulation or analysis of data. The execution of millions of tasks on heterogeneous HPC platforms requires scalable dynamic resource management and multi-level scheduling. RADICAL-Pilot (RP) – an implementation of the Pilot abstraction, addresses these challenges and serves as an effective runtime system to execute workloads comprised of many tasks. In this paper, we characterize the performance of executing many tasks using RP when interfaced with JSM and PRRTE on Summit: RP is responsible for resource management and task scheduling on acquired resource; JSM or PRRTE enact the placement of launching of scheduled tasks. Our experiments provide lower bounds on the performance of RP when integrated with JSM and PRRTE. Specifically, for workloads comprised of homogeneous single-core, 15 minutes-long tasks we find that: PRRTE scales better than JSM for > O(1000) tasks; PRRTE overheads are negligible; and PRRTE supports optimizations that lower the impact of overheads and enable resource utilization of 63 404 compute nodes.

READ FULL TEXT

page 6

page 7

research
02/26/2021

Design and Performance Characterization of RADICAL-Pilot on Leadership-class Platforms

Many extreme scale scientific applications have workloads comprised of a...
research
01/05/2018

Design and Performance Characterization of RADICAL-Pilot on Titan

Many extreme scale scientific applications have workloads comprised of a...
research
05/27/2021

RADICAL-Pilot and Parsl: Executing Heterogeneous Workflows on HPC Platforms

Executing scientific workflows with heterogeneous tasks on HPC platforms...
research
09/18/2019

Balsam: Automated Scheduling and Execution of Dynamic, Data-Intensive HPC Workflows

We introduce the Balsam service to manage high-throughput task schedulin...
research
08/23/2022

Asynchronous Execution of Heterogeneous Tasks in AI-coupled HPC Workflows

Heterogeneous scientific workflows consist of numerous types of tasks an...
research
05/08/2021

Optimising Resource Management for Embedded Machine Learning

Machine learning inference is increasingly being executed locally on mob...
research
07/31/2020

Intelligent Management of Mobile Systems through Computational Self-Awareness

Runtime resource management for many-core systems is increasingly comple...

Please sign up or login with your details

Forgot password? Click here to reset