Design and Performance Characterization of RADICAL-Pilot on Titan

01/05/2018
by   Andre Merzky, et al.
0

Many extreme scale scientific applications have workloads comprised of a large number of individual high-performance tasks. The Pilot abstraction decouples workload specification, resource management, and task execution via job placeholders and late-binding. As such, suitable implementations of the Pilot abstraction can support the collective execution of large number of tasks on supercomputers. We introduce RADICAL-Pilot (RP) as a portable, modular and extensible Python-based Pilot system. We describe RP's design, architecture and implementation. We characterize its performance and show its ability to scalably execute workloads comprised of thousands of MPI tasks on Titan--a DOE leadership class facility. Specifically, we investigate RP's weak (strong) scaling properties up to 131K (65K) cores and 4096 (16384) 32 core tasks. RADICAL-Pilot can be used stand-alone, as well as integrated with other tools as a runtime system.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/26/2021

Design and Performance Characterization of RADICAL-Pilot on Leadership-class Platforms

Many extreme scale scientific applications have workloads comprised of a...
research
05/18/2023

The Graph Database Interface: Scaling Online Transactional and Analytical Graph Workloads to Hundreds of Thousands of Cores

Graph databases (GDBs) are crucial in academic and industry applications...
research
09/08/2019

Characterizing the Performance of Executing Many-tasks on Summit

Many scientific workloads are comprised of many tasks, where each task i...
research
04/07/2021

Pilot-Edge: Distributed Resource Management Along the Edge-to-Cloud Continuum

Many science and industry IoT applications necessitate data processing a...
research
05/27/2021

RADICAL-Pilot and Parsl: Executing Heterogeneous Workflows on HPC Platforms

Executing scientific workflows with heterogeneous tasks on HPC platforms...
research
09/18/2019

Balsam: Automated Scheduling and Execution of Dynamic, Data-Intensive HPC Workflows

We introduce the Balsam service to manage high-throughput task schedulin...
research
11/06/2017

Enabling rootless Linux Containers in multi-user environments: the udocker tool

Containers are increasingly used as means to distribute and run Linux se...

Please sign up or login with your details

Forgot password? Click here to reset