Three Practical Workflow Schedulers for Easy Maximum Parallelism

10/21/2021
by   David M. Rogers, et al.
0

Runtime scheduling and workflow systems are an increasingly popular algorithmic component in HPC because they allow full system utilization with relaxed synchronization requirements. There are so many special-purpose tools for task scheduling, one might wonder why more are needed. Use cases seen on the Summit supercomputer needed better integration with MPI and greater flexibility in job launch configurations. Preparation, execution, and analysis of computational chemistry simulations at the scale of tens of thousands of processors revealed three distinct workflow patterns. A separate job scheduler was implemented for each one using extremely simple and robust designs: file-based, task-list based, and bulk-synchronous. Comparing to existing methods shows unique benefits of this work, including simplicity of design, suitability for HPC centers, short startup time, and well-understood per-task overhead. All three new tools have been shown to scale to full utilization of Summit, and have been made publicly available with tests and documentation. This work presents a complete characterization of the minimum effective task granularity for efficient scheduler usage scenarios. These schedulers have the same bottlenecks, and hence similar task granularities as those reported for existing tools following comparable paradigms.

READ FULL TEXT
research
10/01/2020

Supercomputing with MPI meets the Common Workflow Language standards: an experience report

Use of standards-based workflows is still somewhat unusual by high-perfo...
research
07/29/2019

Staged deployment of interactive multi-application HPC workflows

Running scientific workflows on a supercomputer can be a daunting task f...
research
01/12/2018

A Workload Analysis of NSF's Innovative HPC Resources Using XDMoD

Workload characterization is an integral part of performance analysis of...
research
07/26/2018

Jupyter as Common Technology Platform for Interactive HPC Services

The Minnesota Supercomputing Institute has implemented Jupyterhub and th...
research
09/18/2019

Balsam: Automated Scheduling and Execution of Dynamic, Data-Intensive HPC Workflows

We introduce the Balsam service to manage high-throughput task schedulin...
research
10/27/2020

Comparing Workflow Application Designs for High Resolution Satellite Image Analysis

Very High Resolution satellite and aerial imagery are used to monitor an...
research
09/15/2023

Speeding up charge exchange recombination spectroscopy analysis in support of NERSC/DIII-D realtime workflow

We report optimization work made in support of the development of a real...

Please sign up or login with your details

Forgot password? Click here to reset