Toward Real-time Analysis of Experimental Science Workloads on Geographically Distributed Supercomputers

05/13/2021
by   Michael Salim, et al.
0

Massive upgrades to science infrastructure are driving data velocities upwards while stimulating adoption of increasingly data-intensive analytics. While next-generation exascale supercomputers promise strong support for I/O-intensive workflows, HPC remains largely untapped by live experiments, because data transfers and disparate batch-queueing policies are prohibitive when faced with scarce instrument time. To bridge this divide, we introduce Balsam: a distributed orchestration platform enabling workflows at the edge to securely and efficiently trigger analytics tasks across a user-managed federation of HPC execution sites. We describe the architecture of the Balsam service, which provides a workflow management API, and distributed sites that provision resources and schedule scalable, fault-tolerant execution. We demonstrate Balsam in efficiently scaling real-time analytics from two DOE light sources simultaneously onto three supercomputers (Theta, Summit, and Cori), while maintaining low overheads for on-demand computing, and providing a Python library for seamless integration with existing ecosystems of data analysis tools.

READ FULL TEXT

page 3

page 6

page 7

page 8

research
09/05/2022

Rosetta: a container-centric science platform for resource-intensive, interactive data analysis

Rosetta is a science platform for resource-intensive, interactive data a...
research
04/27/2023

Developing Distributed High-performance Computing Capabilities of an Open Science Platform for Robust Epidemic Analysis

COVID-19 had an unprecedented impact on scientific collaboration. The pa...
research
02/04/2020

StreamFlow: cross-breeding cloud with HPC

Workflows are among the most commonly used tools in a variety of executi...
research
06/23/2022

The LBNL Superfacility Project Report

The Superfacility model is designed to leverage HPC for experimental sci...
research
06/07/2018

Dwarf in a Giant: Enabling Scalable, High-Resolution HPC Energy Monitoring for Real-Time Profiling and Analytics

Energy efficiency, predictive maintenance and security are today key cha...
research
12/01/2020

Python Workflows on HPC Systems

The recent successes and wide spread application of compute intensive ma...
research
08/14/2019

Serverless Supercomputing: High Performance Function as a Service for Science

Growing data volumes and velocities are driving exciting new methods acr...

Please sign up or login with your details

Forgot password? Click here to reset