Merlin: Enabling Machine Learning-Ready HPC Ensembles

12/05/2019
by   J. Luc Peterson, et al.
0

With the growing complexity of computational and experimental facilities, many scientific researchers are turning to machine learning (ML) techniques to analyze large scale ensemble data. With complexities such as multi-component workflows, heterogeneous machine architectures, parallel file systems, and batch scheduling, care must be taken to facilitate this analysis in a high performance computing (HPC) environment. In this paper, we present Merlin, a workflow framework to enable large ML-friendly ensembles of scientific HPC simulations. By augmenting traditional HPC with distributed compute technologies, Merlin aims to lower the barrier for scientific subject matter experts to incorporate ML into their analysis. In addition to its design and some examples, we describe how Merlin was deployed on the Sierra Supercomputer at Lawrence Livermore National Laboratory to create an unprecedented benchmark inertial confinement fusion dataset of approximately 100 million individual simulations and over 24 terabytes of multi-modal physics-based scalar, vector and hyperspectral image data.

READ FULL TEXT
research
05/20/2020

Deploying Scientific AI Networks at Petaflop Scale on Secure Large Scale HPC Production Systems with Containers

There is an ever-increasing need for computational power to train comple...
research
04/13/2021

Using Machine Learning at Scale in HPC Simulations with SmartSim: An Application to Ocean Climate Modeling

We demonstrate the first climate-scale, numerical ocean simulations impr...
research
10/06/2021

Colmena: Scalable Machine-Learning-Based Steering of Ensemble Simulations for High Performance Computing

Scientific applications that involve simulation ensembles can be acceler...
research
07/13/2021

Transitioning from file-based HPC workflows to streaming data pipelines with openPMD and ADIOS2

This paper aims to create a transition path from file-based IO to stream...
research
08/20/2022

MLExchange: A web-based platform enabling exchangeable machine learning workflows

Machine learning (ML) algorithms are showing a growing trend in helping ...
research
05/26/2021

Towards Million-Server Network Simulations on Just a Laptop

The growing size of data center and HPC networks pose unprecedented requ...
research
04/19/2023

Green Carbon Footprint for Model Inference Serving via Exploiting Mixed-Quality Models and GPU Partitioning

This paper presents a solution to the challenge of mitigating carbon emi...

Please sign up or login with your details

Forgot password? Click here to reset