Deploying Scientific AI Networks at Petaflop Scale on Secure Large Scale HPC Production Systems with Containers

05/20/2020
by   David Brayford, et al.
0

There is an ever-increasing need for computational power to train complex artificial intelligence (AI) machine learning (ML) models to tackle large scientific problems. High performance computing (HPC) resources are required to efficiently compute and scale complex models across tens of thousands of compute nodes. In this paper, we discuss the issues associated with the deployment of machine learning frameworks on large scale secure HPC systems and how we successfully deployed a standard machine learning framework on a secure large scale HPC production system, to train a complex three-dimensional convolutional GAN (3DGAN), with petaflop performance. 3DGAN is an example from the high energy physics domain, designed to simulate the energy pattern produced by showers of secondary particles inside a particle detector on various HPC systems.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/24/2019

Deploying AI Frameworks on Secure HPC Systems with Containers

The increasing interest in the usage of Artificial Intelligence techniqu...
research
07/14/2021

Higgs Boson Classification: Brain-inspired BCPNN Learning with StreamBrain

One of the most promising approaches for data analysis and exploration o...
research
11/23/2020

Integrating Deep Learning in Domain Sciences at Exascale

This paper presents some of the current challenges in designing deep lea...
research
12/05/2019

Merlin: Enabling Machine Learning-Ready HPC Ensembles

With the growing complexity of computational and experimental facilities...
research
12/09/2021

Is Disaggregation possible for HPC Cognitive Simulation?

Cognitive simulation (CogSim) is an important and emerging workflow for ...
research
06/09/2021

StreamBrain: An HPC Framework for Brain-like Neural Networks on CPUs, GPUs and FPGAs

The modern deep learning method based on backpropagation has surged in p...
research
03/02/2022

Hyperparameter optimization of data-driven AI models on HPC systems

In the European Center of Excellence in Exascale computing "Research on ...

Please sign up or login with your details

Forgot password? Click here to reset