Deploying AI Frameworks on Secure HPC Systems with Containers

by   David Brayford, et al.

The increasing interest in the usage of Artificial Intelligence techniques (AI) from the research community and industry to tackle "real world" problems, requires High Performance Computing (HPC) resources to efficiently compute and scale complex algorithms across thousands of nodes. Unfortunately, typical data scientists are not familiar with the unique requirements and characteristics of HPC environments. They usually develop their applications with high-level scripting languages or frameworks such as TensorFlow and the installation process often requires connection to external systems to download open source software during the build. HPC environments, on the other hand, are often based on closed source applications that incorporate parallel and distributed computing API's such as MPI and OpenMP, while users have restricted administrator privileges, and face security restrictions such as not allowing access to external systems. In this paper we discuss the issues associated with the deployment of AI frameworks in a secure HPC environment and how we successfully deploy AI frameworks on SuperMUC-NG with Charliecloud.




According to the article:

"Although Singularity has been developed to run in a non-privileged namespace, security issues have arisen on a test system at LRZ where users have escalated their privileges and the system had to be taken out of service."

If this is true, and you found a security issue, why did you not tell the Singularity developers? In looking through all security announcements and disclosures from the Singularity project, I don't see any that credit you or LRZ.  Additionally, Singularity, has a completely privilege-less mode using the "user namespace" and you can disable support for SquashFS/SIF images. Seems like a statement about "security breeches" and removal from the system is unjustified and overly dramatic.


page 3


Deploying Scientific AI Networks at Petaflop Scale on Secure Large Scale HPC Production Systems with Containers

There is an ever-increasing need for computational power to train comple...

Secure Platform for Processing Sensitive Data on Shared HPC Systems

High performance computing clusters operating in shared and batch mode p...

Integrating Deep Learning in Domain Sciences at Exascale

This paper presents some of the current challenges in designing deep lea...

Deploying Containerized QuantEx Quantum Simulation Software on HPC Systems

The simulation of quantum circuits using the tensor network method is ve...

A Serverless Tool for Platform Agnostic Computational Experiment Management

Neuroscience has been carried into the domain of big data and high perfo...

Bringing AI pipelines onto cloud-HPC: setting a baseline for accuracy of COVID-19 AI diagnosis

HPC is an enabling platform for AI. The introduction of AI workloads in ...

Reproducible and User-Controlled Software Environments in HPC with Guix

Support teams of high-performance computing (HPC) systems often find the...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.