A Subset of the CERN Virtual Machine File System: Fast Delivering of Complex Software Stacks for Supercomputing Resources

03/29/2023
by   Alexandre F Boyer, et al.
0

Delivering a reproducible environment along with complex and up-to-date software stacks on thousands of distributed and heterogeneous worker nodes is a critical task. The CernVM-File System (CVMFS) has been designed to help various communities to deploy software on worldwide distributed computing infrastructures by decoupling the software from the Operating System. However, the installation of this file system depends on a collaboration with system administrators of the remote resources and an HTTP connectivity to fetch dependencies from external sources. Supercomputers, which offer tremendous computing power, generally have more restrictive policies than grid sites and do not easily provide the mandatory conditions to exploit CVMFS. Different solutions have been developed to tackle the issue, but they are often specific to a scientific community and do not deal with the problem in its globality. In this paper, we provide a generic utility to assist any community in the installation of complex software dependencies on supercomputers with no external connectivity. The approach consists in capturing dependencies of applications of interests, building a subset of dependencies, testing it in a given environment, and deploying it to a remote computing resource. We experiment this proposal with a real use case by exporting Gauss-a Monte-Carlo simulation program from the LHCb experiment-on Mare Nostrum, one of the top supercomputers of the world. We provide steps to encapsulate the minimum required files and deliver a light and easy-to-update subset of CVMFS: 12.4 Gigabytes instead of 5.2 Terabytes for the whole LHCb repository.

READ FULL TEXT

page 8

page 9

page 10

research
05/08/2023

BLAFS: A Bloat Aware File System

While there has been exponential improvements in hardware performance ov...
research
02/14/2020

Deploying large fixed file datasets with SquashFS and Singularity

Shared high-performance computing (HPC) platforms, such as those provide...
research
10/05/2020

An Easy-to-Use-and-Deploy Grid Computing Framework

A few grid-computing tools are available for public use. However, such s...
research
09/19/2012

Classification Of Heterogeneous Operating System

Operating system is a bridge between system and user. An operating syste...
research
03/27/2019

The XENON1T Data Distribution and Processing Scheme

The XENON experiment is looking for non-baryonic particle dark matter in...
research
04/17/2018

Deep Learning on Operational Facility Data Related to Large-Scale Distributed Area Scientific Workflows

Distributed computing platforms provide a robust mechanism to perform la...

Please sign up or login with your details

Forgot password? Click here to reset