Deploying large fixed file datasets with SquashFS and Singularity

02/14/2020
by   Pierre Rioux, et al.
0

Shared high-performance computing (HPC) platforms, such as those provided by XSEDE and Compute Canada, enable researchers to carry out large-scale computational experiments at a fraction of the cost of the cloud. Most systems require the use of distributed filesystems (e.g. Lustre) for providing a highly multi-user, large capacity storage environment. These suffer performance penalties as the number of files increases due to network contention and metadata performance. We demonstrate how a combination of two technologies, Singularity and SquashFS, can help developers, integrators, architects, and scientists deploy large datasets (O(10M) files) on these shared systems with minimal performance limitations. The proposed integration enables more efficient access and indexing than normal file-based dataset installations, while providing transparent file access to users and processes. Furthermore, the approach does not require administrative privileges on the target system. While the examples studied here have been taken from the field of neuroimaging, the technologies adopted are not specific to that field. Currently, this solution is limited to read-only datasets. We propose the adoption of this technology for the consumption and dissemination of community datasets across shared computing resources.

READ FULL TEXT
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

11/08/2019

CFS: A Distributed File System for Large Scale Container Platforms

We propose CFS, a distributed file system for large scale container plat...
01/01/2010

A distributed file system for a wide-area high performance computing infrastructure

We describe our work in implementing a wide-area distributed file system...
12/04/2021

Towards Aggregated Asynchronous Checkpointing

High-Performance Computing (HPC) applications need to checkpoint massive...
09/16/2020

A FaaS File System for Serverless Computing

Serverless computing with cloud functions is quickly gaining adoption, b...
05/12/2018

Deploying Jupyter Notebooks at scale on XSEDE for Science Gateways and workshops

Jupyter Notebooks have become a mainstream tool for interactive computin...
05/12/2018

Deploying Jupyter Notebooks at scale on XSEDE resources for Science Gateways and workshops

Jupyter Notebooks have become a mainstream tool for interactive computin...
09/02/2018

A Serverless Tool for Platform Agnostic Computational Experiment Management

Neuroscience has been carried into the domain of big data and high perfo...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.