Scylla: A Mesos Framework for Container Based MPI Jobs

05/20/2019
by   Pankaj Saha, et al.
0

Open source cloud technologies provide a wide range of support for creating customized compute node clusters to schedule tasks and managing resources. In cloud infrastructures such as Jetstream and Chameleon, which are used for scientific research, users receive complete control of the Virtual Machines (VM) that are allocated to them. Importantly, users get root access to the VMs. This provides an opportunity for HPC users to experiment with new resource management technologies such as Apache Mesos that have proven scalability, flexibility, and fault tolerance. To ease the development and deployment of HPC tools on the cloud, the containerization technology has matured and is gaining interest in the scientific community. In particular, several well known scientific code bases now have publicly available Docker containers. While Mesos provides support for Docker containers to execute individually, it does not provide support for container inter-communication or orchestration of the containers for a parallel or distributed application. In this paper, we present the design, implementation, and performance analysis of a Mesos framework, Scylla, which integrates Mesos with Docker Swarm to enable orchestration of MPI jobs on a cluster of VMs acquired from the Chameleon cloud [1]. Scylla uses Docker Swarm for communication between containerized tasks (MPI processes) and Apache Mesos for resource pooling and allocation. Scylla allows a policy-driven approach to determine how the containers should be distributed across the nodes depending on the CPU, memory, and network throughput requirement for each application.

READ FULL TEXT

page 1

page 3

page 4

page 5

page 6

research
05/21/2019

Evaluation of Docker Containers for Scientific Workloads in the Cloud

The HPC community is actively researching and evaluating tools to suppor...
research
06/26/2020

Self-Scaling Clusters and Reproducible Containers to Enable Scientific Computing

Container technologies such as Docker have become a crucial component of...
research
09/28/2017

Performance Evaluation of Container-based Virtualization for High Performance Computing Environments

Virtualization technologies have evolved along with the development of c...
research
02/13/2021

MATCH: An MPI Fault Tolerance Benchmark Suite

MPI has been ubiquitously deployed in flagship HPC systems aiming to acc...
research
07/06/2023

Applying Process Mining on Scientific Workflows: a Case Study

Computer-based scientific experiments are becoming increasingly data-int...
research
05/12/2018

Deploying Jupyter Notebooks at scale on XSEDE resources for Science Gateways and workshops

Jupyter Notebooks have become a mainstream tool for interactive computin...
research
10/24/2016

Savu: A Python-based, MPI Framework for Simultaneous Processing of Multiple, N-dimensional, Large Tomography Datasets

Diamond Light Source (DLS), the UK synchrotron facility, attracts scient...

Please sign up or login with your details

Forgot password? Click here to reset