A Kubernetes 'Bridge' operator between cloud and external resources

07/06/2022
by   Boris Lublinsky, et al.
0

Many scientific workflows require dedicated compute resources, including HPC clusters with optimized software, quantum resources, and dedicated hardware cluster systems like Ray, for example. At the same time, many scientific workflows today are built on Kubernetes leveraging growing support for workflow and support tools. To address the growing demand to support workflows on both cloud and dedicated compute resources we present the Bridge Operator, a software extension for container orchestration in Kubernetes which facilitates the submission and monitoring of long running processes on external systems which have their own cluster resources manager (SLURM, LSF, quantum services and Ray). The Bridge Operator consists of a custom Kubernetes controller that employs a Kubernetes Custom Resource Definition to manage applications. We present controller logic to manage the cloud container orchestration and external resource workload manager interface, a resource definition to submit HTTP/HTTPS requests to the external resource, and a controller pod communicating with the external resource manager to submit and manage job execution. The implementation allows us to mirror the external resource in Kubernetes pods, which allows the operator to use these pods as proxies to control the external system. The implementation is agnostic to the choice of resource manager but assumes the system exposes a HTTP/HTTPS API for its control/management. The Bridge Operator automates the role of a human operator running jobs on a black box external resource as part of a complex hybrid workflow on the Cloud.

READ FULL TEXT
research
01/26/2015

JMS: A workflow management system and web-based cluster front-end for the Torque resource manager

Motivation: Complex computational pipelines are becoming a staple of mod...
research
08/22/2023

Demand-driven provisioning of Kubernetes-like resources in OSG

The OSG-operated Open Science Pool is an HTCondor-based virtual cluster ...
research
11/09/2021

Tarema: Adaptive Resource Allocation for Scalable Scientific Workflows in Heterogeneous Clusters

Scientific workflow management systems like Nextflow support large-scale...
research
01/11/2018

A Software-defined SoC Memory Bus Bridge Architecture for Disaggregated Computing

Disaggregation and rack-scale systems have the potential of drastically ...
research
04/20/2020

Improving Resources Management in Network Virtualization by Utilizing a Software-Based Network

Network virtualization is a way to simultaneously run multiple heterogen...
research
03/24/2022

Quantum Computing in the Cloud: Analyzing job and machine characteristics

As the popularity of quantum computing continues to grow, quantum machin...
research
08/28/2022

Adapting the LodView RDF Browser for Navigation over the Multilingual Linguistic Linked Open Data Cloud

The paper is dedicated to the use of LodView for navigation over the mul...

Please sign up or login with your details

Forgot password? Click here to reset