Best of Both Worlds: High Performance Interactive and Batch Launching

08/05/2020
by   Chansup Byun, et al.
0

Rapid launch of thousands of jobs is essential for effective interactive supercomputing, big data analysis, and AI algorithm development. Achieving thousands of launches per second has required hardware to be available to receive these jobs. This paper presents a novel preemptive approach to implement spot jobs on MIT SuperCloud systems allowing the resources to be fully utilized for both long running batch jobs while still providing fast launch for interactive jobs. The new approach separates the job preemption and scheduling operations and can achieve 100 times faster performance in the scheduling of a job with preemption when compared to using the standard scheduler-provided automatic preemption-based capability. The results demonstrate that the new approach can schedule interactive jobs preemptively at a performance comparable to when the required computing resources are idle and available. The spot job capability can be deployed without disrupting the interactive user experience while increasing the overall system utilization.

READ FULL TEXT
research
08/25/2021

Node-Based Job Scheduling for Large Scale Simulations of Short Running Jobs

Diverse workloads such as interactive supercomputing, big data analysis,...
research
08/23/2019

Griffon: Reasoning about Job Anomalies with Unlabeled Data in Cloud-based Platforms

Microsoft's internal big data analytics platform is comprised of hundred...
research
03/24/2021

Towards Accommodating Real-time Jobs on HPC Platforms

Increasing data volumes in scientific experiments necessitate the use of...
research
05/22/2018

DRESS: Dynamic RESource-reservation Scheme for Congested Data-intensive Computing Platforms

In the past few years, we have envisioned an increasing number of busine...
research
12/31/2021

BatchLens: A Visualization Approach for Analyzing Batch Jobs in Cloud Systems

Cloud systems are becoming increasingly powerful and complex. It is high...
research
09/03/2019

Large Scale Parallelization Using File-Based Communications

In this paper, we present a novel and new file-based communication archi...
research
02/06/2023

Optimization of Topology-Aware Job Allocation on a High-Performance Computing Cluster by Neural Simulated Annealing

Jobs on high-performance computing (HPC) clusters can suffer significant...

Please sign up or login with your details

Forgot password? Click here to reset