Node-Based Job Scheduling for Large Scale Simulations of Short Running Jobs

08/25/2021
by   Chansup Byun, et al.
0

Diverse workloads such as interactive supercomputing, big data analysis, and large-scale AI algorithm development, requires a high-performance scheduler. This paper presents a novel node-based scheduling approach for large scale simulations of short running jobs on MIT SuperCloud systems, that allows the resources to be fully utilized for both long running batch jobs while simultaneously providing fast launch and release of large-scale short running jobs. The node-based scheduling approach has demonstrated up to 100 times faster scheduler performance that other state-of-the-art systems.

READ FULL TEXT
research
08/05/2020

Best of Both Worlds: High Performance Interactive and Batch Launching

Rapid launch of thousands of jobs is essential for effective interactive...
research
05/22/2018

DRESS: Dynamic RESource-reservation Scheme for Congested Data-intensive Computing Platforms

In the past few years, we have envisioned an increasing number of busine...
research
09/03/2019

Large Scale Parallelization Using File-Based Communications

In this paper, we present a novel and new file-based communication archi...
research
09/03/2022

HammingMesh: A Network Topology for Large-Scale Deep Learning

Numerous microarchitectural optimizations unlocked tremendous processing...
research
04/17/2019

Terra: Scalable Cross-Layer GDA Optimizations

Geo-distributed analytics (GDA) frameworks transfer large datasets over ...
research
12/07/2019

BoPF: Mitigating the Burstiness-Fairness Tradeoff in Multi-Resource Clusters

Simultaneously supporting latency- and throughout-sensitive workloads in...
research
05/02/2016

Highly Accurate Prediction of Jobs Runtime Classes

Separating the short jobs from the long is a known technique to improve ...

Please sign up or login with your details

Forgot password? Click here to reset