MARS: Multi-Scalable Actor-Critic Reinforcement Learning Scheduler

05/04/2020
by   Betis Baheri, et al.
0

In this paper, we introduce a new scheduling algorithm MARS based on a cost-aware multi-scalable reinforcement learning approach, which serves as an intermediate layer between HPC resource manager and user application workflow, MARS ensembles the pre-generated models from users workflows and decides on the most suitable strategy for optimization. A whole workflow application would be split into several optimized subtasks. Then based on a pre-defined resource management plan. A reward will be generated after executing a scheduled task. Lastly, MARS updates the Deep Neural Network (DNN) model for future use. MARS is designed to be able to optimize the existing models through the reinforcement mechanism. MARS can adapt to the shortage of training samples and optimize the performance by itself, especially through combining the small tasks together or switching between pre-built scheduling strategy such as Backfilling, SJF, etc, then choosing the most suitable approach. We tested MARS using different real-world workflow traces. MARS can achieve between 5 better performance while comparing to the other approaches.

READ FULL TEXT

page 3

page 6

page 10

research
07/04/2022

KubeAdaptor: A Docking Framework for Workflow Containerization on Kubernetes

As Kubernetes becomes the infrastructure of the cloud-native era, the in...
research
01/31/2020

A Deep Reinforcement Learning Approach to Concurrent Bilateral Negotiation

We present a novel negotiation model that allows an agent to learn how t...
research
11/22/2022

Leveraging Reinforcement Learning for Task Resource Allocation in Scientific Workflows

Scientific workflows are designed as directed acyclic graphs (DAGs) and ...
research
09/16/2019

Job Scheduling on Data Centers with Deep Reinforcement Learning

Efficient job scheduling on data centers under heterogeneous complexity ...
research
09/16/2019

Data Centers Job Scheduling with Deep Reinforcement Learning

Efficient job scheduling on data centers under heterogeneous complexity ...
research
01/09/2022

An Adaptive Device-Edge Co-Inference Framework Based on Soft Actor-Critic

Recently, the applications of deep neural network (DNN) have been very p...
research
04/14/2022

Analysis of Workflow Schedulers in Simulated Distributed Environments

Task graphs provide a simple way to describe scientific workflows (sets ...

Please sign up or login with your details

Forgot password? Click here to reset