Machine Learning for Predictive Analytics of Compute Cluster Jobs

05/20/2018
by   Dan Andresen, et al.
0

We address the problem of predicting whether sufficient memory and CPU resources have been requested for jobs at submission time. For this purpose, we examine the task of training a supervised machine learning system to predict the outcome - whether the job will fail specifically due to insufficient resources - as a classification task. Sufficiently high accuracy, precision, and recall at this task facilitates more anticipatory decision support applications in the domain of HPC resource allocation. Our preliminary results using a new test bed show that the probability of failed jobs is associated with information freely available at job submission time and may thus be usable by a learning system for user modeling that gives personalized feedback to users.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/04/2023

DLRover: An Elastic Deep Training Extension with Auto Job Resource Recommendation

The cloud is still a popular platform for distributed deep learning (DL)...
research
07/17/2018

Discovering Job Preemptions in the Open Science Grid

The Open Science Grid(OSG) is a world-wide computing system which facili...
research
06/28/2022

Get Your Memory Right: The Crispy Resource Allocation Assistant for Large-Scale Data Processing

Distributed dataflow systems like Apache Spark and Apache Hadoop enable ...
research
06/25/2020

Sequence-to-sequence models for workload interference

Co-scheduling of jobs in data-centers is a challenging scenario, where j...
research
04/01/2021

Allocation of Fungible Resources via a Fast, Scalable Price Discovery Method

We consider the problem of assigning or allocating resources to a set of...
research
04/07/2023

Runtime Variation in Big Data Analytics

The dynamic nature of resource allocation and runtime conditions on Clou...
research
01/31/2018

Henge: Intent-driven Multi-Tenant Stream Processing

We present Henge, a system to support intent-based multi-tenancy in mode...

Please sign up or login with your details

Forgot password? Click here to reset