Efficient Runtime Profiling for Black-box Machine Learning Services on Sensor Streams

03/10/2022
by   Soeren Becker, et al.
0

In highly distributed environments such as cloud, edge and fog computing, the application of machine learning for automating and optimizing processes is on the rise. Machine learning jobs are frequently applied in streaming conditions, where models are used to analyze data streams originating from e.g. video streams or sensory data. Often the results for particular data samples need to be provided in time before the arrival of next data. Thus, enough resources must be provided to ensure the just-in-time processing for the specific data stream. This paper focuses on proposing a runtime modeling strategy for containerized machine learning jobs, which enables the optimization and adaptive adjustment of resources per job and component. Our black-box approach assembles multiple techniques into an efficient runtime profiling method, while making no assumptions about underlying hardware, data streams, or applied machine learning jobs. The results show that our method is able to capture the general runtime behaviour of different machine learning jobs already after a short profiling phase.

READ FULL TEXT

page 1

page 3

page 7

research
07/17/2018

Discovering Job Preemptions in the Open Science Grid

The Open Science Grid(OSG) is a world-wide computing system which facili...
research
05/30/2018

Predictive Performance Modeling for Distributed Computing using Black-Box Monitoring and Machine Learning

In many domains, the previous decade was characterized by increasing dat...
research
03/09/2019

Machine Learning Based Prediction and Classification of Computational Jobs in Cloud Computing Centers

With the rapid growth of the data volume and the fast increasing of the ...
research
01/28/2021

tf.data: A Machine Learning Data Processing Framework

Training machine learning models requires feeding input data for models ...
research
08/27/2021

Enel: Context-Aware Dynamic Scaling of Distributed Dataflow Jobs using Graph Propagation

Distributed dataflow systems like Spark and Flink enable the use of clus...
research
12/14/2022

Monte-Carlo Tree-Search for Leveraging Performance of Blackbox Job-Shop Scheduling Heuristics

In manufacturing, the production is often done on out-of-the-shelf manuf...
research
11/09/2020

TrimTuner: Efficient Optimization of Machine Learning Jobs in the Cloud via Sub-Sampling

This work introduces TrimTuner, the first system for optimizing machine ...

Please sign up or login with your details

Forgot password? Click here to reset