Evaluation of Load Prediction Techniques for Distributed Stream Processing

08/10/2021
by   Kordian Gontarska, et al.
0

Distributed Stream Processing (DSP) systems enable processing large streams of continuous data to produce results in near to real time. They are an essential part of many data-intensive applications and analytics platforms. The rate at which events arrive at DSP systems can vary considerably over time, which may be due to trends, cyclic, and seasonal patterns within the data streams. A priori knowledge of incoming workloads enables proactive approaches to resource management and optimization tasks such as dynamic scaling, live migration of resources, and the tuning of configuration parameters during run-times, thus leading to a potentially better Quality of Service. In this paper we conduct a comprehensive evaluation of different load prediction techniques for DSP jobs. We identify three use-cases and formulate requirements for making load predictions specific to DSP jobs. Automatically optimized classical and Deep Learning methods are being evaluated on nine different datasets from typical DSP domains, i.e. the IoT, Web 2.0, and cluster monitoring. We compare model performance with respect to overall accuracy and training duration. Our results show that the Deep Learning methods provide the most accurate load predictions for the majority of the evaluated datasets.

READ FULL TEXT

page 1

page 6

page 7

research
06/20/2022

Phoebe: QoS-Aware Distributed Stream Processing through Anticipating Dynamic Workloads

Distributed Stream Processing systems have become an essential part of b...
research
07/26/2023

Evaluation of Data Enrichment Methods for Distributed Stream Processing Systems

Stream processing has become a critical component in the architecture of...
research
01/31/2018

Henge: Intent-driven Multi-Tenant Stream Processing

We present Henge, a system to support intent-based multi-tenancy in mode...
research
06/24/2020

Effective Elastic Scaling of Deep Learning Workloads

The increased use of deep learning (DL) in academia, government and indu...
research
12/22/2018

Trevor: Automatic configuration and scaling of stream processing pipelines

Operating a distributed data stream processing workload efficiently at s...
research
09/14/2018

Auto-tuning Distributed Stream Processing Systems using Reinforcement Learning

Fine tuning distributed systems is considered to be a craftsmanship, rel...
research
02/11/2020

pSPICE: Partial Match Shedding for Complex Event Processing

Complex event processing (CEP) systems continuously process input event ...

Please sign up or login with your details

Forgot password? Click here to reset