Trevor: Automatic configuration and scaling of stream processing pipelines

12/22/2018
by   Manu Bansal, et al.
0

Operating a distributed data stream processing workload efficiently at scale is hard. The operator of the workload must parallelize and lay out tasks of the workload with resources that match the requirement of target data rate. The challenge is that neither the operator nor the programmer is typically aware of the scaling behavior of the workload as a function of resources. An operator manually searches for a safe operating point that can handle predicted peak load and deploys with ample headroom for absorbing unpredictable spikes. Such empirical, static over-provisioning is wasteful of both compute and human resources. We show that precise performance models can be automatically learned for distributed stream processing systems that can predict the execution performance of a job even before deployment. Further, those models can be used to optimally schedule logically specified jobs onto available physical hardware. Finally, those models and the derived execution schedules can be refined online to dynamically adapt to unpredictable changes in the runtime environment or auto-scale with variations in job load.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/01/2022

Collaborative Cluster Configuration for Distributed Data-Parallel Processing: A Research Overview

Many organizations routinely analyze large datasets using systems for di...
research
11/03/2017

Elasticutor: Rapid Elasticity for Realtime Stateful Stream Processing

Elasticity is highly desirable for stream processing systems to guarante...
research
09/06/2023

StreamBed: capacity planning for stream processing

StreamBed is a capacity planning system for stream processing. It predic...
research
05/12/2020

DMR API: Improving cluster productivity by turning applications into malleable

Adaptive workloads can change on–the–fly the configuration of their jobs...
research
08/27/2021

Enel: Context-Aware Dynamic Scaling of Distributed Dataflow Jobs using Graph Propagation

Distributed dataflow systems like Spark and Flink enable the use of clus...
research
08/10/2021

Evaluation of Load Prediction Techniques for Distributed Stream Processing

Distributed Stream Processing (DSP) systems enable processing large stre...
research
06/09/2021

DynamiQ: Planning for Dynamics in Network Streaming Analytics Systems

The emergence of programmable data-plane targets has motivated a new hyb...

Please sign up or login with your details

Forgot password? Click here to reset