DeepAI AI Chat
Log In Sign Up

Trevor: Automatic configuration and scaling of stream processing pipelines

by   Manu Bansal, et al.

Operating a distributed data stream processing workload efficiently at scale is hard. The operator of the workload must parallelize and lay out tasks of the workload with resources that match the requirement of target data rate. The challenge is that neither the operator nor the programmer is typically aware of the scaling behavior of the workload as a function of resources. An operator manually searches for a safe operating point that can handle predicted peak load and deploys with ample headroom for absorbing unpredictable spikes. Such empirical, static over-provisioning is wasteful of both compute and human resources. We show that precise performance models can be automatically learned for distributed stream processing systems that can predict the execution performance of a job even before deployment. Further, those models can be used to optimally schedule logically specified jobs onto available physical hardware. Finally, those models and the derived execution schedules can be refined online to dynamically adapt to unpredictable changes in the runtime environment or auto-scale with variations in job load.


page 1

page 2

page 3

page 4


Collaborative Cluster Configuration for Distributed Data-Parallel Processing: A Research Overview

Many organizations routinely analyze large datasets using systems for di...

Elasticutor: Rapid Elasticity for Realtime Stateful Stream Processing

Elasticity is highly desirable for stream processing systems to guarante...

DMR API: Improving cluster productivity by turning applications into malleable

Adaptive workloads can change on–the–fly the configuration of their jobs...

Enel: Context-Aware Dynamic Scaling of Distributed Dataflow Jobs using Graph Propagation

Distributed dataflow systems like Spark and Flink enable the use of clus...

DynamiQ: Planning for Dynamics in Network Streaming Analytics Systems

The emergence of programmable data-plane targets has motivated a new hyb...

Evaluation of Load Prediction Techniques for Distributed Stream Processing

Distributed Stream Processing (DSP) systems enable processing large stre...