Henge: Intent-driven Multi-Tenant Stream Processing

01/31/2018
by   Faria Kalim, et al.
0

We present Henge, a system to support intent-based multi-tenancy in modern stream processing applications. Henge supports multi-tenancy as a first-class citizen: everyone inside an organization can now submit their stream processing jobs to a single, shared, consolidated cluster. Additionally, Henge allows each tenant (job) to specify its own intents (i.e., requirements) as a Service Level Objective (SLO) that captures latency and/or throughput. In a multi-tenant cluster, the Henge scheduler adapts continually to meet jobs' SLOs in spite of limited cluster resources, and under dynamic input workloads. SLOs are soft and are based on utility functions. Henge continually tracks SLO satisfaction, and when jobs miss their SLOs, it wisely navigates the state space to perform resource allocations in real time, maximizing total system utility achieved by all jobs in the system. Henge is integrated in Apache Storm and we present experimental results using both production topologies and real datasets.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/14/2021

Hugo: A Cluster Scheduler that Efficiently Learns to Select Complementary Data-Parallel Jobs

Distributed data processing systems like MapReduce, Spark, and Flink are...
research
05/22/2019

Two stage cluster for resource optimization with Apache Mesos

As resource estimation for jobs is difficult, users often overestimate t...
research
05/12/2020

DMR API: Improving cluster productivity by turning applications into malleable

Adaptive workloads can change on–the–fly the configuration of their jobs...
research
08/10/2021

Evaluation of Load Prediction Techniques for Distributed Stream Processing

Distributed Stream Processing (DSP) systems enable processing large stre...
research
05/20/2018

Machine Learning for Predictive Analytics of Compute Cluster Jobs

We address the problem of predicting whether sufficient memory and CPU r...
research
06/20/2022

Phoebe: QoS-Aware Distributed Stream Processing through Anticipating Dynamic Workloads

Distributed Stream Processing systems have become an essential part of b...
research
05/22/2018

Cache-based Multi-query Optimization for Data-intensive Scalable Computing Frameworks

In modern large-scale distributed systems, analytics jobs submitted by v...

Please sign up or login with your details

Forgot password? Click here to reset