A Scheduling Algorithm to Maximize Storm Throughput in Heterogeneous Cluster

01/28/2020
by   Saeed Nasehi, et al.
0

In the most popular distributed stream processing frameworks (DSPFs), programs are modeled as a directed acyclic graph. This model allows a DSPF to benefit from the parallelism power of distributed clusters. However, choosing the proper number of vertices for each operator and finding an appropriate mapping between these vertices and processing resources have a determinative effect on overall throughput and resource utilization; while the simplicity of current DSPFs' schedulers leads these frameworks to perform poorly on large-scale clusters. In this paper, we present the design and implementation of a heterogeneity-aware scheduling algorithm that finds the proper number of the vertices of an application graph and maps them to the most suitable cluster node. We start to scale up the application graph over a given cluster gradually, by increasing the topology input rate and taking new instances from bottlenecked vertices. Our experimental results on Storm Micro-Benchmark show that 1) the prediction model estimate CPU utilization with 92 Compared to default scheduler of Storm, our scheduler provides 7 throughput enhancement. 3) The proposed method can find the solution within 4 (worst case) of the optimal scheduler which obtains the best scheduling scenario using an exhaustive search on problem design space.

READ FULL TEXT
research
04/10/2019

R-Storm: Resource-Aware Scheduling in Storm

The era of big data has led to the emergence of new systems for real-tim...
research
05/21/2019

Exploring the Fairness and Resource Distribution in an Apache Mesos Environment

Apache Mesos, a cluster-wide resource manager, is widely deployed in mas...
research
02/09/2019

Linear Time Algorithms for Multiple Cluster Scheduling and Multiple Strip Packing

We study the Multiple Cluster Scheduling problem and the Multiple Strip ...
research
09/09/2020

CASH: A Credit Aware Scheduling for Public Cloud Platforms

The public cloud offers a myriad of services which allows its tenants to...
research
10/22/2021

GCNScheduler: Scheduling Distributed Computing Applications using Graph Convolutional Networks

We consider the classical problem of scheduling task graphs correspondin...
research
01/17/2021

Tailored Learning-Based Scheduling for Kubernetes-Oriented Edge-Cloud System

Kubernetes (k8s) has the potential to merge the distributed edge and the...
research
08/26/2022

Affinity-Aware Resource Provisioning for Long-Running Applications in Shared Clusters

Resource provisioning plays a pivotal role in determining the right amou...

Please sign up or login with your details

Forgot password? Click here to reset