R-Storm: Resource-Aware Scheduling in Storm

04/10/2019
by   Boyang Peng, et al.
0

The era of big data has led to the emergence of new systems for real-time distributed stream processing, e.g., Apache Storm is one of the most popular stream processing systems in industry today. However, Storm, like many other stream processing systems lacks an intelligent scheduling mechanism. The default round-robin scheduling currently deployed in Storm disregards resource demands and availability, and can therefore be inefficient at times. We present R-Storm (Resource-Aware Storm), a system that implements resource-aware scheduling within Storm. R-Storm is designed to increase overall throughput by maximizing resource utilization while minimizing network latency. When scheduling tasks, R-Storm can satisfy both soft and hard resource constraints as well as minimizing network distance between components that communicate with each other. We evaluate R-Storm on set of micro-benchmark Storm applications as well as Storm applications used in production at Yahoo! Inc. From our experimental results we conclude that R-Storm achieves 30-47 and 69-350 For the Yahoo! Storm applications, R-Storm outperforms default Storm by around 50 better when scheduling multiple Storm applications than default Storm.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/28/2020

A Scheduling Algorithm to Maximize Storm Throughput in Heterogeneous Cluster

In the most popular distributed stream processing frameworks (DSPFs), pr...
research
12/03/2018

Resource Management and Scheduling for Big Data Applications in Cloud Computing Environments

This chapter presents software architectures of the big data processing ...
research
12/19/2019

Resource- and Message Size-Aware Scheduling of Stream Processing at the Edge with application to Realtime Microscopy

Whilst computational resources at the cloud edge can be leveraged to imp...
research
06/11/2023

Scheduling of Intermittent Query Processing

Stream processing is usually done either on a tuple-by-tuple basis or in...
research
10/15/2021

Optimal Resource Scheduling and Allocation in Distributed Computing Systems

The essence of distributed computing systems is how to schedule incoming...
research
08/01/2020

POTUS: Predictive Online Tuple Scheduling for Data Stream Processing Systems

Most online service providers deploy their own data stream processing sy...
research
12/22/2018

Bioinformatics Computational Cluster Batch Task Profiling with Machine Learning for Failure Prediction

Motivation: Traditional computational cluster schedulers are based on us...

Please sign up or login with your details

Forgot password? Click here to reset