SWARM: Adaptive Load Balancing in Distributed Streaming Systems for Big Spatial Data

02/27/2020
by   Anas Daghistani, et al.
0

The proliferation of GPS-enabled devices has led to the development of numerous location-based services. These services need to process massive amounts of spatial data in real-time. The current scale of spatial data cannot be handled using centralized systems. This has led to the development of distributed spatial streaming systems. Existing systems are using static spatial partitioning to distribute the workload. In contrast, the real-time streamed spatial data follows non-uniform spatial distributions that are continuously changing over time. Distributed spatial streaming systems need to react to the changes in the distribution of spatial data and queries. This paper introduces SWARM, a light-weight adaptivity protocol that continuously monitors the data and query workloads across the distributed processes of the spatial data streaming system, and redistribute and rebalance the workloads soon as performance bottlenecks get detected. SWARM is able to handle multiple query-execution and data-persistence models. A distributed streaming system can directly use SWARM to adaptively rebalance the system's workload among its machines with minimal changes to the original code of the underlying spatial application. Extensive experimental evaluation using real and synthetic datasets illustrate that, on average, SWARM achieves 200 static grid partitioning that is determined based on observing a limited history of the data and query workloads. Moreover, SWARM reduces execution latency on average 4x compared with the other technique.

READ FULL TEXT

page 2

page 6

page 7

page 9

page 12

page 17

page 25

page 26

research
04/07/2020

GeoFlink: A Distributed and Scalable Framework for the Real-time Processing of Spatial Streams

Apache Flink is an open-source system for scalable processing of batch a...
research
06/09/2021

DynamiQ: Planning for Dynamics in Network Streaming Analytics Systems

The emergence of programmable data-plane targets has motivated a new hyb...
research
05/31/2021

System-aware dynamic partitioning for batch and streaming workloads

When processing data streams with highly skewed and nonstationary key di...
research
05/12/2022

Query Complexity Based Optimal Processing of Raw Data

The paper aims to find an efficient way for processing large datasets ha...
research
07/08/2022

Zero-Shot Cost Models for Distributed Stream Processing

This paper proposes a learned cost estimation model for Distributed Stre...
research
06/05/2019

Evaluating Geospatial RDF stores Using the Benchmark Geographica 2

Since 2007, geospatial extensions of SPARQL, like GeoSPARQL and stSPARQL...
research
08/28/2022

Reshape: Adaptive Result-aware Skew Handling for Exploratory Analysis on Big Data

The process of data analysis, especially in GUI-based analytics systems,...

Please sign up or login with your details

Forgot password? Click here to reset