Quantitative Impact Evaluation of an Abstraction Layer for Data Stream Processing Systems

07/18/2019
by   Guenter Hesse, et al.
0

With the demand to process ever-growing data volumes, a variety of new data stream processing frameworks have been developed. Moving an implementation from one such system to another, e.g., for performance reasons, requires adapting existing applications to new interfaces. Apache Beam addresses these high substitution costs by providing an abstraction layer that enables executing programs on any of the supported streaming frameworks. In this paper, we present a novel benchmark architecture for comparing the performance impact of using Apache Beam on three streaming frameworks: Apache Spark Streaming, Apache Flink, and Apache Apex. We find significant performance penalties when using Apache Beam for application development in the surveyed systems. Overall, usage of Apache Beam for the examined streaming applications caused a high variance of query execution times with a slowdown of up to a factor of 58 compared to queries developed without the abstraction layer. All developed benchmark artifacts are publicly available to ensure reproducible results.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/11/2021

ESPBench: The Enterprise Stream Processing Benchmark

Growing data volumes and velocities in fields such as Industry 4.0 or th...
research
07/20/2018

Apache Spark Streaming and HarmonicIO: A Performance and Architecture Comparison

Studies have demonstrated that Apache Spark, Flink and related framework...
research
03/20/2023

Benchmarking scalability of stream processing frameworks deployed as event-driven microservices in the cloud

Event-driven microservices are an emerging architectural style for data-...
research
01/26/2018

Pilot-Streaming: A Stream Processing Framework for High-Performance Computing

An increasing number of scientific applications rely on stream processin...
research
07/30/2021

A Framework for Adversarial Streaming via Differential Privacy and Difference Estimators

Streaming algorithms are algorithms for processing large data streams, u...
research
01/29/2020

Smart Resource Management for Data Streaming using an Online Bin-packing Strategy

Data stream processing frameworks provide reliable and efficient mechani...
research
07/01/2022

QoE-Centric Multi-User mmWave Scheduling: A Beam Alignment and Buffer Predictive Approach

In this paper, we consider the multi-user scheduling problem in millimet...

Please sign up or login with your details

Forgot password? Click here to reset