Apache Spark Streaming and HarmonicIO: A Performance and Architecture Comparison

07/20/2018
by   Ben Blamey, et al.
0

Studies have demonstrated that Apache Spark, Flink and related frameworks can perform stream processing at very high frequencies, whilst tending to focus on small messages with a computationally light `map' stage for each message; a common enterprise use case. We add to these benchmarks by broadening the domain to include loads with larger messages (leading to network-bound throughput), and that are computationally intensive (leading to CPU-bound throughput) in the map phase; in order to evaluate applicability of these frameworks to scientific computing applications. We present a performance benchmark comparison between Apache Spark Streaming (ASS) under both file and TCP streaming modes; and HarmonicIO, comparing maximum throughput over a broad domain of message sizes and CPU loads. We find that relative performance varies considerably across this domain, with the chosen means of stream source integration having a big impact. We offer recommendations for choosing and configuring the frameworks, and present a benchmarking toolset developed for this study.

READ FULL TEXT
research
07/18/2019

Quantitative Impact Evaluation of an Abstraction Layer for Data Stream Processing Systems

With the demand to process ever-growing data volumes, a variety of new d...
research
01/29/2020

Smart Resource Management for Data Streaming using an Online Bin-packing Strategy

Data stream processing frameworks provide reliable and efficient mechani...
research
01/26/2018

Pilot-Streaming: A Stream Processing Framework for High-Performance Computing

An increasing number of scientific applications rely on stream processin...
research
03/20/2023

Benchmarking scalability of stream processing frameworks deployed as event-driven microservices in the cloud

Event-driven microservices are an emerging architectural style for data-...
research
01/25/2019

A quality model for evaluating and choosing a stream processing framework architecture

Today, we have to deal with many data (Big data) and we need to make dec...
research
03/11/2021

ESPBench: The Enterprise Stream Processing Benchmark

Growing data volumes and velocities in fields such as Industry 4.0 or th...

Please sign up or login with your details

Forgot password? Click here to reset