Theodolite: Scalability Benchmarking of Distributed Stream Processing Engines in Microservice Architectures

09/01/2020
by   Sören Henning, et al.
0

Distributed stream processing engines are designed with a focus on scalability to process big data volumes in a continuous manner. We present the Theodolite method for benchmarking the scalability of distributed stream processing engines. Core of this method is the definition of use cases that microservices implementing stream processing have to fulfill. For each use case, our method identifies relevant workload dimensions that might affect the scalability of a use case. We propose to design one benchmark per use case and relevant workload dimension. We present a general benchmarking framework, which can be applied to execute the individual benchmarks for a given use case and workload dimension. Our framework executes an implementation of the use case's dataflow architecture for different workloads of the given dimension and various numbers of processing instances. This way, it identifies how resources demand evolves with increasing workloads. Within the scope of this paper, we present 4 identified use cases, derived from processing Industrial Internet of Things data, and 7 corresponding workload dimensions. We provide implementations of 4 benchmarks with Kafka Streams and Apache Flink as well as an implementation of our benchmarking framework to execute scalability benchmarks in cloud environments. We use both for evaluating the Theodolite method and for benchmarking Kafka Streams' and Flink's scalability for different deployment options.

READ FULL TEXT
research
03/11/2021

ESPBench: The Enterprise Stream Processing Benchmark

Growing data volumes and velocities in fields such as Industry 4.0 or th...
research
03/20/2023

Benchmarking scalability of stream processing frameworks deployed as event-driven microservices in the cloud

Event-driven microservices are an emerging architectural style for data-...
research
07/26/2023

Evaluation of Data Enrichment Methods for Distributed Stream Processing Systems

Stream processing has become a critical component in the architecture of...
research
06/05/2019

Evaluating Geospatial RDF stores Using the Benchmark Geographica 2

Since 2007, geospatial extensions of SPARQL, like GeoSPARQL and stSPARQL...
research
05/26/2020

Benchmarking Graph Data Management and Processing Systems: A Survey

The development of scalable, representative, and widely adopted benchmar...
research
04/12/2018

BigSR: an empirical study of real-time expressive RDF stream reasoning on modern Big Data platforms

The trade-off between language expressiveness and system scalability (E&...
research
06/27/2023

Challenges and Opportunities for RISC-V Architectures towards Genomics-based Workloads

The use of large-scale supercomputing architectures is a hard requiremen...

Please sign up or login with your details

Forgot password? Click here to reset