Evaluation of Data Enrichment Methods for Distributed Stream Processing Systems

07/26/2023
by   Dominik Scheinert, et al.
0

Stream processing has become a critical component in the architecture of modern applications. With the exponential growth of data generation from sources such as the Internet of Things, business intelligence, and telecommunications, real-time processing of unbounded data streams has become a necessity. DSP systems provide a solution to this challenge, offering high horizontal scalability, fault-tolerant execution, and the ability to process data streams from multiple sources in a single DSP job. Often enough though, data streams need to be enriched with extra information for correct processing, which introduces additional dependencies and potential bottlenecks. In this paper, we present an in-depth evaluation of data enrichment methods for DSP systems and identify the different use cases for stream processing in modern systems. Using a representative DSP system and conducting the evaluation in a realistic cloud environment, we found that outsourcing enrichment data to the DSP system can improve performance for specific use cases. However, this increased resource consumption highlights the need for stream processing solutions specifically designed for the performance-intensive workloads of cloud-based applications.

READ FULL TEXT

page 1

page 7

page 8

research
09/01/2020

Theodolite: Scalability Benchmarking of Distributed Stream Processing Engines in Microservice Architectures

Distributed stream processing engines are designed with a focus on scala...
research
03/17/2022

Beauty and the beast: A case study on performance prototyping of data-intensive containerized cloud applications

Data-intensive container-based cloud applications have become popular wi...
research
08/27/2018

Modeling and Simulation of Spark Streaming

As more and more devices connect to Internet of Things, unbounded stream...
research
08/10/2021

Evaluation of Load Prediction Techniques for Distributed Stream Processing

Distributed Stream Processing (DSP) systems enable processing large stre...
research
02/14/2020

DSCEP: An Infrastructure for Distributed Semantic Complex Event Processing

Today most applications continuously produce information under the form ...
research
05/16/2017

Strider: A Hybrid Adaptive Distributed RDF Stream Processing Engine

Real-time processing of data streams emanating from sensors is becoming ...
research
04/12/2018

BigSR: an empirical study of real-time expressive RDF stream reasoning on modern Big Data platforms

The trade-off between language expressiveness and system scalability (E&...

Please sign up or login with your details

Forgot password? Click here to reset