On the Semantic Overlap of Operators in Stream Processing Engines

03/01/2023
by   Vincenzo Gulisano, et al.
0

Stream processing is extensively used in the IoT-to-Cloud spectrum to distill information from continuous streams of data. Streaming applications usually run in dedicated Stream Processing Engines (SPEs) that adopt the DataFlow model, which defines such applications as graphs of operators that, step by step, transform data into the desired results. As operators can be deployed and executed independently, the DataFlow model supports parallelism and distribution, thus making streaming applications scalable. Today, we witness an abundance of SPEs, each with its set of operators. In this context, understanding how operators' semantics overlap within and across SPEs, and thus which SPEs can support a given application, is not trivial. We tackle this problem by formally showing that common operators of SPEs can be expressed as compositions of a single, minimalistic Aggregate operator, thus showing any framework able to run compositions of such an operator can run applications defined for state-of-the-art SPEs. The Aggregate operator only relies on core concepts of the DataFlow model such as data partitioning by key and time-based windows, and can only output up to one value for each window it analyzes. Together with our formal argumentation, we empirically assess how an SPE that only relies on such an operator compares with an SPE offering operator-specific implementations, as well as study the performance impact of a more expressive Aggregate operator by relaxing the constraint of outputting up to one value per window. The existence of such a common denominator not only implies the portability of operators within and across SPEs but also defines a concise set of requirements for other data processing frameworks to support streaming applications.

READ FULL TEXT
research
11/25/2021

STRETCH: Virtual Shared-Nothing Parallelism for Scalable and Elastic Stream Processing

Stream processing applications extract value from raw data through Direc...
research
07/15/2021

The Art of the Meta Stream Protocol: Torrents of Streams

The rise of streaming libraries such as Akka Stream, Reactive Extensions...
research
02/28/2022

Stream Containers for Resource-oriented RDF Stream Processing

We introduce Stream Containers inspired by the Linked Data Platform as a...
research
05/03/2023

GALOIS: A Hybrid and Platform-Agnostic Stream Processing Architecture

With the increasing prevalence of IoT environments, the demand for proce...
research
05/11/2020

Performance Modeling and Vertical Autoscaling of Stream Joins

Streaming analysis is widely used in cloud as well as edge infrastructur...
research
02/12/2022

Jarvis: Large-scale Server Monitoring with Adaptive Near-data Processing

Rapid detection and mitigation of issues that impact performance and rel...
research
11/02/2020

IOS: Inter-Operator Scheduler for CNN Acceleration

To accelerate CNN inference, existing deep learning frameworks focus on ...

Please sign up or login with your details

Forgot password? Click here to reset