Colocating Real-time Storage and Processing: An Analysis of Pull-based versus Push-based Streaming

11/10/2022
by   Ovidiu-Cristian Marcu, et al.
0

Real-time Big Data architectures evolved into specialized layers for handling data streams' ingestion, storage, and processing over the past decade. Layered streaming architectures integrate pull-based read and push-based write RPC mechanisms implemented by stream ingestion/storage systems. In addition, stream processing engines expose source/sink interfaces, allowing them to decouple these systems easily. However, open-source streaming engines leverage workflow sources implemented through a pull-based approach, continuously issuing read RPCs towards the stream ingestion/storage, effectively competing with write RPCs. This paper proposes a unified streaming architecture that leverages push-based and/or pull-based source implementations for integrating ingestion/storage and processing engines that can reduce processing latency and increase system read and write throughput while making room for higher ingestion. We implement a novel push-based streaming source by replacing continuous pull-based RPCs with one single RPC and shared memory (storage and processing handle streaming data through pointers to shared objects). To this end, we conduct an experimental analysis of pull-based versus push-based design alternatives of the streaming source reader while considering a set of stream benchmarks and microbenchmarks and discuss the advantages of both approaches.

READ FULL TEXT

page 1

page 7

research
03/18/2019

A New Frontier for Pull-Based Graph Processing

The trade-off between pull-based and push-based graph processing engines...
research
01/12/2022

Enlightening Flash Storage to Stream Writes by Objects

For a write request, today flash storage cannot distinguish the logical ...
research
05/17/2019

High Throughput Push Based Storage Manager

The storage manager, as a key component of the database system, is respo...
research
08/27/2018

Piecewise Linear Approximation in Data Streaming: Algorithmic Implementations and Experimental Analysis

Piecewise Linear Approximation (PLA) is a well-established tool to reduc...
research
05/19/2022

Cloudprofiler: TSC-based inter-node profiling and high-throughput data ingestion for cloud streaming workloads

To conduct real-time analytics computations, big data stream processing ...
research
10/27/2020

Distributed Real-Time Data Stream Analysis for CTA

Once completed, the Cherenkov Telescope Array (CTA) will be able to map ...
research
07/15/2021

The Art of the Meta Stream Protocol: Torrents of Streams

The rise of streaming libraries such as Akka Stream, Reactive Extensions...

Please sign up or login with your details

Forgot password? Click here to reset