Scaling Ordered Stream Processing on Shared-Memory Multicores

03/30/2018
by   Guna Prasaad, et al.
0

Many modern applications require real-time processing of large volumes of high-speed data. Such data processing needs can be modeled as a streaming computation. A streaming computation is specified as a dataflow graph that exposes multiple opportunities for parallelizing its execution, in the form of data, pipeline and task parallelism. On the other hand, many important applications require that processing of the stream be ordered, where inputs are processed in the same order as they arrive. There is a fundamental conflict between ordered processing and parallelizing the streaming computation. This paper focuses on the problem of effectively parallelizing ordered streaming computations on a shared-memory multicore machine. We first address the key challenges in exploiting data parallelism in the ordered setting. We present a low-latency, non-blocking concurrent data structure to order outputs produced by concurrent workers on an operator. We also propose a new approach to parallelizing partitioned stateful operators that can handle load imbalance across partitions effectively and mostly avoid delays due to ordering. We illustrate the trade-offs and effectiveness of our concurrent data-structures on micro-benchmarks and streaming queries from the TPCx-BB benchmark. We then present an adaptive runtime that dynamically maps the exposed parallelism in the computation to that of the machine. We propose several intuitive scheduling heuristics and compare them empirically on the TPCx-BB queries. We find that for streaming computations, heuristics that exploit as much pipeline parallelism as possible perform better than those that seek to exploit data parallelism.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/12/2020

Streaming Computations with Region-Based State on SIMD Architectures

Streaming computations on massive data sets are an attractive candidate ...
research
04/09/2021

Stream Processing With Dependency-Guided Synchronization

Real-time data processing applications with low latency requirements hav...
research
09/03/2017

Faster Concurrent Range Queries with Contention Adapting Search Trees Using Immutable Data

The need for scalable concurrent ordered set data structures with linear...
research
07/24/2023

MorphStream: Scalable Processing of Transactions over Streams on Multicores

Transactional Stream Processing Engines (TSPEs) form the backbone of mod...
research
11/25/2021

STRETCH: Virtual Shared-Nothing Parallelism for Scalable and Elastic Stream Processing

Stream processing applications extract value from raw data through Direc...
research
05/09/2019

Exploiting Fine-Grain Ordered Parallelism in Dense Matrix Algorithms

Dense linear algebra kernels are critical for wireless applications, and...
research
07/16/2023

Real-Time Analytics by Coordinating Reuse and Work Sharing

Analytical tools often require real-time responses for highly concurrent...

Please sign up or login with your details

Forgot password? Click here to reset