Scheduling of Intermittent Query Processing

06/11/2023
by   Saranya C, et al.
0

Stream processing is usually done either on a tuple-by-tuple basis or in micro-batches. There are many applications where tuples over a predefined duration/window must be processed within certain deadlines. Processing such queries using stream processing engines can be very inefficient since there is often a significant overhead per tuple or micro-batch. The cost of computation can be significantly reduced by using the wider window available for computation. In this work, we present scheduling schemes where the overhead cost is minimized while meeting the query deadline constraints. For such queries, since the result is needed only at the deadline, tuples can be processed in larger batches, instead of using micro-batches. We present scheduling schemes for single and multi query scenarios. The proposed scheduling algorithms have been implemented as a Custom Query Scheduler, on top of Apache Spark. Our performance study with TPC-H data, under single and multi query modes, shows orders of magnitude improvement as compared to naively using Spark streaming.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/08/2021

LMStream: When Distributed Micro-Batch Stream Processing Systems Meet GPU

This paper presents LMStream, which ensures bounded latency while maximi...
research
08/27/2020

Cost-based Query Rewriting Techniques for Optimizing Aggregates Over Correlated Windows

Window aggregates are ubiquitous in stream processing. In Azure Stream A...
research
09/01/2018

Pay One, Get Hundreds for Free: Reducing Cloud Costs through Shared Query Execution

Cloud-based data analysis is nowadays common practice because of the low...
research
04/10/2019

R-Storm: Resource-Aware Scheduling in Storm

The era of big data has led to the emergence of new systems for real-tim...
research
09/28/2021

Restructuring Serverless Computing with Data-Centric Function Orchestration

Serverless applications are usually composed of multiple short-lived, si...
research
02/13/2019

Efficient Continuous Multi-Query Processing over Graph Streams

Graphs are ubiquitous and ever-present data structures that have a wide ...
research
04/15/2021

Optimizing Multiple Multi-Way Stream Joins

We address the joint optimization of multiple stream joins in a scale-ou...

Please sign up or login with your details

Forgot password? Click here to reset