Real-Time Analytics by Coordinating Reuse and Work Sharing

07/16/2023
by   Panagiotis Sioulas, et al.
0

Analytical tools often require real-time responses for highly concurrent parameterized workloads. A common solution is to answer queries using materialized subexpressions, hence reducing processing at runtime. However, as queries are still processed individually, concurrent outstanding computations accumulate and increase response times. By contrast, shared execution mitigates the effect of concurrency and improves scalability by exploiting overlapping work between queries but does so using heavyweight shared operators that result in high response times. Thus, on their own, both reuse and work sharing fail to provide real-time responses for large batches. Furthermore, naively combining the two approaches is ineffective and can deteriorate performance due to increased filtering costs, reduced marginal benefits, and lower reusability. In this work, we present ParCuR, a framework that harmonizes reuse with work sharing. ParCuR adapts reuse to work sharing in four aspects: i) to reduce filtering costs, it builds access methods on materialized results, ii) to resolve the conflict between benefits from work sharing and materialization, it introduces a sharing-aware materialization policy, iii) to incorporate reuse into sharing-aware optimization, it introduces a two-phase optimization strategy, and iv) to improve reusability and to avoid performance cliffs when queries are partially covered, especially during workload shifts, it combines partial reuse with data clustering based on historical batches. ParCuR outperforms a state-of-the-art work-sharing database by 6.4x and 2x in the SSB and TPC-H benchmarks respectively

READ FULL TEXT
research
02/23/2022

LUCE: A Blockchain-based data sharing platform for monitoring data license accountability and compliance

Easy access to data is one of the main avenues to accelerate scientific ...
research
01/11/2022

ATRAPOS: Evaluating Metapath Query Workloads in Real Time

Heterogeneous information networks (HINs) represent different types of e...
research
09/01/2018

Pay One, Get Hundreds for Free: Reducing Cloud Costs through Shared Query Execution

Cloud-based data analysis is nowadays common practice because of the low...
research
03/30/2018

Scaling Ordered Stream Processing on Shared-Memory Multicores

Many modern applications require real-time processing of large volumes o...
research
09/10/2021

An Effective Early Multi-core System Shared Cache Design Method Based on Reuse-distance Analysis

In this paper, we proposed an effective and efficient multi-core shared-...
research
03/22/2019

Concurrent Transmission Scheduling for Perceptual Data Sharing in mmWave Vehicular Networks

Sharing perceptual data with other vehicles enhances the traffic safety ...
research
10/06/2020

Sharon: Shared Online Event Sequence Aggregation

Streaming systems evaluate massive workloads of event sequence aggregati...

Please sign up or login with your details

Forgot password? Click here to reset