Out-of-Order Sliding-Window Aggregation with Efficient Bulk Evictions and Insertions (Extended Version)

07/20/2023
by   Kanat Tangwongsan, et al.
0

Sliding-window aggregation is a foundational stream processing primitive that efficiently summarizes recent data. The state-of-the-art algorithms for sliding-window aggregation are highly efficient when stream data items are evicted or inserted one at a time, even when some of the insertions occur out-of-order. However, real-world streams are often not only out-of-order but also burtsy, causing data items to be evicted or inserted in larger bulks. This paper introduces a new algorithm for sliding-window aggregation with bulk eviction and bulk insertion. For the special case of single insert and evict, our algorithm matches the theoretical complexity of the best previous out-of-order algorithms. For the case of bulk evict, our algorithm improves upon the theoretical complexity of the best previous algorithm for that case and also outperforms it in practice. For the case of bulk insert, there are no prior algorithms, and our algorithm improves upon the naive approach of emulating bulk insert with a loop over single inserts, both in theory and in practice. Overall, this paper makes high-performance algorithms for sliding window aggregation more broadly applicable by efficiently handling the ubiquitous cases of out-of-order data and bursts.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/29/2020

In-Order Sliding-Window Aggregation in Worst-Case Constant Time

Sliding-window aggregation is a widely-used approach for extracting insi...
research
10/26/2018

Sub-O(log n) Out-of-Order Sliding-Window Aggregation

Sliding-window aggregation summarizes the most recent information in a d...
research
11/10/2017

Real-time Stream-based Monitoring

We introduce RTLola, a new stream-based specification language for the d...
research
11/21/2019

S-RASTER: Contraction Clustering for Evolving Data Streams

Contraction Clustering (RASTER) is a very fast algorithm for density-bas...
research
06/10/2019

Parallel Streaming Random Sampling

This paper investigates parallel random sampling from a potentially-unen...
research
11/07/2017

SWOOP: Top-k Similarity Joins over Set Streams

We provide efficient support for applications that aim to continuously f...
research
06/10/2020

Sliding Window Algorithms for k-Clustering Problems

The sliding window model of computation captures scenarios in which data...

Please sign up or login with your details

Forgot password? Click here to reset