Aion: Better Late than Never in Event-Time Streams

03/07/2020
by   Sérgio Esteves, et al.
0

Processing data streams in near real-time is an increasingly important task. In the case of event-timestamped data, the stream processing system must promptly handle late events that arrive after the corresponding window has been processed. To enable this late processing, the window state must be maintained for a long period of time. However, current systems maintain this state in memory, which either imposes a maximum period of tolerated lateness, or causes the system to degrade performance or even crash when the system memory runs out. In this paper, we propose AION, a comprehensive solution for handling late events in an efficient manner, implemented on top of Flink. In designing AION, we go beyond a naive solution that transfers state between memory and persistent storage on demand. In particular, we introduce a proactive caching scheme, where we leverage the semantics of stream processing to anticipate the need for bringing data to memory. Furthermore, we propose a predictive cleanup scheme to permanently discard window state based on the likelihood of receiving more late events, to prevent storage consumption from growing without bounds. Our evaluation shows that AION is capable of maintaining sustainable levels of memory utilization while still preserving high throughput, low latency, and low staleness.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/26/2022

RMLStreamer-SISO: an RDF stream generator from streaming heterogeneous data

Stream-reasoning query languages such as CQELS and C-SPARQL enable query...
research
03/18/2021

Hazelcast Jet: Low-latency Stream Processing at the 99.99th Percentile

Jet is an open-source, high-performance, distributed stream processor bu...
research
07/03/2020

CICLAD: A Fast and Memory-efficient Closed Itemset Miner for Streams

Mining association rules from data streams is a challenging task due to ...
research
07/15/2019

DOD-ETL: Distributed On-Demand ETL for Near Real-Time Business Intelligence

The competitive dynamics of the globalized market demand information on ...
research
06/01/2019

Approximate Quantiles for Datacenter Telemetry Monitoring

Datacenter systems require efficient troubleshooting and effective resou...
research
08/19/2017

Event-Radar: Real-time Local Event Detection System for Geo-Tagged Tweet Streams

The local event detection is to use posting messages with geotags on soc...
research
11/07/2017

SWOOP: Top-k Similarity Joins over Set Streams

We provide efficient support for applications that aim to continuously f...

Please sign up or login with your details

Forgot password? Click here to reset