Strider: A Hybrid Adaptive Distributed RDF Stream Processing Engine

05/16/2017
by   Xiangnan Ren, et al.
0

Real-time processing of data streams emanating from sensors is becoming a common task in Internet of Things scenarios. The key implementation goal consists in efficiently handling massive incoming data streams and supporting advanced data analytics services like anomaly detection. In an on-going, industrial project, we found out that a 24/7 available stream processing engine usually faces dynamically changing data and workload characteristics. These changes impact the engine's performance and reliability. We propose Strider, a hybrid adaptive distributed RDF Stream Processing engine that optimizes logical query plan according to the state of data streams. Strider has been designed to guarantee important industrial properties such as scalability, high availability, fault-tolerant, high throughput and acceptable latency. These guarantees are obtained by designing the engine's architecture with state-of-the-art Apache components such as Spark and Kafka. We highlight the efficiency (e.g., on a single machine machine, up to 60x gain on throughput compared to state-of-the-art systems, a throughput of 3.1 million triples/second on a 9 machines cluster, a major breakthrough in this system's category) of Strider on real-world and synthetic data sets.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/02/2021

ESTemd: A Distributed Processing Framework for Environmental Monitoring based on Apache Kafka Streaming Engine

Distributed networks and real-time systems are becoming the most importa...
research
07/26/2023

Evaluation of Data Enrichment Methods for Distributed Stream Processing Systems

Stream processing has become a critical component in the architecture of...
research
10/26/2022

RMLStreamer-SISO: an RDF stream generator from streaming heterogeneous data

Stream-reasoning query languages such as CQELS and C-SPARQL enable query...
research
03/18/2021

Hazelcast Jet: Low-latency Stream Processing at the 99.99th Percentile

Jet is an open-source, high-performance, distributed stream processor bu...
research
03/21/2020

A Synopses Data Engine for Interactive Extreme-Scale Analytics

In this work, we detail the design and structure of a Synopses Data Engi...
research
10/17/2019

Adaptive Normalization in Streaming Data

In todays digital era, data are everywhere from Internet of Things to he...
research
07/14/2019

Delivery, consistency, and determinism: rethinking guarantees in distributed stream processing

Consistency requirements for state-of-the-art stream processing systems ...

Please sign up or login with your details

Forgot password? Click here to reset