RMLStreamer-SISO: an RDF stream generator from streaming heterogeneous data

10/26/2022
by   Sitt Min Oo, et al.
0

Stream-reasoning query languages such as CQELS and C-SPARQL enable query answering over RDF streams. Unfortunately, there currently is a lack of efficient RDF stream generators to feed RDF stream reasoners. State-of-the-art RDF stream generators are limited with regard to the velocity and volume of streaming data they can handle. To efficiently generate RDF streams in a scalable way, we extended the RMLStreamer to also generate RDF streams from dynamic heterogeneous data streams. This paper introduces a scalable solution that relies on a dynamic window approach to generate RDF streams with low latency and high throughput from multiple heterogeneous data streams. Our evaluation shows that our solution outperforms the state-of-the-art by achieving millisecond latency (compared to seconds that state-of-the-art solutions need), constant memory usage for all workloads, and sustainable throughput of around 70,000 records/s (compared to 10,000 records/s that state-of-the-art solutions take). This opens up the access to numerous data streams for integration with the semantic web.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/07/2020

Aion: Better Late than Never in Event-Time Streams

Processing data streams in near real-time is an increasingly important t...
research
11/16/2010

Optimizing real-time RDF data streams

The Resource Description Framework (RDF) provides a common data model fo...
research
09/24/2019

An Exploratory Study of How Specialists Deal with Testing in Data Stream Processing Applications

[Background] Nowadays, there is a massive growth of data volume and spee...
research
06/17/2020

Ranking and benchmarking framework for sampling algorithms on synthetic data streams

In the fields of big data, AI, and streaming processing, we work with la...
research
05/11/2020

Performance Modeling and Vertical Autoscaling of Stream Joins

Streaming analysis is widely used in cloud as well as edge infrastructur...
research
04/07/2020

GeoFlink: A Distributed and Scalable Framework for the Real-time Processing of Spatial Streams

Apache Flink is an open-source system for scalable processing of batch a...
research
05/16/2017

Strider: A Hybrid Adaptive Distributed RDF Stream Processing Engine

Real-time processing of data streams emanating from sensors is becoming ...

Please sign up or login with your details

Forgot password? Click here to reset