BigSR: an empirical study of real-time expressive RDF stream reasoning on modern Big Data platforms

04/12/2018
by   Xiangnan Ren, et al.
0

The trade-off between language expressiveness and system scalability (E&S) is a well-known problem in RDF stream reasoning. Higher expressiveness supports more complex reasoning logic, however, it may also hinder system scalability. Current research mainly focuses on logical frameworks suitable for stream reasoning as well as the implementation and the evaluation of prototype systems. These systems are normally developed in a centralized setting which suffer from inherent limited scalability, while an in-depth study of applying distributed solutions to cover E&S is still missing. In this paper, we aim to explore the feasibility of applying modern distributed computing frameworks to meet E&S all together. To do so, we first propose BigSR, a technical demonstrator that supports a positive fragment of the LARS framework. For the sake of generality and to cover a wide variety of use cases, BigSR relies on the two main execution models adopted by major distributed execution frameworks: Bulk Synchronous Processing (BSP) and Record-at-A-Time (RAT). Accordingly, we implement BigSR on top of Apache Spark Streaming (BSP model) and Apache Flink (RAT model). In order to conclude on the impacts of BSP and RAT on E&S, we analyze the ability of the two models to support distributed stream reasoning and identify several types of use cases characterized by their levels of support. This classification allows for quantifying the E&S trade-off by assessing the scalability of each type of use case its level of expressiveness. Then, we conduct a series of experiments with 15 queries from 4 different datasets. Our experiments show that BigSR over both BSP and RAT generally scales up to high throughput beyond million-triples per second (with or without recursion), and RAT attains sub-millisecond delay for stateless query operators.

READ FULL TEXT
research
03/20/2023

Benchmarking scalability of stream processing frameworks deployed as event-driven microservices in the cloud

Event-driven microservices are an emerging architectural style for data-...
research
07/26/2023

Evaluation of Data Enrichment Methods for Distributed Stream Processing Systems

Stream processing has become a critical component in the architecture of...
research
09/01/2020

Theodolite: Scalability Benchmarking of Distributed Stream Processing Engines in Microservice Architectures

Distributed stream processing engines are designed with a focus on scala...
research
03/11/2021

ESPBench: The Enterprise Stream Processing Benchmark

Growing data volumes and velocities in fields such as Industry 4.0 or th...
research
01/25/2019

A quality model for evaluating and choosing a stream processing framework architecture

Today, we have to deal with many data (Big data) and we need to make dec...
research
08/22/2017

Strider-lsa: Massive RDF Stream Reasoning in the Cloud

Reasoning over semantically annotated data is an emerging trend in strea...
research
04/25/2019

Ephemeral Data Handling in Microservices - Technical Report

In modern application areas for software systems --- like eHealth, the I...

Please sign up or login with your details

Forgot password? Click here to reset