PAPyA: Performance Analysis of Large RDF Graphs Processing Made Easy

09/14/2022
by   Mohamed Ragab, et al.
0

Prescriptive Performance Analysis (PPA) has shown to be more useful than traditional descriptive and diagnostic analyses for making sense of Big Data (BD) frameworks' performance. In practice, when processing large (RDF) graphs on top of relational BD systems, several design decisions emerge and cannot be decided automatically, e.g., the choice of the schema, the partitioning technique, and the storage formats. PPA, and in particular ranking functions, helps enable actionable insights on performance data, leading practitioners to an easier choice of the best way to deploy BD frameworks, especially for graph processing. However, the amount of experimental work required to implement PPA is still huge. In this paper, we present PAPyA 1, a library for implementing PPA that allows (1) preparing RDF graphs data for a processing pipeline over relational BD systems, (2) enables automatic ranking of the performance in a user-defined solution space of experimental dimensions; (3) allows user-defined flexible extensions in terms of systems to test and ranking methods. We showcase PAPyA on a set of experiments based on the SparkSQL framework. PAPyA simplifies the performance analytics of BD systems for processing large (RDF) graphs.We provide PAPyA as a public open-source library under an MIT license that will be a catalyst for designing new research prescriptive analytical techniques for BD applications.

READ FULL TEXT
research
07/19/2020

High Performance Data Engineering Everywhere

The amazing advances being made in the fields of machine and deep learni...
research
12/12/2017

Real-time Text Analytics Pipeline Using Open-source Big Data Tools

Real-time text processing systems are required in many domains to quickl...
research
04/11/2021

AutoGL: A Library for Automated Graph Learning

Recent years have witnessed an upsurge of research interests and applica...
research
05/22/2019

AXS: A framework for fast astronomical data processing based on Apache Spark

We introduce AXS (Astronomy eXtensions for Spark), a scalable open-sourc...
research
06/11/2018

A Cost-based Storage Format Selector for Materialization in Big Data Frameworks

Modern big data frameworks (such as Hadoop and Spark) allow multiple use...
research
03/31/2021

Efficient Exploration of Interesting Aggregates in RDF Graphs

As large Open Data are increasingly shared as RDF graphs today, there is...
research
05/09/2017

Diving Performance Assessment by means of Video Processing

The aim of this paper is to present a procedure for video analysis appli...

Please sign up or login with your details

Forgot password? Click here to reset