A Scalable Framework for Quality Assessment of RDF Datasets

01/29/2020
by   Gezim Sejdiu, et al.
0

Over the last years, Linked Data has grown continuously. Today, we count more than 10,000 datasets being available online following Linked Data standards. These standards allow data to be machine readable and inter-operable. Nevertheless, many applications, such as data integration, search, and interlinking, cannot take full advantage of Linked Data if it is of low quality. There exist a few approaches for the quality assessment of Linked Data, but their performance degrades with the increase in data size and quickly grows beyond the capabilities of a single machine. In this paper, we present DistQualityAssessment – an open source implementation of quality assessment of large RDF datasets that can scale out to a cluster of machines. This is the first distributed, in-memory approach for computing different quality metrics for large RDF datasets using Apache Spark. We also provide a quality assessment pattern that can be used to generate new scalable metrics that can be applied to big data. The work presented here is integrated with the SANSA framework and has been applied to at least three use cases beyond the SANSA community. The results show that our approach is more generic, efficient, and scalable as compared to previously proposed approaches.

READ FULL TEXT
research
02/25/2020

A metric Suite for Systematic Quality Assessment of Linked Open Data

Abstract- The vision of the Linked Open Data (LOD) initiative is to prov...
research
11/30/2017

Towards Data Quality Assessment in Online Advertising

In online advertising, our aim is to match the advertisers with the most...
research
11/15/2022

State of the Art of Quality Assessment of Facial Images

The goal of the project "Facial Metrics for EES" is to develop, implemen...
research
03/03/2023

Interoperability-oriented Quality Assessment for Czech Open Data

With the rapid increase of published open datasets, it is crucial to sup...
research
10/04/2021

Rerunning OCR: A Machine Learning Approach to Quality Assessment and Enhancement Prediction

Iterating with new and improved OCR solutions enforces decisions to be t...
research
01/21/2022

VisQualdex – the comprehensive guide to good data visualization

The rapid influx of low-quality data visualisations is one of the main c...
research
07/28/2023

Framework to Automatically Determine the Quality of Open Data Catalogs

Data catalogs play a crucial role in modern data-driven organizations by...

Please sign up or login with your details

Forgot password? Click here to reset