Scaling Up Knowledge Graph Creation to Large and Heterogeneous Data Sources

01/24/2022
by   Enrique Iglesias, et al.
0

RDF knowledge graphs (KG) are powerful data structures to represent factual statements created from heterogeneous data sources. KG creation is laborious, and demands data management techniques to be executed efficiently. This paper tackles the problem of the automatic generation of KG creation processes declaratively specified; it proposes techniques for planning and transforming heterogeneous data into RDF triples following mapping assertions specified in the RDF Mapping Language (RML). Given a set of mapping assertions, the planner provides an optimized execution plan by partitioning and scheduling the execution of the assertions. First, the planner assesses an optimized number of partitions considering the number of data sources, type of mapping assertions, and the associations between different assertions. After providing a list of partitions and assertions that belong to each partition, the planner determines their execution order. A greedy algorithm is implemented to generate the partitions' bushy tree execution plan. Bushy tree plans are translated into operating system commands that guide the execution of the partitions of the mapping assertions in the order indicated by the bushy tree. The proposed optimization approach is evaluated over state-of-the-art RML-compliant engines and existing benchmarks of data sources and RML triples maps. Our experimental results suggest that the performance of the studied engines can be considerably improved, particularly in a complex setting with numerous triples maps and data sources. As a result, engines that usually time in complex cases out can, if not entirely execute all the assertions, still produce a portion of the KG.

READ FULL TEXT
research
08/31/2020

FunMap: Efficient Execution of Functional Mappings for Knowledge Graph Creation

Data has exponentially grown in the last years, and knowledge graphs con...
research
08/17/2020

SDM-RDFizer: An RML Interpreter for the Efficient Creation of RDF Knowledge Graphs

In recent years, the amount of data has increased exponentially, and kno...
research
09/03/2019

MapSDI: A Scaled-up Semantic Data Integration Framework for Knowledge Graph Creation

Semantic web technologies have significantly contributed with effective ...
research
12/14/2021

EABlock: A Declarative Entity Alignment Block for Knowledge Graph Creation Pipelines

Despite encoding enormous amount of rich and valuable data, existing dat...
research
10/02/2020

FedQPL: A Language for Logical Query Plans over Heterogeneous Federations of RDF Data Sources (Extended Version)

Federations of RDF data sources provide great potential when queried for...
research
10/26/2022

Dragoman: Efficiently Evaluating Declarative Mapping Languages over Frameworks for Knowledge Graph Creation

In recent years, there have been valuable efforts and contributions to m...
research
03/12/2019

RocketRML - A NodeJS implementation of a use-case specific RML mapper

The creation of Linked Data from raw data sources is, in theory, no rock...

Please sign up or login with your details

Forgot password? Click here to reset