Optimizing Federated Queries Based on the Physical Design of a Data Lake

02/19/2020
by   Philipp D. Rohde, et al.
0

The optimization of query execution plans is known to be crucial for reducing the query execution time. In particular, query optimization has been studied thoroughly for relational databases over the past decades. Recently, the Resource Description Framework (RDF) became popular for publishing data on the Web. As a consequence, federations composed of different data models like RDF and relational databases evolved. One type of these federations are Semantic Data Lakes where every data source is kept in its original data model and semantically annotated with ontologies or controlled vocabularies. However, state-of-the-art query engines for federated query processing over Semantic Data Lakes often rely on optimization techniques tailored for RDF. In this paper, we present query optimization techniques guided by heuristics that take the physical design of a Data Lake into account. The heuristics are implemented on top of Ontario, a SPARQL query engine for Semantic Data Lakes. Using source-specific heuristics, the query engine is able to generate more efficient query execution plans by exploiting the knowledge about indexes and normalization in relational databases. We show that heuristics which take the physical design of the Data Lake into account are able to speed up query processing.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/17/2017

The Odyssey Approach for Optimizing Federated SPARQL Queries

Answering queries over a federation of SPARQL endpoints requires combini...
research
03/12/2023

QTrail-DB: A Query Processing Engine for Imperfect Databases with Evolving Qualities

Imperfect databases are very common in many applications due to various ...
research
05/17/2022

Rank-based Heuristics for Optimizing the Execution of Product Data Models

The Product Data Model (PDM) is an example of a data-centric approach to...
research
12/06/2022

A geospatial source selector for federated GeoSPARQL querying

Background: Geospatial linked data brings into the scope of the Semantic...
research
08/16/2018

Automatic Generation of a Hybrid Query Execution Engine

The ever-increasing need for fast data processing demands new methods fo...
research
06/01/2018

SaGe: Preemptive Query Execution for High Data Availability on the Web

Semantic Web applications require querying available RDF Data with high ...
research
01/06/2019

Exact Selectivity Computation for Modern In-Memory Database Query Optimization

Selectivity estimation remains a critical task in query optimization eve...

Please sign up or login with your details

Forgot password? Click here to reset