Indexing Data on the Web: A Comparison of Schema-level Indices for Data Search – Extended Technical Report

06/12/2020
by   Till Blume, et al.
0

Indexing the Web of Data offers many opportunities, in particular, to find and explore data sources. One major design decision when indexing the Web of Data is to find a suitable index model, i.e., how to index and summarize data. Various efforts have been conducted to develop specific index models for a given task. With each index model designed, implemented, and evaluated independently, it remains difficult to judge whether an approach generalizes well to another task, set of queries, or dataset. In this work, we empirically evaluate six representative index models with unique feature combinations. Among them is a new index model incorporating inferencing over RDFS and owl:sameAs. We implement all index models for the first time into a single, stream-based framework. We evaluate variations of the index models considering sub-graphs of size 0, 1, and 2 hops on two large, real-world datasets. We evaluate the quality of the indices regarding the compression ratio, summarization ratio, and F1-score denoting the approximation quality of the stream-based index computation. The experiments reveal huge variations in compression ratio, summarization ratio, and approximation quality for different index models, queries, and datasets. However, we observe meaningful correlations in the results that help to determine the right index model for a given task, type of query, and dataset.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/05/2019

FLuID: A Meta Model to Flexibly Define Schema-level Indices for the Web of Data

Schema-level indices are vital for summarizing large collections of grap...
research
02/16/2021

A Lazy Approach for Efficient Index Learning

Learned indices using neural networks have been shown to outperform trad...
research
08/31/2021

Hierarchical Bitmap Indexing for Range and Membership Queries on Multidimensional Arrays

Traditional indexing techniques commonly employed in da­ta­ba­se systems...
research
01/14/2022

A Semantic Web Technology Index

Semantic Web (SW) technology has been widely applied to many domains suc...
research
01/04/2021

A Pluggable Learned Index Method via Sampling and Gap Insertion

Database indexes facilitate data retrieval and benefit broad application...
research
03/12/2020

Post-Estimation Smoothing: A Simple Baseline for Learning with Side Information

Observational data are often accompanied by natural structural indices, ...
research
03/01/2019

Parallel Index-based Stream Join on a Multicore CPU

There is increasing interest in using multicore processors to accelerate...

Please sign up or login with your details

Forgot password? Click here to reset