ROCKER: A Refinement Operator for Key Discovery

05/11/2017
by   Tommaso Soru, et al.
0

The Linked Data principles provide a decentral approach for publishing structured data in the RDF format on the Web. In contrast to structured data published in relational databases where a key is often provided explicitly, finding a set of properties that allows identifying a resource uniquely is a non-trivial task. Still, finding keys is of central importance for manifold applications such as resource deduplication, link discovery, logical data compression and data integration. In this paper, we address this research gap by specifying a refinement operator, dubbed ROCKER, which we prove to be finite, proper and non-redundant. We combine the theoretical characteristics of this operator with two monotonicities of keys to obtain a time-efficient approach for detecting keys, i.e., sets of properties that describe resources uniquely. We then utilize a hash index to compute the discriminability score efficiently. Therewith, we ensure that our approach can scale to very large knowledge bases. Results show that ROCKER yields more accurate results, has a comparable runtime, and consumes less memory w.r.t. existing state-of-the-art techniques.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/01/2021

MATE: Multi-Attribute Table Extraction

A core operation in data discovery is to find joinable tables for a give...
research
09/24/2020

Compressed Key Sort and Fast Index Reconstruction

In this paper we propose an index key compression scheme based on the no...
research
06/07/2023

Reversible Numeric Composite Key (RNCK)

In database design, Composite Keys are used to uniquely identify records...
research
05/31/2022

Discovery of Keys for Graphs [Extended Version]

Keys for graphs uses the topology and value constraints needed to unique...
research
04/21/2021

PTHash: Revisiting FCH Minimal Perfect Hashing

Given a set S of n distinct keys, a function f that bijectively maps the...
research
02/17/2023

Triemaps that match

The trie data structure is a good choice for finite maps whose keys are ...
research
03/10/2021

On the primitivity of the AES key-schedule

The key-scheduling algorithm in the AES is the component responsible for...

Please sign up or login with your details

Forgot password? Click here to reset