Skyblocking: Learning Blocking Schemes on the Skyline

05/31/2018
by   Jingyu Shao, et al.
0

In this paper, for the first time, we introduce the concept of skyblocking, which aims to learn scheme skylines. Given a set of blocking schemes and measures (e.g. PC and PQ), each scheme can be mapped as a point to a scheme space where each measure is one dimension. The scheme skyline points are not dominated by any other scheme point in the scheme space considering their measure values. The main difference with traditional skyline queries is that the measure values associated with the blocking schemes are not given, but have to be calculated based on given datasets. Moreover, existing work to calculate such values have the limitations that either a large amount of labels are required or the accurate measure values are hard to be guaranteed. The main purpose and novelty behind our approach is that we only need to use a limited number of labels to efficiently learn a set of blocking schemes (i.e. scheme skyline) such that no scheme points are dominated by others in the scheme space. The experimental results show that our approach has good performance on label efficiency, blocking quality and blocking efficiency. We experimentally verify that our approach outperforms several baseline approaches over four real-world datasets.

READ FULL TEXT

page 7

page 14

research
05/31/2018

Skyblocking for Entity Resolution

In this paper, for the first time, we introduce the concept of skyblocki...
research
10/21/2022

Blocking Delaunay Triangulations from the Exterior

Given two distinct point sets P and Q in the plane, we say that Q blocks...
research
08/09/2021

Euclidean 3D Stable Roommates is NP-hard

We establish NP-completeness for the Euclidean 3D Stable Roommates probl...
research
05/24/2023

Strong blocking sets and minimal codes from expander graphs

A strong blocking set in a finite projective space is a set of points th...
research
09/12/2022

A hybridizable discontinuous Galerkin method on unfitted meshes for single-phase Darcy flow in fractured porous media

We present a novel hybridizable discontinuous Galerkin (HDG) method on u...
research
08/19/2020

Scalable Blocking for Very Large Databases

In the field of database deduplication, the goal is to find approximatel...
research
08/05/2021

Crystalline: Fast and Memory Efficient Wait-Free Reclamation

Historically, memory management based on lock-free reference counting wa...

Please sign up or login with your details

Forgot password? Click here to reset