Skyblocking for Entity Resolution

05/31/2018
by   Jingyu Shao, et al.
0

In this paper, for the first time, we introduce the concept of skyblocking, which aims to efficiently identify the "most preferred" blocking scheme in terms of a given set of selection criteria for entity resolution blocking. To capture all possible preferred blocking schemes, scheme skyline (i.e. blocking schemes on the skyline) has been studied in a multi-dimensional scheme space with dimensions corresponding to selection criteria for blocking (e.g. PC and PQ). However, applying traditional skyline techniques to learn scheme skylines is a non-trivial task. Due to the unique characteristics of blocking schemes, we face several challenges, such as: how to find a balanced number of match and non-match labels to effectively approximate a block scheme in a scheme space, and how to design efficient skyline algorithms to explore a scheme space for finding scheme skylines. To overcome these challenges, we propose a scheme skyline learning approach, which incorporates skyline techniques into an active learning process of scheme skylines. We have conducted experiments over four real-world datasets. The experimental results show that our approach is able to efficiently identify scheme skylines in a large scheme space only using a limited number of labels. Our approach also outperforms the state-of-the-art approaches for learning blocking schemes in several aspects, including: label efficiency, blocking quality and learning efficiency.

READ FULL TEXT
research
05/31/2018

Skyblocking: Learning Blocking Schemes on the Skyline

In this paper, for the first time, we introduce the concept of skyblocki...
research
09/20/2016

An Ensemble Blocking Scheme for Entity Resolution of Large and Sparse Datasets

Entity Resolution, also called record linkage or deduplication, refers t...
research
05/15/2019

A Survey of Blocking and Filtering Techniques for Entity Resolution

Efficiency techniques are an integral part of Entity Resolution, since i...
research
05/28/2020

Efficient and Effective ER with Progressive Blocking

Blocking is a mechanism to improve the efficiency of Entity Resolution (...
research
05/19/2020

Benchmarking Blocking Algorithms for Web Entities

An increasing number of entities are described by interlinked data rathe...
research
09/22/2020

The computational cost of blocking for sampling discretely observed diffusions

Many approaches for conducting Bayesian inference on discretely observed...
research
07/05/2022

Block-SCL: Blocking Matters for Supervised Contrastive Learning in Product Matching

Product matching is a fundamental step for the global understanding of c...

Please sign up or login with your details

Forgot password? Click here to reset