ERBlox: Combining Matching Dependencies with Machine Learning for Entity Resolution

02/07/2016
by   Zeinab Bahmani, et al.
0

Entity resolution (ER), an important and common data cleaning problem, is about detecting data duplicate representations for the same external entities, and merging them into single representations. Relatively recently, declarative rules called "matching dependencies" (MDs) have been proposed for specifying similarity conditions under which attribute values in database records are merged. In this work we show the process and the benefits of integrating four components of ER: (a) Building a classifier for duplicate/non-duplicate record pairs built using machine learning (ML) techniques; (b) Use of MDs for supporting the blocking phase of ML; (c) Record merging on the basis of the classifier results; and (d) The use of the declarative language "LogiQL" -an extended form of Datalog supported by the "LogicBlox" platform- for all activities related to data processing, and the specification and enforcement of MDs.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/21/2016

Enforcing Relational Matching Dependencies with Datalog for Entity Resolution

Entity resolution (ER) is about identifying and merging records in a dat...
research
09/20/2016

An Ensemble Blocking Scheme for Entity Resolution of Large and Sparse Datasets

Entity Resolution, also called record linkage or deduplication, refers t...
research
09/29/2017

Entity Consolidation: The Golden Record Problem

Four key processes in data integration are: data preparation (i.e., extr...
research
10/02/2017

DeepER -- Deep Entity Resolution

Entity Resolution (ER) is a fundamental problem with many applications. ...
research
06/18/2020

Record fusion: A learning approach

Record fusion is the task of aggregating multiple records that correspon...
research
03/08/2017

Performance Bounds for Graphical Record Linkage

Record linkage involves merging records in large, noisy databases to rem...
research
03/12/2023

Another Generic Setting for Entity Resolution: Basic Theory

Benjelloun et al. <cit.> considered the Entity Resolution (ER) problem a...

Please sign up or login with your details

Forgot password? Click here to reset