MinoanER: Schema-Agnostic, Non-Iterative, Massively Parallel Resolution of Web Entities

05/15/2019
by   Vasilis Efthymiou, et al.
0

Entity Resolution (ER) aims to identify different descriptions in various Knowledge Bases (KBs) that refer to the same entity. ER is challenged by the Variety, Volume and Veracity of entity descriptions published in the Web of Data. To address them, we propose the MinoanER framework that simultaneously fulfills full automation, support of highly heterogeneous entities, and massive parallelization of the ER process. MinoanER leverages a token-based similarity of entities to define a new metric that derives the similarity of neighboring entities from the most important relations, as they are indicated only by statistics. A composite blocking method is employed to capture different sources of matching evidence from the content, neighbors, or names of entities. The search space of candidate pairs for comparison is compactly abstracted by a novel disjunctive blocking graph and processed by a non-iterative, massively parallel matching algorithm that consists of four generic, schema-agnostic matching rules that are quite robust with respect to their internal configuration. We demonstrate that the effectiveness of MinoanER is comparable to existing ER tools over real KBs exhibiting low Variety, but it outperforms them significantly when matching KBs with high Variety.

READ FULL TEXT

page 1

page 2

page 7

research
05/19/2020

Benchmarking Blocking Algorithms for Web Entities

An increasing number of entities are described by interlinked data rathe...
research
05/15/2019

Schema-agnostic Progressive Entity Resolution (extended version)

Entity Resolution (ER) is the task of finding entity profiles that corre...
research
04/19/2022

Generalized Supervised Meta-blocking (technical report)

Entity Resolution constitutes a core data integration task that relies o...
research
02/21/2020

Crowdsourced Collective Entity Resolution with Relational Match Propagation

Knowledge bases (KBs) store rich yet heterogeneous entities and facts. E...
research
05/15/2019

A Survey of Blocking and Filtering Techniques for Entity Resolution

Efficiency techniques are an integral part of Entity Resolution, since i...
research
12/28/2021

Bipartite Graph Matching Algorithms for Clean-Clean Entity Resolution: An Empirical Evaluation

Entity Resolution (ER) is the task of finding records that refer to the ...
research
02/25/2022

How to reduce the search space of Entity Resolution: with Blocking or Nearest Neighbor search?

Entity Resolution suffers from quadratic time complexity. To increase it...

Please sign up or login with your details

Forgot password? Click here to reset