Entity Matching by Pool-based Active Learning

11/01/2022
by   Youfang Han, et al.
0

The goal of entity matching is to find the corresponding records representing the same real-world entity from different data sources. At present, in the mainstream methods, rule-based entity matching methods need tremendous domain knowledge. The machine-learning based or deep-learning based entity matching methods need a large number of labeled samples to build the model, which is difficult to achieve in some applications. In addition, learning-based methods are easy to over-fitting, so the quality requirements of training samples are very high. In this paper, we present an active learning method ALMatcher for the entity matching tasks. This method needs to manually label only a small number of valuable samples, and use these samples to build a model with high quality. This paper proposes a hybrid uncertainty as query strategy to find those valuable samples for labeling, which can minimize the number of labeled training samples meanwhile meet the task requirements. The proposed method has been validated on seven data sets in different fields. The experiment shows that ALMatcher uses only a small number of labeled samples and achieves better results compared to existing approaches.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/09/2019

Learning to Sample: an Active Learning Framework

Meta-learning algorithms for active learning are emerging as a promising...
research
03/09/2022

Reinforced Meta Active Learning

In stream-based active learning, the learning procedure typically has ac...
research
10/15/2022

Active Learning from the Web

Labeling data is one of the most costly processes in machine learning pi...
research
06/17/2019

Low-resource Deep Entity Resolution with Transfer and Active Learning

Entity resolution (ER) is the task of identifying different representati...
research
04/14/2019

Robust and Discriminative Labeling for Multi-label Active Learning Based on Maximum Correntropy Criterion

Multi-label learning draws great interests in many real world applicatio...
research
05/06/2022

HumanAL: Calibrating Human Matching Beyond a Single Task

This work offers a novel view on the use of human input as labels, ackno...
research
10/28/2022

Reinforcement Learning-based Defect Mitigation for Quality Assurance of Additive Manufacturing

Additive Manufacturing (AM) is a powerful technology that produces compl...

Please sign up or login with your details

Forgot password? Click here to reset