Active Deep Learning on Entity Resolution by Risk Sampling

12/23/2020
by   Youcef Nafa, et al.
0

While the state-of-the-art performance on entity resolution (ER) has been achieved by deep learning, its effectiveness depends on large quantities of accurately labeled training data. To alleviate the data labeling burden, Active Learning (AL) presents itself as a feasible solution that focuses on data deemed useful for model training. Building upon the recent advances in risk analysis for ER, which can provide a more refined estimate on label misprediction risk than the simpler classifier outputs, we propose a novel AL approach of risk sampling for ER. Risk sampling leverages misprediction risk estimation for active instance selection. Based on the core-set characterization for AL, we theoretically derive an optimization model which aims to minimize core-set loss with non-uniform Lipschitz continuity. Since the defined weighted K-medoids problem is NP-hard, we then present an efficient heuristic algorithm. Finally, we empirically verify the efficacy of the proposed approach on real data by a comparative study. Our extensive experiments have shown that it outperforms the existing alternatives by considerable margins. Using ER as a test case, we demonstrate that risk sampling is a promising approach potentially applicable to other challenging classification tasks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/07/2020

Adaptive Deep Learning for Entity Resolution by Risk Analysis

The state-of-the-art performance on entity resolution (ER) has been achi...
research
12/06/2019

Towards Interpretable and Learnable Risk Analysis for Entity Resolution

Machine-learning-based entity resolution has been widely studied. Howeve...
research
05/31/2018

Improving the Results of Machine-based Entity Resolution with Limited Human Effort: A Risk Perspective

Pure machine-based solutions usually struggle in challenging classificat...
research
05/31/2018

Improving Machine-based Entity Resolution with Limited Human Effort: A Risk Perspective

Pure machine-based solutions usually struggle in the challenging classif...
research
03/15/2018

r-HUMO: A Risk-Aware Human-Machine Cooperation Framework for Entity Resolution with Quality Guarantees

Even though many approaches have been proposed for entity resolution (ER...
research
11/20/2020

Cost-effective Variational Active Entity Resolution

Accurately identifying different representations of the same real-world ...
research
07/07/2021

RISAN: Robust Instance Specific Abstention Network

In this paper, we propose deep architectures for learning instance speci...

Please sign up or login with your details

Forgot password? Click here to reset