Interpretable and Low-Resource Entity Matching via Decoupling Feature Learning from Decision Making

06/08/2021
by   Zijun Yao, et al.
0

Entity Matching (EM) aims at recognizing entity records that denote the same real-world object. Neural EM models learn vector representation of entity descriptions and match entities end-to-end. Though robust, these methods require many resources for training, and lack of interpretability. In this paper, we propose a novel EM framework that consists of Heterogeneous Information Fusion (HIF) and Key Attribute Tree (KAT) Induction to decouple feature representation from matching decision. Using self-supervised learning and mask mechanism in pre-trained language modeling, HIF learns the embeddings of noisy attribute values by inter-attribute attention with unlabeled data. Using a set of comparison features and a limited amount of annotated data, KAT Induction learns an efficient decision tree that can be interpreted by generating entity matching rules whose structure is advocated by domain experts. Experiments on 6 public datasets and 3 industrial datasets show that our method is highly efficient and outperforms SOTA EM models in most cases. Our codes and datasets can be obtained from https://github.com/THU-KEG/HIF-KAT.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/11/2022

PromptEM: Prompt-tuning for Low-resource Generalized Entity Matching

Entity Matching (EM), which aims to identify whether two entity records ...
research
08/02/2023

MultiEM: Efficient and Effective Unsupervised Multi-Table Entity Matching

Entity Matching (EM), which aims to identify all entity pairs referring ...
research
04/01/2020

Deep Entity Matching with Pre-Trained Language Models

We present Ditto, a novel entity matching system based on pre-trained Tr...
research
09/20/2023

Heterogeneous Entity Matching with Complex Attribute Associations using BERT and Neural Networks

Across various domains, data from different sources such as Baidu Baike ...
research
06/10/2022

Machop: an End-to-End Generalized Entity Matching Framework

Real-world applications frequently seek to solve a general form of the E...
research
04/08/2021

Deep Indexed Active Learning for Matching Heterogeneous Entity Representations

Given two large lists of records, the task in entity resolution (ER) is ...
research
06/18/2020

Record fusion: A learning approach

Record fusion is the task of aggregating multiple records that correspon...

Please sign up or login with your details

Forgot password? Click here to reset