Reuse and Adaptation for Entity Resolution through Transfer Learning

Entity resolution (ER) is one of the fundamental problems in data integration, where machine learning (ML) based classifiers often provide the state-of-the-art results. Considerable human effort goes into feature engineering and training data creation. In this paper, we investigate a new problem: Given a dataset D_T for ER with limited or no training data, is it possible to train a good ML classifier on D_T by reusing and adapting the training data of dataset D_S from same or related domain? Our major contributions include (1) a distributed representation based approach to encode each tuple from diverse datasets into a standard feature space; (2) identification of common scenarios where the reuse of training data can be beneficial; and (3) five algorithms for handling each of the aforementioned scenarios. We have performed comprehensive experiments on 12 datasets from 5 different domains (publications, movies, songs, restaurants, and books). Our experiments show that our algorithms provide significant benefits such as providing superior performance for a fixed training data size.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/11/2019

Towards Safe Machine Learning for CPS: Infer Uncertainty from Training Data

Machine learning (ML) techniques are increasingly applied to decision-ma...
research
06/08/2021

Supervised Machine Learning with Plausible Deniability

We study the question of how well machine learning (ML) models trained o...
research
11/08/2022

Inferring Class Label Distribution of Training Data from Classifiers: An Accuracy-Augmented Meta-Classifier Attack

Property inference attacks against machine learning (ML) models aim to i...
research
08/16/2019

AutoER: Automated Entity Resolution using Generative Modelling

Entity resolution (ER) refers to the problem of identifying records in o...
research
12/07/2020

Adaptive Deep Learning for Entity Resolution by Risk Analysis

The state-of-the-art performance on entity resolution (ER) has been achi...
research
03/13/2020

DAN: Dual-View Representation Learning for Adapting Stance Classifiers to New Domains

We address the issue of having a limited number of annotations for stanc...
research
08/02/2019

A Visual Technique to Analyze Flow of Information in a Machine Learning System

Machine learning (ML) algorithms and machine learning based software sys...

Please sign up or login with your details

Forgot password? Click here to reset