Entity Resolution and Federated Learning get a Federated Resolution

03/11/2018
by   Richard Nock, et al.
0

Consider two data providers, each maintaining records of different feature sets about common entities. They aim to learn a linear model over the whole set of features. This problem of federated learning over vertically partitioned data includes a crucial upstream issue: entity resolution, i.e. finding the correspondence between the rows of the datasets. It is well known that entity resolution, just like learning, is mistake-prone in the real world. Despite the importance of the problem, there has been no formal assessment of how errors in entity resolution impact learning. In this paper, we provide a thorough answer to this question, answering how optimal classifiers, empirical losses, margins and generalisation abilities are affected. While our answer spans a wide set of losses --- going beyond proper, convex, or classification calibrated ---, it brings simple practical arguments to upgrade entity resolution as a preprocessing step to learning. As an example, we modify a simple token-based entity resolution algorithm so that it aims at avoiding matching rows belonging to different classes, and perform experiments in the setting where entity resolution relies on noisy data, which is very relevant to real world domains. Notably, our approach covers the case where one peer does not have classes, or a noisy record of classes. Experiments display that using the class information during entity resolution can buy significant uplift for learning at little expense from the complexity standpoint.

READ FULL TEXT

page 6

page 23

research
08/23/2022

FlexER: Flexible Entity Resolution for Multiple Intents

Entity resolution, a longstanding problem of data cleaning and integrati...
research
06/11/2021

Exploiting Record Similarity for Practical Vertical Federated Learning

As the privacy of machine learning has drawn increasing attention, feder...
research
03/01/2021

Federated Learning without Revealing the Decision Boundaries

We consider the recent privacy preserving methods that train the models ...
research
10/31/2011

Query-time Entity Resolution

Entity resolution is the problem of reconciling database references corr...
research
12/28/2021

Bipartite Graph Matching Algorithms for Clean-Clean Entity Resolution: An Empirical Evaluation

Entity Resolution (ER) is the task of finding records that refer to the ...
research
10/03/2022

Estimating the Performance of Entity Resolution Algorithms: Lessons Learned Through PatentsView.org

This paper introduces a novel evaluation methodology for entity resoluti...
research
08/16/2019

AutoER: Automated Entity Resolution using Generative Modelling

Entity resolution (ER) refers to the problem of identifying records in o...

Please sign up or login with your details

Forgot password? Click here to reset