Record Linkage to Match Customer Names: A Probabilistic Approach

06/26/2018
by   Bahare Fatemi, et al.
0

Consider the following problem: given a database of records indexed by names (e.g., name of companies, restaurants, businesses, or universities) and a new name, determine whether the new name is in the database, and if so, which record it refers to. This problem is an instance of record linkage problem and is a challenging problem because people do not consistently use the official name, but use abbreviations, synonyms, different order of terms, different spelling of terms, short form of terms, and the name can contain typos or spacing issues. We provide a probabilistic model using relational logistic regression to find the probability of each record in the database being the desired record for a given query and find the best record(s) with respect to the probabilities. Building on term-matching and translational approaches for search, our model addresses many of the aforementioned challenges and provides good results when existing baselines fail. Using the probabilities outputted by the model, we can automate the search process for a portion of queries whose desired documents get a probability higher than a trust threshold. We evaluate our model on a large real-world dataset from a telecommunications company and compare it to several state-of-the-art baselines. The obtained results show that our model is a promising probabilistic model for record linkage for names. We also test if the knowledge learned by our model on one domain can be effectively transferred to a new domain. For this purpose, we test our model on an unseen test set from the business names of the secondString dataset. Promising results show that our model can be effectively applied to unseen datasets. Finally, we study the sensitivity of our model to the statistics of datasets.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/19/2019

Fast Record Linkage for Company Entities

Record Linkage is an essential part of almost all real-world systems tha...
research
11/13/2018

Personal Names Popularity Estimation and its Application to Record Linkage

This study deals with a fairly simply formulated problem -- how to estim...
research
09/03/2018

GB-KMV: An Augmented KMV Sketch for Approximate Containment Similarity Search

In this paper, we study the problem of approximate containment similarit...
research
07/12/2012

A Hierarchical Graphical Model for Record Linkage

The task of matching co-referent records is known among other names as r...
research
03/09/2020

Fast Bayesian Record Linkage With Record-Specific Disagreement Parameters

Applied researchers are often interested in linking individuals between ...
research
12/17/2019

Function Naming in Stripped Binaries Using Neural Networks

In this paper we investigate the problem of automatically naming pieces ...

Please sign up or login with your details

Forgot password? Click here to reset