Application of Advanced Record Linkage Techniques for Complex Population Reconstruction

12/13/2016
by   Peter Christen, et al.
0

Record linkage is the process of identifying records that refer to the same entities from several databases. This process is challenging because commonly no unique entity identifiers are available. Linkage therefore has to rely on partially identifying attributes, such as names and addresses of people. Recent years have seen the development of novel techniques for linking data from diverse application areas, where a major focus has been on linking complex data that contain records about different types of entities. Advanced approaches that exploit both the similarities between record attributes as well as the relationships between entities to identify clusters of matching records have been developed. In this application paper we study the novel problem where rather than different types of entities we have databases where the same entity can have different roles, and where these roles change over time. We specifically develop novel techniques for linking historical birth, death, marriage and census records with the aim to reconstruct the population covered by these records over a period of several decades. Our experimental evaluation on real Scottish data shows that even with advanced linkage techniques that consider group, relationship, and temporal aspects it is challenging to achieve high quality linkage from such complex data.

READ FULL TEXT
research
04/19/2021

Large Scale Record Linkage in the Presence of Missing Data

Record linkage is aimed at the accurate and efficient identification of ...
research
02/15/2023

A Case Study on Record Matching of Individuals in Historical Archives of Indigenous Databases

Digitization of historical records has produced a significant amount of ...
research
07/06/2018

Temporal graph-based clustering for historical record linkage

Research in the social sciences is increasingly based on large and compl...
research
02/22/2018

Options for encoding names for data linking at the Australian Bureau of Statistics

Publicly, ABS has said it would use a cryptographic hash function to con...
research
11/13/2018

Personal Names Popularity Estimation and its Application to Record Linkage

This study deals with a fairly simply formulated problem -- how to estim...
research
05/14/2012

A Generalized Fellegi-Sunter Framework for Multiple Record Linkage With Application to Homicide Record Systems

We present a probabilistic method for linking multiple datafiles. This t...
research
02/16/2021

VIEW: a framework for organization level interactive record linkage to support reproducible data science

Objective: To design and evaluate a general framework for interactive re...

Please sign up or login with your details

Forgot password? Click here to reset