Temporal graph-based clustering for historical record linkage

07/06/2018
by   Charini Nanayakkara, et al.
0

Research in the social sciences is increasingly based on large and complex data collections, where individual data sets from different domains are linked and integrated to allow advanced analytics. A popular type of data used in such a context are historical censuses, as well as birth, death, and marriage certificates. Individually, such data sets however limit the types of studies that can be conducted. Specifically, it is impossible to track individuals, families, or households over time. Once such data sets are linked and family trees spanning several decades are available it is possible to, for example, investigate how education, health, mobility, employment, and social status influence each other and the lives of people over two or even more generations. A major challenge is however the accurate linkage of historical data sets which is due to data quality and commonly also the lack of ground truth data being available. Unsupervised techniques need to be employed, which can be based on similarity graphs generated by comparing individual records. In this paper we present initial results from clustering birth records from Scotland where we aim to identify all births of the same mother and group siblings into clusters. We extend an existing clustering technique for record linkage by incorporating temporal constraints that must hold between births by the same mother, and propose a novel greedy temporal clustering technique. Experimental results show improvements over non-temporary approaches, however further work is needed to obtain links of high quality.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/13/2016

Application of Advanced Record Linkage Techniques for Complex Population Reconstruction

Record linkage is the process of identifying records that refer to the s...
research
02/15/2023

A Case Study on Record Matching of Individuals in Historical Archives of Indigenous Databases

Digitization of historical records has produced a significant amount of ...
research
04/19/2021

Large Scale Record Linkage in the Presence of Missing Data

Record linkage is aimed at the accurate and efficient identification of ...
research
06/20/2018

Developing a Temporal Bibliographic Data Set for Entity Resolution

Entity resolution is the process of identifying groups of records within...
research
08/23/2020

A Prior for Record Linkage Based on Allelic Partitions

In database management, record linkage aims to identify multiple records...
research
02/27/2019

Ranking in Genealogy: Search Results Fusion at Ancestry

Genealogy research is the study of family history using available resour...
research
03/21/2017

Historical collaborative geocoding

The latest developments in digital have provided large data sets that ca...

Please sign up or login with your details

Forgot password? Click here to reset