Assessing the accuracy of record linkages with Markov chain based Monte Carlo simulation approach

01/15/2019 ∙ by Shovanur Haque, et al. ∙ 0

Record linkage is the process of finding matches and linking records from different data sources so that the linked records belong to the same entity. There is an increasing number of applications of record linkage in statistical, health, government and business organisations to link administrative, survey, population census and other files to create a complete set of information for more complete and comprehensive analysis. Despite this increase, there has been little work on developing tools to assess the quality of linked files. Ensuring that the matched records in the combined file actually correspond to the same individual or entity is crucial for the validity of any analyses and inferences based on the combined data. This paper proposes a Markov Chain based Monte Carlo simulation method for assessing the accuracy of a linked file and illustrates the utility of the approach using the ABS (Australian Bureau of Statistics) synthetic data in realistic data settings. In the linking process, different blocking strategies are considered to classify matches from non-matches with different levels of accuracy. To assess the average accuracy of linking, correctly linked proportions are investigated for each record. Test results show strong performance of the proposed method of assessment of accuracy of the linkages.



There are no comments yet.


This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.