VIEW: a framework for organization level interactive record linkage to support reproducible data science

02/16/2021
by   Mohammad Karim, et al.
0

Objective: To design and evaluate a general framework for interactive record linkage using a convenient algorithm combined with tractable Human Intelligent Tasks (HITs; i.e. micro tasks requiring human judgment) that can support reproducible data science. Materials and Methods: Accurate linkage of real data requires both automatic processing of well-defined tasks and human processing of tasks that require human judgment (i.e., HITs) on messy data. We present a reproducible, interactive, and iterative framework for record linkage called VIEW (Visual Interactive Entity-resolution Workbench). We implemented and evaluated VIEW by integrating two commonly used hospital databases, the American Hospital Association (AHA) Annual Survey of Hospitals and the Medicare Cost Reports for Hospitals from CMS. Results: Using VIEW to iteratively standardize and clean the data, we linked all Texas hospitals common in both databases with 100 and manually linking 28 hospitals using HITs. Discussion: Similarities in hospital names and addresses and the dynamic nature of hospital attributes over time make it impossible to build a fully automated linkage system for hospitals that can be maintained over time. VIEW is a software that supports a reproducible semi-automated process that can generate and track HITs to be reviewed and linked manually for messy data elements such as hospitals that have been merged. Conclusion: Effective software that can support the interactive and iterative process of record linkage, and well-designed HITs can streamline the linkage processes to support high quality replicable research using messy real data.

READ FULL TEXT

page 4

page 9

research
06/07/2019

Increasing Transparent and Accountable Use of Data by Quantifying the Actual Privacy Risk in Interactive Record Linkage

Record linkage refers to the task of integrating data from two or more d...
research
10/31/2022

AI Assistants: A Framework for Semi-Automated Data Wrangling

Data wrangling tasks such as obtaining and linking data from various sou...
research
04/19/2021

Large Scale Record Linkage in the Presence of Missing Data

Record linkage is aimed at the accurate and efficient identification of ...
research
03/23/2023

Towards Transparent, Reusable, and Customizable Data Science in Computational Notebooks

Data science workflows are human-centered processes involving on-demand ...
research
12/13/2016

Application of Advanced Record Linkage Techniques for Complex Population Reconstruction

Record linkage is the process of identifying records that refer to the s...
research
08/10/2020

(Almost) All of Entity Resolution

Whether the goal is to estimate the number of people that live in a cong...
research
08/04/2017

Exploiting Redundancy, Recurrence and Parallelism: How to Link Millions of Addresses with Ten Lines of Code in Ten Minutes

Accurate and efficient record linkage is an open challenge of particular...

Please sign up or login with your details

Forgot password? Click here to reset