BRATsynthetic: Text De-identification using a Markov Chain Replacement Strategy for Surrogate Personal Identifying Information

10/28/2022
by   John D. Osborne, et al.
0

Objective: Implement and assess personal health identifying information (PHI) substitution strategies and quantify their privacy preserving benefits. Materials and Methods: We implement and assess 3 different `Hiding in Plain Sight` (HIPS) strategies for PHI replacement including a standard Consistent replacement strategy, a Random replacement strategy and a novel Markov model-based strategy. We evaluate the privacy preserving benefits of these strategies on a synthetic PHI distribution and real clinical corpora from 2 different institutions using a range of false negative error rates (FNER). Results: Using FNER ranging from 0.1 could be reduced from 27.1 FNER) utilizing the Markov chain strategy versus the Consistent strategy on a corpus containing a diverse set of notes from the University of Alabama at Birmingham (UAB). The Markov chain substitution strategy also consistently outperformed the Consistent and Random substitution strategies in a MIMIC corpus of discharge summaries and on a range of synthetic clinical PHI distributions. Discussion: We demonstrate that a Markov chain surrogate generation strategy substantially reduces the chance of inadvertent PHI release across a range of assumed PHI FNER and release our implementation `BRATsynthetic` on Github. Conclusion: The Markov chain replacement strategy allows for the release of larger de-identified corpora at the same risk level relative to corpora released using a consistent HIPS strategy.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/05/2018

Simulating Raga Notes with a Markov Chain of Order 1-2

Semi Natural Algorithmic composition (SNCA) is the technique of using al...
research
07/29/2023

Shared Information for a Markov Chain on a Tree

Shared information is a measure of mutual dependence among multiple join...
research
07/17/2022

The Importance Markov Chain

The Importance Markov chain is a new algorithm bridging the gap between ...
research
08/23/2022

Random Transpositions on Contingency Tables

Contingency tables are useful objects in statistics for representing 2-w...
research
03/16/2019

Active and Passive Portfolio Management with Latent Factors

We address a portfolio selection problem that combines active (outperfor...
research
07/01/2023

Iterative conditional replacement algorithm for conditionally specified models

The sample-based Gibbs sampler has been the dominant method for approxim...

Please sign up or login with your details

Forgot password? Click here to reset