Adversarial Learning of Privacy-Preserving Text Representations for De-Identification of Medical Records

06/12/2019
by   Max Friedrich, et al.
0

De-identification is the task of detecting protected health information (PHI) in medical text. It is a critical step in sanitizing electronic health records (EHRs) to be shared for research. Automatic de-identification classifierscan significantly speed up the sanitization process. However, obtaining a large and diverse dataset to train such a classifier that works wellacross many types of medical text poses a challenge as privacy laws prohibit the sharing of raw medical records. We introduce a method to create privacy-preserving shareable representations of medical text (i.e. they contain no PHI) that does not require expensive manual pseudonymization. These representations can be shared between organizations to create unified datasets for training de-identification models. Our representation allows training a simple LSTM-CRF de-identification model to an F1 score of 97.4 exposes private information in its representation. A robust, widely available de-identification classifier based on our representation could potentially enable studies for which de-identification would otherwise be too costly.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/18/2018

Privacy Preserving Analytics on Distributed Medical Data

Objective: To enable privacy-preserving learning of high quality generat...
research
10/16/2018

A survey of automatic de-identification of longitudinal clinical narratives

Use of medical data, also known as electronic health records, in researc...
research
01/16/2020

Comparing Rule-based, Feature-based and Deep Neural Methods for De-identification of Dutch Medical Records

Unstructured information in electronic health records provide an invalua...
research
06/17/2019

Scrubbing Sensitive PHI Data from Medical Records made Easy by SpaCy -- A Scalable Model Implementation Comparisons

De-identification of clinical records is an extremely important process ...
research
01/27/2019

Automatic end-to-end De-identification: Is high accuracy the only metric?

De-identification of electronic health records (EHR) is a vital step tow...
research
03/25/2021

Benchmarking Modern Named Entity Recognition Techniques for Free-text Health Record De-identification

Electronic Health Records (EHRs) have become the primary form of medical...
research
01/30/2018

An Optimized Information-Preserving Relational Database Watermarking Scheme for Ownership Protection of Medical Data

Recently, a significant amount of interest has been developed in motivat...

Please sign up or login with your details

Forgot password? Click here to reset