An Easy-to-use and Robust Approach for the Differentially Private De-Identification of Clinical Textual Documents

11/02/2022
by   Yakini Tchouka, et al.
0

Unstructured textual data is at the heart of healthcare systems. For obvious privacy reasons, these documents are not accessible to researchers as long as they contain personally identifiable information. One way to share this data while respecting the legislative framework (notably GDPR or HIPAA) is, within the medical structures, to de-identify it, i.e. to detect the personal information of a person through a Named Entity Recognition (NER) system and then replacing it to make it very difficult to associate the document with the person. The challenge is having reliable NER and substitution tools without compromising confidentiality and consistency in the document. Most of the conducted research focuses on English medical documents with coarse substitutions by not benefiting from advances in privacy. This paper shows how an efficient and differentially private de-identification approach can be achieved by strengthening the less robust de-identification method and by adapting state-of-the-art differentially private mechanisms for substitution purposes. The result is an approach for de-identifying clinical documents in French language, but also generalizable to other languages and whose robustness is mathematically proven.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/16/2022

De-Identification of French Unstructured Clinical Notes for Machine Learning Tasks

Unstructured textual data are at the heart of health systems: liaison le...
research
06/04/2021

Dutch Named Entity Recognition and De-identification Methods for the Human Resource Domain

The human resource (HR) domain contains various types of privacy-sensiti...
research
11/15/2021

Colored Noise Mechanism for Differentially Private Clustering

The goal of this paper is to propose and analyze a differentially privat...
research
05/27/2020

Benchmarking Differentially Private Residual Networks for Medical Imagery

Hospitals and other medical institutions often have vast amounts of medi...
research
12/09/2021

Differentially Private Ensemble Classifiers for Data Streams

Learning from continuous data streams via classification/regression is p...
research
12/18/2019

MedCAT – Medical Concept Annotation Tool

Biomedical documents such as Electronic Health Records (EHRs) contain a ...
research
05/04/2021

Automatic de-identification of Data Download Packages

The General Data Protection Regulation (GDPR) grants all natural persons...

Please sign up or login with your details

Forgot password? Click here to reset