Which anonymization technique is best for which NLP task? – It depends. A Systematic Study on Clinical Text Processing

09/01/2022
by   Iyadh Ben Cheikh Larbi, et al.
0

Clinical text processing has gained more and more attention in recent years. The access to sensitive patient data, on the other hand, is still a big challenge, as text cannot be shared without legal hurdles and without removing personal information. There are many techniques to modify or remove patient related information, each with different strengths. This paper investigates the influence of different anonymization techniques on the performance of ML models using multiple datasets corresponding to five different NLP tasks. Several learnings and recommendations are presented. This work confirms that particularly stronger anonymization techniques lead to a significant drop of performance. In addition to that, most of the presented techniques are not secure against a re-identification attack based on similarity search.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/06/2019

Publicly Available Clinical BERT Embeddings

Contextual word embedding models such as ELMo (Peters et al., 2018) and ...
research
10/03/2018

A Deep Learning Architecture for De-identification of Patient Notes: Implementation and Evaluation

De-identification is the process of removing 18 protected health informa...
research
09/30/2021

Tipping the Scales: A Corpus-Based Reconstruction of Adjective Scales in the McGill Pain Questionnaire

Modern medical diagnosis relies on precise pain assessment tools in tran...
research
09/10/2021

How May I Help You? Using Neural Text Simplification to Improve Downstream NLP Tasks

The general goal of text simplification (TS) is to reduce text complexit...
research
09/14/2023

Clinical Text Summarization: Adapting Large Language Models Can Outperform Human Experts

Sifting through vast textual data and summarizing key information impose...
research
06/08/2023

Privacy- and Utility-Preserving NLP with Anonymized Data: A case study of Pseudonymization

This work investigates the effectiveness of different pseudonymization t...
research
04/13/2018

A Determination Scheme for Quasi-Identifiers Using Uniqueness and Influence for De-Identification of Clinical Data

Objectives; The accumulation and usefulness of clinical data have increa...

Please sign up or login with your details

Forgot password? Click here to reset