Towards Automatic Generation of Shareable Synthetic Clinical Notes Using Neural Language Models

05/16/2019
by   Oren Melamud, et al.
0

Large-scale clinical data is invaluable to driving many computational scientific advances today. However, understandable concerns regarding patient privacy hinder the open dissemination of such data and give rise to suboptimal siloed research. De-identification methods attempt to address these concerns but were shown to be susceptible to adversarial attacks. In this work, we focus on the vast amounts of unstructured natural language data stored in clinical notes and propose to automatically generate synthetic clinical notes that are more amenable to sharing using generative models trained on real de-identified records. To evaluate the merit of such notes, we measure both their privacy preservation properties as well as utility in training clinical NLP models. Experiments using neural language models yield notes whose utility is close to that of the real ones in some clinical NLP tasks, yet leave ample room for future improvements.

READ FULL TEXT
research
09/01/2023

Publicly Shareable Clinical Large Language Model Built on Synthetic Clinical Notes

The development of large language models tailored for handling patients'...
research
06/03/2016

Using Neural Generative Models to Release Synthetic Twitter Corpora with Reduced Stylometric Identifiability of Users

We present a method for generating synthetic versions of Twitter data us...
research
02/16/2023

Do We Still Need Clinical Language Models?

Although recent advances in scaling large language models (LLMs) have re...
research
02/17/2021

Performance of Automatic De-identification Across Different Note Types

Free-text clinical notes detail all aspects of patient care and have gre...
research
06/27/2019

Training Models to Extract Treatment Plans from Clinical Notes Using Contents of Sections with Headings

Objective: Using natural language processing (NLP) to find sentences tha...
research
09/14/2023

Clinical Text Summarization: Adapting Large Language Models Can Outperform Human Experts

Sifting through vast textual data and summarizing key information impose...
research
06/28/2022

The NLP Sandbox: an efficient model-to-data system to enable federated and unbiased evaluation of clinical NLP models

Objective The evaluation of natural language processing (NLP) models for...

Please sign up or login with your details

Forgot password? Click here to reset