Generation of Synthetic Electronic Medical Record Text

12/06/2018
by   Jiaqi Guan, et al.
0

Machine learning (ML) and Natural Language Processing (NLP) have achieved remarkable success in many fields and have brought new opportunities and high expectation in the analyses of medical data. The most common type of medical data is the massive free-text electronic medical records (EMR). It is widely regarded that mining such massive data can bring up important information for improving medical practices as well as for possible new discoveries on complex diseases. However, the free EMR texts are lacking consistent standards, rich of private information, and limited in availability. Also, as they are accumulated from everyday practices, it is often hard to have a balanced number of samples for the types of diseases under study. These problems hinder the development of ML and NLP methods for EMR data analysis. To tackle these problems, we developed a model to generate synthetic text of EMRs called Medical Text Generative Adversarial Network or mtGAN. It is based on the GAN framework and is trained by the REINFORCE algorithm. It takes disease features as inputs and generates synthetic texts as EMRs for the corresponding diseases. We evaluate the model from micro-level, macro-level and application-level on a Chinese EMR text dataset. The results show that the method has a good capacity to fit real data and can generate realistic and diverse EMR samples. This provides a novel way to avoid potential leakage of patient privacy while still supply sufficient well-controlled cohort data for developing downstream ML and NLP methods. It can also be used as a data augmentation method to assist studies based on real EMR data.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/22/2023

Natural Language Processing in Electronic Health Records in Relation to Healthcare Decision-making: A Systematic Review

Background: Natural Language Processing (NLP) is widely used to extract ...
research
03/19/2017

Generating Multi-label Discrete Patient Records using Generative Adversarial Networks

Access to electronic health record (EHR) data has motivated computationa...
research
07/04/2022

GAN-based generation of realistic 3D data: A systematic review and taxonomy

Data has become the most valuable resource in today's world. With the ma...
research
04/10/2020

Negation Detection for Clinical Text Mining in Russian

Developing predictive modeling in medicine requires additional features ...
research
04/04/2023

Synthesize Extremely High-dimensional Longitudinal Electronic Health Records via Hierarchical Autoregressive Language Model

Synthetic electronic health records (EHRs) that are both realistic and p...
research
06/12/2018

Are My EHRs Private Enough? -Event-level Privacy Protection

Privacy is a major concern in sharing human subject data to researchers ...

Please sign up or login with your details

Forgot password? Click here to reset