The NLP Sandbox: an efficient model-to-data system to enable federated and unbiased evaluation of clinical NLP models

06/28/2022
by   Yao Yan, et al.
13

Objective The evaluation of natural language processing (NLP) models for clinical text de-identification relies on the availability of clinical notes, which is often restricted due to privacy concerns. The NLP Sandbox is an approach for alleviating the lack of data and evaluation frameworks for NLP models by adopting a federated, model-to-data approach. This enables unbiased federated model evaluation without the need for sharing sensitive data from multiple institutions. Materials and Methods We leveraged the Synapse collaborative framework, containerization software, and OpenAPI generator to build the NLP Sandbox (nlpsandbox.io). We evaluated two state-of-the-art NLP de-identification focused annotation models, Philter and NeuroNER, using data from three institutions. We further validated model performance using data from an external validation site. Results We demonstrated the usefulness of the NLP Sandbox through de-identification clinical model evaluation. The external developer was able to incorporate their model into the NLP Sandbox template and provide user experience feedback. Discussion We demonstrated the feasibility of using the NLP Sandbox to conduct a multi-site evaluation of clinical text de-identification models without the sharing of data. Standardized model and data schemas enable smooth model transfer and implementation. To generalize the NLP Sandbox, work is required on the part of data owners and model developers to develop suitable and standardized schemas and to adapt their data or model to fit the schemas. Conclusions The NLP Sandbox lowers the barrier to utilizing clinical data for NLP model evaluation and facilitates federated, multi-site, unbiased evaluation of NLP models.

READ FULL TEXT

page 16

page 26

research
06/28/2023

Multi-Site Clinical Federated Learning using Recursive and Attentive Models and NVFlare

The prodigious growth of digital health data has precipitated a mounting...
research
02/20/2020

Federated pretraining and fine tuning of BERT using clinical notes from multiple silos

Large scale contextual representation models, such as BERT, have signifi...
research
11/15/2018

Implementing a Portable Clinical NLP System with a Common Data Model - a Lisp Perspective

This paper presents a Lisp architecture for a portable NLP system, terme...
research
05/16/2019

Towards Automatic Generation of Shareable Synthetic Clinical Notes Using Neural Language Models

Large-scale clinical data is invaluable to driving many computational sc...
research
10/03/2018

A Deep Learning Architecture for De-identification of Patient Notes: Implementation and Evaluation

De-identification is the process of removing 18 protected health informa...
research
02/09/2022

FedQAS: Privacy-aware machine reading comprehension with federated learning

Machine reading comprehension (MRC) of text data is one important task i...
research
03/23/2018

Detection of Surgical Site Infection Utilizing Automated Feature Generation in Clinical Notes

Postsurgical complications (PSCs) are known as a deviation from the norm...

Please sign up or login with your details

Forgot password? Click here to reset