Contextualised concept embedding for efficiently adapting natural language processing models for phenotype identification

03/10/2019
by   Honghan Wu, et al.
0

Many efforts have been put to use automated approaches, such as natural language processing (NLP), to mine or extract data from free-text medical records to picture comprehensive patient profiles for delivering better health-care. Reusing NLP models in new settings, however, remains cumbersome - requiring validation and/or retraining on new data iteratively to achieve convergent results. In this paper, we formally define and analyse the NLP model adaptation problem, particularly in phenotype identification tasks, and identify two types of common unnecessary or wasted efforts: duplicate waste and imbalance waste. A distributed representation approach is proposed to represent familiar language patterns for an NLP model by learning phenotype embeddings from its training data. Computations on these language patterns are then introduced to help avoid or reduce unnecessary efforts by combining both geometric and semantic similarities. To evaluate the approach, we cross validate NLP models developed for six physical morbidity studies (23 phenotypes; 17 million documents) on anonymised medical records of South London Maudsley NHS Trust, United Kingdom. Two metrics are introduced to quantify the reductions for both duplicate and imbalance wastes. We conducted various experiments on reusing NLP models in four phenotype identification tasks. Our approach can choose a best model for a given new task, which can identify up to 76 model retraining, meanwhile, having very good performances (93-97 It can also provide guidance for validating and retraining the model for novel language patterns in new tasks, which can help save around 80 required in blind model-adaptation approaches.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/30/2021

Deep Natural Language Processing for LinkedIn Search Systems

Many search systems work with large amounts of natural language data, e....
research
01/30/2021

Taxonomic survey of Hindi Language NLP systems

Natural Language processing (NLP) represents the task of automatic handl...
research
10/10/2022

A Survey of Methods for Addressing Class Imbalance in Deep-Learning Based Natural Language Processing

Many natural language processing (NLP) tasks are naturally imbalanced, a...
research
05/07/2023

LatinCy: Synthetic Trained Pipelines for Latin NLP

This paper introduces LatinCy, a set of trained general purpose Latin-la...
research
07/27/2021

Remember What You have drawn: Semantic Image Manipulation with Memory

Image manipulation with natural language, which aims to manipulate image...
research
11/15/2022

Analyse der Entwicklungstreiber militärischer Schwarmdrohnen durch Natural Language Processing

Military drones are taking an increasingly prominent role in armed confl...

Please sign up or login with your details

Forgot password? Click here to reset