Generalized and Transferable Patient Language Representation for Phenotyping with Limited Data

by   Yuqi Si, et al.

The paradigm of representation learning through transfer learning has the potential to greatly enhance clinical natural language processing. In this work, we propose a multi-task pre-training and fine-tuning approach for learning generalized and transferable patient representations from medical language. The model is first pre-trained with different but related high-prevalence phenotypes and further fine-tuned on downstream target tasks. Our main contribution focuses on the impact this technique can have on low-prevalence phenotypes, a challenging task due to the dearth of data. We validate the representation from pre-training, and fine-tune the multi-task pre-trained models on low-prevalence phenotypes including 38 circulatory diseases, 23 respiratory diseases, and 17 genitourinary diseases. We find multi-task pre-training increases learning efficiency and achieves consistently high performance across the majority of phenotypes. Most important, the multi-task pre-training is almost always either the best-performing model or performs tolerably close to the best-performing model, a property we refer to as robust. All these results lead us to conclude that this multi-task transfer learning architecture is a robust approach for developing generalized and transferable patient language representations for numerous phenotypes.


MVP: Multi-task Supervised Pre-training for Natural Language Generation

Pre-trained language models (PLMs) have achieved notable success in natu...

Using Transfer Learning for Code-Related Tasks

Deep learning (DL) techniques have been used to support several code-rel...

Muppet: Massive Multi-task Representations with Pre-Finetuning

We propose pre-finetuning, an additional large-scale learning stage betw...

MultiCheXNet: A Multi-Task Learning Deep Network For Pneumonia-like Diseases Diagnosis From X-ray Scans

We present MultiCheXNet, an end-to-end Multi-task learning model, that i...

Multivariate Business Process Representation Learning utilizing Gramian Angular Fields and Convolutional Neural Networks

Learning meaningful representations of data is an important aspect of ma...

What's in your Head? Emergent Behaviour in Multi-Task Transformer Models

The primary paradigm for multi-task training in natural language process...

Unsupervised pre-training of graph transformers on patient population graphs

Pre-training has shown success in different areas of machine learning, s...