Neural Language Models with Distant Supervision to Identify Major Depressive Disorder from Clinical Notes

Major depressive disorder (MDD) is a prevalent psychiatric disorder that is associated with significant healthcare burden worldwide. Phenotyping of MDD can help early diagnosis and consequently may have significant advantages in patient management. In prior research MDD phenotypes have been extracted from structured Electronic Health Records (EHR) or using Electroencephalographic (EEG) data with traditional machine learning models to predict MDD phenotypes. However, MDD phenotypic information is also documented in free-text EHR data, such as clinical notes. While clinical notes may provide more accurate phenotyping information, natural language processing (NLP) algorithms must be developed to abstract such information. Recent advancements in NLP resulted in state-of-the-art neural language models, such as Bidirectional Encoder Representations for Transformers (BERT) model, which is a transformer-based model that can be pre-trained from a corpus of unsupervised text data and then fine-tuned on specific tasks. However, such neural language models have been underutilized in clinical NLP tasks due to the lack of large training datasets. In the literature, researchers have utilized the distant supervision paradigm to train machine learning models on clinical text classification tasks to mitigate the issue of lacking annotated training data. It is still unknown whether the paradigm is effective for neural language models. In this paper, we propose to leverage the neural language models in a distant supervision paradigm to identify MDD phenotypes from clinical notes. The experimental results indicate that our proposed approach is effective in identifying MDD phenotypes and that the Bio- Clinical BERT, a specific BERT model for clinical data, achieved the best performance in comparison with conventional machine learning models.


page 1

page 2

page 3

page 4


AKI-BERT: a Pre-trained Clinical Language Model for Early Prediction of Acute Kidney Injury

Acute kidney injury (AKI) is a common clinical syndrome characterized by...

Phenotyping of Clinical Notes with Improved Document Classification Models Using Contextualized Neural Language Models

Clinical notes contain an extensive record of a patient's health status,...

A Deep Representation Empowered Distant Supervision Paradigm for Clinical Information Extraction

Objective: To automatically create large labeled training datasets and r...

A Multi-View Joint Learning Framework for Embedding Clinical Codes and Text Using Graph Neural Networks

Learning to represent free text is a core task in many clinical machine ...

Embedding Electronic Health Records for Clinical Information Retrieval

Neural network representation learning frameworks have recently shown to...

Training Models to Extract Treatment Plans from Clinical Notes Using Contents of Sections with Headings

Objective: Using natural language processing (NLP) to find sentences tha...

Ontology-Driven Self-Supervision for Adverse Childhood Experiences Identification Using Social Media Datasets

Adverse Childhood Experiences (ACEs) are defined as a collection of high...

Please sign up or login with your details

Forgot password? Click here to reset