Rare Disease Identification from Clinical Notes with Ontologies and Weak Supervision

05/05/2021
by   Hang Dong, et al.
0

The identification of rare diseases from clinical notes with Natural Language Processing (NLP) is challenging due to the few cases available for machine learning and the need of data annotation from clinical experts. We propose a method using ontologies and weak supervision. The approach includes two steps: (i) Text-to-UMLS, linking text mentions to concepts in Unified Medical Language System (UMLS), with a named entity linking tool (e.g. SemEHR) and weak supervision based on customised rules and Bidirectional Encoder Representations from Transformers (BERT) based contextual representations, and (ii) UMLS-to-ORDO, matching UMLS concepts to rare diseases in Orphanet Rare Disease Ontology (ORDO). Using MIMIC-III discharge summaries as a case study, we show that the Text-to-UMLS process can be greatly improved with weak supervision, without any annotated data from domain experts. Our analysis shows that the overall pipeline processing discharge summaries can surface rare disease cases, which are mostly uncaptured in manual ICD codes of the hospital admissions.

READ FULL TEXT
research
05/11/2022

Ontology-Based and Weakly Supervised Rare Disease Phenotyping from Clinical Notes

Computational text phenotyping is the practice of identifying patients w...
research
08/24/2023

Large Language Models Vote: Prompting for Rare Disease Identification

The emergence of generative Large Language Models (LLMs) emphasizes the ...
research
01/08/2023

Semantic rule Web-based Diagnosis and Treatment of Vector-Borne Diseases using SWRL rules

Vector-borne diseases (VBDs) are a kind of infection caused through the ...
research
08/24/2022

Ontology-Driven Self-Supervision for Adverse Childhood Experiences Identification Using Social Media Datasets

Adverse Childhood Experiences (ACEs) are defined as a collection of high...
research
08/24/2022

Adverse Childhood Experiences Identification from Clinical Notes with Ontologies and NLP

Adverse Childhood Experiences (ACEs) are defined as a collection of high...
research
09/01/2021

Exploring deep learning methods for recognizing rare diseases and their clinical manifestations from texts

Although rare diseases are characterized by low prevalence, approximatel...
research
08/02/2021

The RareDis corpus: a corpus annotated with rare diseases, their signs and symptoms

The RareDis corpus contains more than 5,000 rare diseases and almost 6,0...

Please sign up or login with your details

Forgot password? Click here to reset