PhenoTagger: A Hybrid Method for Phenotype Concept Recognition using Human Phenotype Ontology

09/17/2020
by   Ling Luo, et al.
0

Automatic phenotype concept recognition from unstructured text remains a challenging task in biomedical text mining research. Previous works that address the task typically use dictionary-based matching methods, which can achieve high precision but suffer from lower recall. Recently, machine learning-based methods have been proposed to identify biomedical concepts, which can recognize more unseen concept synonyms by automatic feature learning. However, most methods require large corpora of manually annotated data for model training, which is difficult to obtain due to the high cost of human annotation. In this paper, we propose PhenoTagger, a hybrid method that combines both dictionary and machine learning-based methods to recognize Human Phenotype Ontology (HPO) concepts in unstructured biomedical text. We first use all concepts and synonyms in HPO to construct a dictionary. Then, the dictionary and biomedical literature are used to automatically build a weakly-supervised training dataset for machine learning. Next, a cutting-edge deep learning model is trained to classify each candidate phrase into a corresponding concept label. Finally, the dictionary and machine learning-based prediction results are combined for improved performance. Our method is validated with two HPO corpora, and the results show that PhenoTagger compares favorably to state-of-the-art methods. In addition, to demonstrate the generalizability of our method, we retrained PhenoTagger using the disease ontology MEDIC for disease concept recognition to investigate the effect of training on different ontologies. Experimental results on the NCBI disease corpus show that PhenoTagger without requiring manually annotated training data achieves competitive performance as compared with state-of-the-art supervised methods.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/21/2021

End-to-end Biomedical Entity Linking with Span-based Dictionary Matching

Disease name recognition and normalization, which is generally called bi...
research
11/30/2022

AIONER: All-in-one scheme-based biomedical named entity recognition using deep learning

Biomedical named entity recognition (BioNER) seeks to automatically reco...
research
02/25/2019

MedMentions: A Large Biomedical Corpus Annotated with UMLS Concepts

This paper presents the formal release of MedMentions, a new manually an...
research
04/27/2015

Concept Extraction to Identify Adverse Drug Reactions in Medical Forums: A Comparison of Algorithms

Social media is becoming an increasingly important source of information...
research
11/06/2019

Gextext: Disease Network Extraction from Biomedical Literature

PURPOSE: We propose a fully unsupervised method to learn latent disease ...
research
02/12/2015

Applying deep learning techniques on medical corpora from the World Wide Web: a prototypical system and evaluation

BACKGROUND: The amount of biomedical literature is rapidly growing and i...
research
04/16/2016

ACD: Action Concept Discovery from Image-Sentence Corpora

Action classification in still images is an important task in computer v...

Please sign up or login with your details

Forgot password? Click here to reset