Understanding Stability of Medical Concept Embeddings: Analysis and Prediction

04/21/2019
by   Grace E. Lee, et al.
0

In biomedical area, medical concepts linked to external knowledge bases (e.g., UMLS) are frequently used for accurate and effective representations. There are many studies to develop embeddings for medical concepts on biomedical corpus and evaluate overall quality of concept embeddings. However, quality of individual concept embeddings has not been carefully investigated. We analyze the quality of medical concept embeddings trained with word2vec in terms of embedding stability. From the analysis, we observe that some of concept embeddings are out of the effect of different hyperparameter values in word2vec and remain with poor stability. Moreover, when stability of concept embeddings is analyzed in terms of frequency, many low-frequency concepts achieve high stability as high-frequency concepts do. The findings suggest that there are other factors influencing the stability of medical concept embeddings. In this paper, we propose a new factor, the distribution of context words to predict stability of medical concept embeddings. By estimating the distribution of context words using normalized entropy, we show that the skewed distribution has a moderate correlation with the stability of concept embeddings. The result demonstrates that a medical concept whose a large portion of context words is taken up by a few words is able to obtain high stability, even though its frequency is low. The clear correlation between the proposed factor and stability of medical concept embeddings allows to predict the medical concepts with low-quality embeddings even prior to training.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/25/2018

Factors Influencing the Surprising Instability of Word Embeddings

Despite the recent popularity of word embedding methods, there is only a...
research
06/06/2018

Medical Concept Embedding with Time-Aware Attention

Embeddings of medical concepts such as medication, procedure and diagnos...
research
11/05/2020

CODER: Knowledge infused cross-lingual medical term embedding for term normalization

We propose a novel medical term embedding method named CODER, which stan...
research
04/14/2020

Multi-Ontology Refined Embeddings (MORE): A Hybrid Multi-Ontology and Corpus-based Semantic Representation for Biomedical Concepts

Objective: Currently, a major limitation for natural language processing...
research
10/11/2019

Finding Interpretable Concept Spaces in Node Embeddings using Knowledge Bases

In this paper we propose and study the novel problem of explaining node ...
research
06/07/2020

Medical Concept Normalization in User Generated Texts by Learning Target Concept Embeddings

Medical concept normalization helps in discovering standard concepts in ...
research
03/24/2020

Can Embeddings Adequately Represent Medical Terminology? New Large-Scale Medical Term Similarity Datasets Have the Answer!

A large number of embeddings trained on medical data have emerged, but i...

Please sign up or login with your details

Forgot password? Click here to reset