Clinical Concept Embeddings Learned from Massive Sources of Medical Data

04/04/2018
by   Andrew L. Beam, et al.
0

Word embeddings have emerged as a popular approach to unsupervised learning of word relationships in machine learning and natural language processing. In this article, we benchmark two of the most popular algorithms, GloVe and word2vec, to assess their suitability for capturing medical relationships in large sources of biomedical data. Leaning on recent theoretical insights, we provide a unified view of these algorithms and demonstrate how different sources of data can be combined to construct the largest ever set of embeddings for 108,477 medical concepts using an insurance claims database of 60 million members, 20 million clinical notes, and 1.7 million full text biomedical journal articles. We evaluate our approach, called cui2vec, on a set of clinically relevant benchmarks and in many instances demonstrate state of the art performance relative to previous results. Finally, we provide a downloadable set of pre-trained embeddings for other researchers to use, as well as an online tool for interactive exploration of the cui2vec embeddings.

READ FULL TEXT

page 19

page 20

page 21

page 22

page 23

page 25

page 26

page 27

research
10/22/2018

BioSentVec: creating sentence embeddings for biomedical texts

Sentence embeddings have become an essential part of today's natural lan...
research
11/03/2018

Learning Contextual Hierarchical Structure of Medical Concepts with Poincairé Embeddings to Clarify Phenotypes

Biomedical association studies are increasingly done using clinical conc...
research
02/01/2018

A Comparison of Word Embeddings for the Biomedical Natural Language Processing

Neural word embeddings have been widely used in biomedical Natural Langu...
research
07/27/2020

EffiCare: Better Prognostic Models via Resource-Efficient Health Embeddings

Recent medical prognostic models adapted from high data-resource fields ...
research
07/11/2021

Document Embedding for Scientific Articles: Efficacy of Word Embeddings vs TFIDF

Over the last few years, neural network derived word embeddings became p...
research
12/06/2019

Med2Meta: Learning Representations of Medical Concepts with Meta-Embeddings

Distributed representations of medical concepts have been used to suppor...
research
03/24/2020

Can Embeddings Adequately Represent Medical Terminology? New Large-Scale Medical Term Similarity Datasets Have the Answer!

A large number of embeddings trained on medical data have emerged, but i...

Please sign up or login with your details

Forgot password? Click here to reset