Text2Node: a Cross-Domain System for Mapping Arbitrary Phrases to a Taxonomy

04/11/2019
by   Rohollah Soltani, et al.
0

Electronic health record (EHR) systems are used extensively throughout the healthcare domain. However, data interchangeability between EHR systems is limited due to the use of different coding standards across systems. Existing methods of mapping coding standards based on manual human experts mapping, dictionary mapping, symbolic NLP and classification are unscalable and cannot accommodate large scale EHR datasets. In this work, we present Text2Node, a cross-domain mapping system capable of mapping medical phrases to concepts in a large taxonomy (such as SNOMED CT). The system is designed to generalize from a limited set of training samples and map phrases to elements of the taxonomy that are not covered by training data. As a result, our system is scalable, robust to wording variants between coding systems and can output highly relevant concepts when no exact concept exists in the target taxonomy. Text2Node operates in three main stages: first, the lexicon is mapped to word embeddings; second, the taxonomy is vectorized using node embeddings; and finally, the mapping function is trained to connect the two embedding spaces. We compared multiple algorithms and architectures for each stage of the training, including GloVe and FastText word embeddings, CNN and Bi-LSTM mapping functions, and node2vec for node embeddings. We confirmed the robustness and generalisation properties of Text2Node by mapping ICD-9-CM Diagnosis phrases to SNOMED CT and by zero-shot training at comparable accuracy. This system is a novel methodological contribution to the task of normalizing and linking phrases to a taxonomy, advancing data interchangeability in healthcare. When applied, the system can use electronic health records to generate an embedding that incorporates taxonomical medical knowledge to improve clinical predictive models.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/19/2019

Snomed2Vec: Random Walk and Poincaré Embeddings of a Clinical Knowledge Base for Healthcare Analytics

Representation learning methods that transform encoded data (e.g., diagn...
research
03/26/2018

Deep Representation for Patient Visits from Electronic Health Records

We show how to learn low-dimensional representations (embeddings) of pat...
research
06/25/2018

Mapping Unparalleled Clinical Professional and Consumer Languages with Embedding Alignment

Mapping and translating professional but arcane clinical jargons to cons...
research
07/18/2017

Visually Aligned Word Embeddings for Improving Zero-shot Learning

Zero-shot learning (ZSL) highly depends on a good semantic embedding to ...
research
06/21/2021

Patient Embeddings in Healthcare and Insurance Applications

The paper researches the problem of concept and patient representations ...
research
12/03/2013

A semi-automatic semantic method for mapping SNOMED CT concepts to VCM Icons

VCM (Visualization of Concept in Medicine) is an iconic language for rep...
research
05/18/2023

Taxonomy Completion with Probabilistic Scorer via Box Embedding

Taxonomy completion, a task aimed at automatically enriching an existing...

Please sign up or login with your details

Forgot password? Click here to reset