Log In Sign Up

Towards User Friendly Medication Mapping Using Entity-Boosted Two-Tower Neural Network

by   Shaoqing Yuan, et al.

Recent advancements in medical entity linking have been applied in the area of scientific literature and social media data. However, with the adoption of telemedicine and conversational agents such as Alexa in healthcare settings, medical name inference has become an important task. Medication name inference is the task of mapping user friendly medication names from a free-form text to a concept in a normalized medication list. This is challenging due to the differences in the use of medical terminology from health care professionals and user conversations coming from the lay public. We begin with mapping descriptive medication phrases (DMP) to standard medication names (SMN). Given the prescriptions of each patient, we want to provide them with the flexibility of referring to the medication in their preferred ways. We approach this as a ranking problem which maps SMN to DMP by ordering the list of medications in the patient's prescription list obtained from pharmacies. Furthermore, we leveraged the output of intermediate layers and performed medication clustering. We present the Medication Inference Model (MIM) achieving state-of-the-art results. By incorporating medical entities based attention, we have obtained further improvement for ranking models.


Deep Neural Models for Medical Concept Normalization in User-Generated Texts

In this work, we consider the medical concept normalization problem, i.e...

Sequence Learning with RNNs for Medical Concept Normalization in User-Generated Texts

In this work, we consider the medical concept normalization problem, i.e...

MedType: Improving Medical Entity Linking with Semantic Type Prediction

Medical entity linking is the task of identifying and standardizing conc...

NSEEN: Neural Semantic Embedding for Entity Normalization

Much of human knowledge is encoded in the text, such as scientific publi...

Patient Risk Assessment and Warning Symptom Detection Using Deep Attention-Based Neural Networks

We present an operational component of a real-world patient triage syste...

Recovering Patient Journeys: A Corpus of Biomedical Entities and Relations on Twitter (BEAR)

Text mining and information extraction for the medical domain has focuse...

A Wikipedia-based approach to profiling activities on social media

Online user profiling is a very active research field, catalyzing great ...

1 Introduction

Figure 1: Different ways of how users interact with conversational agents for medical queries

Medication names are extremely hard to pronounce for patients without a proper medical background. Thus, when interacting with Alexa on medication names, patients without this background may have many different ways to refer to a medication (e,g. Bumetanide can be referred to as Bumetanide (generic name), Bumex (brand name), high blood pressure pill (disease name)). On the other hand, patients with medical knowledge may use abbreviations or specialized ways to refer to medication names. For example, patients may use “immune meds” to refer to “mycophenolate mofetil hydrochloride” in their prescription list.

In this paper, we describe a new problem about finding the generic medication name (SMN: standardized medication name) based on a patient’s description (DMP: descriptive medication phrase) from a list of medications the patient is consuming. According to our internal user research, in the United States, patients with chronic diseases usually take around four to five medications daily. This problem is different from medical concept normalization (Zhu et al., 2019) which tries to map a health-related entity mention in a free-form text to a concept in a controlled vocabulary (Miftahutdinov and Tutubalina, 2019) which is a generic concept list rather than a patient specific prescription list and is generally much longer.

We structure this as a ranking problem. Here we rank all medications a patient is consuming based on the relationship with the patient’s description and the one ranked highest will be the inference result. We present a hard attention based entity boosted CNN architecture achieving 4% over earlier ranking methods.

Furthermore, the mapping between SMN and DMP contains the patient’s understanding of the medications, especially from the usage perspective of the medications. Using latent output from our model, we build a medication clustering system which groups together medications with similar effects and disease treatments. The output is designed to aid physicians to consider other medications as a substitution for decreasing cost as well as helping patients distinguish medications that are similar in their impression but should, in reality, be used in different conditions. Moreover, with clustering patients will have an intuitive understanding of the relationship of the medications they are consuming. Our contributions are as follows:

  • We present a medical entity boosted architecture, Medication Inference Model (MIM) achieving a improvement over strong BERT baselines.

  • We benchmark against state-of-the-art ranking architectures, demonstrating robustness of our work.

  • We present medication clustering results which group together medications with similar effects and treat the similar diseases.

2 Task Definition

Each example is represented as a tuple , where is a DMP, with a length , is a SMN, with a length , and is the label representing the relationship between and . and have the same length. if is the generic medication name that is referring to, otherwise.

= {(high), (blood), (pressure)}
= = {(morphine), (suppository)}
= = {(hydrochlorothiazide)}
= = {… }

It is possible that among , more than one medications may be referred to by . Thus, where is the number of medications in that could be referred by

. Ideally, we should make it possible that for the estimated

, . In this paper, however, we assume .

The clustering task is defined as grouping medications across the prescriptions of different patients. i.e., we assign each medication in to a group according to the DMP associated with them.

Figure 2: Medication Inference Model structure

3 Method

Instead of comparing medications in each sample, we begin with two. Each sample consists of and the model must distinguish which SMN in the patient is referring to by . To simplify, we assume only one of could be referred by . In practice, when there are (where ) medications in a patient’s prescription we run the model on all the combinations of the medications and rank them accordingly.

3.1 Entity Boosted Two-Tower Neural Network

Motivated by the facial recognition problem, where the models evaluate the similarity between images of faces

(Chopra et al., 2005), we apply a Two-tower neural network to our problem. We regard the DMP as the query of each medicine and the SMNs as the medication candidates. The purpose is to match the correct SMNs to the DMP provided.

Descriptions can often be verbose and can contain a large amount of noise. To improve the robustness and reduce noise, we have incorporated medical entity based hard attention (Luong et al., 2015) using Amazon Web Services Comprehend Medical (CM) (Bhatia et al., 2019a)

which is a natural language processing service to perform entity and relation extraction.

For each instance of data, we use the generic name as the SMN in our model, and generate DMPs from free-form text data that describe the usage of each medication in patient friendly terms. To reduce the noise, we feed the description to CM to extract entities. CM is able to extract relevant medical information from unstructured text and classify extracted entities into five categories and 28 types. In this work we use the entities marked with types “dx name” (diagnostic indicator), “treatment name”, “system organ site”, “swap”, “generic name”, “procedure name”, “brand name”, “test name” as DMPs.

3.1.1 Medication Inference Model

Figure 2 outlines the Medication Inference Model (MIM) network. In this model, there are two different sets of unshared embedding weights, where one is used to embed SMNs and the other DMP. We use convolutional (CNN) layers (Kim, 2014)

followed by pooling on top of the embedding layer to get a vector representation of SMNs and DNP separately. We use cosine distance to measure the separation between the two SMNs and DMP.

3.1.2 BERT-based Model

It is natural to leverage models which prove to be successful in solving question-answering problems to process our task. Here, DMP are regarded as the patient’s query, with the SMNs as the answer candidates.

We concatenate the DMP output with different SMNs separately and combine them into a BERT (Devlin et al., 2018) based multi-choice model (Guo et al., 2019). The vector representations of the [CLS] tokens are used to represent the combinations of DMP with each SMN, and are fed into a fully connected layer. Finally we use hinge loss for ranking to compare the scale values with ground truth.

4 Experiments

Number of candidates
Baseline Entity-based Attention
Model 2 3 4 5 2 3 4 5
NormalBert 69.7 54.7 46.6 42.7 79.7 72.7 62.4 62.5
BioBert 73.4 59.8 51.9 46.4 81.5 74.8 66.2 66.4
ClinicalBert 73.5 57.8 51.9 47.4 79.5 74.4 64.8 62.7
ARC-1 65.2 46.2 37.5 33.4 76.8 66.8 58.6 59.4
ARC-2 64.0 48.1 39.1 36.5 75.7 65.7 62.1 57.0
ConvKNRM 65.4 52.0 42.6 38.8 76.7 66.6 60.7 57.3
MatchLSTM 64.9 50.8 40.9 35.4 82.4 70.9 62.6 58.3
MatchPyramid 59.5 45.0 36.5 33.5 74.0 58.0 58.0 54.1
MIM 73.9 57.9 48.9 49.7 87.7 80.8 78.9 76.9
Table 1: Synthetic test set: For each model we report top1 accuracy including and excluding entity-based attention.
Upper limit for number of candidates
Model 2 3 4 5 10
BioBert 83.10 73.50 67.00 61.8 52.70
MIM 83.50 74.50 69.50 63.60 53.80
Table 2: Real test set: For each model we report top1 accuracy.

4.1 Dataset and Evaluation Metrics

Synthetic Data Set The training and test dataset are generated from 2,683 medication descriptions from the FDB111 PEM (patient education module) dataset. FDB stands for First DataBank which is a known drug database and medical device database provider. Each PEM file contains a patient facing medication description including medication generic name, uses, warnings side effects etc.222CommonNames, Warning, Uses, HowToUse, SideEffects, Precautions, DrugInteractions, Overdose, Notes, MissedDose, Storage, MedicAlert.

The SMNs are collected from the generic name section and DMPs are generated from the ”USES” section of the PEM files using CM as described in Section 3.1.1

. To evaluate the effect of the entity extraction component, we generate another DMP set by randomly drawing n-grams (where

) from the ”USES” section of the PEM files as a replacement for CM.

Next, we use the SMNs and DMPs collected to generate our training and test sets. we generate each instance starting with a DMP according to following steps.

  1. For each DMP, we generate a positive SMNs set which consist of SMNs extracted from the same PEM file where the DMP is extracted from. It is possible that one DMP may have multiple positive SMNs if the DMP is a very general phrase. For example, the DMP “high blood pressure” may have multiple SMNs since many medications can be used to treat hypertension.

  2. For each DMP, we also generate a negative SMN set. The negative SMNs are all the medications covered by the PEM files excluding the positive SMNs identified above and should follow the constraint that the entities extracted from the ”USES” section of the negative SMNs’ PEM files should have no overlap with that of the DMPs.

  3. Each instance in the training and validation data set consists of positive SMN and negative SMNs randomly selected from the SMN sets described above. The label of each instance is indicating which SMN is positive.

For the training and validation splits, is set to 2 in step 3 above, which means there are two SMNs in each instance. The training and validation data set contains 680K instances and of them are used for training and for validation and testing.

For testing purposes, we generated four synthetic test sets with in step 3 set to separately to simulate the real situations where patients with chronic disease in the U.S. usually have four to five medications in their prescription list at a time.

Real Data Set The real data set is generated based on 251 prescriptions collected from the i2b2 data set 333 which contains the de-identified patient discharge summaries. Internal human annotators generate DMPs for each medication in the prescriptions. It is observed that in a real prescription, multiple medications may serve the same purpose and a general DMP could be used to refer to multiple medications in a prescription. In our current experiment, we assume the ground truth of each DMP is only the medication used to generate the DMP in a prescription. In this way, we will get the lower bound of the performance of the models. For testing purpose, we limited the number of medications in each prescription to be 10, 5, 4, 3, 2 respectively. For the test set with 10 as max number of medications, we go through all the 251 prescriptions and only select the prescriptions that has less than 10 medications into our test set. We randomly truncate the prescriptions in the 10 medication test set to 5, 4, 3, 2 medications as the other test sets. Further more, in order to evaluate the situations where one DMP may refer to multiple SMNs in a prescription, the annotators are currently working on labeling all the SMNs that a DMP could refer to in a prescription and if the model outputs one of the medications in the ground truth SMNs, the test sample will be marked as success in future experiments.

We report accuracy as the main evaluation metric, i.e., the correctness of selecting the positive SMN from

SMNs. When evaluating on the test data, the model goes through all pairwise combinations of the SMNs and ranks all the SMNs accordingly.

4.2 Experimental Details

For the CNN-based model, we test multiple word embedding models including 200-dimensional BioWordVec (Zhang et al., 2019; Chen et al., 2018) and 300-dimensional FastText word embeddings (Bojanowski et al., 2017)

trained with 3,466 articles from the Mayo Clinic. The two dimensional CNN layer consisted of 200 filters with window size 2, strip as 1 and no regularization. Batch size is set to 150 and we observed model convergence after six epochs. For the pre-trained language model, we leverage Clinical BERT

(Alsentzer et al., 2019), BioBERT (Lee et al., 2019), and original BERT models (Devlin et al., 2018). We used the default settings for all BERT models as provided by Devlin et al. (2018). Batch size is set to 32, learning rate is set to and dropout rate is set to . We observed the model converged after 10 epochs. We trained and evaluated all the models using a Tesla V100 GPU.

4.2.1 Baselines

When evaluating the performance of our model, we compare the medication name inference performance with baseline models listed below.

  • ARC-I (Hu et al., 2014)

    : ARC-I finds the representation of each sentence with CNN layers, and then compares the representation for the two sentences with a multi-layer perceptron (MLP).

  • ARC-II (Hu et al., 2014): ARC-II improves based on ARC-I by calculating the interaction features between sentences with CNN.

  • ConvKNRM (Dai et al., 2018): Conv-KNRM uses CNN to represent n-grams of various lengths and soft matches them in a unified embedding space. The n-gram soft matches are then utilized by the kernel pooling and a fully connected layer to generate the final ranking score.

  • MatchLSTM (Wang and Jiang, 2016): The matchLSTM sequentially aggregates the matching of the attention-weighted question to each token of the answer and uses the aggregated matching result to make a final prediction.

  • MatchPyramid (Pang et al., 2016): MatchPyramid generates a matching matrix which represents the similarity between mention and candidate and then apply CNN layers on top of the matrix followed by a MLP layer to calculate the similarity score.

(a) Diagnose (b) Symptom (c) Drug type
high blood pressure,
strokes, heart attacks
cough, coughing antibiotic
example SMN amlodipine promethazine chloramphenicol
nearby SMNs
perindopril, ramipril,
trandolapril, quinapril,
enalapril, isradipine,
lisinopril, sacubitril,
aliskiren, eplerenone
guaifenesin, expectorant,
acetaminophen, hydrocodone,
polymyxin b, gentamicin,
cefotetan, spiramycin,
gatifloxacin, piperacillin,
cephalexin, cefoxitin,
Table 3: Examples of the DMP/SMN match and clustering results

5 Results and Discussion

Table 1 provides the accuracy results for each model we experimented with on the synthetic test data set. Number of candidates represent test data sets with medications in each test instance, as described in Section 4.1. We report test results for each model with and without AWS Comprehend Medical as “Entity-based Attention” and “Baseline” columns.

Table 1 demonstrates the robustness of MIM. We observe that MIM and BERT based models outperform current state-of-the-art models such as ARC-I and MatchPyramid across a different number of candidates. Table 2 further compares the performance of two best performing models on real test set.

MIM outperforms BERT based models with a - improvement in accuracy. We believe the major reason for this is that MIM, by encoding SMN and DMP separately, is able to encode the representation in a more robust way in comparison to BERT based models which concatenate the representations together using a special separator token. We also observe performance variation of BERT models based on their pre-training. We found that domain specific pre-training helps, giving - improvement when compared to the baseline BERT.

We observe the entity-boosted description gives robust results across all the model settings achieving significant improvement in accuracy over non-entity based models. This alleviates the problem of noise in the lengthy descriptions.

Furthermore we see that our MIM model, with a relatively simpler CNN encoder as well as separate encoders for SMN and DMP, has the distinct advantage of generating inference results with low latency. This is ideal for real-time industrial settings. According to our experiments, the average latency for the MIM model for five medications is ms, while, compared against BERT at ms.

5.1 Medication Clustering Result

We apply k

-nearest neighbor (KNN) clustering based on the CNN max pooling output from the Two-tower neural network. The generic names of 2,683 medications are represented by vectors of 200 dimension. The number of clusters of KNN is determined by a Silhouette analysis

(Rousseeuw, 1987) with result given in Figure 3. The Silhouette analysis shows that clustering performs better when number of classes is 31.

Figure 3: Average Silhouette value for different number of clusters

Figure 4 shows the t-SNE visualization of the result where 2,683 medications are grouped into 31 clusters. The figure illustrates that medications with same effects, treating same disease or having similar drug types are mapped close to each other.

Figure 4: t-SNE visualization of the clusters

Table 3 shows three examples of the clustering result. The nearby SMNs are sampled from the same cluster which example SMNs belong to and ranked according to their distance to the given SMN. The nature of the problem enables and requires the inference model to group SMNs based on multiple dimensions. We list examples from three dimensions and it is very natural for users to refer to their medications by diagnosis, disease symptoms and drug type. For example, coughing is a common symptom for multiple diseases including the common cold, pulmonary diseases such as pneumonia, and even from seasonal allergies. In column (b), the model is able to cluster medications that could relieve cough symptom of different underlying causes, for example, promethazine and antihistamine are used to treat allergies whereas zanamivir is used to treat and prevent flu.

6 Related Work

Earlier work on medical concept normalization (Zhu et al., 2019)

relied on lexicon based string matching and dictionary lookup to map limited number of variations of text to a pre-defined medical vocabulary

(Aronson, 2001; Brennan and Aronson, 2003). (Leaman et al., 2013) introduced DNrom as the first pairwise learning ranking model that compares associations between mentions and entities of various disease. (Limsopatham and Collier, 2016; Lee et al., 2017)

then further leveraged deep learning models, convolutional neural network

(Limsopatham and Collier, 2016)

and recurrent neural network models

(Belousov et al., 2017) trained on large corpus of medical articles etc.. Currently, researchers enhanced the deep learning based model with different model structure to incorporate context information, better process out of vocabulary (OOV) words and take advantages of interaction features from different semantic levels (Luo et al., 2018; Miftahutdinov and Tutubalina, 2019; Niu et al., 2019).

With the success of deep learning, many neural network based models have been proposed for semantic matching, and document ranking. Models such as ARC-I (Hu et al., 2014) first compute the representation of the two sentences, and then compute their relevance. Semantic/text matching techniques fits well to solve the medical concept normalization problems when the number of candidates is limited. As listed in (Guo et al., 2019), recently researchers have focused on developing deep learning models to solve document retrieval, question answering, conversational response ranking, and paraphrase identification (Guo et al., 2019) problems and introduced state-of-the-art models such as ARC-I (Hu et al., 2014), ARC-II (Hu et al., 2014), ConvKNRM (Dai et al., 2018), MatchLSTM (Wang and Jiang, 2016), MatchPyramid (Pang et al., 2016), Bert(Devlin et al., 2018).

In recent years, natural language processing (NLP) techniques have demonstrated increasing effectiveness in clinical text mining (Bhatia et al., 2019a) (Bhatia et al., 2019b) . Electronic health record (EHR) narratives, e.g., discharge summaries and progress notes contain a wealth of medically relevant information such as diagnosis information and adverse drug events. Automatic extraction of such information and representation of clinical knowledge in standardized formats (Singh and Bhatia, 2019) could be employed for a variety of purposes such as clinical event surveillance, decision support (Jin et al., 2018), pharmacovigilance, and drug efficacy studies.

This paper describes a problem that is a combination of the medical concept normalization and semantic matching problem using medical entity based hard attention. The nature of the problem presented in this paper requires the solution be able to extract informations from short phrases with limited context information.

7 Conclusion and Future Work

In this paper, we introduce a new problem common in the development of medication voice interaction products. We evaluate the accuracy of different solutions and show that our entity boosted MIM outperform baseline models. The specialty of this problem is that the context information is very limited when compared against other NLP tasks and the short length of the phrases prevent us from leveraging other advanced techniques that rely on words relationship in a phrase. The evaluation result also show that the problem prefers simple model structure. Since the phrases structure is very simple, the quality of word embeddings is more important in this problem and keeping the embedding weight unchanged is important when the training data is not sufficient enough to enhance the relationship between words either due to the nature of the data or small sample sizes.

We also observe the discrepancy between synthetic collected datasets from real patients. For example, the combinations of the medicine on synthetic prescriptions may not be valid from a practitioner’s or patient’s perspective. We plan to further validate our model on real patient data to increase practicality. Finally on top of comparing and evaluating on two medications samples, we plan to experiment with more medications in each sample in training to closer mimic real world scenarios.


  • E. Alsentzer, J. Murphy, W. Boag, W. Weng, D. Jin, T. Naumann, and M. McDermott (2019) Publicly available clinical BERT embeddings. In Proceedings of the 2nd Clinical Natural Language Processing Workshop, Minneapolis, Minnesota, USA, pp. 72–78. External Links: Link, Document Cited by: §4.2.
  • A. R. Aronson (2001) Effective mapping of biomedical text to the umls metathesaurus: the metamap program.. In Proceedings of the AMIA Symposium, pp. 17. Cited by: §6.
  • M. Belousov, W. Dixon, and G. Nenadic (2017) Using an ensemble of generalised linear and deep learning models in the smm4h 2017 medical concept normalisation task. In Proceedings of the Second Workshop on Social Media Mining for Health Applications (SMM4H). Health Language Processing Laboratory, Cited by: §6.
  • P. Bhatia, B. Celikkaya, M. Khalilia, and S. Senthivel (2019a)

    Comprehend medical: a named entity recognition and relationship extraction web service

    In 2019 18th IEEE International Conference On Machine Learning And Applications (ICMLA), Vol. , pp. 1844–1851. Cited by: §3.1, §6.
  • P. Bhatia, K. Arumae, and E. B. Celikkaya (2019b)

    Dynamic transfer learning for named entity recognition

    In International Workshop on Health Intelligence, pp. 69–81. Cited by: §6.
  • P. Bojanowski, E. Grave, A. Joulin, and T. Mikolov (2017) Enriching word vectors with subword information. Transactions of the Association for Computational Linguistics 5, pp. 135–146. External Links: ISSN 2307-387X Cited by: §4.2.
  • P. F. Brennan and A. R. Aronson (2003) Towards linking patients and clinical information: detecting umls concepts in e-mail. Journal of biomedical informatics 36 (4-5), pp. 334–341. Cited by: §6.
  • Q. Chen, Y. Peng, and Z. Lu (2018) BioSentVec: creating sentence embeddings for biomedical texts. CoRR abs/1810.09302. External Links: Link, 1810.09302 Cited by: §4.2.
  • S. Chopra, R. Hadsell, and Y. LeCun (2005) Learning a similarity metric discriminatively, with application to face verification. In

    2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05)

    Vol. 1, pp. 539–546. Cited by: §3.1.
  • Z. Dai, C. Xiong, J. Callan, and Z. Liu (2018) Convolutional neural networks for soft-matching n-grams in ad-hoc search. In Proceedings of the eleventh ACM international conference on web search and data mining, pp. 126–134. Cited by: 3rd item, §6.
  • J. Devlin, M. Chang, K. Lee, and K. Toutanova (2018) Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805. Cited by: §3.1.2, §4.2, §6.
  • J. Guo, Y. Fan, X. Ji, and X. Cheng (2019) MatchZoo: a learning, practicing, and developing system for neural text matching. In Proceedings of the 42Nd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR’19, New York, NY, USA, pp. 1297–1300. External Links: ISBN 978-1-4503-6172-9, Link, Document Cited by: §3.1.2, §6.
  • B. Hu, Z. Lu, H. Li, and Q. Chen (2014) Convolutional neural network architectures for matching natural language sentences. In Advances in neural information processing systems, pp. 2042–2050. Cited by: 1st item, 2nd item, §6.
  • M. Jin, M. T. Bahadori, A. Colak, P. Bhatia, B. Celikkaya, R. Bhakta, S. Senthivel, M. Khalilia, D. Navarro, B. Zhang, et al. (2018) Improving hospital mortality prediction with medical named entities and multimodal learning. arXiv preprint arXiv:1811.12276. Cited by: §6.
  • Y. Kim (2014) Convolutional neural networks for sentence classification. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, pp. 1746–1751. External Links: Link, Document Cited by: §3.1.1.
  • P. Langley (2000) Crafting papers on machine learning. In Proceedings of the 17th International Conference on Machine Learning (ICML 2000), P. Langley (Ed.), Stanford, CA, pp. 1207–1216. Cited by: §7.
  • R. Leaman, R. Islamaj Doğan, and Z. Lu (2013) DNorm: disease name normalization with pairwise learning to rank. Bioinformatics 29 (22), pp. 2909–2917. Cited by: §6.
  • J. Lee, W. Yoon, S. Kim, D. Kim, S. Kim, C. H. So, and J. Kang (2019) BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics. External Links: ISSN 1367-4803, Document, Link Cited by: §4.2.
  • K. Lee, S. A. Hasan, O. Farri, A. Choudhary, and A. Agrawal (2017) Medical concept normalization for online user-generated texts. In 2017 IEEE International Conference on Healthcare Informatics (ICHI), pp. 462–469. Cited by: §6.
  • N. Limsopatham and N. Collier (2016) Normalising medical concepts in social media texts by learning semantic representation. Cited by: §6.
  • Y. Luo, G. Song, P. Li, and Z. Qi (2018) Multi-task medical concept normalization using multi-view convolutional neural network. In

    Thirty-Second AAAI Conference on Artificial Intelligence

    Cited by: §6.
  • M. Luong, H. Pham, and C. D. Manning (2015)

    Effective approaches to attention-based neural machine translation

    arXiv preprint arXiv:1508.04025. Cited by: §3.1.
  • Z. Miftahutdinov and E. Tutubalina (2019) Deep neural models for medical concept normalization in user-generated texts. arXiv preprint arXiv:1907.07972. Cited by: §1, §6.
  • J. Niu, Y. Yang, S. Zhang, Z. Sun, and W. Zhang (2019) Multi-task character-level attentional networks for medical concept normalization. Neural Processing Letters 49 (3), pp. 1239–1256. Cited by: §6.
  • L. Pang, Y. Lan, J. Guo, J. Xu, S. Wan, and X. Cheng (2016) Text matching as image recognition. In Thirtieth AAAI Conference on Artificial Intelligence, Cited by: 5th item, §6.
  • P. J. Rousseeuw (1987)

    Silhouettes: a graphical aid to the interpretation and validation of cluster analysis

    Journal of computational and applied mathematics 20, pp. 53–65. Cited by: §5.1.
  • G. Singh and P. Bhatia (2019) Relation extraction using explicit context conditioning. arXiv preprint arXiv:1902.09271. Cited by: §6.
  • S. Wang and J. Jiang (2016) Machine comprehension using match-lstm and answer pointer. arXiv preprint arXiv:1608.07905. Cited by: 4th item, §6.
  • Y. Zhang, Q. Chen, Z. Yang, H. Lin, and Z. Lu (2019) BioWordVec, improving biomedical word embeddings with subword information and mesh. Scientific data 6 (1), pp. 52. Cited by: §4.2.
  • M. Zhu, B. Celikkaya, P. Bhatia, and C. K. Reddy (2019) LATTE: latent type modeling for biomedical entity linking. arXiv preprint arXiv:1911.09787. Cited by: §1, §6.