TF-IDF vs Word Embeddings for Morbidity Identification in Clinical Notes: An Initial Study

05/20/2021
by   Danilo Dessì, et al.
0

Today, we are seeing an ever-increasing number of clinical notes that contain clinical results, images, and textual descriptions of patient's health state. All these data can be analyzed and employed to cater novel services that can help people and domain experts with their common healthcare tasks. However, many technologies such as Deep Learning and tools like Word Embeddings have started to be investigated only recently, and many challenges remain open when it comes to healthcare domain applications. To address these challenges, we propose the use of Deep Learning and Word Embeddings for identifying sixteen morbidity types within textual descriptions of clinical records. For this purpose, we have used a Deep Learning model based on Bidirectional Long-Short Term Memory (LSTM) layers which can exploit state-of-the-art vector representations of data such as Word Embeddings. We have employed pre-trained Word Embeddings namely GloVe and Word2Vec, and our own Word Embeddings trained on the target domain. Furthermore, we have compared the performances of the deep learning approaches against the traditional tf-idf using Support Vector Machine and Multilayer perceptron (our baselines). From the obtained results it seems that the latter outperforms the combination of Deep Learning approaches using any word embeddings. Our preliminary results indicate that there are specific features that make the dataset biased in favour of traditional machine learning approaches.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/29/2017

Recurrent neural networks with specialized word embeddings for health-domain named-entity recognition

Background. Previous state-of-the-art systems on Drug Name Recognition (...
research
03/11/2020

A Precisely Xtreme-Multi Channel Hybrid Approach For Roman Urdu Sentiment Analysis

In order to accelerate the performance of various Natural Language Proce...
research
04/11/2018

Exploiting Task-Oriented Resources to Learn Word Embeddings for Clinical Abbreviation Expansion

In the medical domain, identifying and expanding abbreviations in clinic...
research
11/19/2017

Intelligent Word Embeddings of Free-Text Radiology Reports

Radiology reports are a rich resource for advancing deep learning applic...
research
10/07/2020

Combining Deep Learning and String Kernels for the Localization of Swiss German Tweets

In this work, we introduce the methods proposed by the UnibucKernel team...

Please sign up or login with your details

Forgot password? Click here to reset