Evaluating Sparse Interpretable Word Embeddings for Biomedical Domain

05/11/2020
by   Mohammad Amin Samadi, et al.
0

Word embeddings have found their way into a wide range of natural language processing tasks including those in the biomedical domain. While these vector representations successfully capture semantic and syntactic word relations, hidden patterns and trends in the data, they fail to offer interpretability. Interpretability is a key means to justification which is an integral part when it comes to biomedical applications. We present an inclusive study on interpretability of word embeddings in the medical domain, focusing on the role of sparse methods. Qualitative and quantitative measurements and metrics for interpretability of word vector representations are provided. For the quantitative evaluation, we introduce an extensive categorized dataset that can be used to quantify interpretability based on category theory. Intrinsic and extrinsic evaluation of the studied methods are also presented. As for the latter, we propose datasets which can be utilized for effective extrinsic evaluation of word vectors in the biomedical domain. Based on our experiments, it is seen that sparse word vectors show far more interpretability while preserving the performance of their original vectors in downstream tasks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/01/2018

A Comparison of Word Embeddings for the Biomedical Natural Language Processing

Neural word embeddings have been widely used in biomedical Natural Langu...
research
09/23/2018

Learning and Evaluating Sparse Interpretable Sentence Embeddings

Previous research on word embeddings has shown that sparse representatio...
research
11/23/2017

SPINE: SParse Interpretable Neural Embeddings

Prediction without justification has limited utility. Much of the succes...
research
06/21/2016

Correlation-based Intrinsic Evaluation of Word Vector Representations

We introduce QVEC-CCA--an intrinsic evaluation metric for word vector re...
research
11/24/2019

Causally Denoise Word Embeddings Using Half-Sibling Regression

Distributional representations of words, also known as word vectors, hav...
research
06/07/2017

Insights into Analogy Completion from the Biomedical Domain

Analogy completion has been a popular task in recent years for evaluatin...
research
04/18/2021

Group-Sparse Matrix Factorization for Transfer Learning of Word Embeddings

Sparse regression has recently been applied to enable transfer learning ...

Please sign up or login with your details

Forgot password? Click here to reset