MeDAL: Medical Abbreviation Disambiguation Dataset for Natural Language Understanding Pretraining

12/27/2020
by   Zhi Wen, et al.
0

One of the biggest challenges that prohibit the use of many current NLP methods in clinical settings is the availability of public datasets. In this work, we present MeDAL, a large medical text dataset curated for abbreviation disambiguation, designed for natural language understanding pre-training in the medical domain. We pre-trained several models of common architectures on this dataset and empirically showed that such pre-training leads to improved performance and convergence speed when fine-tuning on downstream medical tasks.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset