Benchmarking for Biomedical Natural Language Processing Tasks with a Domain Specific ALBERT

07/09/2021
by   Usman Naseem, et al.
12

The availability of biomedical text data and advances in natural language processing (NLP) have made new applications in biomedical NLP possible. Language models trained or fine tuned using domain specific corpora can outperform general models, but work to date in biomedical NLP has been limited in terms of corpora and tasks. We present BioALBERT, a domain-specific adaptation of A Lite Bidirectional Encoder Representations from Transformers (ALBERT), trained on biomedical (PubMed and PubMed Central) and clinical (MIMIC-III) corpora and fine tuned for 6 different tasks across 20 benchmark datasets. Experiments show that BioALBERT outperforms the state of the art on named entity recognition (+11.09 (+0.80 classification (+0.62 It represents a new state of the art in 17 out of 20 benchmark datasets. By making BioALBERT models and data available, our aim is to help the biomedical NLP community avoid computational costs of training and establish a new set of baselines for future efforts across a broad range of biomedical NLP tasks.

READ FULL TEXT

page 1

page 3

page 8

research
04/19/2021

ELECTRAMed: a new pre-trained language representation model for biomedical NLP

The overwhelming amount of biomedical scientific texts calls for the dev...
research
07/31/2020

Domain-Specific Language Model Pretraining for Biomedical Natural Language Processing

Pretraining large neural language models, such as BERT, has led to impre...
research
09/19/2020

BioALBERT: A Simple and Effective Pre-trained Language Model for Biomedical Named Entity Recognition

In recent years, with the growing amount of biomedical documents, couple...
research
10/12/2022

MedJEx: A Medical Jargon Extraction Model with Wiki's Hyperlink Span and Contextualized Masked Language Model Score

This paper proposes a new natural language processing (NLP) application ...
research
05/26/2023

Improving accuracy of GPT-3/4 results on biomedical data using a retrieval-augmented language model

Large language models (LLMs) have made significant advancements in natur...
research
11/16/2021

Improving the robustness and accuracy of biomedical language models through adversarial training

Deep transformer neural network models have improved the predictive accu...
research
04/05/2023

Evaluation of ChatGPT Family of Models for Biomedical Reasoning and Classification

Recent advances in large language models (LLMs) have shown impressive ab...

Please sign up or login with your details

Forgot password? Click here to reset