A pre-training technique to localize medical BERT and enhance BioBERT

05/14/2020
by   Shoya Wada, et al.
0

Bidirectional Encoder Representations from Transformers (BERT) models for biomedical specialties such as BioBERT and clinicalBERT have significantly improved in biomedical text-mining tasks and enabled us to extract valuable information from biomedical literature. However, we benefitted only in English because of the significant scarcity of high-quality medical documents, such as PubMed, in each language. Therefore, we propose a method that realizes a high-performance BERT model by using a small corpus. We introduce the method to train a BERT model on a small medical corpus both in English and Japanese, respectively, and then we evaluate each of them in terms of the biomedical language understanding evaluation (BLUE) benchmark and the medical-document-classification task in Japanese, respectively. After confirming their satisfactory performances, we apply our method to develop a model that outperforms the pre-existing models. Bidirectional Encoder Representations from Transformers for Biomedical Text Mining by Osaka University (ouBioBERT) achieves the best scores on 7 of the 10 datasets in terms of the BLUE benchmark. The total score is 1.0 points above that of BioBERT.

READ FULL TEXT
research
04/08/2022

RuBioRoBERTa: a pre-trained biomedical language model for Russian language biomedical text mining

This paper presents several BERT-based models for Russian language biome...
research
04/04/2023

San-BERT: Extractive Summarization for Sanskrit Documents using BERT and it's variants

In this work, we develop language models for the Sanskrit language, name...
research
05/02/2023

Cancer Hallmark Classification Using Bidirectional Encoder Representations From Transformers

This paper presents a novel approach to accurately classify the hallmark...
research
04/15/2020

lamBERT: Language and Action Learning Using Multimodal BERT

Recently, the bidirectional encoder representations from transformers (B...
research
06/12/2019

A Simple Text Mining Approach for Ranking Pairwise Associations in Biomedical Applications

We present a simple text mining method that is easy to implement, requir...
research
01/01/2020

Stacked DeBERT: All Attention in Incomplete Data for Text Classification

In this paper, we propose Stacked DeBERT, short for Stacked Denoising Bi...
research
04/19/2021

Operationalizing a National Digital Library: The Case for a Norwegian Transformer Model

In this work, we show the process of building a large-scale training set...

Please sign up or login with your details

Forgot password? Click here to reset