CBEAF-Adapting: Enhanced Continual Pretraining for Building Chinese Biomedical Language Model

11/21/2022
by   Yongyu Yan, et al.
0

Continual pretraining is a standard way of building a domain-specific pretrained language model from a general-domain language model. However, sequential task training may cause catastrophic forgetting, which affects the model performance in downstream tasks. In this paper, we propose a continual pretraining method for the BERT-based model, named CBEAF-Adapting (Chinese Biomedical Enhanced Attention-FFN Adapting). Its main idea is to introduce a small number of attention heads and hidden units inside each self-attention layer and feed-forward network. Using the Chinese biomedical domain as a running example, we trained a domain-specific language model named CBEAF-RoBERTa. We conduct experiments by applying models to downstream tasks. The results demonstrate that with only about 3 our method could achieve about 0.5 the best performing model in baseline and the domain-specific model, PCL-MedBERT, respectively. We also examine the forgetting problem of different pretraining methods. Our method alleviates the problem by about 13 fine-tuning.

READ FULL TEXT
research
04/05/2020

Improved Pretraining for Domain-specific Contextual Embedding Models

We investigate methods to mitigate catastrophic forgetting during domain...
research
10/16/2021

Lifelong Pretraining: Continually Adapting Language Models to Emerging Corpora

Pretrained language models (PTLMs) are typically learned over a large, s...
research
09/14/2021

MDAPT: Multilingual Domain Adaptive Pretraining in a Single Model

Domain adaptive pretraining, i.e. the continued unsupervised pretraining...
research
05/17/2023

DoReMi: Optimizing Data Mixtures Speeds Up Language Model Pretraining

The mixture proportions of pretraining data domains (e.g., Wikipedia, bo...
research
10/26/2021

AVocaDo: Strategy for Adapting Vocabulary to Downstream Domain

During the fine-tuning phase of transfer learning, the pretrained vocabu...
research
05/20/2023

Lifelong Language Pretraining with Distribution-Specialized Experts

Pretraining on a large-scale corpus has become a standard method to buil...
research
09/20/2022

Generalizing through Forgetting – Domain Generalization for Symptom Event Extraction in Clinical Notes

Symptom information is primarily documented in free-text clinical notes ...

Please sign up or login with your details

Forgot password? Click here to reset