An Empirical Investigation Towards Efficient Multi-Domain Language Model Pre-training

10/01/2020
by   Kristjan Arumae, et al.
0

Pre-training large language models has become a standard in the natural language processing community. Such models are pre-trained on generic data (e.g. BookCorpus and English Wikipedia) and often fine-tuned on tasks in the same domain. However, in order to achieve state-of-the-art performance on out of domain tasks such as clinical named entity recognition and relation extraction, additional in domain pre-training is required. In practice, staged multi-domain pre-training presents performance deterioration in the form of catastrophic forgetting (CF) when evaluated on a generic benchmark such as GLUE. In this paper we conduct an empirical investigation into known methods to mitigate CF. We find that elastic weight consolidation provides best overall scores yielding only a 0.33 while remaining competitive in bio-medical tasks. Furthermore, we explore gradient and latent clustering based data selection techniques to improve coverage when using elastic weight consolidation and experience replay methods.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/22/2022

KALA: Knowledge-Augmented Language Model Adaptation

Pre-trained language models (PLMs) have achieved remarkable success on v...
research
12/16/2021

Learning Rich Representation of Keyphrases from Text

In this work, we explore how to learn task-specific language models aime...
research
04/21/2023

GeoLayoutLM: Geometric Pre-training for Visual Information Extraction

Visual information extraction (VIE) plays an important role in Document ...
research
03/24/2022

Multi-armed bandits for online optimization of language model pre-training: the use case of dynamic masking

Transformer-based language models (TLMs) provide state-of-the-art perfor...
research
07/14/2023

Do not Mask Randomly: Effective Domain-adaptive Pre-training by Masking In-domain Keywords

We propose a novel task-agnostic in-domain pre-training method that sits...
research
10/12/2022

MedJEx: A Medical Jargon Extraction Model with Wiki's Hyperlink Span and Contextualized Masked Language Model Score

This paper proposes a new natural language processing (NLP) application ...
research
04/13/2020

Robustly Pre-trained Neural Model for Direct Temporal Relation Extraction

Background: Identifying relationships between clinical events and tempor...

Please sign up or login with your details

Forgot password? Click here to reset