Clinical-Longformer and Clinical-BigBird: Transformers for long clinical sequences

01/27/2022
by   Yikuan Li, et al.
0

Transformers-based models, such as BERT, have dramatically improved the performance for various natural language processing tasks. The clinical knowledge enriched model, namely ClinicalBERT, also achieved state-of-the-art results when performed on clinical named entity recognition and natural language inference tasks. One of the core limitations of these transformers is the substantial memory consumption due to their full self-attention mechanism. To overcome this, long sequence transformer models, e.g. Longformer and BigBird, were proposed with the idea of sparse attention mechanism to reduce the memory usage from quadratic to the sequence length to a linear scale. These models extended the maximum input sequence length from 512 to 4096, which enhanced the ability of modeling long-term dependency and consequently achieved optimal results in a variety of tasks. Inspired by the success of these long sequence transformer models, we introduce two domain enriched language models, namely Clinical-Longformer and Clinical-BigBird, which are pre-trained from large-scale clinical corpora. We evaluate both pre-trained models using 10 baseline tasks including named entity recognition, question answering, and document classification tasks. The results demonstrate that Clinical-Longformer and Clinical-BigBird consistently and significantly outperform ClinicalBERT as well as other short-sequence transformers in all downstream tasks. We have made our source code available at [https://github.com/luoyuanlab/Clinical-Longformer] the pre-trained models available for public download at: [https://huggingface.co/yikuan8/Clinical-Longformer].

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/27/2023

A Comparative Study of Pretrained Language Models for Long Clinical Text

Objective: Clinical knowledge enriched transformer models (e.g., Clinica...
research
03/22/2023

A Small-Scale Switch Transformer and NLP-based Model for Clinical Narratives Classification

In recent years, Transformer-based models such as the Switch Transformer...
research
04/10/2021

Not All Attention Is All You Need

Self-attention based models have achieved remarkable success in natural ...
research
10/11/2022

An Exploration of Hierarchical Attention Transformers for Efficient Long Document Classification

Non-hierarchical sparse attention Transformer-based models, such as Long...
research
01/30/2020

Data Mining in Clinical Trial Text: Transformers for Classification and Question Answering Tasks

This research on data extraction methods applies recent advances in natu...
research
05/27/2022

Understanding Long Programming Languages with Structure-Aware Sparse Attention

Programming-based Pre-trained Language Models (PPLMs) such as CodeBERT h...
research
07/28/2020

Big Bird: Transformers for Longer Sequences

Transformers-based models, such as BERT, have been one of the most succe...

Please sign up or login with your details

Forgot password? Click here to reset