SkIn: Skimming-Intensive Long-Text Classification Using BERT for Medical Corpus

09/13/2022
by   Yufeng Zhao, et al.
0

BERT is a widely used pre-trained model in natural language processing. However, since BERT is quadratic to the text length, the BERT model is difficult to be used directly on the long-text corpus. In some fields, the collected text data may be quite long, such as in the health care field. Therefore, to apply the pre-trained language knowledge of BERT to long text, in this paper, imitating the skimming-intensive reading method used by humans when reading a long paragraph, the Skimming-Intensive Model (SkIn) is proposed. It can dynamically select the critical information in the text so that the sentence input into the BERT-Base model is significantly shortened, which can effectively save the cost of the classification algorithm. Experiments show that the SkIn method has achieved superior accuracy than the baselines on long-text classification datasets in the medical field, while its time and space requirements increase linearly with the text length, alleviating the time and space overflow problem of basic BERT on long-text data.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/21/2021

Ad Text Classification with Transformer-Based Natural Language Processing Methods

In this study, a natural language processing-based (NLP-based) method is...
research
12/30/2022

Distant Reading of the German Coalition Deal: Recognizing Policy Positions with BERT-based Text Classification

Automated text analysis has become a widely used tool in political scien...
research
04/16/2022

SimpleBERT: A Pre-trained Model That Learns to Generate Simple Words

Pre-trained models are widely used in the tasks of natural language proc...
research
10/08/2022

KG-MTT-BERT: Knowledge Graph Enhanced BERT for Multi-Type Medical Text Classification

Medical text learning has recently emerged as a promising area to improv...
research
07/05/2022

Betti numbers of attention graphs is all you really need

We apply methods of topological analysis to the attention graphs, calcul...
research
04/04/2023

Multidimensional Perceptron for Efficient and Explainable Long Text Classification

Because of the inevitable cost and complexity of transformer and pre-tra...
research
06/08/2022

Abstraction not Memory: BERT and the English Article System

Article prediction is a task that has long defied accurate linguistic de...

Please sign up or login with your details

Forgot password? Click here to reset