Sectioning of Biomedical Abstracts: A Sequence of Sequence Classification Task

01/18/2022
by   Mehmet Efruz Karabulut, et al.
0

Rapid growth of the biomedical literature has led to many advances in the biomedical text mining field. Among the vast amount of information, biomedical article abstracts are the easily accessible sources. However, the number of the structured abstracts, describing the rhetorical sections with one of Background, Objective, Method, Result and Conclusion categories is still not considerable. Exploration of valuable information in the biomedical abstracts can be expedited with the improvements in the sequential sentence classification task. Deep learning based models has great performance/potential in achieving significant results in this task. However, they can often be overly complex and overfit to specific data. In this project, we study a state-of-the-art deep learning model, which we called SSN-4 model here. We investigate different components of the SSN-4 model to study the trade-off between the performance and complexity. We explore how well this model generalizes to a new data set beyond Randomized Controlled Trials (RCT) dataset. We address the question that whether word embeddings can be adjusted to the task to improve the performance. Furthermore, we develop a second model that addresses the confusion pairs in the first model. Results show that SSN-4 model does not appear to generalize well beyond RCT dataset.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/17/2017

PubMed 200k RCT: a Dataset for Sequential Sentence Classification in Medical Abstracts

We present PubMed 200k RCT, a new dataset based on PubMed for sequential...
research
01/25/2019

BioBERT: pre-trained biomedical language representation model for biomedical text mining

Biomedical text mining has become more important than ever as the number...
research
06/07/2023

Good Data, Large Data, or No Data? Comparing Three Approaches in Developing Research Aspect Classifiers for Biomedical Papers

The rapid growth of scientific publications, particularly during the COV...
research
05/15/2023

Comparing Variation in Tokenizer Outputs Using a Series of Problematic and Challenging Biomedical Sentences

Background Objective: Biomedical text data are increasingly availabl...
research
10/21/2022

A Dataset for Plain Language Adaptation of Biomedical Abstracts

Though exponentially growing health-related literature has been made ava...
research
10/03/2019

Towards Understanding of Medical Randomized Controlled Trials by Conclusion Generation

Randomized controlled trials (RCTs) represent the paramount evidence of ...
research
09/19/2022

LED down the rabbit hole: exploring the potential of global attention for biomedical multi-document summarisation

In this paper we report on our submission to the Multidocument Summarisa...

Please sign up or login with your details

Forgot password? Click here to reset