Segmenting Scientific Abstracts into Discourse Categories: A Deep Learning-Based Approach for Sparse Labeled Data

05/11/2020
by   Soumya Banerjee, et al.
0

The abstract of a scientific paper distills the contents of the paper into a short paragraph. In the biomedical literature, it is customary to structure an abstract into discourse categories like BACKGROUND, OBJECTIVE, METHOD, RESULT, and CONCLUSION, but this segmentation is uncommon in other fields like computer science. Explicit categories could be helpful for more granular, that is, discourse-level search and recommendation. The sparsity of labeled data makes it challenging to construct supervised machine learning solutions for automatic discourse-level segmentation of abstracts in non-bio domains. In this paper, we address this problem using transfer learning. In particular, we define three discourse categories BACKGROUND, TECHNIQUE, OBSERVATION-for an abstract because these three categories are the most common. We train a deep neural network on structured abstracts from PubMed, then fine-tune it on a small hand-labeled corpus of computer science papers. We observe an accuracy of 75 corpus. We perform an ablation study to highlight the roles of the different parts of the model. Our method appears to be a promising solution to the automatic segmentation of abstracts, where the labeled data is sparse.

READ FULL TEXT
research
08/30/2018

Chinese Discourse Segmentation Using Bilingual Discourse Commonality

Discourse segmentation aims to segment Elementary Discourse Units (EDUs)...
research
09/02/2019

Minimally Supervised Learning of Affective Events Using Discourse Relations

Recognizing affective events that trigger positive or negative sentiment...
research
01/01/2021

Unifying Discourse Resources with Dependency Framework

For text-level discourse analysis, there are various discourse schemes b...
research
05/19/2023

Unsupervised Scientific Abstract Segmentation with Normalized Mutual Information

The abstracts of scientific papers consist of premises and conclusions. ...
research
08/28/2018

Toward Fast and Accurate Neural Discourse Segmentation

Discourse segmentation, which segments texts into Elementary Discourse U...
research
09/10/2019

Discourse Tagging for Scientific Evidence Extraction

The biomedical scientific literature comprises a crucial, sometimes life...
research
08/01/2018

Structured Differential Learning for Automatic Threshold Setting

We introduce a technique that can automatically tune the parameters of a...

Please sign up or login with your details

Forgot password? Click here to reset