Generation of Highlights from Research Papers Using Pointer-Generator Networks and SciBERT Embeddings

by   Tohida Rehman, et al.

Nowadays many research articles are prefaced with research highlights to summarize the main findings of the paper. Highlights not only help researchers precisely and quickly identify the contributions of a paper, they also enhance the discoverability of the article via search engines. We aim to automatically construct research highlights given certain segments of the research paper. We use a pointer-generator network with coverage mechanism and a contextual embedding layer at the input that encodes the input tokens into SciBERT embeddings. We test our model on a benchmark dataset, CSPubSum and also present MixSub, a new multi-disciplinary corpus of papers for automatic research highlight generation. For both CSPubSum and MixSub, we have observed that the proposed model achieves the best performance compared to related variants and other models proposed in the literature. On the CSPubSum data set, our model achieves the best performance when the input is only the abstract of a paper as opposed to other segments of the paper. It produces ROUGE-1, ROUGE-2 and ROUGE-L F1-scores of 38.26, 14.26 and 35.51, respectively, METEOR F1-score of 32.62, and BERTScore F1 of 86.65 which outperform all other baselines. On the new MixSub data set, where only the abstract is the input, our proposed model (when trained on the whole training corpus without distinguishing between the subject categories) achieves ROUGE-1, ROUGE-2 and ROUGE-L F1-scores of 31.78, 9.76 and 29.3, respectively, METEOR F1-score of 24.00, and BERTScore F1 of 85.25, outperforming other models.


page 1

page 2

page 3

page 4


Self-Attentive Model for Headline Generation

Headline generation is a special type of text summarization task. While ...

BioNerFlair: biomedical named entity recognition using flair embedding and sequence tagger

Motivation: The proliferation of Biomedical research articles has made t...

Coreference Resolution in Research Papers from Multiple Domains

Coreference resolution is essential for automatic text understanding to ...

CRNN: A Joint Neural Network for Redundancy Detection

This paper proposes a novel framework for detecting redundancy in superv...

An unsupervised and customizable misspelling generator for mining noisy health-related text sources

In this paper, we present a customizable datacentric system that automat...

Large Scale Subject Category Classification of Scholarly Papers with Deep Attentive Neural Networks

Subject categories of scholarly papers generally refer to the knowledge ...

Collecting Indicators of Compromise from Unstructured Text of Cybersecurity Articles using Neural-Based Sequence Labelling

Indicators of Compromise (IOCs) are artifacts observed on a network or i...

Please sign up or login with your details

Forgot password? Click here to reset