SEAL: Scientific Keyphrase Extraction and Classification

by   Ayush Garg, et al.
IIT Gandhinagar

Automatic scientific keyphrase extraction is a challenging problem facilitating several downstream scholarly tasks like search, recommendation, and ranking. In this paper, we introduce SEAL, a scholarly tool for automatic keyphrase extraction and classification. The keyphrase extraction module comprises two-stage neural architecture composed of Bidirectional Long Short-Term Memory cells augmented with Conditional Random Fields. The classification module comprises of a Random Forest classifier. We extensively experiment to showcase the robustness of the system. We evaluate multiple state-of-the-art baselines and show a significant improvement. The current system is hosted at


page 1

page 2


De-identification of medical records using conditional random fields and long short-term memory networks

The CEGS N-GRID 2016 Shared Task 1 in Clinical Natural Language Processi...

Detecting Multiple Speech Disfluencies using a Deep Residual Network with Bidirectional Long Short-Term Memory

Stuttering is a speech impediment affecting tens of millions of people o...

LSTM stack-based Neural Multi-sequence Alignment TeCHnique (NeuMATCH)

The alignment of heterogeneous sequential data (video to text) is an imp...

Experiment Segmentation in Scientific Discourse as Clause-level Structured Prediction using Recurrent Neural Networks

We propose a deep learning model for identifying structure within experi...

Bench-Marking Information Extraction in Semi-Structured Historical Handwritten Records

In this report, we present our findings from benchmarking experiments fo...

Reference String Extraction Using Line-Based Conditional Random Fields

The extraction of individual reference strings from the reference sectio...

1. Introduction

With the ever-growing scientific volume, scholarly search and recommendation engines are gradually adopting artificial intelligence frameworks for better document retrieval. A well-known annotation task is to identify topical keyphrases for facilitating topical search. Further, the identified keyphrases can be classified into several semantic categories for facilitating knowledge graph construction. However, due to the large volume of research, current manual annotation schemes are financially infeasible owing to the requirement of continuous human resources and domain expertise.

In this paper, we introduce SEAL that aims to automate keyphrase extraction and further classification into three semantic categories: (i) tasks, (ii) processes, or (iii) materials. Tasks represent research problems like extraction, processing, parsing, etc. Processes represent solutions to problems, including physical equipment, algorithms, methods/techniques, and tools. Materials include physical material such as chemical compounds and datasets. We showcase that SEAL outperforms several state-of-the-art tools on a recently published dataset of 500 scientific publications in the field of Computer Science, Material Sciences, and Physics (Augenstein et al., 2017).

2. Seal Architecture

(a) (b)
Figure 1. (a) Flow chart of the SEAL architecture. (b) Snapshot of SEAL demo webpage with the output of abstract of this paper. Processes, materials and tasks are marked in green, orange and blue respectively.

SEAL comprises two distinct neural modules, one for keyphrase extraction and other for keyphrase classification. We use standard ‘Beginning, Inside and Last tokens of multi-token chunks, Unit-length chunks and Outside’ (BILOU) labeling scheme (Ammar et al., 2017) in both of the modules. Figure 1(a) presents the SEAL architecture.

2.1. Keyphrase Extraction Module

This module leverages pre-trained token-level 9216-dimensional SciBERT embeddings (Beltagy et al., 2019)111SciBERT results in significantly better scores than several other embeddings such as Levi and Goldberg dependency-based embeddings (Levy and Goldberg, 2014) and GLOVE (Pennington et al., 2014).

to train three layers (96,48, and 24 hidden units in respective layers) of Bidirectional Long Short-Term Memory (BiLSTM) cells stacked on top of each other. The output, a 24-dimensional vector, is then downsized to a five-dimensional vector using a linear layer and then fed to a Conditional Random Field (CRF) layer, which then predicts the label of the token. The results are, further, refined through a post-processing step to handle single-token keyphrases.

2.2. Keyphrase Classification Module

This module uses pre-trained token-level Levy Embeddings (Levy and Goldberg, 2014)222Levy embeddings results in significantly better scores than several other embeddings.

. For each token, we also consider the immediate neighboring tokens as context. We pass the concatenated vector of the embedding of the current token, the previous token, and the next token to the standard Random Forest (RF) classifier. Candidate tokens without next or previous tokens are appropriately padded with the embedding corresponding to the

¡UNKNOWN¿ tag.

We post-process the abbreviations and chemical formulae separately due to inefficiencies associated with their classification. In the case of abbreviations, we match abbreviations with the full-form using their first occurrence and assigned them the respective class of the full-form. In the case of chemical formulae (such as NaCl, Mg), we match the corresponding formulae name token using a regular expression.

3. Experimental Results

We experiment on ScienceIE dataset333 (Augenstein et al., 2017) containing 500 scientific abstracts curated from Science Direct open access publications. Each abstract is manually labeled with keyphrase boundaries and their respective classes. For experimentation, we verbatim follow the guidelines specified in the ScienceIE competition. The dataset is partitioned into train, development, and test sets containing 350, 50, and 100 abstracts, respectively. We next, showcase that SEAL outperformed the top-ranked implementations on the ScienceIE leaderboard against the standard F1-score metric.

Keyphrase Extraction: As described in previous section, we experiment with several embedding schemes. Table 1 shows extraction accuracy of SciBERT outperformed other embedding schemes. Table 1 also compares F1-scores of SEAL against the ScienceIE official leaderboard top-rankers (Augenstein et al., 2017). Furthermore, unlike TIAL_UW (rank 1 in the leaderboard ) and s2_end2end (Ammar et al., 2017) (rank 2 in the leaderboard), SEAL does not use external data sources.

Keyphrase Classification Table 2 compares F1-scores of SEAL against the ScienceIE official leaderboard (Augenstein et al., 2017). Note that, classification module leverages Levy embeddings.

Model F1 score
Glove word embeddings (Pennington et al., 2014) 0.440
Levi and Goldberg embeddings (Levy and Goldberg, 2014) 0.470
SciBERT embeddings (Beltagy et al., 2019) 0.564
SEAL 0.564
TIAL_UW 0.560
s2_end2end (Ammar et al., 2017) 0.550
Table 1. Performance of SEAL at different input embeddings and against state-of-the-art extraction systems described in ScienceIE leaderboard (Augenstein et al., 2017).
Model F1-score
SEAL 0.74
MayoNLP (Liu et al., 2017) 0.67
UKP/EELECTION (Eger et al., 2017) 0.66
Table 2. Performance of SEAL against top-ranked classification systems described in ScienceIE leaderboard (Augenstein et al., 2017).

4. System Description


web-application is developed using the Flask framework. The current implementation uses Pytorch Framework for the extraction and classification modules. The trained models and the demo are hosted at our research group server

444 Figure 1(b) presents snapshot of the demo. On encountering a POST request, the framework first executes the extraction module, followed by the classification module, and displays the result. The code, processed dataset, and the system implementation details are available at

5. Conclusion and future proposals

In this paper, we propose a toolkit SEAL for scientific keyphrase extraction as well as classification. We showcase that SEAL performed similar to state-of-the-art extraction systems that leverage the large volume of external knowledge. In the future, we plan to experiment with domain-specific embeddings and semi-supervised bootstrapping techniques.