Using Transformer based Ensemble Learning to classify Scientific Articles

02/19/2021
by   Sohom Ghosh, et al.
0

Many time reviewers fail to appreciate novel ideas of a researcher and provide generic feedback. Thus, proper assignment of reviewers based on their area of expertise is necessary. Moreover, reading each and every paper from end-to-end for assigning it to a reviewer is a tedious task. In this paper, we describe a system which our team FideLIPI submitted in the shared task of SDPRA-2021 [14]. It comprises four independent sub-systems capable of classifying abstracts of scientific literature to one of the given seven classes. The first one is a RoBERTa [10] based model built over these abstracts. Adding topic models / Latent dirichlet allocation (LDA) [2] based features to the first model results in the second sub-system. The third one is a sentence level RoBERTa [10] model. The fourth one is a Logistic Regression model built using Term Frequency Inverse Document Frequency (TF-IDF) features. We ensemble predictions of these four sub-systems using majority voting to develop the final system which gives a F1 score of 0.93 on the test and validation set. This outperforms the existing State Of The Art (SOTA) model SciBERT's [1] in terms of F1 score on the validation set.Our codebase is available at https://github.com/SDPRA-2021/shared-task/tree/main/FideLIPI

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/26/2020

Inno at SemEval-2020 Task 11: Leveraging Pure Transformer for Multi-Class Propaganda Detection

The paper presents the solution of team "Inno" to a SEMEVAL 2020 task 11...
research
04/08/2022

RubCSG at SemEval-2022 Task 5: Ensemble learning for identifying misogynous MEMEs

This work presents an ensemble system based on various uni-modal and bi-...
research
09/17/2022

Detecting Generated Scientific Papers using an Ensemble of Transformer Models

The paper describes neural models developed for the DAGPap22 shared task...
research
04/07/2017

EELECTION at SemEval-2017 Task 10: Ensemble of nEural Learners for kEyphrase ClassificaTION

This paper describes our approach to the SemEval 2017 Task 10: "Extracti...
research
01/26/2022

FiNCAT: Financial Numeral Claim Analysis Tool

While making investment decisions by reading financial documents, invest...
research
12/24/2016

KS_JU@DPIL-FIRE2016:Detecting Paraphrases in Indian Languages Using Multinomial Logistic Regression Model

In this work, we describe a system that detects paraphrases in Indian La...

Please sign up or login with your details

Forgot password? Click here to reset