Hierarchical Multi-Label Classification of Scientific Documents

11/05/2022
by   Mobashir Sadat, et al.
0

Automatic topic classification has been studied extensively to assist managing and indexing scientific documents in a digital collection. With the large number of topics being available in recent years, it has become necessary to arrange them in a hierarchy. Therefore, the automatic classification systems need to be able to classify the documents hierarchically. In addition, each paper is often assigned to more than one relevant topic. For example, a paper can be assigned to several topics in a hierarchy tree. In this paper, we introduce a new dataset for hierarchical multi-label text classification (HMLTC) of scientific papers called SciHTC, which contains 186,160 papers and 1,233 categories from the ACM CCS tree. We establish strong baselines for HMLTC and propose a multi-task learning approach for topic classification with keyword labeling as an auxiliary task. Our best model achieves a Macro-F1 score of 34.57 opportunities on hierarchical scientific topic classification. We make our dataset and code available on Github.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/30/2023

Recent Advances in Hierarchical Multi-label Text Classification: A Survey

Hierarchical multi-label text classification aims to classify the input ...
research
10/16/2019

HiGitClass: Keyword-Driven Hierarchical Classification of GitHub Repositories

GitHub has become an important platform for code sharing and scientific ...
research
11/05/2017

Multi-label Dataless Text Classification with Topic Modeling

Manually labeling documents is tedious and expensive, but it is essentia...
research
04/02/2022

Constrained Sequence-to-Tree Generation for Hierarchical Text Classification

Hierarchical Text Classification (HTC) is a challenging task where a doc...
research
05/16/2022

Decision Making for Hierarchical Multi-label Classification with Multidimensional Local Precision Rate

Hierarchical multi-label classification (HMC) has drawn increasing atten...
research
06/15/2020

Document Classification for COVID-19 Literature

The global pandemic has made it more important than ever to quickly and ...
research
10/18/2020

Topic Recommendation for Software Repositories using Multi-label Classification Algorithms

Many platforms exploit collaborative tagging to provide their users with...

Please sign up or login with your details

Forgot password? Click here to reset