The Effect of Metadata on Scientific Literature Tagging: A Cross-Field Cross-Model Study

02/07/2023
by   Yu Zhang, et al.
0

Due to the exponential growth of scientific publications on the Web, there is a pressing need to tag each paper with fine-grained topics so that researchers can track their interested fields of study rather than drowning in the whole literature. Scientific literature tagging is beyond a pure multi-label text classification task because papers on the Web are prevalently accompanied by metadata information such as venues, authors, and references, which may serve as additional signals to infer relevant tags. Although there have been studies making use of metadata in academic paper classification, their focus is often restricted to one or two scientific fields (e.g., computer science and biomedicine) and to one specific model. In this work, we systematically study the effect of metadata on scientific literature tagging across 19 fields. We select three representative multi-label classifiers (i.e., a bag-of-words model, a sequence-based model, and a pre-trained language model) and explore their performance change in scientific literature tagging when metadata are fed to the classifiers as additional features. We observe some ubiquitous patterns of metadata's effects across all fields (e.g., venues are consistently beneficial to paper tagging in almost all cases), as well as some unique patterns in fields other than computer science and biomedicine, which are not explored in previous studies.

READ FULL TEXT
research
04/28/2022

D3: A Massive Dataset of Scholarly Metadata for Analyzing the State of Computer Science Research

DBLP is the largest open-access repository of scientific articles on com...
research
12/01/2022

Analyzing the State of Computer Science Research with the DBLP Discovery Dataset

The number of scientific publications continues to rise exponentially, e...
research
12/04/2018

Quantification and Analysis of Scientific Language Variation Across Research Fields

Quantifying differences in terminologies from various academic domains h...
research
02/15/2021

MATCH: Metadata-Aware Text Classification in A Large Hierarchy

Multi-label text classification refers to the problem of assigning each ...
research
04/08/2018

Automatically assembling a full census of an academic field

The composition of the scientific workforce shapes the direction of scie...
research
05/23/2017

Calidad en repositorios digitales en Argentina, estudio comparativo y cualitativo

Numerous institutions and organizations need not only to preserve the ma...
research
01/04/2021

Improving reference mining in patents with BERT

References in patents to scientific literature provide relevant informat...

Please sign up or login with your details

Forgot password? Click here to reset