Algorithmic labeling in hierarchical classifications of publications: Evaluation of bibliographic fields and term weighting approaches

04/17/2020
by   Peter Sjögårde, et al.
0

Algorithmic classifications of research publications can be used to study many different aspects of the science system, such as the organization of science into fields, the growth of fields, interdisciplinarity, and emerging topics. How to label the classes in these classifications is a problem that has not been thoroughly addressed in the literature. In this study we evaluate different approaches to label the classes in algorithmically constructed classifications of research publications. We focus on two important choices: the choice of (1) different bibliographic fields and (2) different approaches to weight the relevance of terms. To evaluate the different choices, we created two baselines: one based on the Medical Subject Headings in MEDLINE and another based on the Science-Metrix journal classification. We tested to what extent different approaches yield the desired labels for the classes in the two baselines. Based on our results we recommend extracting terms from titles and keywords to label classes at high levels of granularity (e.g. topics). At low levels of granularity (e.g. disciplines) we recommend extracting terms from journal names and author addresses. We recommend the use of a new approach, term frequency to specificity ratio, to calculate the relevance of terms.

READ FULL TEXT
01/08/2018

Granularity of algorithmically constructed publication-level classifications of research publications: Identification of topics

The purpose of this study is to find a theoretically grounded, practical...
06/27/2018

Author-Based Analysis of Conference versus Journal Publication in Computer Science

Conference publications in computer science (CS) have attracted scholarl...
09/23/2019

The Golden Eras of Graphene Science and Technology: Bibliographic Evidences From Journal and Patent Publications

Today's scientific research is an expensive enterprise funded largely by...
01/03/2018

How the Taiwanese Do China Studies: Applications of Text Mining

With the rapid evolution of cross-strait situation, "Mainland China" as ...
04/30/2021

Content-based subject classification at article level in biomedical context

Subject classification is an important task to analyze scholarly publica...
10/15/2021

Improving overlay maps of science: combining overview and detail

Overlay maps of science are global base maps over which subsets of publi...