Using Supervised Learning to Classify Metadata of Research Data by Discipline of Research

10/16/2019
by   Tobias Weber, et al.
0

Automated classification of metadata of research data by their discipline(s) of research can be used in scientometric research, by repository service providers, and in the context of research data aggregation services. Openly available metadata of the DataCite index for research data were used to compile a large training and evaluation set comprised of 609,524 records, which is published alongside this paper. These data allow to reproducibly assess classification approaches, such as tree-based models and neural networks. According to our experiments with 20 base classes (multi-label classification), multi-layer perceptron models perform best with a f1-macro score of 0.760 closely followed by Long Short-Term Memory models (f1-macro score of 0.755). A possible application of the trained classification models is the quantitative analysis of trends towards interdisciplinarity of digital scholarly output or the characterization of growth patterns of research data, stratified by discipline of research. Both applications perform at scale with the proposed models which are available for re-use.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/14/2022

Multi-label topic classification for COVID-19 literature with Bioformer

We describe Bioformer team's participation in the multi-label topic clas...
research
10/16/2020

Predicting Playa Inundation Using a Long Short-Term Memory Neural Network

In the Great Plains, playas are critical wetland habitats for migratory ...
research
05/19/2018

Chief complaint classification with recurrent neural networks

Syndromic surveillance detects and monitors individual and population he...
research
11/08/2019

Macro F1 and Macro F1

The 'macro F1' metric is frequently used to evaluate binary, multi-class...
research
10/02/2022

Comparison of Data Representations and Machine Learning Architectures for User Identification on Arbitrary Motion Sequences

Reliable and robust user identification and authentication are important...
research
09/20/2018

Specimens as research objects: reconciliation across distributed repositories to enable metadata propagation

Botanical specimens are shared as long-term consultable research objects...
research
12/17/2017

Using Deep learning methods for generation of a personalized list of shuffled songs

The shuffle mode, where songs are played in a randomized order that is d...

Please sign up or login with your details

Forgot password? Click here to reset