(Semi)automated disambiguation of scholarly repositories

07/05/2023
by   Miriam Baglioni, et al.
0

The full exploitation of scholarly repositories is pivotal in modern Open Science, and scholarly repository registries are kingpins in enabling researchers and research infrastructures to list and search for suitable repositories. However, since multiple registries exist, repository managers are keen on registering multiple times the repositories they manage to maximise their traction and visibility across different research communities, disciplines, and applications. These multiple registrations ultimately lead to information fragmentation and redundancy on the one hand and, on the other, force registries' users to juggle multiple registries, profiles and identifiers describing the same repository. Such problems are known to registries, which claim equivalence between repository profiles whenever possible by cross-referencing their identifiers across different registries. However, as we will see, this “claim set” is far from complete and, therefore, many replicas slip under the radar, possibly creating problems downstream. In this work, we combine such claims to create duplicate sets and extend them with the results of an automated clustering algorithm run over repository metadata descriptions. Then we manually validate our results to produce an “as accurate as possible” de-duplicated dataset of scholarly repositories.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/26/2022

"Knock knock! Who's there?" A study on scholarly repositories' availability

Scholarly repositories are the cornerstone of modern open science, and t...
research
09/19/2023

AstroPortal: An ontology repository concept for astronomy, astronautics and other space topics

This paper describes a repository for ontologies of astronomy, astronaut...
research
10/25/2021

Generating GitHub Repository Descriptions: A Comparison of Manual and Automated Approaches

Given the vast number of repositories hosted on GitHub, project discover...
research
04/18/2018

ArXiv and the REF open access policy

HEFCE's Policy for open access in the post-2014 Research Excellence Fram...
research
04/12/2023

Constructing a Searchable Knowledge Repository for FAIR Climate Data

The development of a knowledge repository for climate science data is a ...
research
08/23/2023

Understanding differences of the OA uptake within the Germany university landscape (2010-2020) – Part 2: repository-provided OA

This study investigates the determinants for the uptake of institutional...
research
09/29/2021

SliceHub: Augmenting Shared 3D Model Repositories with Slicing Results for 3D Printing

In this paper, we explore how to augment shared 3D model repositories, s...

Please sign up or login with your details

Forgot password? Click here to reset