Soft-Search: Two Datasets to Study the Identification and Production of Research Software

02/27/2023
by   Eva Maxfield Brown, et al.
0

Software is an important tool for scholarly work, but software produced for research is in many cases not easily identifiable or discoverable. A potential first step in linking research and software is software identification. In this paper we present two datasets to study the identification and production of research software. The first dataset contains almost 1000 human labeled annotations of software production from National Science Foundation (NSF) awarded research projects. We use this dataset to train models that predict software production. Our second dataset is created by applying the trained predictive models across the abstracts and project outcomes reports for all NSF funded projects between the years of 2010 and 2023. The result is an inferred dataset of software production for over 150,000 NSF awards. We release the Soft-Search dataset to aid in identifying and understanding research software production: https://github.com/si2-urssi/eager

READ FULL TEXT
research
07/06/2020

Sosed: a tool for finding similar software projects

In this paper, we present Sosed, a tool for discovering similar software...
research
07/20/2023

The Changing Role of RSEs over the Lifetime of Parsl

This position paper describes the Parsl open source research software pr...
research
12/02/2020

Production Monitoring to Improve Test Suites

Software testing ensures that a software system behaves as intended. In ...
research
03/29/2022

Demystifying Software Release Note Issues on GitHub

Release notes (RNs) summarize main changes between two consecutive softw...
research
06/12/2018

Next generation portal for federated testbeds MySlice v2: from prototype to production

A number of projects in computer science around the world have contribut...
research
09/30/2021

Towards a modern CMake workflow

Modern CMake offers the features to manage versatile and complex project...
research
04/22/2022

S2AMP: A High-Coverage Dataset of Scholarly Mentorship Inferred from Publications

Mentorship is a critical component of academia, but is not as visible as...

Please sign up or login with your details

Forgot password? Click here to reset