More Effective Software Repository Mining

08/08/2020
by   Adam Tutko, et al.
0

Background: Data mining and analyzing of public Git software repositories is a growing research field. The tools used for studies that investigate a single project or a group of projects have been refined, but it is not clear whether the results obtained on such “convenience samples” generalize. Aims: This paper aims to elucidate the difficulties faced by researchers who would like to ascertain the generalizability of their findings by introducing an interface that addresses the issues with obtaining representative samples. Results: To do that we explore how to exploit the World of Code system to make software repository sampling and analysis much more accessible. Specifically, we present a resource for Mining Software Repository researchers that is intended to simplify data sampling and retrieval workflow and, through that, increase the validity and completeness of data. Conclusions: This system has the potential to provide researchers a resource that greatly eases the difficulty of data retrieval and addresses many of the currently standing issues with data sampling.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/06/2020

The SmartSHARK Ecosystem for Software Repository Mining

Software repository mining is the foundation for many empirical software...
research
11/30/2020

Toward a Benchmark Repository for Software Maintenance Tool Evaluations with Humans

To evaluate software maintenance techniques and tools in controlled expe...
research
08/11/2020

GraphRepo: Fast Exploration in Software Repository Mining

Mining and storage of data from software repositories is typically done ...
research
10/08/2021

A Mining Software Repository Extended Cookbook: Lessons learned from a literature review

The main purpose of Mining Software Repositories (MSR) is to discover th...
research
01/11/2021

What Affects Team Behavior? Preliminary Linguistic Analysis of Communications in the Jazz Repository

There is a growing belief that understanding and addressing the human pr...
research
02/23/2021

The SmartSHARK Repository Mining Data

The SmartSHARK repository mining data is a collection of rich and detail...
research
06/07/2021

Adopting Softer Approaches in the Study of Repository Data: A Comparative Analysis

Context: Given the acknowledged need to understand the people processes ...

Please sign up or login with your details

Forgot password? Click here to reset