Replication Can Improve Prior Results: A GitHub Study of Pull Request Acceptance

02/09/2019
by   Di Chen, et al.
0

Crowdsourcing and data mining can be used to effectively reduce the effort associated with the partial replication and enhancement of qualitative studies. For example, in a primary study, other researchers explored factors influencing the fate of GitHub pull requests using an extensive qualitative analysis of 20 pull requests. Guided by their findings, we mapped some of their qualitative insights onto quantitative questions. To determine how well their findings generalize, we collected much more data (170 additional pull requests from 142 GitHub projects). Using crowdsourcing, that data was augmented with subjective qualitative human opinions about how pull requests extended the original issue. The crowd's answers were then combined with quantitative features and, using data mining, used to build a predictor for whether code would be merged. That predictor was far more accurate that one built from the primary study's qualitative factors (F1=90 vs 68%), illustrating the value of a mixed-methods approach and replication to improve prior results. To test the generality of this approach, the next step in future work is to conduct other studies that extend qualitative studies with crowdsourcing and data mining.

READ FULL TEXT
research
06/13/2018

Crowd-Powered Data Mining

Many data mining tasks cannot be completely addressed by automated proce...
research
12/15/2018

On the impact of pull request decisions on future contributions

The pull-based development process has become prevalent on platforms suc...
research
05/28/2021

Pull Request Decision Explained: An Empirical Overview

Context: Pull-based development model is widely used in open source, lea...
research
12/18/2019

Replication in Data Grids: Metrics and Strategies

We focus in this report on two main axes. The first is dedicated to the ...
research
09/24/2020

On the Relationship between Refactoring Actions and Bugs: A Differentiated Replication

Software refactoring aims at improving code quality while preserving the...
research
03/02/2023

Uses and Gratifications of Alternative Social Media: Why do people use Mastodon?

The primary purpose of this investigation is to answer the research ques...
research
06/30/2020

Hierarchical Qualitative Clustering – clustering mixed datasets with critical qualitative information

Clustering can be used to extract insights from data or to verify some o...

Please sign up or login with your details

Forgot password? Click here to reset