git2net - Mining Time-Stamped Co-Editing Networks from Large git Repositories

03/25/2019
by   Christoph Gote, et al.
0

Data from software repositories have become an important foundation for the empirical study of software engineering processes. A recurring theme in the repository mining literature is the inference of developer networks capturing e.g. collaboration, coordination, or communication from the commit history of projects. Most of the studied networks are based on the co-authorship of software artefacts defined at the level of files, modules, or packages. While this approach has led to insights into the social aspects of software development, it neglects detailed information on code changes and code ownership, e.g. which exact lines of code have been authored by which developers, that is contained in the commit log of software projects. Addressing this issue, we introduce git2net, a scalable python software that facilitates the extraction of fine-grained co-editing networks in large git repositories. It uses text mining techniques to analyse the detailed history of textual modifications within files. This information allows us to construct directed, weighted, and time-stamped networks, where a link signifies that one developer has edited a block of source code originally written by another developer. Our tool is applied in case studies of an Open Source and a commercial software project. We argue that it opens up a massive new source of high-resolution data on human collaboration patterns.

READ FULL TEXT

page 1

page 10

research
11/21/2019

Analysing Time-Stamped Co-Editing Networks in Software Development Teams using git2net

Data from software repositories have become an important foundation for ...
research
04/17/2020

An Annotated Dataset of Stack Overflow Post Edits

To improve software engineering, software repositories have been mined f...
research
05/03/2022

Tooling for Time- and Space-efficient git Repository Mining

Software projects under version control grow with each commit, accumulat...
research
04/01/2022

The OCEAN mailing list data set: Network analysis spanning mailing lists and code repositories

Communication surrounding the development of an open source project larg...
research
02/23/2021

The SmartSHARK Repository Mining Data

The SmartSHARK repository mining data is a collection of rich and detail...
research
04/15/2020

Ownership at Large – Open Problems and Challenges in Ownership Management

Software-intensive organizations rely on large numbers of software asset...
research
06/01/2018

A Revision Control System for Image Editing in Collaborative Multimedia Design

Revision control is a vital component in the collaborative development o...

Please sign up or login with your details

Forgot password? Click here to reset