Determining the Intrinsic Structure of Public Software Development History

11/16/2020
by   Antoine Pietri, et al.
0

Background. Collaborative software development has produced a wealth of version control system (VCS) data that can now be analyzed in full. Little is known about the intrinsic structure of the entire corpus of publicly available VCS as an interconnected graph. Understanding its structure is needed to determine the best approach to analyze it in full and to avoid methodological pitfalls when doing so. Objective. We intend to determine the most salient network topol-ogy properties of public software development history as captured by VCS. We will explore: degree distributions, determining whether they are scale-free or not; distribution of connect component sizes; distribution of shortest path lengths.Method. We will use Software Heritage-which is the largest corpus of public VCS data-compress it using webgraph compression techniques, and analyze it in-memory using classic graph algorithms. Analyses will be performed both on the full graph and on relevant subgraphs. Limitations. The study is exploratory in nature; as such no hypotheses on the findings is stated at this time. Chosen graph algorithms are expected to scale to the corpus size, but it will need to be confirmed experimentally. External validity will depend on how representative Software Heritage is of the software commons.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/16/2020

The Software Heritage Graph Dataset: Large-scale Analysis of Public Software Development History

Software Heritage is the largest existing public archive of software sou...
research
09/25/2018

Trustworthiness in Enterprise Crowdsourcing: a Taxonomy & evidence from data

In this paper we study the trustworthiness of the crowd for crowdsourced...
research
03/19/2019

Challenges and issues in collaborative software developments

The software development process has evolved with respect to the problem...
research
03/06/2022

Automated Inter-artefact Traceability Establishment for DevOps Practice

Software traceability is an important aspect in DevOps based software de...
research
10/16/2018

A Metric of Software Size as a Tool for IT Governance

This paper proposes a new metric for software functional size, which is ...
research
01/11/2016

Git4Voc: Git-based Versioning for Collaborative Vocabulary Development

Collaborative vocabulary development in the context of data integration ...

Please sign up or login with your details

Forgot password? Click here to reset