Representation of Developer Expertise in Open Source Software

05/20/2020
by   Tapajit Dey, et al.
0

With tens of millions of projects and developers, the OSS ecosystem is both vibrant and intimidating. On one hand, it hosts the source code for the most critical infrastructures and has the most brilliant developers as contributors, while on the other hand, poor quality or even malicious software, and novice developers abound. External contributions are critical to OSS projects, but the chances their contributions are accepted or even considered depend on the trust between maintainers and contributors. Such trust is built over repeated interactions and coding platforms provide signals of project or developer quality via measures of activity (commits), and social relationships (followers/stars) to facilitate trust. These signals, however, do not represent the specific expertise of a developer. We, therefore, aim to address this gap by defining the skill space for APIs, developers, and projects that reflects what developers know (and projects need) more precisely than could be obtained via aggregate activity counts, and more generally than pointing to individual files developers have changed in the past. Specifically, we use the World of Code infrastructure to extract the complete set of APIs in the files changed by all open source developers. We use that data to represent APIs, developers, and projects in the skill space, and evaluate if the alignment measures in the skill space can predict whether or not the developers use new APIs, join new projects, or get their pull requests accepted. We also check if the developers' representation in the skill space aligns with their self-reported expertise. Our results suggest that the proposed embedding in the skill space achieves our aims and may serve not only as a signal to increase trust (and efficiency) of open source ecosystems, but may also allow more detailed investigations of other phenomena related to developer proficiency and learning.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/03/2018

On Developers' Personality in Large-scale Distributed Projects: The Case of the Apache Ecosystem

Large-scale distributed projects are typically the results of collective...
research
07/11/2022

Dev2vec: Representing Domain Expertise of Developers in an Embedding Space

Accurate assessment of the domain expertise of developers is important f...
research
10/17/2020

Visualization of Contributions to Open-Source Projects

We want to analyze visually, to what extend team members and external de...
research
12/05/2022

Empirical Study of Co-Renamed Identifiers

Background: The renaming of program identifiers is the most common refac...
research
06/29/2020

New developer metrics: Are comments as crucial as code contributions?

Open-source code development has become widespread in recent years. As a...
research
10/06/2022

Trust in Motion: Capturing Trust Ascendancy in Open-Source Projects using Hybrid AI

Open-source is frequently described as a driver for unprecedented commun...
research
03/08/2019

Online division of labour: emergent structures in Open Source Software

The development Open Source Software fundamentally depends on the partic...

Please sign up or login with your details

Forgot password? Click here to reset