DeepAI AI Chat
Log In Sign Up

The OCEAN mailing list data set: Network analysis spanning mailing lists and code repositories

04/01/2022
by   Melanie Warrick, et al.
Google
The University of Vermont
0

Communication surrounding the development of an open source project largely occurs outside the software repository itself. Historically, large communities often used a collection of mailing lists to discuss the different aspects of their projects. Multimodal tool use, with software development and communication happening on different channels, complicates the study of open source projects as a sociotechnical system. Here, we combine and standardize mailing lists of the Python community, resulting in 954,287 messages from 1995 to the present. We share all scraping and cleaning code to facilitate reproduction of this work, as well as smaller datasets for the Golang (122,721 messages), Angular (20,041 messages) and Node.js (12,514 messages) communities. To showcase the usefulness of these data, we focus on the CPython repository and merge the technical layer (which GitHub account works on what file and with whom) with the social layer (messages from unique email addresses) by identifying 33 explore correlations between the valence of social messaging and the structure of the collaboration network. We discuss how these data provide a laboratory to test theories from standard organizational science in large open source projects.

READ FULL TEXT
03/26/2018

Poster: Communication in Open-Source Projects--End of the E-mail Era?

Communication is essential in software engineering. Especially in distri...
07/03/2021

Architecture Information Communication in Two OSS Projects: the Why, Who, When, and What

Architecture information is vital for Open Source Software (OSS) develop...
02/08/2019

The List is the Process: Reliable Pre-Integration Tracking of Commits on Mailing Lists

A considerable corpus of research on software evolution focuses on minin...
01/26/2022

LAGOON: An Analysis Tool for Open Source Communities

This paper presents LAGOON – an open source platform for understanding t...
04/08/2021

Permutation Encoding for Text Steganography: A Short Tutorial

We explore a method of encoding secret messages using factoradic numberi...
03/25/2019

git2net - Mining Time-Stamped Co-Editing Networks from Large git Repositories

Data from software repositories have become an important foundation for ...
01/28/2022

Identifying Emergent Leadership in OSS Projects Based on Communication Styles

In open source software (OSS) communities, existing leadership indicator...