DeepAI AI Chat
Log In Sign Up

The OCEAN mailing list data set: Network analysis spanning mailing lists and code repositories

by   Melanie Warrick, et al.
The University of Vermont

Communication surrounding the development of an open source project largely occurs outside the software repository itself. Historically, large communities often used a collection of mailing lists to discuss the different aspects of their projects. Multimodal tool use, with software development and communication happening on different channels, complicates the study of open source projects as a sociotechnical system. Here, we combine and standardize mailing lists of the Python community, resulting in 954,287 messages from 1995 to the present. We share all scraping and cleaning code to facilitate reproduction of this work, as well as smaller datasets for the Golang (122,721 messages), Angular (20,041 messages) and Node.js (12,514 messages) communities. To showcase the usefulness of these data, we focus on the CPython repository and merge the technical layer (which GitHub account works on what file and with whom) with the social layer (messages from unique email addresses) by identifying 33 explore correlations between the valence of social messaging and the structure of the collaboration network. We discuss how these data provide a laboratory to test theories from standard organizational science in large open source projects.


Poster: Communication in Open-Source Projects--End of the E-mail Era?

Communication is essential in software engineering. Especially in distri...

Architecture Information Communication in Two OSS Projects: the Why, Who, When, and What

Architecture information is vital for Open Source Software (OSS) develop...

The List is the Process: Reliable Pre-Integration Tracking of Commits on Mailing Lists

A considerable corpus of research on software evolution focuses on minin...

LAGOON: An Analysis Tool for Open Source Communities

This paper presents LAGOON – an open source platform for understanding t...

Permutation Encoding for Text Steganography: A Short Tutorial

We explore a method of encoding secret messages using factoradic numberi...

git2net - Mining Time-Stamped Co-Editing Networks from Large git Repositories

Data from software repositories have become an important foundation for ...

Identifying Emergent Leadership in OSS Projects Based on Communication Styles

In open source software (OSS) communities, existing leadership indicator...