The Software Heritage Graph Dataset: Large-scale Analysis of Public Software Development History

11/16/2020
by   Antoine Pietri, et al.
0

Software Heritage is the largest existing public archive of software source code and accompanying development history. It spans more than five billion unique source code files and one billion unique commits , coming from more than 80 million software projects. These software artifacts were retrieved from major collaborative development platforms (e.g., GitHub, GitLab) and package repositories (e.g., PyPI, Debian, NPM), and stored in a uniform representation linking together source code files, directories, commits, and full snapshots of version control systems (VCS) repositories as observed by Software Heritage during periodic crawls. This dataset is unique in terms of accessibility and scale, and allows to explore a number of research questions on the long tail of public software development, instead of solely focusing on ”most starred” repositories as it often happens.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/12/2021

The Software Heritage Filesystem (SwhFS): Integrating Source Code Archival with Development

We introduce the Software Heritage filesystem (SwhFS), a user-space file...
research
03/16/2021

LabelGit: A Dataset for Software Repositories Classification using Attributed Dependency Graphs

Software repository hosting services contain large amounts of open-sourc...
research
11/16/2020

Determining the Intrinsic Structure of Public Software Development History

Background. Collaborative software development has produced a wealth of ...
research
09/09/2019

A Systematic Review on Learning and Suggesting Source Code Changes in Version History

Software systems are in continuous evolution through source code changes...
research
08/31/2023

DevGPT: Studying Developer-ChatGPT Conversations

The emergence of large language models (LLMs) such as ChatGPT has disrup...
research
08/22/2023

The Software Heritage License Dataset (2022 Edition)

Context: When software is released publicly, it is common to include wit...
research
12/31/2018

Open Source Software Opportunities and Risks

Open Source Software (OSS) history is traced to initial efforts in 1971 ...

Please sign up or login with your details

Forgot password? Click here to reset