Geographic Diversity in Public Code Contributions

03/29/2022
by   Davide Rossi, et al.
0

We conduct an exploratory, large-scale, longitudinal study of 50 years of commits to publicly available version control system repositories, in order to characterize the geographic diversity of contributors to public code and its evolution over time. We analyze in total 2.2 billion commits collected by Software Heritage from 160 million projects and authored by 43 million authors during the 1971-2021 time period. We geolocate developers to 12 world regions derived from the United Nation geoscheme, using as signals email top-level domains, author names compared with names distributions around the world, and UTC offsets mined from commit metadata.We find evidence of the early dominance of North America in open source software, later joined by Europe. After that period, the geographic diversity in public code has been constantly increasing. We also identify relevant historical shifts related to the UNIX wars, the increase of coding literacy in Central and South Asia, and broader phenomena like colonialism and people movement across countries (immigration/emigration).

READ FULL TEXT

page 2

page 4

research
02/15/2022

Worldwide Gender Differences in Public Code Contributions

Gender imbalance is a well-known phenomenon observed throughout sciences...
research
11/17/2020

Gender Differences in Public Code Contributions: a 50-year Perspective

Gender imbalance in information technology in general, and Free/Open Sou...
research
09/04/2023

This Is a Local Domain: On Amassing Country-Code Top-Level Domains from Public Data

Domain lists are a key ingredient for representative censuses of the Web...
research
03/18/2020

A Dataset and an Approach for Identity Resolution of 38 Million Author IDs extracted from 2B Git Commits

The data collected from open source projects provide means to model larg...
research
05/18/2021

Women's Participation in Open Source Software: A Survey of the Literature

Participation of women in Open Source Software (OSS) is very unbalanced,...
research
04/16/2020

Deep Generation of Coq Lemma Names Using Elaborated Terms

Coding conventions for naming, spacing, and other essentially stylistic ...
research
06/07/2019

Do Authors Deposit on Time? Tracking Open Access Policy Compliance

Recent years have seen fast growth in the number of policies mandating O...

Please sign up or login with your details

Forgot password? Click here to reset