GitHub Repositories with Links to Academic Papers: Open Access, Traceability, and Evolution

Traceability between published scientific breakthroughs and their implementation is essential, especially in the case of Open Source Software implements bleeding edge science into its code. However, aligning the link between GitHub repositories and academic papers can prove difficult, and the link impact remains unknown. This paper investigates the role of academic paper references contained in these repositories. We conducted a large-scale study of 20 thousand GitHub repositories to establish prevalence of references to academic papers. We use a mixed-methods approach to identify Open Access (OA), traceability and evolutionary aspects of the links. Although referencing a paper is not typical, we find that a vast majority of referenced academic papers are OA. In terms of traceability, our analysis revealed that machine learning is the most prevalent topic of repositories. These repositories tend to be affiliated with academic communities. More than half of the papers do not link back to any repository. A case study of referenced arXiv paper shows that most of these papers are high-impact and influential and do align with academia, referenced by repositories written in different programming languages. From the evolutionary aspect, we find very few changes of papers being referenced and links to them.


page 1

page 2

page 3

page 4


9.6 Million Links in Source Code Comments: Purpose, Evolution, and Decay

Links are an essential feature of the World Wide Web, and source code re...

paper2repo: GitHub Repository Recommendation for Academic Papers

GitHub has become a popular social application platform, where a large n...

Automatic Academic Paper Rating Based on Modularized Hierarchical Convolutional Neural Network

As more and more academic papers are being submitted to conferences and ...

How do influential and non-influential papers spread online?

Social media has become an important channel for publicizing academic re...

Assessing the quality of sources in Wikidata across languages: a hybrid approach

Wikidata is one of the most important sources of structured data on the ...

Reddit-TUDFE: practical tool to explore Reddit usability in data science and knowledge processing

This contribution argues that Reddit, as a massive, categorized, open-ac...

Memetic search for overlapping topics based on a local evaluation of link communities

In spite of recent advances in field delineation methods, bibliometricia...

Please sign up or login with your details

Forgot password? Click here to reset