Categorizing the Content of GitHub README Files

02/20/2018
by   Gede Artha Azriadi Prana, et al.
0

README files play an essential role in shaping a developer's first impression of a software repository and in documenting the software project that the repository hosts. Yet, we lack a systematic understanding of the content of a typical README file as well as tools that can process these files automatically. To close this gap, we conduct a qualitative study involving the manual annotation of 4,226 README file sections from 393 randomly sampled GitHub repositories and we design and evaluate a classifier and a set of features that can categorize these sections automatically. We find that information discussing the `What' and `How' of a repository is very common, while many README files lack information regarding the purpose and status of a repository. Our multi-label classifier which can predict eight different categories achieves an F1 score of 0.746. This work enables the owners of software repositories to improve the quality of their documentation and it has the potential to make it easier for the software development community to discover relevant information in GitHub README files.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/18/2020

Topic Recommendation for Software Repositories using Multi-label Classification Algorithms

Many platforms exploit collaborative tagging to provide their users with...
research
08/18/2021

Generation of TypeScript Declaration Files from JavaScript Code

Developers are starting to write large and complex applications in TypeS...
research
05/27/2020

Github Data Exposure and Accessing Blocked Data using the GraphQL Security Design Flaw

This research study was conducted to illustrate how it is easily possibl...
research
11/10/2019

A Multimodal CNN-based Tool to Censure Inappropriate Video Scenes

Due to the extensive use of video-sharing platforms and services for the...
research
02/12/2021

Same File, Different Changes: The Potential of Meta-Maintenance on GitHub

Online collaboration platforms such as GitHub have provided software dev...
research
02/25/2021

What's in a GitHub Repository? – A Software Documentation Perspective

Developers use and contribute to repositories on GitHub. Documentation p...
research
10/25/2021

Generating GitHub Repository Descriptions: A Comparison of Manual and Automated Approaches

Given the vast number of repositories hosted on GitHub, project discover...

Please sign up or login with your details

Forgot password? Click here to reset