A Tool to Extract Structured Data from GitHub

12/07/2020
by   Shreyansh Surana, et al.
0

GitHub repositories consist of various detailed information about the project contributors, the number of commits and its contributors, releases, pull requests, programming languages, and issues. However, no systematic dataset of open source projects exists which features detailed information about the repositories on GitHub for knowledge acquisition and mining. In this paper, we developed tool support, named GitRepository, which helps in creating a data-set of repositories based on the proposed schema. Out of initial 1680 repositories, the dataset hosts 620 repositories (with applied basic filters of stars and forks), and 247 repositories (after applying all pre-defined filters). The tool extracts the information of GitHub repositories and saves the data in the form of CSV. files and a database (.DB) file.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/16/2023

Wasmizer: Curating WebAssembly-driven Projects on GitHub

WebAssembly has attracted great attention as a portable compilation targ...
research
03/08/2021

Sampling Projects in GitHub for MSR Studies

Almost every Mining Software Repositories (MSR) study requires, as first...
research
04/02/2023

GitHub OSS Governance File Dataset

Open-source Software (OSS) has become a valuable resource in both indust...
research
03/12/2023

SecretBench: A Dataset of Software Secrets

According to GitGuardian's monitoring of public GitHub repositories, the...
research
06/21/2022

An Empirical Study On Correlation between Readme Content and Project Popularity

Readme in GitHub repositories serves as a preliminary source of informat...
research
07/06/2020

Sosed: a tool for finding similar software projects

In this paper, we present Sosed, a tool for discovering similar software...
research
06/08/2017

Optimal parameters for bloom-filtered joins in Spark

In this paper, we present an algorithm that joins relational database ta...

Please sign up or login with your details

Forgot password? Click here to reset