A Tool to Extract Structured Data from GitHub

12/07/2020
by   Shreyansh Surana, et al.
0

GitHub repositories consist of various detailed information about the project contributors, the number of commits and its contributors, releases, pull requests, programming languages, and issues. However, no systematic dataset of open source projects exists which features detailed information about the repositories on GitHub for knowledge acquisition and mining. In this paper, we developed tool support, named GitRepository, which helps in creating a data-set of repositories based on the proposed schema. Out of initial 1680 repositories, the dataset hosts 620 repositories (with applied basic filters of stars and forks), and 247 repositories (after applying all pre-defined filters). The tool extracts the information of GitHub repositories and saves the data in the form of CSV. files and a database (.DB) file.

READ FULL TEXT
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

03/08/2021

Sampling Projects in GitHub for MSR Studies

Almost every Mining Software Repositories (MSR) study requires, as first...
07/05/2018

An Insight into the Pull Requests of GitHub

Given the increasing number of unsuccessful pull requests in GitHub proj...
07/06/2020

Sosed: a tool for finding similar software projects

In this paper, we present Sosed, a tool for discovering similar software...
02/25/2021

What's in a GitHub Repository? – A Software Documentation Perspective

Developers use and contribute to repositories on GitHub. Documentation p...
06/08/2017

Optimal parameters for bloom-filtered joins in Spark

In this paper, we present an algorithm that joins relational database ta...
10/01/2019

Beyond Textual Issues: Understanding the Usage and Impact of GitHub Reactions

Recently, GitHub introduced a new social feature, named reactions, which...
02/23/2021

The SmartSHARK Repository Mining Data

The SmartSHARK repository mining data is a collection of rich and detail...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.