Summarising Big Data: Common GitHub Dataset for Software Engineering Challenges

06/08/2020
by   Abdulkadir Seker, et al.
0

In open-source software development environments; textual, numerical and relationship-based data generated are of interest to researchers. Various data sets are available for this data, which is frequently used in areas such as software engineering and natural language processing. However, since these data sets contain all the data in the environment, the problem arises in the terabytes of data processing. For this reason, almost all of the studies using GitHub data use filtered data according to certain criteria. In this context, using a different data set in each study makes a comparison of the accuracy of the studies quite difficult. In order to solve this problem, a common dataset was created and shared with the researchers, which would allow us to work on many software engineering problems.

READ FULL TEXT
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 3

03/24/2020

A Systematic Mapping of Software Engineering Challenges: GHTorrent Case

Git is used as the distributed version control system for many open-sour...
01/02/2020

Dataset of Video Game Development Problems

Different from traditional software development, there is little informa...
02/23/2021

Data Engineering for Everyone

Data engineering is one of the fastest-growing fields within machine lea...
07/13/2021

Promises and Perils of Inferring Personality on GitHub

Personality plays a pivotal role in our understanding of human actions a...
03/30/2020

Repository for Reusing Artifacts of Artificial Neural Networks

Artificial Neural Networks (ANNs) replaced conventional software systems...
08/31/2018

Total Recall, Language Processing, and Software Engineering

A broad class of software engineering problems can be generalized as the...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.