Defectors: A Large, Diverse Python Dataset for Defect Prediction

03/08/2023
by   Parvez Mahbub, et al.
0

Defect prediction has been a popular research topic where machine learning (ML) and deep learning (DL) have found numerous applications. However, these ML/DL-based defect prediction models are often limited by the quality and size of their datasets. In this paper, we present Defectors, a large dataset for just-in-time and line-level defect prediction. Defectors consists of ≈ 213K source code files (≈ 93K defective and ≈ 120K defect-free) that span across 24 popular Python projects. These projects come from 18 different domains, including machine learning, automation, and internet-of-things. Such a scale and diversity make Defectors a suitable dataset for training ML/DL models, especially transformer models that require large and diverse datasets. We also foresee several application areas of our dataset including defect prediction and defect explanation. Dataset link: https://doi.org/10.5281/zenodo.7708984

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/10/2021

ManyTypes4Py: A Benchmark Python Dataset for Machine Learning-based Type Inference

In this paper, we present ManyTypes4Py, a large Python dataset for machi...
research
02/28/2022

ApacheJIT: A Large Dataset for Just-In-Time Defect Prediction

In this paper, we present ApacheJIT, a large dataset for Just-In-Time de...
research
03/13/2023

Systematic Evaluation of Deep Learning Models for Failure Prediction

With the increasing complexity and scope of software systems, their depe...
research
08/01/2023

GRDD: A Dataset for Greek Dialectal NLP

In this paper, we present a dataset for the computational study of a num...
research
03/26/2021

LS-CAT: A Large-Scale CUDA AutoTuning Dataset

The effectiveness of Machine Learning (ML) methods depend on access to l...
research
06/01/2022

Studying the Practices of Deploying Machine Learning Projects on Docker

Docker is a containerization service that allows for convenient deployme...
research
05/15/2023

Transactional Python for Durable Machine Learning: Vision, Challenges, and Feasibility

In machine learning (ML), Python serves as a convenient abstraction for ...

Please sign up or login with your details

Forgot password? Click here to reset