Characterizing Bugs in Python and R Data Analytics Programs

by   Shibbir Ahmed, et al.

R and Python are among the most popular languages used in many critical data analytics tasks. However, we still do not fully understand the capabilities of these two languages w.r.t. bugs encountered in data analytics tasks. What type of bugs are common? What are the main root causes? What is the relation between bugs and root causes? How to mitigate these bugs? We present a comprehensive study of 5,068 Stack Overflow posts, 1,800 bug fix commits from GitHub repositories, and several GitHub issues of the most used libraries to understand bugs in R and Python. Our key findings include: while both R and Python have bugs due to inexperience with data analysis, Python see significantly larger data preprocessing bugs compared to R. Developers experience significantly more data flow bugs in R because intermediate results are often implicit. We also found changes and bugs in packages and libraries cause more bugs in R compared to Python while package or library misselection and conflicts cause more bugs in Python than R. While R has a slightly higher readability barrier for data analysts, the statistical power of R leads to a less number of bad performance bugs. In terms of data visualization, R packages have significantly more bugs than Python libraries. We also identified a strong correlation between comparable packages in R and Python despite their linguistic and methodological differences. Lastly, we contribute a large dataset of manually verified R and Python bugs.


page 10

page 14

page 20


A Comprehensive Study on Deep Learning Bug Characteristics

Deep learning has gained substantial popularity in recent years. Develop...

Not All Bugs Are the Same: Understanding, Characterizing, and Classifying the Root Cause of Bugs

Modern version control systems such as Git or SVN include bug tracking m...

An Empirical Study on Bugs Inside PyTorch: A Replication Study

Software systems are increasingly relying on deep learning components, d...

Evaluating Synthetic Bugs

Fuzz testing has been used to find bugs in programs since the 1990s, but...

A Systematic Impact Study for Fuzzer-Found Compiler Bugs

Despite much recent interest in randomised testing (fuzzing) of compiler...

An Empirical Study of Bugs in Quantum Machine Learning Frameworks

Quantum computing has emerged as a promising domain for the machine lear...

How Does Bug-Handling Effort Differ Among Different Programming Languages?

Handling bugs is an essential part of software development. The impact o...

Please sign up or login with your details

Forgot password? Click here to reset