Tests4Py: A Benchmark for System Testing

07/11/2023
by   Marius Smytzek, et al.
0

Benchmarks are among the main drivers of progress in software engineering research, especially in software testing and debugging. However, current benchmarks in this field could be better suited for specific research tasks, as they rely on weak system oracles like crash detection, come with few unit tests only, need more elaborative research, or cannot verify the outcome of system tests. Our Tests4Py benchmark addresses these issues. It is derived from the popular BugsInPy benchmark, including 30 bugs from 5 real-world Python applications. Each subject in Tests4Py comes with an oracle to verify the functional correctness of system inputs. Besides, it enables the generation of system tests and unit tests, allowing for qualitative studies by investigating essential aspects of test sets and extensive evaluations. These opportunities make Tests4Py a next-generation benchmark for research in test generation, debugging, and automatic program repair.

READ FULL TEXT
research
08/28/2018

Is Unit Testing Immune to Coincidental Correctness?

Researchers have previously shown that Coincidental Correctness (CC) is ...
research
05/28/2019

Empirical Review of Java Program Repair Tools: A Large-Scale Experiment on 2,141 Bugs and 23,551 Repair Attempts

In the past decade, research on test-suite-based automatic program repai...
research
08/28/2018

Does the Testing Level affect the Prevalence of Coincidental Correctness?

Researchers have previously shown that Coincidental Correctness (CC) is ...
research
05/07/2023

No More Manual Tests? Evaluating and Improving ChatGPT for Unit Test Generation

Unit testing is essential in detecting bugs in functionally-discrete pro...
research
04/12/2022

Toward Granular Automatic Unit Test Case Generation

Unit testing verifies the presence of faults in individual software comp...
research
05/08/2023

ChatUniTest: a ChatGPT-based automated unit test generation tool

Unit testing is a crucial, yet often tedious and time-consuming task. To...
research
05/26/2023

Towards More Realistic Evaluation for Neural Test Oracle Generation

Effective unit tests can help guard and improve software quality but req...

Please sign up or login with your details

Forgot password? Click here to reset