Towards More Realistic Evaluation for Neural Test Oracle Generation

05/26/2023
by   Zhongxin Liu, et al.
0

Effective unit tests can help guard and improve software quality but require a substantial amount of time and effort to write and maintain. A unit test consists of a test prefix and a test oracle. Synthesizing test oracles, especially functional oracles, is a well-known challenging problem. Recent studies proposed to leverage neural models to generate test oracles, i.e., neural test oracle generation (NTOG), and obtained promising results. However, after a systematic inspection, we find there are some inappropriate settings in existing evaluation methods for NTOG. These settings could mislead the understanding of existing NTOG approaches' performance. We summarize them as 1) generating test prefixes from bug-fixed program versions, 2) evaluating with an unrealistic metric, and 3) lacking a straightforward baseline. In this paper, we first investigate the impacts of these settings on evaluating and understanding the performance of NTOG approaches. We find that 1) unrealistically generating test prefixes from bug-fixed program versions inflates the number of bugs found by the state-of-the-art NTOG approach TOGA by 61.8 the Precision of TOGA is only 0.38 NoException, which simply expects no exception should be raised, can find 61 of the bugs found by TOGA with twice the Precision. Furthermore, we introduce an additional ranking step to existing evaluation methods and propose an evaluation metric named Found@K to better measure the cost-effectiveness of NTOG approaches. We propose a novel unsupervised ranking method to instantiate this ranking step, significantly improving the cost-effectiveness of TOGA. Eventually, we propose a more realistic evaluation method TEval+ for NTOG and summarize seven rules of thumb to boost NTOG approaches into their practical usages.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/20/2021

Neural Unit Test Suggestions

Testing is widely recognized as an important stage of the software devel...
research
07/29/2023

Neural-Based Test Oracle Generation: A Large-scale Evaluation and Lessons Learned

Defining test oracles is crucial and central to test development, but ma...
research
02/03/2023

Perfect Is the Enemy of Test Oracle

Automation of test oracles is one of the most challenging facets of soft...
research
11/11/2021

SyzScope: Revealing High-Risk Security Impacts of Fuzzer-Exposed Bugs in Linux kernel

Fuzzing has become one of the most effective bug finding approach for so...
research
07/28/2023

Towards Automatic Generation of Amplified Regression Test Oracles

Regression testing is crucial in ensuring that pure code refactoring doe...
research
07/02/2023

LLM4CBI: Taming LLMs to Generate Effective Test Programs for Compiler Bug Isolation

Compiler bugs pose a significant threat to safety-critical applications,...
research
07/11/2023

Tests4Py: A Benchmark for System Testing

Benchmarks are among the main drivers of progress in software engineerin...

Please sign up or login with your details

Forgot password? Click here to reset