Is Unit Testing Immune to Coincidental Correctness?
Researchers have previously shown that Coincidental Correctness (CC) is prevalent; however, the benchmarks they used are considered inadequate nowadays. They have also recognized the negative impact of CC on the effectiveness of fault localization and testing. The aim of this paper is to study Coincidental Correctness, using more realistic code, mainly from the perspective of unit testing. This stems from the fact that the practice of unit testing has grown tremendously in recent years due to the wide adoption of software development processes, such as Test-Driven Development. We quantified the presence of CC in unit testing using the Defects4J benchmark. This entailed manually injecting two code checkers for each of the 395 defects in Defects4J: 1) a weak checker that detects weak CC tests by monitoring whether the defect was reached; and 2) a strong checker that detects strong CC tests by monitoring whether the defect was reached and the program has transitioned into an infectious state. We also conducted preliminary experiments (using Defects4J, NanoXML and JTidy) to assess the pervasiveness of CC at the unit testing level in comparison to that at the integration and system levels. Our study showed that unit testing is not immune to CC, as it exhibited 7.2x more strong CC tests than failing tests and 8.3x more weak CC tests than failing tests. However, our preliminary results suggested that it might be less prone to CC than integration testing and system testing.
READ FULL TEXT