Patch Quality and Diversity of Invariant-Guided Search-Based Program Repair
Most automatic program repair techniques rely on test cases to specify correct program behavior. Due to test cases' frequently incomplete coverage of desired behavior, however, patches often overfit and fail to generalize to broader requirements. Moreover, in the absence of perfectly correct outputs, methods to ensure higher patch quality, such as merging together several patches or a human evaluating patch recommendations, benefit from having access to a diverse set of patches, making patch diversity a potentially useful trait. We evaluate the correctness and diversity of patches generated by GenProg and an invariant-based diversity-enhancing extension described in our prior work. We find no evidence that promoting diversity changes the correctness of patches in a positive or negative direction. Using invariant- and test case generation-driven metrics for measuring semantic diversity, we find no observed semantic differences between patches for most bugs, regardless of the repair technique used.
READ FULL TEXT