Improving The Effectiveness of Automatically Generated Test Suites Using Metamorphic Testing

04/18/2020 ∙ by Prashanta Saha, et al. ∙ Montana State University 0

Automated test generation has helped to reduce the cost of software testing. However, developing effective test oracles for these automatically generated test inputs is a challenging task. Therefore, most automated test generation tools use trivial oracles that reduce the fault detection effectiveness of these automatically generated test cases. In this work, we provide results of an empirical study showing that utilizing metamorphic relations can increase the fault detection effectiveness of automatically generated test cases.



There are no comments yet.


page 2

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1. Introduction

Software testing is a costly activity yet essential to detect faults. Typically in testing, an oracle is used to check whether the output produced for a given test input is correct or not  (Weyuker, 1982). Much work has been done on automated test case generation, including the development of publicly available tools (Fraser and Arcuri, 2011). The main focus of this work has been on developing efficient methods to generate test inputs to achieve a particular target such as coverage and weak mutation score (Saha and Kanewala, 2018). However, there has been relatively less attention paid on utilizing effective oracles to improve the fault detection effectiveness of these automatically generated test cases.

Figure 1. (a) EvoSuite Generated Test Case , (b) Modified Test Case with MR in MT

Metamorphic Testing (MT) is a technique proposed to alleviate the oracle problem of software under test (SUT) (Chen et al., 1998). This is based on the idea that most of the time it is easier to predict relations between outputs of a program, than understanding its input-output behavior. Such a relation is called a Metamorphic Relation (MR) in MT, and is a necessary property of the SUT that specifies a relationship between multiple inputs and their outputs (Chen et al., 2018).

Automatically generated test suites have certain advantages over manually written test cases, in particular, saving human labor and time. Some work has shown that it is more effective to use test cases that are generated based on some coverage criteria rather than randomly generated test cases (Pacheco and Ernst, 2007). However, due to the automated generation of test inputs, defining the oracles for these test inputs is a hard problem and faces the oracle problem. Thus, many of the automatically generated test cases would contain trivial oracles, such as the assert statements that we discussed above. This reduces the fault detection effectiveness of these test cases. Therefore, in this work, we investigate whether we can utilize MRs to improve the fault detection effectiveness of automatically generated test cases. For example, figure 1a is an EvoSuite generated test case for Power method. This method powers a matrix of the given component (i.e. int n) and returns the powered matrix. Though this test case has a code coverage of 100% but the generated assert statements are weak to detect critical faults in the method. Because of the presence of such trivial oracles, the fault detection effectiveness of this test case is reduced. With Multiplication MR, we modified the current test case from figure 1b. We multiplied the source test case matrix with the same matrix. We ran the test case for the Power method. Then we expected the resultant matrix from these two test cases are equal, or the follow-up output is higher than source output and compared them using assertion statements.

In this paper, we present the initial results of an empirical study conducted to evaluate the effectiveness of utilizing MRs with automatically generated test inputs. Our preliminary results show that MRs can help to increase the effectiveness of automatically generated test suites.

2. Empirical Study

In this experiment we used 4 classes (,, and SquareRootSolver- .java) from la4j111

(version 0.6.0) open-source Java library. la4j is a linear algebra library that provides matrix and vector implementations and algorithms and was one of the software packages used for evaluating the performance of automated testing tools. For each of these 4 classes, we used EvoSuite

(Fraser and Arcuri, 2011)

tool to generate test cases targeting line, branch, and weak mutation coverage. We have identified 16 MRs for the above 4 classes. These MRs are created based on common matrix operations (e.g., Transpose Matrix, Identity Matrix). We manually verified those input-output relationships of MRs with some sample values. Then we ran those MR modified source test cases (follow-up test cases) with automated source test inputs on the original programs and verified the MR properties again. If any MR did not hold for any test input, we excluded that MR for that particular input.

We used mutation testing, in particular, PIT222 tool to generate mutants, to measure the fault detection effectiveness of the test cases enhanced with MRs. We considered a mutant as ”killed” when the MR violates the output relation and as ”alive” when the relationship holds. We collected all the killed/alive information and calculated the mutation score and fault detection ratio for automated test suites and MRs.

3. Preliminary Results and Conclusions

Figure 2 shows the fault detection effectiveness of EvoSuite generated test cases (orange), and the Evosuite test cases utilizing MRs (blue). We also show the fault detection effectiveness of developer written test cases. As shown in the results, there is a significant increase in the mutation score of when MRs are utilized with the automatically generated test suite. For two classes, the increase of the mutation score is 100% higher than the automatically generated test suite. This preliminary result suggests that utilizing MRs with automatically generated test cases would improve the fault detection effectiveness. But for the case of the developer test suite, there is no additional mutant killed by the MRs except for Matrix. This needs to be investigated further.

Figure 2. 4 classes with Mutation score of automatically generated test suites and developer test suites, and increase of mutation score with Metamorphic Testing. (SRS = SquareRootSolver, LSS = LeastSquaresSolver, FBSS = ForwardBackSubstitutionSolver, E = EvoSuite, D = Developer)

Our preliminary results are promising, and it suggests that MT can effectively improve the fault detection capability of automatically generated test suites. But we need a large scale implementation to prove this claim further and to validate the correlation. We also need to find out the individual performance of MRs compared to the automatically generated test suites.

4. Acknowledgments

This work is supported by award number 1656877 from the National Science Foundation. Any Opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect those of the National Science Foundation.


  • T. Y. Chen, S. C. Cheung, and S. W. Yiu (1998) Metamorphic testing: a new approach for generating next test cases. Cited by: §1.
  • T. Y. Chen, F. Kuo, H. Liu, P. Poon, D. Towey, T. H. Tse, and Z. Q. Zhou (2018) Metamorphic testing: a review of challenges and opportunities. ACM Comput. Surv. 51 (1), pp. 4:1–4:27. External Links: ISSN 0360-0300, Link, Document Cited by: §1.
  • G. Fraser and A. Arcuri (2011) EvoSuite: automatic test suite generation for object-oriented software. In Proceedings of the 19th ACM SIGSOFT Symposium and the 13th European Conference on Foundations of Software Engineering, ESEC/FSE ’11, New York, NY, USA, pp. 416–419. External Links: ISBN 978-1-4503-0443-6, Link, Document Cited by: §1, §2.
  • C. Pacheco and M. D. Ernst (2007) Randoop: feedback-directed random testing for java. In Companion to the 22Nd ACM SIGPLAN Conference on Object-oriented Programming Systems and Applications Companion, OOPSLA ’07, New York, NY, USA, pp. 815–816. External Links: ISBN 978-1-59593-865-7, Link, Document Cited by: §1.
  • P. Saha and U. Kanewala (2018) Fault detection effectiveness of source test case generation strategies for metamorphic testing. In Proceedings of the 3rd International Workshop on Metamorphic Testing, MET ’18, New York, NY, USA, pp. 2–9. External Links: ISBN 978-1-4503-5729-6, Link, Document Cited by: §1.
  • E. J. Weyuker (1982) On Testing Non-Testable Programs. The Computer Journal 25 (4), pp. 465–470. External Links: ISSN 0010-4620, Document, Link, Cited by: §1.