 # Predicting Metamorphic Relation for Matrix Calculation Programs

Matrices often represent important information in scientific applications and are involved in performing complex calculations. But systematically testing these applications is hard due to the oracle problem. Metamorphic testing is an effective approach to test such applications because it uses metamorphic relations to determine whether test cases have passed or failed. Metamorphic relations are typically identified with the help of a domain expert and is a labor intensive task. In this work we use a graph kernel based machine learning approach to predict metamorphic relations for matrix calculation programs. Previously, this graph kernel based machine learning approach was used to successfully predict metamorphic relations for programs that perform numerical calculations. Results of this study show that this approach can be used to predict metamorphic relations for matrix calculation programs as well.

## Authors

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1. Introduction

Matrix calculations are common in scientific applications. Often, matrices represent data, graphs or mathematical equations in the applications. (Office, 2013). They can be used to get quick and good approximation for complicated calculation in time-sensitive engineering applications (Office, 2013). Moreover, matrix multiplication is used in graphics, digital videos and solving linear equations of particular variables in different applications (Office, 2013). But testing these applications is hard due to the difficulties associated with defining suitable test oracles (Weyuker, 1982). This is known as the oracle problem (Weyuker, 1982). Metamorphic Testing (MT) can be used to alleviate the test oracle problem (Y. Chen et al., 1998). MT conducts testing by checking whether the programs behave according to a set of metamorphic relation (MR) properties (Chen et al., 2003). A metamorphic relation specifies how the output should change according to a change made to the input (Chen et al., 2003). MT operates as follows (Y. Chen et al., 1998; Chen et al., 2003):

1. Identify a suitable set of metamorphic relations which should satisfy the program under test.

2. Create a set of initial test cases.

3. Apply the input transformations specified by the identified MRs in Step 1 and create follow-up test cases for each of the initial test case.

4. Execute the initial and follow-up test case pairs and check if the output change satisfies the change predicted by the MR. When testing a program, a run-time violation of an MR can mean that a fault or faults are present in the program under test.

In a previous work (Kanewala et al., 2016), a graph kernel-based machine learning method was introduced to predict MRs for programs with numerical inputs and outputs. In this work, we use the above method to predict MRs for functions performing matrix calculations. This method starts by creating the control flow graphs (CFGs) of each program, and the random walk kernel is used to compute the similarity between the graphs. The computed kernel values are used by a support vector machine (SVM) to automatically predict MRs for previously unseen functions. In this study, three types of metamorphic relations are identified for the matrix-based programs and are used for the predictions. We used 55 functions obtained from open source matrix calculation libraries to evaluate the effectiveness of this method. Our result shows that for matrix-based calculations, the random walk kernel can effectively predict the MRs.

## 2. Approach

This section discusses the details of the metamorphic relation approach used in this study.

### 2.1. Function Representation

The first step of this method is to convert a function into its CFG. This representation is specifically used since it allows the extraction of information about the sequence of operations performed in a control flow path that is directly related to the MRs satisfied by a given function.

A CFG is a directed graph of a function f. Here, x is a statement in f, represented by each node . The operation performed in each x are labeled label(). Supposedly if x and y are statements of f, after execution of x, y is executed. Then it can be said that e is an edge where . Control flow of the function f is represented by all the edges, and the starting point and the exiting point are represented by nodes and respectively (E. Allen, 1970).

We use the framework to create the CFGs. We post-processed the generated CFGs from Soot so that the nodes would represent atomic operations. In addition we annotated all the method call nodes in the CFG with their return types. Figure 1 represents a function for calculating scalar multiplication of a matrix and its post-processed CFG representation.

### 2.2. Random Walk Kernel

After creating the CFG representation of the functions, the next step is to use a graph kernel to compute the similarity between the CFGs. In previous work (Kanewala et al., 2016), two graph kernels were used and among them, better performance was shown by the random walk kernel. Therefore we use the random walk kernel in this study. We briefly describe the idea of the random walk kernel in this section. More information about this including the definitions can be found in (Kanewala et al., 2016).

The random walk kernel computes the similarity score between two graphs by summing up the similarity scores of all the pairs of walks in the two graphs. The similarity score of a pair of walks is computed by multiplying the similarity scores of their corresponding step pairs. The similarity score of a pair of steps is computed by multiplying the similarity scores of node and edge pairs that make up the step. The similarity score of a node pair is determined by their node labels: if the two node labels are the same, then the pair is assigned a similarity score of one, else it is assigned a similarity score of zero. Also, if the two node labels represent operations with similar mathematical properties (but not identical), then the pair is assigned a similarity score of 0.5. Edge labels decide the value assigned for the similarity score of a pair of edges. In this work we only used one type of edge showing the flow of control between the operations. Thus the similarity score for a pair of edges is always one.

### 2.3. Predictive Model Creation

The computed random walk kernel values are supplied to a support vector machine with a binary label indicating whether a given function satisfies a given MR or not. The support vector machine uses the provided information to create a model that can predict if a new function would satisfy the considered MR or not. In this study, the SVM implementation from the scikit-learn toolkit was used.

## 3. Experimental setup

This section describes the code corpus and MRs used in this study. The details of the evaluation procedure are also discussed here.

### 3.1. The Code Corpus

A total of 55 functions, all of which takes matrices as inputs and produces matrices as outputs, were used to measure the effectiveness of the method described in Section 2 for predicting MRs. These functions were collected from Apache Commons Math Library, which is an open source project. These functions execute a variety of calculations on matrices such as addition, multiplication, subtraction, and searching (e.g. getting column matrix, getting row matrix). There were several functions that performed the same functionality, but they were implemented differently. For example, Array2DRowRealMatrix class and OpenMapRealMatrix class both have multiplication functions for matrices, but they are implemented in different ways. In such cases, both the functions are used in the code corpus. All the functions used in this study can be found via the following URL: https://github.com/MSU-STLab/MRPrediction/tree/master/alldotfiles

### 3.2. Metamorphic Relations

We manually identified three categories of MRs - Additive, Permutative, and Multiplicative, that are generally applicable to matrix calculations. These three high-level categories are further divided based on whether the modification is made at the element, row, or column levels. The full categorization of the MRs is shown in Table 1. In this work we only focus on predicting the high level MR category; i.e. Permutative, Additive and Multiplicative.

### 3.3. Evaluation Procedure

We use train, validation and test method to evaluate the MR prediction effectiveness. Table 2 shows the number of positive and negative instances for each MR; positive indicates that a function satisfies the given MR and negative indicates that the function does not satisfy the given MR. For each MR, we divided the data into three subsets, where each fold contained approximately the same portion of positive and negative instances, as the original dataset. The three folds were named as Train data, Test data, and Validation data. The precomputed kernel values of the functions in Train data were used to create the prediction model. The Validation data was used to select the following parameters for the predictive model:

• Regularization parameter of the SVM.

• Path weighing factor in the random walk kernel where .

The parameter values selected using the validation set were then used to create the predictive model for predicting the MRs for the test data. We repeated the train, validation and test method ten times so that the functions in each fold is selected randomly each time to avoid any biases occur in fold divisions.

We used the Area Under the receiver operating characteristic Curve (AUC)

(Huang and Ling, 2005)

to measure the prediction effectiveness. AUC measures the probability that a randomly chosen negative example will have a lower prediction score than a randomly chosen positive example. AUC does not depend on the discrimination threshold of the classifier and has been shown to be a better measure for comparing learning algorithms

(Huang and Ling, 2005). Figure 2. Prediction AUC score for random graph kernel for validation dataset and test dataset

## 4. Results and Discussion

Table 3 lists the and C values that recorded the highest AUC values for each MR on the validation set. For the three MRs considered in this study, the value selected for the parameter doesn’t seem to have a big effect on the prediction accuracy. But for all the three MRs, the best value for is 0.9, indicating that longer paths in the CFGs are more important for predicting these MRs than the other paths.

Figure 2 shows the AUC scores for the validation data set and the test data set. On the test data, the highest AUC score (0.81) could be observed when predicting the Permutative MR. The other two MRs also reported AUC values higher than 0.7 indicating that our approach created effective predictive models for all the three MRs. Further, for all the three MRs, AUC values for the validation data set and the test data set is close. This indicates that there is a low chance of over-fitting in the predictive model.

## 5. Related Work

Several previous studies have looked into automatically generating/predicting MRs. Kanewala et. al showed that, in previously unseen programs, MRs can be predicted using a machine learning method. Features were extracted from CFGs of the functions and they were then used to create a predictive model (Kanewala and Bieman, 2013). Later, they developed the graph kernel based approach used in this study to predict MRs for numerical programs (Kanewala et al., 2016).

Liu et al. introduced a new method called Composition of Metamorphic Relation (CMR), where the generation of new metamorphic relations is done by combining existing metamorphic relations (Liu et al., 2012). A similar study has been done by Dong et. al, where Compositional MR was generated based on the speculative law of proposition logic (Dong et al., 2008).

Zhang et al. suggested a technique, where an algorithm searches for metamorphic relations in the form of linear or quadratic equations (Zhang et al., 2014). Su et al. also suggested a new method called KABU, which can be used to find more likely metamorphic relations by dynamically inferring the properties of the status of a method (Su et al., 2015).

Chen et al. proposed a tool called METRIC, where metamorphic relations were identified based on the category-choice framework (Yueh Chen et al., 2016). Later, they introduced an approach called DESSERT, where DividE-and-conquer methodology was used to identify the categorieS, choiceS, and choicE Relations for Test case generation (Chen et al., 2012).

## 6. Conclusion & future work

The metamorphic testing technique is very useful to test programs that do not have a test oracle. The effectiveness of this technique highly depends on the set of MRs used for testing. But the identification process of MRs is mostly done manually and could be a time consuming process.

This study is an extension of previous work, where the random walk kernel is used to predict MRs for functions that performs matrix calculation. Our results show that for these types of functions, random walk kernel can be effective in predicting MRs.

In the future, we plan to increase the number of functions used in this study. Further, new types of MRs, specifically for functions that perform matrix calculation, can also be considered. We also plan to extend the MR prediction scope beyond the function level.

###### Acknowledgements.
This work is supported by award number 1656877 from the National Science Foundation. Any Opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect those of the National Science Foundation.

## References

• (1)
• Chen et al. (2003) T.Y. Chen, T.H. Tse, and Z. Quan Zhou. 2003. Fault-based testing without the need of oracles. Information and Software Technology 45, 1 (2003), 1 – 9.
• Chen et al. (2012) T. Y. Chen, P. L. Poon, S. F. Tang, and T. H. Tse. 2012. DESSERT: a DividE-and-conquer methodology for identifying categorieS, choiceS, and choicE Relations for Test case generation. IEEE Transactions on Software Engineering 38, 4 (July 2012), 794–809.
• Dong et al. (2008) G Dong, Baowen Xu, L Chen, C Nie, and L Wang. 2008. Case studies on testing with compositional metamorphic relations. 24 (12 2008), 437–443.
• E. Allen (1970) Frances E. Allen. 1970. Control flow analysis. 5 (07 1970), 1–19.
• Huang and Ling (2005) Jin Huang and C. X. Ling. 2005. Using AUC and accuracy in evaluating learning algorithms. IEEE Transactions on Knowledge and Data Engineering 17, 3 (March 2005), 299–310.
• Kanewala and Bieman (2013) U. Kanewala and J. M. Bieman. 2013. Using machine learning techniques to detect metamorphic relations for programs without test oracles. In 2013 IEEE 24th International Symposium on Software Reliability Engineering (ISSRE). 1–10.
• Kanewala et al. (2016) Upulee Kanewala, James M. Bieman, and Asa Ben-Hur. 2016. Predicting Metamorphic Relations for Testing Scientific Software: A Machine Learning Approach Using Graph Kernels. Softw. Test. Verif. Reliab. 26, 3 (May 2016), 245–269.
• Liu et al. (2012) H. Liu, X. Liu, and T. Y. Chen. 2012. A New Method for Constructing Metamorphic Relations. In 2012 12th International Conference on Quality Software. 59–68.
• Office (2013) Larry Hardesty MIT News Office. 2013. Explained: Matrices. (Dec 2013).
• Su et al. (2015) Fang-Hsiang Su, Jonathan Bell, Christian Murphy, and Gail Kaiser. 2015. Dynamic Inference of Likely Metamorphic Properties to Support Differential Testing. In Proceedings of the 10th International Workshop on Automation of Software Test (AST ’15). IEEE Press, Piscataway, NJ, USA, 55–59.
• Weyuker (1982) Elaine Weyuker. 1982. On Testing Non-Testable Programs. 25 (11 1982).
• Y. Chen et al. (1998) T Y. Chen, S C. Cheung, and Sm Yiu. 1998. Metamorphic testing: a new approach for generating next test cases. (01 1998).
• Yueh Chen et al. (2016) Tsong Yueh Chen, Pak-Lok Poon, and Xiaoyuan Xie. 2016. METRIC: METamorphic Relation Identification based on the Category-choice framework. 116 (07 2016), 177–190.
• Zhang et al. (2014) Jie Zhang, Junjie Chen, Dan Hao, Yingfei Xiong, Bing Xie, Lu Zhang, and Hong Mei. 2014. Search-based inference of polynomial metamorphic relations. (09 2014), 701-712 pages.