1. Introduction
Matrix calculations are common in scientific applications. Often, matrices represent data, graphs or mathematical equations in the applications. (Office, 2013). They can be used to get quick and good approximation for complicated calculation in timesensitive engineering applications (Office, 2013). Moreover, matrix multiplication is used in graphics, digital videos and solving linear equations of particular variables in different applications (Office, 2013). But testing these applications is hard due to the difficulties associated with defining suitable test oracles (Weyuker, 1982). This is known as the oracle problem (Weyuker, 1982). Metamorphic Testing (MT) can be used to alleviate the test oracle problem (Y. Chen et al., 1998). MT conducts testing by checking whether the programs behave according to a set of metamorphic relation (MR) properties (Chen et al., 2003). A metamorphic relation specifies how the output should change according to a change made to the input (Chen et al., 2003). MT operates as follows (Y. Chen et al., 1998; Chen et al., 2003):

Identify a suitable set of metamorphic relations which should satisfy the program under test.

Create a set of initial test cases.

Apply the input transformations specified by the identified MRs in Step 1 and create followup test cases for each of the initial test case.

Execute the initial and followup test case pairs and check if the output change satisfies the change predicted by the MR. When testing a program, a runtime violation of an MR can mean that a fault or faults are present in the program under test.
In a previous work (Kanewala et al., 2016), a graph kernelbased machine learning method was introduced to predict MRs for programs with numerical inputs and outputs. In this work, we use the above method to predict MRs for functions performing matrix calculations. This method starts by creating the control flow graphs (CFGs) of each program, and the random walk kernel is used to compute the similarity between the graphs. The computed kernel values are used by a support vector machine (SVM) to automatically predict MRs for previously unseen functions. In this study, three types of metamorphic relations are identified for the matrixbased programs and are used for the predictions. We used 55 functions obtained from open source matrix calculation libraries to evaluate the effectiveness of this method. Our result shows that for matrixbased calculations, the random walk kernel can effectively predict the MRs.
Metamorphic Relation  Change made to the input  Expected change in the output 
Permutation of all the elements  
Permutative  Permutation of rows  The matrix size will remain same 
Permutation of columns  
Scalar addition to matrix  
Additive  Addition of two or more matrices  Element values will increase or remain same 
Addition to the subset of elements of the matrix  
Multiplicative  Scalar multiplication to matrix  Element values will increase 
Multiplication to the subset of elements of the matrix 
2. Approach
This section discusses the details of the metamorphic relation approach used in this study.
2.1. Function Representation
The first step of this method is to convert a function into its CFG. This representation is specifically used since it allows the extraction of information about the sequence of operations performed in a control flow path that is directly related to the MRs satisfied by a given function.
A CFG is a directed graph of a function f. Here, x is a statement in f, represented by each node . The operation performed in each x are labeled label(). Supposedly if x and y are statements of f, after execution of x, y is executed. Then it can be said that e is an edge where . Control flow of the function f is represented by all the edges, and the starting point and the exiting point are represented by nodes and respectively (E. Allen, 1970).
We use the Soot^{1}^{1}1https://www.sable.mcgill.ca/soot/ framework to create the CFGs. We postprocessed the generated CFGs from Soot so that the nodes would represent atomic operations. In addition we annotated all the method call nodes in the CFG with their return types. Figure 1 represents a function for calculating scalar multiplication of a matrix and its postprocessed CFG representation.
2.2. Random Walk Kernel
After creating the CFG representation of the functions, the next step is to use a graph kernel to compute the similarity between the CFGs. In previous work (Kanewala et al., 2016), two graph kernels were used and among them, better performance was shown by the random walk kernel. Therefore we use the random walk kernel in this study. We briefly describe the idea of the random walk kernel in this section. More information about this including the definitions can be found in (Kanewala et al., 2016).
The random walk kernel computes the similarity score between two graphs by summing up the similarity scores of all the pairs of walks in the two graphs. The similarity score of a pair of walks is computed by multiplying the similarity scores of their corresponding step pairs. The similarity score of a pair of steps is computed by multiplying the similarity scores of node and edge pairs that make up the step. The similarity score of a node pair is determined by their node labels: if the two node labels are the same, then the pair is assigned a similarity score of one, else it is assigned a similarity score of zero. Also, if the two node labels represent operations with similar mathematical properties (but not identical), then the pair is assigned a similarity score of 0.5. Edge labels decide the value assigned for the similarity score of a pair of edges. In this work we only used one type of edge showing the flow of control between the operations. Thus the similarity score for a pair of edges is always one.
2.3. Predictive Model Creation
The computed random walk kernel values are supplied to a support vector machine with a binary label indicating whether a given function satisfies a given MR or not. The support vector machine uses the provided information to create a model that can predict if a new function would satisfy the considered MR or not. In this study, the SVM implementation from the scikitlearn^{2}^{2}2http://scikitlearn.org/stable/ toolkit was used.
3. Experimental setup
This section describes the code corpus and MRs used in this study. The details of the evaluation procedure are also discussed here.
3.1. The Code Corpus
A total of 55 functions, all of which takes matrices as inputs and produces matrices as outputs, were used to measure the effectiveness of the method described in Section 2 for predicting MRs. These functions were collected from Apache Commons Math Library^{3}^{3}3https://commons.apache.org/proper/commonsmath, which is an open source project. These functions execute a variety of calculations on matrices such as addition, multiplication, subtraction, and searching (e.g. getting column matrix, getting row matrix). There were several functions that performed the same functionality, but they were implemented differently. For example, Array2DRowRealMatrix class and OpenMapRealMatrix class both have multiplication functions for matrices, but they are implemented in different ways. In such cases, both the functions are used in the code corpus. All the functions used in this study can be found via the following URL: https://github.com/MSUSTLab/MRPrediction/tree/master/alldotfiles
Metamorphic Relation  Positive instances  Negative instances 

Permutative  14  41 
Additive  37  18 
Multiplicative  21  34 
3.2. Metamorphic Relations
We manually identified three categories of MRs  Additive, Permutative, and Multiplicative, that are generally applicable to matrix calculations. These three highlevel categories are further divided based on whether the modification is made at the element, row, or column levels. The full categorization of the MRs is shown in Table 1. In this work we only focus on predicting the high level MR category; i.e. Permutative, Additive and Multiplicative.
3.3. Evaluation Procedure
We use train, validation and test method to evaluate the MR prediction effectiveness. Table 2 shows the number of positive and negative instances for each MR; positive indicates that a function satisfies the given MR and negative indicates that the function does not satisfy the given MR. For each MR, we divided the data into three subsets, where each fold contained approximately the same portion of positive and negative instances, as the original dataset. The three folds were named as Train data, Test data, and Validation data. The precomputed kernel values of the functions in Train data were used to create the prediction model. The Validation data was used to select the following parameters for the predictive model:

Regularization parameter of the SVM.

Path weighing factor in the random walk kernel where .
The parameter values selected using the validation set were then used to create the predictive model for predicting the MRs for the test data. We repeated the train, validation and test method ten times so that the functions in each fold is selected randomly each time to avoid any biases occur in fold divisions.
We used the Area Under the receiver operating characteristic Curve (AUC)
(Huang and Ling, 2005)to measure the prediction effectiveness. AUC measures the probability that a randomly chosen negative example will have a lower prediction score than a randomly chosen positive example. AUC does not depend on the discrimination threshold of the classifier and has been shown to be a better measure for comparing learning algorithms
(Huang and Ling, 2005).Metamorphic Relation  Best  Best C 

Permutative  0.9  0.1, 1, 10, 100, 1000 
Additive  0.9  0.1, 1, 10, 100, 1000 
Multiplicative  0.9  0.1, 1, 10, 100, 1000 
4. Results and Discussion
Table 3 lists the and C values that recorded the highest AUC values for each MR on the validation set. For the three MRs considered in this study, the value selected for the parameter doesn’t seem to have a big effect on the prediction accuracy. But for all the three MRs, the best value for is 0.9, indicating that longer paths in the CFGs are more important for predicting these MRs than the other paths.
Figure 2 shows the AUC scores for the validation data set and the test data set. On the test data, the highest AUC score (0.81) could be observed when predicting the Permutative MR. The other two MRs also reported AUC values higher than 0.7 indicating that our approach created effective predictive models for all the three MRs. Further, for all the three MRs, AUC values for the validation data set and the test data set is close. This indicates that there is a low chance of overfitting in the predictive model.
5. Related Work
Several previous studies have looked into automatically generating/predicting MRs. Kanewala et. al showed that, in previously unseen programs, MRs can be predicted using a machine learning method. Features were extracted from CFGs of the functions and they were then used to create a predictive model (Kanewala and Bieman, 2013). Later, they developed the graph kernel based approach used in this study to predict MRs for numerical programs (Kanewala et al., 2016).
Liu et al. introduced a new method called Composition of Metamorphic Relation (CMR), where the generation of new metamorphic relations is done by combining existing metamorphic relations (Liu et al., 2012). A similar study has been done by Dong et. al, where Compositional MR was generated based on the speculative law of proposition logic (Dong et al., 2008).
Zhang et al. suggested a technique, where an algorithm searches for metamorphic relations in the form of linear or quadratic equations (Zhang et al., 2014). Su et al. also suggested a new method called KABU, which can be used to find more likely metamorphic relations by dynamically inferring the properties of the status of a method (Su et al., 2015).
Chen et al. proposed a tool called METRIC, where metamorphic relations were identified based on the categorychoice framework (Yueh Chen et al., 2016). Later, they introduced an approach called DESSERT, where DividEandconquer methodology was used to identify the categorieS, choiceS, and choicE Relations for Test case generation (Chen et al., 2012).
6. Conclusion & future work
The metamorphic testing technique is very useful to test programs that do not have a test oracle. The effectiveness of this technique highly depends on the set of MRs used for testing. But the identification process of MRs is mostly done manually and could be a time consuming process.
This study is an extension of previous work, where the random walk kernel is used to predict MRs for functions that performs matrix calculation. Our results show that for these types of functions, random walk kernel can be effective in predicting MRs.
In the future, we plan to increase the number of functions used in this study. Further, new types of MRs, specifically for functions that perform matrix calculation, can also be considered. We also plan to extend the MR prediction scope beyond the function level.
Acknowledgements.
This work is supported by award number 1656877 from the National Science Foundation. Any Opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect those of the National Science Foundation.References
 (1)
 Chen et al. (2003) T.Y. Chen, T.H. Tse, and Z. Quan Zhou. 2003. Faultbased testing without the need of oracles. Information and Software Technology 45, 1 (2003), 1 – 9. https://doi.org/10.1016/S09505849(02)001295
 Chen et al. (2012) T. Y. Chen, P. L. Poon, S. F. Tang, and T. H. Tse. 2012. DESSERT: a DividEandconquer methodology for identifying categorieS, choiceS, and choicE Relations for Test case generation. IEEE Transactions on Software Engineering 38, 4 (July 2012), 794–809. https://doi.org/10.1109/TSE.2011.69
 Dong et al. (2008) G Dong, Baowen Xu, L Chen, C Nie, and L Wang. 2008. Case studies on testing with compositional metamorphic relations. 24 (12 2008), 437–443.
 E. Allen (1970) Frances E. Allen. 1970. Control flow analysis. 5 (07 1970), 1–19.
 Huang and Ling (2005) Jin Huang and C. X. Ling. 2005. Using AUC and accuracy in evaluating learning algorithms. IEEE Transactions on Knowledge and Data Engineering 17, 3 (March 2005), 299–310. https://doi.org/10.1109/TKDE.2005.50
 Kanewala and Bieman (2013) U. Kanewala and J. M. Bieman. 2013. Using machine learning techniques to detect metamorphic relations for programs without test oracles. In 2013 IEEE 24th International Symposium on Software Reliability Engineering (ISSRE). 1–10. https://doi.org/10.1109/ISSRE.2013.6698899
 Kanewala et al. (2016) Upulee Kanewala, James M. Bieman, and Asa BenHur. 2016. Predicting Metamorphic Relations for Testing Scientific Software: A Machine Learning Approach Using Graph Kernels. Softw. Test. Verif. Reliab. 26, 3 (May 2016), 245–269. https://doi.org/10.1002/stvr.1594
 Liu et al. (2012) H. Liu, X. Liu, and T. Y. Chen. 2012. A New Method for Constructing Metamorphic Relations. In 2012 12th International Conference on Quality Software. 59–68. https://doi.org/10.1109/QSIC.2012.10
 Office (2013) Larry Hardesty MIT News Office. 2013. Explained: Matrices. (Dec 2013). http://news.mit.edu/2013/explainedmatrices1206
 Su et al. (2015) FangHsiang Su, Jonathan Bell, Christian Murphy, and Gail Kaiser. 2015. Dynamic Inference of Likely Metamorphic Properties to Support Differential Testing. In Proceedings of the 10th International Workshop on Automation of Software Test (AST ’15). IEEE Press, Piscataway, NJ, USA, 55–59. http://dl.acm.org/citation.cfm?id=2819261.2819279
 Weyuker (1982) Elaine Weyuker. 1982. On Testing NonTestable Programs. 25 (11 1982).
 Y. Chen et al. (1998) T Y. Chen, S C. Cheung, and Sm Yiu. 1998. Metamorphic testing: a new approach for generating next test cases. (01 1998).
 Yueh Chen et al. (2016) Tsong Yueh Chen, PakLok Poon, and Xiaoyuan Xie. 2016. METRIC: METamorphic Relation Identification based on the Categorychoice framework. 116 (07 2016), 177–190.
 Zhang et al. (2014) Jie Zhang, Junjie Chen, Dan Hao, Yingfei Xiong, Bing Xie, Lu Zhang, and Hong Mei. 2014. Searchbased inference of polynomial metamorphic relations. (09 2014), 701712 pages.