## 1 Introduction

Today in software systems development After a period of development, it may be necessary to make changes to previously developed components. These changes may affect other parts of the software. These effects may cause new system malfunctions. To prevent these unwanted disorders, a type of software test called regression test is used.

Regression testing uses existing test cases to check whether the changes made did not cause a new error or had unwanted side effects. In other words, the goal is to ensure that parts of a system that have not changed continue to work as they did before the changes.[5]

All of them successfully commented on the correct operation of the system. Obtaining this set of test cases will not be so easy, so that the integer linear programming methods are used today to find this set of test cases.

[2]We are trying to obtain this set of test cases by clustering by singular value decomposition on a matrix of functions used in the system that are close in time changes. This information will be easily obtained with the help of source control tools that are used in most software systems companies today. Due to the high volume of functions in commercial software systems, we used a parallel approach during this clustering to enable this approach to be used in large commercial systems in the shortest possible time.

In the second part of this research, the methods used are introduced and in the third part of these methods to perform data clustering and then extract this data from a software system in parallel and find a set of test cases that should be used for A specific change vector to be retested has been used, in the fourth section the experimental results of this research and the fifth section for the final conclusion.

## 2 Related Works

This section deals with the related work done in the field of selecting regression test cases, as well as source code clustering and parsing of individual values.

### 2.1 Selection of regression test cases

Regression testing is the process of re-testing software, which is modified. Large systems usually have large regression test sets. In a software system, small changes in one part of a system often cause problems in other parts of the system.[1]

Regression testing is used to find these types of problems. The purpose of regression test selection techniques is to isolate tests that may detect errors after the system has been modified. The simplest regression test selection technique is a retest strategy in which the entire set of test cases is examined.[4]

### 2.2 Source code clustering

Nowadays, source code clustering has become especially important for its application in plagiarism detection systems, including the effort (Duracik et al.) To try to cluster the source code of a system line by line. They had software based on the kmeans clustering algorithm, which then, taking an incremental approach to this clustering, completed the search by vector source code. In this research, we have developed clustering of source code changes based on function changes.[3]

### 2.3 Parallel singular value decomposition

In recent years, several parallelizations have been performed on the singular value decomposition algorithm. In an attempt (Xiao et al.), We have witnessed the parallelization of the mentioned algorithm based on Jacobi block parallelization.[6]

In this study, in addition to the algorithm parallelization singular value decomposition, we developed the extraction of information of our clusters in parallel. This parallel development is much more suitable for use by commercial software companies’ infrastructures, because large commercial companies with stronger hardware infrastructures are generally developing software systems that are in great need of this type of test.

## 3 The Proposed Method

This research presents an algorithm that based on changes in functions in a software system source code, can identify a set of functions that have a greater impact on each other and by singular value decomposition, a clustering is achieved that through this clustering Can produce a more concise set of regression test cases. All of the above algorithms have been developed in parallel on shared memory systems to increase performance. The above algorithm 1 as follows.

### 3.1 Preliminary data

In resource control systems, all changes to the source code of a software system are recorded during development by the software system.

These changes can be seen in the form of a history at different levels such as files and functions or even changed lines.

We observed these changes at the function level and limited the execution of system test cases to functions. This was done for two reasons, the first of which can be stated that if we examine the changes at the file level, it may be due to The large number of lines of source code in each file is not accurate, and the second reason is that we did not limit the source code changes to the level of the changed lines because the dimensions of our change matrices increased dramatically, making them very thin in problem. Been. For these reasons, it was decided to examine the source code changes at the level of function changes.

To prepare the required data, we create a square matrix of the order of the number of functions under study, each of which represents the existence of changes between two functions with each other through this source control system. This matrix is called .

This relationship is two-way, meaning that if changes to the i and j functions are made, both the junctions associated with (j, i) and (i, j (F) must be set, so the matrix is symmetric. The dimensions of will be that M is the number of functions considered.

Another matrix that is required is the matrix, each of which represents the relationship between each test case and the functions of that software system. This matrix is not necessarily symmetric because the number of test cases will not necessarily be the same as the number of functions and a test case may be related to a number of functions or vice versa. This matrix will also have dimensions where represents the number of test cases.

### 3.2 Parallel clustering by singular value decomposition

As mentioned, the matrix , due to its symmetry and positive definite, has the singular value decomposition with the same values as and .

After performing the decomposition of singular values on the matrix, the optimal output is three matrices , , .

(1) |

After singular value decomposition and running the two loops in parallel in the algorithm to produce the desired clusters, we will obtain a matrix that will be in dimensions and so on. To prevent values from being written simultaneously by two parallel executable units (such as a thread or process), we used critical section to prevent invalid values from being written in one thread of the R matrix.

(2) |

Each column of this matrix, or in other words, each cluster of this set, represents functions that have undergone more changes in relation to each other. Setting this limit of dependence between functions depends on setting the threshold value in the algorithm and is directly related to it, so that as we increase the threshold, we will have clusters that have elements with more dependencies.

### 3.3 Generate reduced sets of test cases

After obtaining the matrix with the help of the matrix, a kind of relationship should be established between the regression test cases and these resulting clusters. So by transposing the multiplication of those two matrices, we get a matrix that will show in a clustered manner the test cases that must be examined for each function change. This resulting matrix is called .

(3) | |||

Therefore, for each change vector, has dimensions of , which only represent the changed functions. By applying the multiplication operation to the matrix transcript, the set of test cases that should be re-examined in this version of the software can be determined.

(4) |

The resulting vector represents the set of test cases that should be reconsidered in return for these changes. This result indicates that test cases 5 and 6 have a higher priority for re-implementation than other test cases.

## 4 Experimental results

In this section, we examined what challenges we would face if we examined changes in system functions instead of changing the files of a software system, in order to increase accuracy.

In examining the changes of functions together, we will face much larger dimensions of the input matrices, because the number of functions of a system is much larger than the number of files of that system, this approach creates a computational challenge for the developers of a software system. For this reason, we presented a parallel approach to the above algorithm instead of a sequential approach. In this section, we try to compare the two and analyze the points that are more suitable for parallel execution of the algorithm. To perform our test, we used a shared memory multiprocessor system with four third-generation processors clocked at 9.2 GHz by Intel, along with implementing two parallel and sequential versions of the above algorithm.

We use source control systems to collect data related to software system changes. These systems are able to record software product changes at any time for any changes made by any member of the software development team.

As you can see in Table 1, for matrix dimensions less than 20 parallel methods have a longer execution time than the sequential method, but for dimensions larger than this value the parallel method has a lower execution speed. Because our research has matrices with very high input dimensions due to the study of changes in functions with each other, we propose the use of a parallel approach.

Dimention | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 |
---|---|---|---|---|---|---|---|---|---|---|---|

Parallel | 9.27 | 15.55 | 30.25 | 8.71 | 77.08 | 113.32 | 135.53 | 48.00 | 42.91 | 113.80 | 58.32 |

Sequential | 7.81 | 15.35 | 27.84 | 7.88 | 79.98 | 121.07 | 137.53 | 50.45 | 47.84 | 118.35 | 59.77 |

According to Figure 1, this parallelization has been performed on 11 different dimensions of matrices resulting from changes in a software system, and the resulting disturbances are due to the amount of thinness of each matrix. There is no comparison between the two parallel and sequential approaches and the result of these two approaches. These changes are due to the fact that in one version of the software there were fewer changes than other versions.

## 5 Conclusion

In the approach introduced by (Sherriff et al.) Due to the study of file changes and the consequent low size of the problem, there was no need for parallelism.[4] But in the approach introduced by us due to increased accuracy and review of modified functions In the files, we were faced with an increase in the size of the problem that we were able to overcome this computational challenge with the parallel approach presented. as a result; This research, which has been developed using a parallel approach to source code clustering based on changes in functions, has a much higher operational accuracy than research based on changes in source code files.

## References

- [1] P. Ammann, J. Offutt, Introduction to Software Testing, Cambridge University Press, Cambridge, 2017.
- [2] C. Chi-Lun ,H. Chin-Yu ,C. Chang-Yu ,C. Kai-Wen and L. Chen-Hua, Analysis and assessment of weighted combinatorial criterion for test suite reduction, Quality and Reliability Engineering International, (2021), 1–31.
- [3] M. Duracik,E. Krsak and P. Hrkut, Searching source code fragments using incremental clustering, Concurrency and Computation Practice and Experience, 32 (2020), No. 13
- [4] M. Sherriff, M. Lake and L. Williams, Prioritization of Regression Tests using Singular Value Decomposition with Empirical Change Records, The 18th IEEE International Symposium on Software Reliability (ISSRE ’07), (2007).
- [5] A. Spillner, T. Linz,Software Testing Foundations A Study Guide for the Certified Tester Exam, Rocky Nook, 2021.
- [6] P. Xiao, Z. Wang and S. Rajasekaran, Novel Speedup Techniques for Parallel Singular Value Decomposition, 20th International Conference on High Performance Computing and Communications, 2018.