Source Code Optimization using Equivalent Mutants

03/26/2018 ∙ by Jorge López, et al. ∙ 0

A mutant is a program obtained by syntactically modifying a program's source code; an equivalent mutant is a mutant, which is functionally equivalent to the original program. Mutants are primarily used in mutation testing, and when deriving a test suite, obtaining an equivalent mutant is considered to be highly negative, although these equivalent mutants could be used for other purposes. We present an approach that considers equivalent mutants valuable, and utilizes them for source code optimization. Source code optimization enhances a program's source code preserving its behavior. We showcase a procedure to achieve source code optimization based on equivalent mutants and discuss proper mutation operators. Experimental evaluation with Java and C programs proves the applicability of the proposed approach. An algorithmic approach for source code optimization using equivalent mutants is proposed. It is showcased that whenever applicable, the approach can outperform traditional compiler optimizations.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Source code optimization is a process which enhances a program’s source code, in order to obtain a functionally equivalent program, i.e., a program which computes the same solution for the same problem but, possesses better non-functional aspects. Traditionally, source code optimization techniques are implemented on compilers dragonbook .

Program mutants are used in mutation testing mujava , a software testing technique whose main idea is to modify the original source code to obtain a mutant that should be later distinguished from the original program by a test case. The program modification is performed using a mutation operator; a mutation operator performs changes to the original source code. When applying a mutation operator, an equivalent program called an equivalent mutant can be obtained. Mutation testing attempts to detect and avoid equivalent mutants mutantimpact . We note that detecting equivalent mutants using compiler optimizations is well established mutdetcomp2 . However, to the best of our knowledge, the first publication where a novel use of equivalent mutants is discussed, appeared recently noveluseofmutants

; the authors show that equivalent mutants can be used for static anomaly detection, e.g., to detect if the mutated code possesses better readability, better execution time, etc. However, the authors do not study nor outline a procedure where mutation operators are used for source code optimization.

Equivalent mutants can provide an optimized source code in terms of its (program/binary) execution time and other aspects. However, to effectively use the software mutation technique for source code optimization, several questions should be addressed: what are the mutation operators which can provide such optimizations? how to apply such mutation operators for optimization purposes? what is the benefit of the mutation-based source code optimization compared to traditional source code optimization? This paper is devoted to answer these questions; further, we perform preliminary experiments with a mutation software, Java mujava , which showcase the applicability and effectiveness of the proposed approach (Section 3).

2 Equivalent Mutants for Source Code Optimization

Given a (computer) program , we denote its associated source code. is obtained from through a proper compilation process, i.e., a function that maps a program’s source code (a string over a particular programming language alphabet ) into a binary (or executable) code, i.e., . We denote the set of all possible inputs for as ; correspondingly, is the set of all possible outputs of . An input sequence is denoted as ; correspondingly, an output sequence is the program’s output response to this sequence, denoted as 222We assume that the program is deterministic and, therefore, such output is unique.. We consider a program’s running time under a given input sequence , in a common and predefined architecture, measured in milliseconds () and denoted as . Correspondingly, we denote the overall running time of a program with respect to a set of input sequences as .

A program is -equivalent to (written ) if . We focus on program -equivalence due to the fact that in the general case, the problem of checking the equivalence of two arbitrary programs is undecidable. However, in some cases, equivalence with respect to a finite set of inputs implies complete functional equivalence when having a behavior model mbt1 . Furthermore, many programs are used only within a context, receiving only a subset of possible (defined) inputs, or the program is only developed for a subset of inputs. Likewise, it is well-known that regression tests (a finite subset of the program inputs) are becoming an industry standard, and they somehow guarantee that a new version (including an optimized one) behaves as required.

A source code optimization process is a function which receives a source code and produces a new (optimized) source code . The obtained source code compiles to a functionally equivalent program with respect to an input set , i.e., . As it is not possible to derive an algorithmic approach to compute the time complexity of a program, optimality is considered with respect to the overall running time of a program, i.e, .

Arcaini et al. noveluseofmutants showcased that mutants can be better than the original source code, including the case when the mutated source code has better time complexity than the original one. However, no discussion was performed on how equivalent mutants can be exploited. Therefore, the problem stated and solved in this paper is as follows: how can source code optimization be forced by the use of source code mutation? It is important to highlight that the use of source code mutants to enhance the source code’s non-functional properties is limited in the literature; for a comprehensive survey on the subject the interested reader can refer to mutatnssurvey .

We assume that there exist certain mutation operators which are more likely to provide source code optimization due to their nature. Operators as statement deletion can optimize the source code by performing a dead code elimination, arithmetic operator replacement can optimize the source code by performing operators’ strength reduction, etc. dragonbook . Nevertheless, compiler optimizations are likely to be more effective while performed on target by a compiler. Therefore, the question arises: are there any mutation operators that can produce source code optimizations which are different from the known compiler optimizations? Indeed, we collected the following set of mutation operators based on the method-level mutation operators of Java mujava :

  • Relational Operator Replacement (ROR): replaces relational operators with others, e.g., >= with >. In certain cases, avoiding to execute the code when the condition reaches equality can enhance the performance (as shown in noveluseofmutants ), for example, when searching for the maximum number within an array as shown in the following code snippet (hereafter denotes the difference/replacement, i.e., the obtained mutant).

    for (int i = 0; i < arr.length(); i++)
       if (arr[i] >= max)
       if (arr[i]  max)
          max = arr[i];
  • Shortcut Assignment Operator Replacement (ASR): replaces shortcut assignment operators with other shortcut assignment operators, e.g., += with *=. In certain cases, advancing faster in the progression can avoid the execution of loop cycles, for example, when working over the powers of a given number as shown in the following code snippet.

       for(int i = 1; i <= N; i+=3)
       for(int i = 1; i <= N; i*=3)
          if(i > 0 && 1162261467 % i == 0)
             //If-body
  • Arithmetic Operator Replacement (AOR): replaces arithmetic operators with others, e.g., from + to *; similar to ASR, AOR can help advancing faster in the progressions.

We are interested in the set of mutation operators that perform different optimizations from traditional compiler optimizations, and can be applicable to different programming languages. Let be the set of mutation operators of interest. This set can be always extended by adding other mutation operators that can also perform compiler optimizations. We aim at limiting the mutation operators to be considered in order to avoid deriving mutants that do not optimize the source code. Indeed, executing all mutants against the set of inputs may take a very long time. However, we note that even if the optimization process takes more time than executing the original program once, the time investment can be worthy for widespread programs which may be executed in millions of devices, or systems for which critical components are executed millions of times. Furthermore, selecting the critical parts of the code to be optimized can aid to reduce the complexity of this approach.

input : 
output : An optimized source code
foreach  do
      
      
foreach  do
      
       foreach  do
            
             if // the program does not compile
             then
                  goto end_loop
            
             true
             foreach  do
                  
                   if ! then
                        goto end_loop
                  
                  
            if  then
                  
                  
                  
                  
            end_loop:
            
      
Algorithm 1 Code optimization using equivalent mutants

We propose Algorithm 1 for source code optimization using equivalent mutants. Hereafter, denotes a mutation function which takes the mutation operator and the source code to mutate as parameters, and produces a set of mutants of the corresponding type. The resulting optimizations depend on the set of inputs on which the program is stimulated.

Algorithm 1 returns a source code which compiles to a program that is -equivalent to the initial one. Therefore, for assuring the program equivalence, one can derive a set of inputs as a complete/exhaustive test suite which guarantees that the original and optimized programs have the same behavior mbt1 . In fact, the more precise this set is constructed, the higher is the guarantee of the equivalence between the optimized and the original programs.

3 Preliminary Experimental Results

As a simple case study, we chose the source code of an intricate Java function which given a binary string, returns its integer value. Note that in this source code there is no verification that the string is indeed binary, however, we do not focus on such enhancements. The source code is shown below.

static int b2tob10 (String binary) {
    String bin = new StringBuilder(binary).reverse().toString();
    int size = bin.length();
    if(bin.length() == 0)
        return 0;
    int pos = 1, i = 2, number = 0, count, aux;
    number += Integer.parseInt(bin.substring(0,1));
    while (i <= 1 << size - 1){
        aux = i;
        count = 0;
        while(aux > 0){
            count++;
            aux = aux & (aux - 1);
        }
        if (count > 1) {
            i +=2;
            continue;
        }
        number += Integer.parseInt(bin.substring(pos, ++pos)) * i;
        i+= 2;
    }
    return number;
}

When performing experiments, Algorithm 1 has been executed with the following parameters:

  • Java as the mutation function,

  • the source code as shown above,

  • as the mutation operator set,

  • as the set of inputs (test suite).

The obtained set of optimized source code contains a mutant of interest, identified in Java as ASRS_18 which is the replacement of an assignment operator (ASR), i.e., the statement i+=2 with i*=2. The obtained mutant is -equivalent to the original program as it outputs . At the same time, its overall execution time is , strictly less than the original program’s overall execution time of . As it can be seen, the performance enhancement obtained by the showcased approach is significant. Furthermore, despite the fact that is not a complete test suite, one can assure that the obtained mutant is equivalent to the initial program. Indeed, the variable number (the return value of the function) gets updated only when the continue instruction is not executed. The continue instruction is not executed under the condition that there exists more than one ‘1’ in the binary representation of the iterator i. Any binary string with only one ‘1’ represents a power of two; that implies that the next time the condition does not execute the continue instruction occurs when the iterator i equals i*2.

As the Java compiler (javac) and virtual machine (JVM) perform static and dynamic optimizations, the previously presented optimization outperforms the optimizations performed by both, javac and JVM. However, in order to compare this approach to traditional compiler optimizations, we translated the example to standard C code. The overall running time of the program obtained from compiling the original C source code without optimizations was . The program obtained by compiling the program with the highest Gnu Compiler Collection (gcc) optimizations (gcc -O3) had an overall running time of . The program obtained by compiling the mutant without any compiler optimizations had an overall running time of . As it can be seen, the provided optimizations outperform the traditional ones. The main reason behind this improvement is that the optimizations obtained using equivalent mutants affect the semantics of the source code, differently from compiler optimizations.

4 Conclusion

We presented an approach for source code optimization using equivalent mutants. Preliminary experimental results show that the presented approach can outperform the traditional compiler optimizations, whenever the approach is applicable. Many directions are left open for future work and perhaps the most important of them is the study of the applicability of the approach, by performing a thorough experimental evaluation. Other interesting directions include studying other types of source code optimization together with the extended list of mutation operators and the exploration of symbolic model checking for the efficient verification of equivalent mutants.

References

References

  • (1) A. V. Aho, M. S. Lam, R. Sethi, J. D. Ullman, Compilers: Principles, Techniques, and Tools (2Nd Edition), Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA, 2006.
  • (2) Y.-S. Ma, J. Offutt, Y. R. Kwon, Mujava: An automated class mutation system: Research articles, Softw. Test. Verif. Reliab. 15 (2) (2005) 97–133.
  • (3) B. J. M. Grün, D. Schuler, A. Zeller, The impact of equivalent mutants, in: 2009 International Conference on Software Testing, Verification, and Validation Workshops, 2009, pp. 192–199.
  • (4) M. Papadakis, Y. Jia, M. Harman, Y. L. Traon, Trivial compiler equivalence: A large scale empirical study of a simple, fast and effective equivalent mutant detection technique, in: 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering, Vol. 1, 2015, pp. 936–946.
  • (5) P. Arcaini, A. Gargantini, E. Riccobene, P. Vavassori, A novel use of equivalent mutants for static anomaly detection in software artifacts, Information and Software Technology 81 (2017) 52 – 64.
  • (6) R. Dorofeeva, K. El-Fakih, S. Maag, A. R. Cavalli, N. Yevtushenko, Fsm-based conformance testing methods: A survey annotated with experimental evaluation, Information & Software Technology 52 (12) (2010) 1286–1297.
  • (7) M. Papadakis, M. Kintis, J. Zhang, Y. Jia, Y. L. Traon, M. Harman, Mutation testing advances: An analysis and survey, Advances in Computers, Elsevier, 2018. doi:https://doi.org/10.1016/bs.adcom.2018.03.015.