Translating C programs to MSVL programs

08/25/2018 ∙ by Meng Wang, et al. ∙ 0

C language is one of the most popular languages in system programming and applications written in C have been widely used by different industries. In order to improve the safety and reliability of these applications, a runtime verification tool UMC4MSVL based on Modeling, Simulation and Verification Language (MSVL) is employed. To do so, C programs have to be translate into MSVL programs. This paper presents an algorithm to achieve the translation from a C program to an equivalent MSVL program in one-to-one manner. The proposed algorithm has been implemented in a tool called C2M. A case study is given to show how the approach works.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Software systems are now widely used in various fields of our daily life, and many of them are written in C language since it can be used to implement a complex system in a flexible way. In order to improve the safety and reliability of these systems, many researchers focus on verifying them using model checking clarke1994model ; clarke1999model . With conventional model checkers such as NuSMV cimatti2002nusmv and SPIN holzmann1997model , an abstract model has to be manually extracted from a C program and a desired property is specified in LTL pnueli1977temporal or CTL Clarke1986Automatic . Then it is verified whether the abstract model satisfies the property. However, as software systems become larger and larger with increasingly complex internal structures and external interactions, it is difficult to extract models from programs. Also, any errors may cause inconsistencies between the abstract model and original program.

In recent years, verifying safety-critical systems at code level has attracted more attentions. Tools like, SLAM ball2001automatically , BLAST beyer2006the , CPAChecker beyer2011cpachecker and CBMC kroening2014cbmc , support safety property verification which is carried out by inserting assertions into C source code and then checking whether the assertions are violated. Ultimate LTLAutomizer dietsch2015fairness and T2 brockschmidt2016t2 can verify temporal properties by reducing the verification problem to fair termination checking. To do that, the program to be verified written in C needs to be translated into an intermediate form. All these tools suffer from state explosion problem. Further, since there is no execution information, they are not always accurate and sometimes produce false positives (i.e., potential errors may be reported where there are none) and false negatives (i.e., errors are not reported).

As a lightweight verification technique, runtime verification checks whether a certain execution trace satisfies a given property by monitoring the execution of a system. It avoids the state explosion problem. The runtime verification tool Wang2017Full converts temporal property verification as a dynamic program execution task. With this tool, a program to be verified is a Modeling, Simulation and Verification Language (MSVL) duan1996extended ; duan2005temporal ; zhang2016mechanism as a program M and a desired property is specified with a Propositional Projection Temporal Logic (PPTL) duan2014a ; duan2008decision formula . By translating the negation of the desired property into an MSVL program M’, whether M violates can be checked by evaluating whether there exists an acceptable execution of the new MSVL program “M and M’”.

In order to verify C programs using , they have to be rewritten into MSVL programs. However, the syntax and semantics of C programs are relatively complicated. In this paper, the syntax of C language is confined in a suitable subset and an algorithm is proposed to automatically translate C programs to MSVL programs. MSVL statements and data types include all C statements and data types except goto statement and union type. Therefore, restricted C fragments without goto statement and union type can be translated to an equivalent MSVL program in an automatic way. In fact, we can treat all variables are framed as in C, and the translation between two languages is in one-to-one manner. The operational semantics of each statement in C fragments and the related statement in MSVL is equivalent. The time complexity of the translation is linear (), where is the number of statements and type declarations in a C program.

The contributions of this paper are three-fold:

  • We present an algorithm to translate a C program into an MSVL program.

  • We have implemented a translator and experiments have been carried out on real-world C programs.

  • A case study is given to show the application of in practice.

The remainder of this paper is organized as follows. In the next section, the restricted C fragment called Xd-C is briefly introduced and MSVL is introduced in section 3. Section 4 presents an algorithm for translating C programs to their equivalent MSVL programs. Implementation of the proposed approach is presented in Section 5 and a case study is given in Section 6. Section 7 concludes the paper.

2 The Restricted C Fragment: Xd-C

The restricted C fragment called Xd-C is confined in a subset of ANSI-C (C89 standard version). It consists of often-used types, expressions and statements of C language. Xd-C is similar to Clight blazy2009mechanized but more than Clight.

2.1 Types

The supported types in Xd-C include arithmetic types (char, int, float and double in various sizes and signedness), pointer, void pointer, function pointer and struct types. However, union type, static local variables and type qualifiers such as const, restrict and volatile are not allowed in Xd-C. As storage-class specifiers, typedef definitions have been expanded away during parsing and type-checking. The abstract syntax of Xd-C types is given as follows:

Self-defined Types:


where defines a structure consisting of body ; defines a function pointer with each parameter in type and a return value in type or . defines a function pointer with no parameter. Note that (possibly with subscriptions) is a string (name) consisting of characters and digits with a character as its head.

2.2 Expressions

The expression in Xd-C is inductively defined as follows:

where is an arbitrary constant, a variable, the element of array (counting from 0), the element in row and column , the member of structural variable , the member of the structural variable that points to and a function call with arguments . represents the type cast of namely converting the value of to the value with type . is a unary expression including taking the address of , the pointer dereferencing, the positive of , the negative of , ~ the bitwise complement of and the logical negation of . represents an binary operator including arithmetic operators (, , , and ), bitwise operators (, , , and  ), relational operators (, , and ), equality operators ( and ) and logical operators ( and ). is a conditional expression indicating that the result is if is not equal to 0, and otherwise.

The expression that can occur in left-value position can inductively defined as follows:

2.3 Statements

The following are the elementary statements in Xd-C:

A null statement performs no operations. An expression statement () is evaluated as a void expression for its side effects. In simple assignment statement (), the value of replaces the value stored in the location designated by . In conditional statement if(){}else{}, executes if the expression compares unequal to 0, executes otherwise. In switch statement switch(), is the controlling expression, the expression of each case label shall be an integer constant expression. There are three kinds of iteration statements in Xd-C including while loop, do loop and for loop statements. An iteration statement causes the loop body to execute repeatedly until the controlling expression equals 0. In while loop statement while(){}, the evaluation of takes place before each execution of , while in do loop statement do{}while(), the evaluation of takes place after each execution of . In for loop statement for(){}, executes once at the beginning of the first iteration, is the loop condition, executes at the end of each iteration, and is the loop body. Jump statements including continue, break, return and return are supported in Xd-C, but not the goto statement. A continue statement shall appear only in or as a loop body. A break statement terminates execution of the smallest enclosing switch or iteration statement. A return statement appears only in a function.

A Xd-C program is composed of a list of declarations, a list of functions and a main function. It can be defined as follows:

where defines a one dimensional array having elements with type while defines a two dimensional array having elements with type ; defines an initialization of except for ; defines a function with each parameter in type and a return value in type or ; defines a function with no parameter.

Summary: As we can see, some constructs and facilities in ANSI-C (C89) are not supported in Xd-C. In the following, we show a key negative list which Xd-C does not support.

  • goto statement;

  • structure;

  • , , and expressions;

  • () comma statements;

  • compound assignments where ;

  • ; structure assignments;

  • continuous assignments;

  • , , , and storage-class specifiers;

  • and type qualifiers;

  • local variables in a block;

  • nested cases in a switch statement;

  • assignment expressions such as ;

  • function pointers pointing to external functions;

  • functions that accept a variable number of arguments.

In fact, the constructs and facilities in the above negative list except for goto statement can be implemented by Xd-C although the implementation might be tedious. Therefore, Xd-C is a reasonable subset of ANSI-C (C89) in practice.

3 Msvl

MSVL is a Modeling Simulation and Verification Language duan1996extended ; duan2005temporal ; zhang2016mechanism which can be used to both model and execute a system. It is a programming language supporting not only conventional programming but also parallel programming. There are some statements in MSVL that cannot expressed by C, thus a suitable subset of MSVL can support Xd-C well. This section briefly introduces the suitable subset of programming language MSVL which is borrowed from duan1996extended ; duan2008framed ; duan2004framed ; Yang2007Operational .

3.1 Syntax

The arithmetic expression and boolean expression in the subset of MSVL are inductively defined as follows:

where the explanations of , and are the same as explanations of , and in Xd-C, respectively. is a state function call, and an external function call. A state function contains no temporal operators. Note that a unary operation (, ) can be treated as a state function ; a binary operation (, ) can be treated as a state function . An external call of function means we concern only the return value of the function but not the interval over which the function is executed. stands for the value of at the next state and the value of at the previous state. We assume that all variables used are framed. The following are the elementary statements in the subset of MSVL:

where , and are arithmetic expressions and is a boolean expression. The termination statement empty means that the current state is the final state of an interval. skip specifies one unit of time over an interval. The assignment “” indicates that is assigned the value of at the current state, while “” indicates that the value of at the next state equals the current value of and the length of the interval is one unit of time. The conjunction statement “ and ” means that and are executed concurrently. “” means that is executed until its termination from this time point then is executed. y(,…,) and ext y(,…,) are internal and external function calls, respectively. The meanings of other statements are the same as Xd-C. Note that all the above statements are defined by PTL formulas.

In addition, data type in MSVL wang2017msvl is defined the same as in Xd-C. An MSVL program can be defined as follows:

where for structure definition , is used to connect each member of struct ; for function fragment , is the return value; and define functions without a return value.

4 Translation from C to MSVL

In this section, an algorithm for translating a program written in Xd-C to an MSVL program is presented. A Xd-C program is composed of a list of declarations and a list of functions (including main function). Thus, algorithms to translate declarations and functions in Xd-C to MSVL ones are needed.

Algorithm 1 translates a Xd-C declaration to an MSVL declaration. For variable declaration , is translated to an MSVL variable list using . For array initialization , is used to count the number of elements in array and each element is translated to an MSVL expression by . For structure definition , is used to replace .

0:  a declaration
0:  an MSVL fragment
  begin function
   case
     is : return and skip;
     is : return and skip;
     is : return and skip;
     is : return and skip;
     is :
             return and skip;
     is :
             return ;
   end case
  end function
Algorithm 1

Algorithm 2 translates a variable list in Xd-C to a variable list in MSVL. Variable is directly translated to MSVL variable and variable initialization is translated to .

0:  a variable list in Xd-C
0:  a variable list in MSVL
  begin function
   case
     is : return ;
     is : return ;
     is : return ;
   end case
  end function
Algorithm 2

Algorithm 3 shows how to translate each Xd-C expression to an MSVL expression. Constant and variable can directly be translated to and in MSVL while for other expressions such as , and , the sub-expressions , , , , and are translated to their corresponding MSVL expressions. Operations , and are translated to , and , respectively. Expression is translated to MSVL expression .

0:  an expression in Xd-C
0:  an expression in MSVL
  begin function
   case
     is : return ;
     is : return ;
     is : return ;
     is : return ;
     is :return ;
     is : return ;
     is : return ;
     is : return ;
     is : return ;
     is : return ;
     is : return ;
     is : return ;
     is : return ;
     is :
     return ;
   end case
  end function
Algorithm 3

Function fragment can be translated to an MSVL function using Algorithm 4.

0:  a function fragment in Xd-C
0:  an MSVL fragment
  begin function
   case
     is :
            return ;
                       
     is :
            return ;
     is :
            return ;
     is :
            return ;
   end case
  end function
Algorithm 4

Algorithm 5 is presented to translate each statement in Xd-C to an MSVL statement. A null statement is translated to MSVL statement empty. Expression, switch and while loop statements are translated to MSVL statements using Algorithm , and , respectively. The translation of do loop and for loop statements are performed with the help of . Simple assignment statement is translated to MSVL unit assignment statements . A conditional statement is translated to MSVL conditional statement if()then{}else{ }. For a sequential statement , its sub-statements and are translated to MSVL statements using .

0:  an elementary statement in Xd-C
0:  an MSVL statement
  begin function
   case
     is : return empty;
     is : return ;
     is : return ;
     is if(){}else{}:
     return if()then{}else{};
     is switch():
     return ;
     is while(){}: return ;
     is do{}while(): return while;
     is for(){}: return while;
     is continue;: return ;
     is break;: return ;
     is return ;: return and ;
     is return;: return ;
     is : return ;
   end case
  end function
Algorithm 5

Algorithm 6 translates each expression statement to an MSVL statement. For function call expression statement (), we just need to translate to an MSVL expression using Algorithm . Since other expression statements have no side effects, they are translated to empty.

0:  an expression statement in Xd-C
0:  an MSVL statement
  begin function
   case
     is : return ;
    default: return empty;
   end case
  end function
Algorithm 6

Algorithm 7 translate each case of switch statement to an MSVL conditional statement. Algorithm 8 translates while loop statement to an MSVL statement.

0:  switch case statement and controlling expression
0:  an MSVL statement
  begin function
   case
     is default:
     return if()
           then{};
     is case :
     return if(
        )then{};
   end case
  end function
Algorithm 7
0:  a Xd-C while statement :
0:  an MSVL statement
  begin function
   case
     contains no break, return or continue:
     return ;
     is and contains break at the last statement:
     return
        while
         ifthen
     is and contains return at the last statement:
     return while
         ifthen
     is and contains continue at the last statement:
     return
        while
         ifthen;
   end case
  end function
Algorithm 8

5 Implementation

We have implemented the proposed approach in a tool named . The architecture of the tool is shown in Fig. 1. A C program is first preprocessed. In this phase, #includes are removed by merging all C files in a subject into a file according to their invoking relationships. Macro definitions such as #ifdef, #define and #undef are processed using MinGW mingw to generate a C program without them. Then, lexer and parser of C programs based on Parser Generator (PG) which integrates Lex and Yacc are employed to do the lexical analysis and syntax analysis, respectively. After that, a syntax tree of a C program is generated and it is translated to an MSVL program using the algorithms presented before. Finally, post processing adjusts the format of the generated MSVL program and outputs it to a file with a suffix of ”.m”. Since a generated MSVL program may invoke MSVL and C library functions, we build our libraries of C and MSVL functions.

Figure 1: Architecture of

In order to show the usability and scalability of our tool in translating real-world C programs to MSVL programs, we apply on 13 programs from industry whose sizes range from 0.5k to 17k lines as shown in Table 1. In this benchmark, C programs from RERS P14 to RERS P19 are taken from RERS Grey-Box Challenge 2012 howar2012rers (RERS). LTLNFBA ltlnfba is a software for translating an LTL formula to a Büchi automaton. carc Car is a license plate recognition system. The other 5 programs bzip2, mcf, art, gzip and twolf are from SPEC2000 SPEC . The experiments have been carried out on a 64-bit Windows 7 PC with a 4.00GHz Intel(R) Core(TM) i7 processor and 64GB memory.

Table 1 shows the experimental results on the benchmark. The column “Program” represents names of programs and the column “LOC” shows sizes of C programs. The column “LOM” lists sizes of MSVL programs translated from C programs. The column “Time” show the time consumed for accomplishing the translation tasks. Experimental results in Table 1 show that for all the programs, our tool can effectively output the translation results and the size of the generated MSVL programs is about 2.60 times of C Programs.

Program LOC LOM Time(s)
RERS P14 514 2261 0.46
RERS P15 1353 5016 2.04
RERS P16 1304 5271 2.18
RERS P17 2100 7753 4.38
RERS P18 3306 12677 11.81
RERS P19 8079 28332 63.83
LTLNFBA 3296 9113 0.76
carc 2170 4027 0.59
bzip2 2320 4976 0.55
mcf 1322 2124 0.36
art 886 1514 0.28
gzip 3773 8189 0.80
twolf 17452 33114 7.11
Total 47875 124376 95.15
Table 1: Results of C2M on real-world programs

6 Case Study

In this section, an application is presented to illustrate how the translator translate a C program to an MSVL program.

Application SPEC is a compression program to compress and decompress inputs files. We use the function as an example to illustrate our approach. Fig.2 shows the core of , including most kinds of C statements and the generated MSVL program. By using translation algorithms presented before, different kinds of C statements are translated to their equivalent MSVL statements.

  • Function definition statement void can be directly translated to an MSVL function definition statement function .

  • Variable declaration statement unsigned char is translated to unsigned char and skip.

  • Simple assignment statement () is directly translated to an MSVL unit assignment statement .

  • For loop statement for(){...} is translated to while(