Frama-C [11, 10] is a framework for static and dynamic analysis of C programs. It offers a common infrastructure shared by various plugins that implement specific analyses, as well as a behavioral specification language named ACSL . Developing such a platform is a difficult and time-consuming task. As most existing Frama-C plugins do not support concurrent C code, extending the current platform to handle it is an interesting and promising work direction.
Motivated by an earlier case study on deductive verification of an operating system component , we have proposed a new plugin, named conc2seq , that allows Frama-C to deal with concurrent programs. In order to leverage the existing plugins, we designed conc2seq as a code transformation tool. For sequentially consistent programs , a concurrent program can be simulated by a sequential program that produces all interleavings of its threads.
To ensure that the proofs and analyses conducted using conc2seq are correct, we need to assure that the transformation preserves the semantics of programs. The contribution of this paper presents the proof of correctness of the code transformation principle used in conc2seq.
The verification of the transformation is done for simplified languages that capture the interesting property with respect to validity, in particular memory accesses and basic data and control structures (both sequential and parallel). We formalize the source (parallel) language as well as the target (sequential) language and formally define the transformation on these languages.
In these languages, we do not consider all control structures of the C language but only simple conditionals and loops (goto and switch are not part of the considered languages). The C assignments are decomposed into three simpler constructs: local assignments that do not incur access to the global memory, reading of the global memory (one location at a time), and writing into the global memory (one location at a time). An expression can only be composed of constants, basic operations and local variables. Procedure calls are allowed but recursion is not. There is no dynamic memory allocation.
In the remaining of this report, we present first the considered source and target languages as well as their formal semantics (Section 2). Then we describe the transformation (Section 3). Section 4 is devoted to the equivalence relation between states of the source program and states of the transformed program, and its use for the proof of correctness of the proposed transformation. We discuss an ongoing effort to mechanize the formalization and proof with the interactive theorem prover Coq in Section 5. Finally, we position our contribution with respect to the literature in Section 6 and conclude in Section 7.
2 Considered Languages
2.1 Syntax and Program Definition
We consider an enumerable set of memory locations . We do not support dynamic memory allocation: the memory locations manipulated by a program are thus known before the beginning of the execution. A size is associated to each allocated location, i.e. the number of values that can be stored at this memory location. A location can be seen as an array in C whose first element is and whose address is .
The set of values that can be assigned to variables is written and is the union of memory locations (), integers () and booleans (). We assume that different values of the language take the same amount of memory.
We write for the set of local variables. In the remaining of the paper, for a set whose elements are written , is the set of finite sequences of elements of and will denote an element of , i.e. a sequence of elements of . Expressions are defined as follows:
We do not define the set of operators here: it is a usual set of arithmetic and boolean operations. It is however necessary to emphasize that these operators do not allow pointer arithmetic. The only provided operation on memory locations is comparison. Expressions cannot produce side-effects. In the remaining of the paper, expressions will be denoted by and variants.
A sequential program is defined as a sequence of procedures, by convention the first one being the main procedure. A procedure is defined by its name, its parameters (that are a subset of local variables) and the sequence of instructions that form its body:
where is the set of valid procedure names. , , and names built from are all reserved names. is the set of instruction lists, i.e. program code.
The language includes the usual primitives in a small imperative language: sequence of instructions (we will write instead of ), conditionals, loops. Assignment is decomposed into three distinct cases: assignment of a local variable with the value of an expression, writing the value of an expression to the heap, and reading a value from the heap to a local variable. Expressions cannot contain reads from memory, nor procedure calls. A C assignment containing several accesses to the heap should therefore be decomposed into several reads into local variables and an assignment of an expression to a local variable, and finally, if necessary, a write to the heap from a local variable. Procedures can be called using the classical syntax where is the list of expressions passed in arguments. Arguments are passed by value.
A sequential program is fully defined by:
the list of its procedures (the main one taking no parameter),
a list of allocated memory locations with their associated sizes (positive numbers).
A parallel program can be executed by any strictly positive number of threads. There is no dynamic creation of threads. During the execution of a parallel program the number of threads remains constant, given by a specific parameter of each execution. Let denote this static number of threads.
is the set of thread identifiers. We identify with seen as subset of . An element of is thus a value for both languages. A parallel program can use any of the sequential program constructs. In addition, it can contain the instruction that allows to run a sequence of instructions atomically. In such a code section, no thread, other than the one that initiated the execution of the atomic block, can be executed.
A parallel program is fully defined by:
the list of its procedures,
a list of allocated memory locations in the shared memory with their associated sizes,
a mapping from thread identifiers to defined procedure names, defining the main procedure of each thread.
For a program (either sequential or parallel), denotes the allocated memory of the program. This association list is also considered as a function, therefore denotes the size allocated for memory location , if defined. denotes the sequence of procedures of the program. For a parallel program is the mapping from to , and for a sequential program is the main procedure name. For a name and a program , denotes the body of the procedure named in the program . If it is clear from the context may be omitted.
Comparison with the concurrent C of the Frama-C plugin.
For sequential programs, the simplifications with respect to the subset of C handled by our conc2seq plugin are essentially that we do not support pointer arithmetic, the expressions containing several memory reads or procedure calls should be decomposed, and we support only the “most structured” control structures. The typing is also very basic: variables and heap locations accept any type of values (integers, booleans, memory locations) and the type of expressions is checked dynamically by the semantic rules if necessary (for example the expression that is a condition of a loop or conditional should evaluate to a boolean value).
In C11, sequentially consistent concurrent atomic operations are often described by an equivalent sequential C program that is supposed to be atomically executed. In our Frama-C plugin, such operations are specified using ACSL and their calls placed into atomic sections. In the small imperative parallel language presented above, we could use the same technique: implement atomic operations as their sequential counterparts and put their calls into atomic blocks. For example, we illustrate the atomic transfer of the value of an global variable to another one in Figure 1. It is composed of two instructions that are executed in a single atomic step. The resulting simulating code will be commented later.
In our case studies, the concurrent C programs do not need to know the number of threads, and actually do not depend on the number of threads except for one specific feature: global variables that are thread local. This kind of variables are in shared memory, but each thread has its own independent copy. This is particularly useful to have thread dedicated copies of global variables such as errno. In this case, in our memory model it would mean that the number of memory locations called errno would be dependent on the number of threads. The set of allocated memory locations does not depend on the number of threads.
If we want to model a procedure f that uses a thread local variable tlv we can define in our parallel language a procedure that takes an additional argument and use, for each thread, a different main procedure calling with a specific allocated memory location passed to argument .
However the set of allocated memory locations (as well as the number of different main procedures) is not dependent on the number of running threads. We can then imagine to have a kind of extended parallel language which could contain symbolic names for thread local variables and a pre-processor that, for a specific value of , would generate programs of the proposed parallel language (generating as many memory locations and main procedures as necessary). As the transformation presented in Section 3 from the proposed parallel language to the proposed sequential language also depends on , we do not consider this aspect to be a limitation of our modelling approach. These modelling choices allow to keep both languages simple and representative.
For a sequential program, or a thread, the local environment is a partial function from local variables to values: . The set of local environments is written . denotes the empty environment, i.e. the function undefined everywhere.
For both the sequential and the parallel languages, a heap is a partial function from memory locations that returns a partial function from indices to values, thus essentially defining an array indexed from . is the set of heaps. For a defined memory location, the associated partial function is defined continuously for indices from to a fixed size.
A local execution context is composed of the name of the procedure being executed, a local environment and the code that remains to execute. The set of local execution contexts is . A call stack is defined as a sequence (stack) of local execution contexts: .
The states of sequential and parallel programs are respectively:
For a parallel state , we denote by the first component of the state, i.e. the mapping from thread identifiers to stacks of local execution contexts. We omit the index when it is clear from the context.
Initial contexts and states.
The initial execution stack is for a sequential program. For a parallel program, the initial context of a thread is . For a sequential program, an initial state is thus: . For a parallel program, an initial state is where .
An initial heap should satisfy the memory allocation defined by a sequential program, i.e. if then is defined for all . In addition, the values contained in such a memory location cannot be themselves memory locations (but they can be any other values). The same constraints hold for an initial heap of a parallel program.
Final states and safe execution
The final state of a sequential program is such that and the final state of a parallel program is such that with .
We define a blocking state as a non final state reached from an initial state such that no semantic rule can make the execution progress. A safe program is a program that does not reach a blocking state from any initial state. In particular, a safe program can have non-terminating executions.
The sequential programs produce 5 basic actions: silent action, procedure call, procedure return, memory reading, memory writing. For parallel programs, the atomic block structure requires to have an action list as a possible action:
Execution traces are action lists for sequential programs and lists of events, i.e. pairs of thread identifier and action, for parallel programs.
The operational semantics of sequential programs is defined in Figure 2 (rules for loops and conditionals are omitted, see ). A judgement of the sequential semantics has the following form: , meaning that a new state is reached from the state and this execution step produces an action . is a program definition. We write for the reflexive and transitive closure of the relation defined by the inference system of Figure 2.
We use the following notations: is the concatenation of two sequences/lists. To add an element on top (i.e. on the left) of a sequence, we use the separator “” for sequences of instructions, and the separator “” for sequences of local contexts (stacks). is the length of the sequence . We write to denote that is an element of the sequence , and by abuse of notation, that is a component of a tuple in the list of tuples . is the function such that and for all elements different from , we have . For two sequences and of equal length, we write instead of . Thus denotes an update of variable with value in environment while denotes an update at offset of memory location with value in heap . When it is the empty environment that is updated, we omit it.
corresponds to the evaluation of expression in local environment . We omit the definition of this evaluation that is classic. For example for a variable , .
This semantics is rather usual, but condition in rule forbids recursive procedure calls. Moreover there is a special procedure call: . This is the only non-deterministic rule of the sequential language. It selects randomly a value between and (excluded), such that is a memory location which is defined at index and contains a value different from (reserved for terminated threads). The memory location is updated with this value . This procedure call will be used in the simulation to model the change of current thread. Note that this procedure is not supposed to be called in parallel programs.
Figure 3 presents the semantics of parallel programs. A judgement of this semantics have the following form: , where we recall that is a strictly positive number of threads.
A thread is selected such that and has code to execute. If the first instruction of is not an atomic block, then the state is reduced using the semantics of the sequential language. In this case the whole shared heap is given as the heap of the sequential reduction. The action of the sequential reduction is combined to the thread identifier to form the event of the parallel reduction.
If the first instruction of is an atomic block, then we use the sequential semantics to reduce the whole block. As we reduce the whole instruction sequence without allowing for a change of thread, the execution of this sequence is indeed atomic. The nesting of atomic blocks is not allowed: our semantics would be stuck in this case.
3 Program Transformation
Let us consider a parallel program . The memory of the simulating sequential program contains: , a fresh memory location of size , a fresh memory location of size , for each procedure a fresh memory location of size (with ). will be shared by the threads. The array contains for each thread identifier (therefore at index ) the simulation of the program counter of the thread identified by , while contains the identifier of the current running thread. is used to manage the return of calls to in the simulating code.
The three assignment instructions are supposed to be atomic. For loops and conditionals, the evaluation of the condition is supposed to be atomic. The transformation essentially translates each atomic instruction of each procedure of the parallel program into one procedure of the simulating sequential program. This procedure has a parameter that is supposed to be the identifier of the active thread running the instruction. In the remaining of the paper, variables written is - are fresh variables not used in the input parallel program, but that we need to implement in the simulating sequential program, such as .
We assume that the input parallel program is labeled: each instruction is labeled by two values of ( is a label that indicated termination), such that the first one, denoted , is a unique label in the program definition, and the second one, denoted , is the label of the instruction that follows the current instruction in the program text (for example the label of the next instruction of a conditional is the instruction that follows the conditional, not the label of one of the branches). We write for such a labeled instruction. One important point is that the label of the last instruction of each procedure is a label distinct from all the labels in the program. is a function that returns the label of the first instruction of the body of procedure . returns the label of the last instruction of the procedure body. If the body is empty, both functions returns a label distinct from all other labels in the program.
For each local variable of the program (uniquely identified by the name of the procedure in which it appears and its name ), including procedure formal parameters, we need a fresh memory location of allocated size (we omit in the remaining of the paper), so that each simulated thread has a copy of what was a local variable in the parallel program.
We detail how the transformation proceeds on an example instruction: . This instruction will be transformed into a procedure named with parameter (we assume a coercion from to , and we omit it most of the time). is simulated by the array . As reads from the heap are not allowed in expressions, in the simulated code we first need to read the value from . We write this sequence of instructions defined as . Note that after this sequence of instructions, variable is defined, therefore the original expression can be used as is. The original assignment however should be translated too as is simulated by an array . We translate it to: . Finally we update the program counter of the running thread, so the full translation of the instruction is:
The generalization to an arbitrary is just that we “load” all the variables of before using . Reading from the heap and writing to the heap are translated in a very similar way. Figure 1 provides a more complex example with the simulating code of the atomic memory transfer.
Both conditional and loops are translated into a procedure that evaluates the condition and then updates the program counter to the appropriate label. For example, if the condition of a conditional is true then the program counter is updated to the label of the first instruction of the “then” branch of the original conditional, if this branch is non-empty, otherwise the label used is the label of the instruction that follows the original conditional.
Each procedure call is translated into one procedure that passes the values to parameters and updates the program counter to the first instruction of the body original procedure (label for a call to ). Also for each procedure we generate an additional procedure, named , that manages the return of calls to . This procedure should be able to update the program counter to the instruction that follows the call. To be able to do so for any call, this return procedure should use a label previously stored at memory location by the generated procedure that prepares the call:
One procedure is generated for each atomic block. Each instruction in the block is generated in a similar way as previously described but no update to the program counter is done, conditionals and loops keep their structure and their blocks are recursively translated in the atomic fashion. Procedure calls are inlined and the body of the called procedure is translated in the atomic fashion. It is necessary that procedures are not recursive for this inlining transformation to terminate.
Finally the main procedure of the simulating sequential program, named , is generated (Figure 4). It has basically two parts: in the first part (denoted by ) each program counter is updated to the identifier of the first instruction of the main procedure of the considered thread. places the value at location to to stop the execution when the main procedure ends. also initializes the local variable , that indicates if all threads are terminated, to . We suppose that there is at least one thread with a main procedure to execute. If it were not the case, we would initialize it to . The second part is the main simulating loop: if there are still threads to run, a thread identifier of an active thread is chosen (call to , instruction named ), then the value of the program counter for this thread is read and a switch (it is implemented as nested conditionals, we use it here for the ease of presentation) calls the appropriate procedure named (sequence of instructions named ). The body of this loop ends by updating the flag that indicates if there are still running threads (sequence of instructions named ).
To state the correctness theorem, we need two notions of equivalence: state equivalence, relating states of the input parallel program and states of the simulating sequential program, and trace equivalence that relates traces generated by the input parallel program to traces generated by the simulating sequential program.
4.1 Equivalence of states and traces
We note the sequential program state of the simulation of a safe parallel program in a state . In , we distinguish two disjoint parts that replicates and the addresses that simulate the local variables of . This second part also includes , , and the addresses . The syntax allows to partially apply for the thread to select the part that simulates . So the function is . We define state equivalence as follows:
(1) expresses the fact that the original heap should be a sub-part of the simulating heap. For each thread , (2) relates the content of every local variable of by the content of the global array in that simulates it.
Program counters must be correctly modeled, (3a) and (3b) express that each program counter must point on the next instruction to execute by thread if any (3a), if not (3b). Call stacks must be correctly modeled by (4). We refer to  for the formal definitions of next, that returns the label of the next instruction to execute in a non-empty local execution context, and wf_stack, that relates the call stacks of the parallel state with the labels at memory locations . Finally in condition (5), the equivalence is defined for simulating program states such that the next step to perform is the evaluation of the condition of the loop since the simulation of an instruction is the execution of this evaluation followed by the body of the loop.
The equivalence of traces is defined on filtered lists of actions generated by the semantics. In the simulating program executions, we ignore -actions and memory operations in . We ignore all call to and return from simulating procedures except for calls to , and procedures that simulate the start of a call and the return of a call.
4.2 Correctness of the simulation
Theorem 1 (Correct simulation).
Let be a safe parallel program, its simulating program, (resp. ) an initial state of (resp. ).
From , we can reach, by the initialization sequence , equivalent to .
For all reachable from , there exists an equivalent reachable from with an equivalent trace (Forward simulation).
For all reachable from , there exists an equivalent reachable from with an equivalent trace (Backward simulation).
The proof of this theorem rely on two main observations. First, the parallel semantics is deterministic except for the choice of the thread, which is not an operation of the program. Equivalently, the only non-deterministic operation of the simulation is the call to , that models the non-deterministic behavior of the parallel semantics. Second, once the parallel semantics has selected a thread, the reduction is delegated to the sequential semantics that is deterministic. The corresponding simulating code, that solves the program counter and execute the simulating procedure, is also deterministic. Now, if we prove the forward simulation for a transformation and the resulting code is deterministic, then we also prove the backward simulation, as pointed by [14, Def 5.]. More detailed proofs of theorem 1 can be found in the report .
We show that the initialization establish the equivalence (1) by induction on traces. For the forward simulation (2), the induction is on the instructions, for the backward simulation (3), on the number of iterations of the interleaving loop.
An initial state of the simulation is:
As we suppose (by construction) that initially, and that contains correctly allocated simulation blocks for local variables and by the definition of a parallel initial state: such that the parts (1) and (2) of the equivalence are verified. The idea is then to show that the execution of correctly establish (3a), (3b), (4) and (5). In , we first move each program counter to the first instruction of each main procedure, ensuring (3a) and (3b) , and then initialize the address of each of these main procedures to ensure (4) (the base of the stack is correctly modeled). Finally, we initialize terminated to false, since each thread must, at least, return from its main, ensuring the (5). We have reached a state equivalent to .
Lemma (Forward simulation on a single step).
Let be a safe parallel program and its simulating program, a parallel state that reaches with an event , equivalent to , there exists a trace equivalent to that allows to reach equivalent to .
Sketch of proof.
By the equivalence relation, we know that is of the form:
We then perform the reduction . It generates an action (the first action of ) that places at memory location , being an allowed choice for since .
At this step, we perform a case analysis depending on the executed instruction and prove that the execution reach a state where the parts (1) to (4) of the equivalence are verified. Then, the execution of updates the variable by successively comparing the program counters to . As we maintained (3a) and (3b), we reach a state such that (5) is verified:
Moreover, actions generated during this loop are reads in and -actions (that are filtered). We reach, from equivalent to , a state equivalent to , with a trace equivalent to . ∎
Lemma (Backward simulation on a single step).
Let be the simulating program of a safe parallel program , a sequential state that reach with a trace , such that does not contain call action to , equivalent to , there exists an action such that is equivalent to that allows to reach equivalent to .
Sketch of proof.
Starting from , the simulation builds a trace so the condition is evaluated to (else we would not execute the loop, and the first action of the trace would not be realized). We also know that there exists