Modular deductive verification [DBLP:journals/cacm/Hoare69] allows the user to prove that a function respects its formal specification. More precisely, for a given function , any individual call to can be proved to respect the contract of , that is, basically an implication: if the given precondition is true before the call and the call terminates111Termination can be assumed (partial correctness) or proved separately (full correctness) in a well-known way [Floyd1967]; for the purpose of this paper we can assume it., the given postcondition is true after it. However, some kinds of properties are not easily reducible to a single function call. Indeed, it is frequently necessary to express a property that involves several functions or relates the results of several calls to the same function for different arguments. Such properties are known as relational properties [DBLP:conf/popl/Benton04].
Examples of such relational properties include monotonicity (i.e. ), involving 2 calls, or transitivity (), involving 3 calls. In secure information flow [DBLP:journals/mscs/BartheDR11], non-interference is also a relational property. Namely, given a partition of program variables between high-security variables and low-security variables, a program is said to be non-interferent if any two executions starting from states in which the low-security variables have the same initial values will end up in a final state where the low-security variables have the same values. In other words, high-security variables cannot interfere with low-security ones.
Relational properties can also relate calls to different functions. For instance, in the verification of voting rules [BeckertBormerEA2016], relational properties are used for defining specific properties. Notably, applying the voting rule to a sequence of ballots and a permutation of the same sequence of ballots must lead to the same result, i.e. the order in which the ballots are passed to the voting function should not have any impact on the outcome.
|⬇ // x3 = *x1; *x1 = *x2; *x2 = x3; // *x1 = *x1 + *x2; *x2 = *x1 - *x2; *x1 = *x1 - *x2;||⬇ // x3_1 = *x1_1; *x1_1 = *x2_1; *x2_1 = x3_1; *x1_2 = *x1_2 + *x2_2; *x2_2 = *x1_2 - *x2_2; *x1_2 = *x1_2 - *x2_2;|
Lack of support for relational properties in verification tools was already faced by industrial users (e.g. in [BishopBC13] for C programs). The usual way to deal with this limitation is to use Self-Composition [DBLP:journals/mscs/BartheDR11, DBLP:conf/fm/SchebenS14, blatterKGP17], Product program [DBLP:conf/fm/BartheCK11] or other Self-Composition optimizations [ShemerCAV2021]. Those techniques are based on code transformations that are relatively tedious and error-prone. Moreover, they are hardly applicable in practice to real-life programs with pointers like in C. Namely, self-composition requires that the compared executions operate on completely separated (i.e. disjoint) memory areas, which might be extremely difficult to ensure for complex programs with pointers.
Example 1 (Motivating Example)
Figure 1 shows an example of two simple C programs performing a swap of the values referred to by pointers x1 and x2 (of type int*). Program uses an auxiliary variable x3 (of type int), while performs an in-place swap using arithmetic operations. As usual in that case, to work correctly, each of these programs needs some separation hypotheses: pointers x1 and x2 should be separated (that is, point to disjoint memory locations) and must not point to x1, x2 themselves and, for , to x3.
Consider a relational property, denoted , stating that both programs, executed from two states in which each of *x1 and *x2 has the same value, will end up in two states also having the same values in these locations. To prove this relational property using self-composition, one typically has to generate a new C program (see Fig. 1) composing and . To avoid name conflicts, we rename their variables by adding, resp., suffixes “_1” and “_2”. The relational property is then expressed by a contract of with a precondition and a postcondition . Obviously, both and must include the equalities: *x1_1==*x1_2 and *x2_1==*x2_2, and must also require the aforementioned separation hypotheses necessary for each function. But for programs with pointers and aliasing, this is not sufficient: the user also has to specify additional separation hypotheses222For convenience of the readers, and are defined in detail in Appendix 0.A. between variables coming from the different programs, that is, in our example, that each of x1_1 and x2_1 is separated from each of x1_2 and x2_2. Without such hypotheses, a deductive verification tool cannot show, for example, that a modification of *x1_1 does not impact *x1_2 in the composed program , and is thus unable to deduce the required property. For real-life programs, such separation hypotheses can be hard to specify or generate. ∎
This paper proposes an alternative approach that is not based on code transformation or relational rules. It directly uses a verification condition generator (VCGen) to produce logical formulas to be verified (typically, with an automated prover) to ensure a given relational property. It requires no extra code processing (such as sequential composition of programs or variable renaming). Moreover, no additional separation hypotheses—in addition to those that are anyway needed for each function to work—are required. The locations of each program are separated by construction: each program has its own memory state. The language considered in this work was chosen as a minimal language representative of the main issues relevant for relational property verification: it is a standard While language enriched with annotations, procedures and pointers (see programs and in Fig. 1 for examples). Notably, the presence of dereferencing and address-of operations makes it representative of various aliasing problems with (possibly, multiple) pointer dereferences of a real-life language like C. We formalize the proposed approach and prove333The full development is available at https://bit.ly/3FJhE41. its soundness in the proof assistant [Coq]. Our development contains about 3400 lines.
The contributions of this paper include:
a formalization and proof of soundness of recursive Hoare triple verification with a verification condition generator on a representative language with procedures and aliasing;
a novel method for verifying relational properties using a verification condition generator, without relying on code transformation (such as self-composition) or making additional separation hypotheses in case of aliasing;
a formalization and proof of soundness of the proposed method of relational property verification for the considered language.
Section 2 introduces an imperative language used in this work. Functional correctness is defined in Section 3, and relational properties in Section 4. Then, we prove the soundness of a VCGen in Section 5, and show how it can be soundly extended to verify relational properties in Section 6. Finally, we present related work in Section 7 and concluding remarks in Section 8.
2 Syntax and Semantics of the Language
2.1 Notation for Locations, States, and Procedure Contracts
We denote by the set of natural numbers, by the set of nonzero natural numbers, and by the set of Boolean values. Let be the set of program locations and the set of program (procedure) names, and let and denote metavariables ranging over those respective sets. We assume that there exists a bijective function , so that . Intuitively, we can see as the address of location .
Let be the set of functions , called memory states, and let denote metavariables ranging over the set. A state maps a location to a value using its address: location has value
We define the update operation of a memory state , also denoted by , as the memory state mapping each address to the same value as , except for , bound to . Formally, is defined by the following rules:
Let be the set of functions , called procedure environments, mapping program names to commands (defined below), and let denote metavariables ranging over . We write to refer to , the commands (or body) of procedure for a given procedure environment .
Assertions are predicates of arity one, taking as parameter a memory state and returning an equational first-order logic formula. Let metavariables range over the set of assertions. For instance, using -notation, assertion assessing that location is bound to can be defined by This form will be more convenient for relational properties (than e.g. ) as it makes explicit the memory states on which a property is evaluated.
Finally, we define the set of contract environments , and metavariables to range over . More precisely, maps a procedure name to the associated (procedure) contract , composed of a pre- and a postcondition for procedure . As usual, a procedure contract will allow us to specify the behavior of a single procedure call, that is, if we start executing in a memory state satisfying , and the evaluation terminates, the final state satisfies .
2.2 Syntax for Expressions and Commands
Let , and denote respectively the sets of arithmetic expressions, Boolean expressions and commands. We denote by ; and metavariables ranging, respectively, over those sets. Syntax of arithmetic and Boolean expressions is given in Fig. 2. Constants are natural numbers or Boolean values. Expressions use standard arithmetic, comparison and logic binary operators, denoted respectively , , . Since we use natural values, the subtraction is bounded by 0, as in : if , the result of is considered to be 0. Expressions also include locations, possibly with a dereference or address operators.
Figure 2 also presents the syntax of commands in . Sequences, skip and conditions are standard. An assignment can be done to a location directly or after a dereference. Recall that a location contains as a value a natural number, say , that can be seen in turn as the address of a location, namely , so the assignment writes the value of expression to the location , while the address operation computes the address of . An assertion command indicates that an assertion should be valid at the point where the command occurs. The loop command is always annotated with an invariant . As usual, this invariant should hold when we reach the command and be preserved by each loop step. Command is a procedure call. All annotations (assertions, loop invariants and procedure contracts) will be ignored during the program execution and will be relevant only for program verification in Section 5. Procedures do not have explicit parameters and return values (hence we use the term procedure call rather than function call). Instead, as in assembly code [Irvine:2014:ALX:2655333], parameters and return value(s) are shared implicitly between the caller and the callee through memory locations: the caller must put/read the right values at the right locations before/after the call. Finally, to avoid ambiguity, we regroup sequences of commands with .
Figure 3 shows an example of a command and a procedure environment where procedure points to a recursive command, called in . With the semantics of Sec. 2.3, from any initial state, the command will return a state in which . Procedure returns a state where if the initial state satisfies . This can be expressed by the contract environment given (in -notation) in Fig. 3. ∎
2.3 Operational Semantics
Evaluation of arithmetic and Boolean expressions in is defined by functions and . Selected evaluation rules for arithmetic expressions are shown in Fig. 4. Operations and have a semantics similar to the C language, i.e. dereferencing and address-of. Semantics of Boolean expressions is standard [DBLP:books/daglib/0070910].
Based on these evaluation functions, we can define the operational semantics of commands in a given procedure environment . Selected evaluation rules444For convenience of the readers, full versions of Fig. 4, 5 are given in Appendix 0.B. are shown in Fig. 5. As said above, both assertions and loop invariants can be seen as program annotations that do not influence the execution of the program itself. Hence, command is equivalent to a skip. Likewise, loop invariant has no influence on the semantics of .
We write to denote that can be derived from the rules of Fig. 5. Our formalization, inspired by [SF], provides a deep embedding of , with an associated parser, in files Aexp.v, Bexp.v and Com.v.
3 Functional Correctness
We define functional correctness in a similar way to the original Hoare triple definition [DBLP:journals/cacm/Hoare69], except that we also need a procedure environment , leading to a quadruple denoted . We will however still refer by the term “Hoare triple” to the corresponding program property, formally defined as follows.
Definition 1 (Hoare triple)
Let be a command, a procedure environment, and and two assertions. We define a Hoare triple as follows:
Informally, our definition states that, for a given , if a state satisfies and the execution of on terminates in a state , then satisfies .
Next, we introduce notation to denote the fact that, for the given and , every procedure satisfies its contract.
Definition 2 (Contract Validity)
Let be a procedure environment and a contract environment. We define contract validity as follows:
The notion of contract validity is at the heart of modular verification, since it allows assuming that the contracts of the callees are satisfied during the verification of a Hoare triple. More precisely, to state the validity of procedure contracts without assuming anything about their bodies in our formalization, we will consider an arbitrary choice of implementations of procedures that satisfy the contracts, like in assumption (3) in Lemma 1. This technical lemma, taken from [DBLP:series/txcs/AptBO09, Equation (4.6)], gives an alternative criterion for validity of procedure contracts: if, under the assumption that the contracts in hold, we can prove for each procedure that its body satisfies its contract, then the contracts are valid.
Lemma 1 (Adequacy of contracts)
Given a procedure environment and a contract environment such that
Any given terminating execution traverses a finite number of procedure calls (over all procedures) that can be replaced by inlining the bodies a sufficient number of times. We first formalize a theory of -inliners (that inline procedure bodies a finite number of times and replace deeper calls by nonterminating loops) and prove their properties. Relying on this elegant theory, the proof of the lemma proceeds by induction on the number of procedure inlinings. ∎
From that, we can establish the main result of this section. Theorem 3.1, taken from [DBLP:series/txcs/AptBO09, Th. 4.2] states that holds if assumption (3) holds and if the validity of contracts of for implies the Hoare triple itself. This theorem is the basis for modular verification of Hoare Triples, as done for instance in Hoare Logic [DBLP:journals/cacm/Hoare69, DBLP:books/daglib/0070910] or verification condition generation.
Theorem 3.1 (Recursion)
Given a procedure environment and a contract environment such that
By Lemma 1.∎
We refer the reader to the development,
more precisely the results
recursive_hoare_triple in file
Hoare_Triple.v for complete proofs of
Lemma 1 and
Theorem 3.1 for .
To the best of our knowledge, this is the first mechanized proof of
these classical results in .
An interesting corollary can be deduced from Theorem 3.1.
[Procedure Recursion] Given a procedure environment and a contract environment such that
4 Relational Properties
Relational properties can be seen as an extension of Hoare triples. But, instead of linking one program with two properties, the pre- and postconditions, relational properties link programs to two properties, called relational assertions. We define a relational assertion as a predicate taking a sequence of memory states and returning a first-order logic formula. We use metavariables to range over the set of relational assertions, denoted . As a simple example of a relational assertion, we might say that two states bind location to the same value. This would be stated as follows: .
A relational property is a property about programs , stating that if each program starts in a state and ends in a state such that holds, then holds, where and are relational assertions over memory states.
We formally define relational correctness similarly to functional correctness (cf. Def. 1), except that we now use sequences of memory states and commands of equal size. We denote by a sequence of elements where ranges from to . If , is the empty sequence denoted .
Definition 3 (Relational Correctness)
Let be a procedure environment, a sequence of commands (), and and two relational assertions over states. The relational correctness of with respect to and , denoted , is defined as follows:
This notation generalizes the one proposed by Benton [DBLP:conf/popl/Benton04] for relational properties linking two commands: As Benton’s work mostly focused on comparing equivalent programs, using symbol was quite natural.
Example 3 (Relational property)
Figure 6 formalizes the relational property for programs and discussed in Ex. 1. Recall that (written in Fig. 6 in Benton’s notation) states that both programs executed from two states named and having the same values in and will end up in two states and also having the same values in these locations. Notice that the initial state of each program needs separation hypotheses (cf. the second line of the definition of ). Namely, and must point to different locations and must not point to , or, for , to for the property to hold. This relational property is formalized in the developement in file Examples.v. ∎
5 Verification Condition Generation for Hoare Triples
A standard way [Floyd1967] for verifying that a Hoare triple holds is to use a verification condition generator (VCGen). In this section, we formalize a VCGen for Hoare triples and show that it is correct, in the sense that if all verification conditions that it generates are valid, then the Hoare triple is valid according to Def. 1.
5.1 Verification Condition Generator
We have chosen to split the VCGen in three steps, as it is commonly done [Frama-C]:
function generates the main verification condition, expressing that the postcondition holds in the final state, assuming auxiliary annotations hold;
function generates auxiliary verification conditions stemming from assertions, loop invariants, and preconditions of called procedures;
finally, function generates verification conditions for the auxiliary procedures that are called by the main program, to ensure that their bodies respect their contracts.
Definition 4 (Function generating the main verification condition)
Given a command , a memory state representing the state before the command, a contract environment , and an assertion , function returns a formula defined by case analysis on as shown in Fig. 7.
Assertion represents the postcondition we want to verify after the command executed from state . For each command, except sequence and branch, a fresh memory state is introduced and related to the current memory state . The new memory state is given as parameter to . For , which does nothing, both states are identical. For assignments, is simply the update of . An assertion introduces a hypothesis over but leaves it unchanged. For a sequence, we simply compose the conditions, that is, we check that the final state of is such that will be verified after executing . For a conditional, we check that if the condition evaluates to , the then branch will ensure the postcondition, and that otherwise the else branch will ensure the postcondition. The rule for calls simply assumes that verifies . Finally, assumes that, after a loop, is a state where the loop condition is and the loop invariant holds. As for an assertion, the callee’s precondition and the loop invariant are just assumed to be true; function , defined below, generates the corresponding proof obligations.
For , and , we have:
Definition 5 (Function generating the auxiliary verification conditions)
Given a command , a memory state representing the state before the command, and a contract environment , function returns a formula defined by case analysis on as shown in Fig. 8.
Basically, collects all assertions, preconditions of called procedures, as well as invariant establishment and preservation, and lifts the corresponding formulas to constraints on the initial state through the use of .
Finally, we define the function for generating the conditions for verifying that the body of each procedure defined in respects its contract defined in .
Definition 6 (Function generating the procedure verification condition)
Given two environments and , returns the following formula:
The VCGen is defined in file Vcg.v of the development. Interested readers will also find a proof (in file Vcg_Opt.v) of a VCGen optimization (not detailed here), which prevents the size of the generated formulas from becoming exponential in the number of conditions in the program [DBLP:conf/popl/FlanaganS01], which is a classical problem for “naive” VCGens.
5.2 Hoare Triple Verification
We can now state the theorems establishing correctness of the VCGen. Their proof can be found in file Correct.v of the development.
First, Lemma 2 shows that, under the assumption of the procedure contracts, a Hoare triple is valid if for all memory states satisfying the precondition, the main verification condition and the auxiliary verification conditions hold.
Assume the following two properties hold:
Then we have
By structural induction over . ∎
Next, we prove in Lemma 3 that if holds, then for an arbitrary choice of implementations of procedures respecting the procedure contracts, the body of each procedure respects its contract.
Assume that the formula is satisfied. Then we have
By Lemma 2. ∎
Finally, we can establish the main theorem of this section, stating that the VCGen is correct with respect to our definition of Hoare triples.
Theorem 5.1 (Soundness of VCGen)
Assume that we have and
Then we have .
6 Relational Properties Verification
In this section, we propose a verification method for relational properties (defined in Section 4) using the VCGen defined in Section 5 (or, more generally, any VCGen respecting Theorem 5.1). First, we define the notation for the recursive call of function on a sequence of commands and memory states:
Definition 7 (Function )
Given a sequence of commands and a sequence of memory states , a contract environment and a predicate over states, function is defined by induction on as follows.
Intuitively, for , gives the weakest relational condition that and must fulfill in order for to hold after executing from and from :
Assume we have a command , a sequence of commands , and a sequence of memory states . From Def. 1, it follows that
is equivalent to
Example 6 (Relational verification condition)
In order to make things more concrete, we can go back to the relational property between two implementations and of swap defined in Ex. 1 and examine what would be the main verification condition generated by . Let and be defined as in Ex. 3. In this particular case, we have , and is empty (since we do not have any function call), thus Def. 7 becomes:
We thus start by applying over , to obtain, using the rules of Def. 4 for sequence and assignment, the following intermediate formula:
We can then do the same with to obtain the final formula:
with odd (resp., even) indices result fromfor (resp., ). ∎
We similarly define a notation for the auxiliary verification conditions for a sequence of commands.
Definition 8 (Function )
Given a sequence of commands and a sequence of memory states , we define function as follows:
For it trivially follows from Def. 8 that:
Using functions and , we can now give the main result of this paper: it states that the verification of relational properties using the VCGen is correct.
Theorem 6.1 (Soundness of relational VCGen)
For any sequence of commands , contract environment , procedure environment , and relational assertions over states and , if the following three properties hold:
then we have
In other words, a relational property is valid if all procedure contracts are valid, and, assuming the relational precondition holds, both the auxiliary verification conditions and the main relational verification condition hold. We give the main steps of the proof below. The corresponding formalization is available in file Rela.v, and the proof of Theorem 6.1 is in file Correct_Rela.v.
By induction on the length of the sequence of commands .