DeepAI

# Verifying Relational Properties using Trace Logic

We present a logical framework for the verification of relational properties in imperative programs. Our work is motivated by relational properties which come from security applications and often require reasoning about formulas with quantifier-alternations. Our framework reduces verification of relational properties of imperative programs to a validity problem into trace logic, an expressive instance of first-order predicate logic. Trace logic draws its expressiveness from its syntax, which allows expressing properties over computation traces. Its axiomatization supports fine-grained reasoning about intermediate steps in program execution, notably loop iterations. We present an algorithm to encode the semantics of programs as well as their relational properties in trace logic, and then show how first-order theorem proving can be used to reason about the resulting trace logic formulas. Our work is implemented in the tool Rapid and evaluated with examples coming from the security field.

• 30 publications
• 1 publication
• 2 publications
• 6 publications
• 25 publications
• 15 publications
08/04/2020

### Trace Logic for Inductive Loop Reasoning

We propose trace logic, an instance of many-sorted first-order logic, to...
03/20/2019

### Quantum Relational Hoare Logic with Expectations

We present a variant of the quantum relational Hoare logic from (Unruh, ...
05/10/2020

### Verification of Quantitative Hyperproperties Using Trace Enumeration Relations

Many important cryptographic primitives offer probabilistic guarantees o...
11/17/2022

### Towards Trace-based Deductive Verification (Tech Report)

Contracts specifying a procedure's behavior in terms of pre- and postcon...
02/07/2020

### RHLE: Relational Reasoning for Existential Program Verification

Reasoning about nondeterministic programs requires a specification of ho...
02/17/2022

### Scheduling Complexity of Interleaving Search

miniKanren is a lightweight embedded language for logic and relational p...
02/09/2022

### An algebra of alignment for relational verification

Relational verification encompasses information flow security, regressio...

## 1 Introduction

Program verification generally focuses on proving that all executions of a program lie within a specified set of executions, that is, properties are seen as sets of traces. However, this approach is not general enough to capture various fundamental properties, such as non-interference [26] and robustness [12]. These notions are naturally modelled as relational properties, that is as properties over sets of pairs of traces. Relational properties are special instances of hyperproperties [15], which are formally defined as sets of sets of traces.

Verification of relational properties can be achieved in different ways. One approach is by reduction to program verification: given a program and a hyperproperty , construct a program and a property , such that: (i) verifies and (ii) verifies implies verifies . The main advantage of this approach is that (i) can be verified using standard verification tools, whereas (ii) is proved generically for the method used for constructing , for instance self-composition [7, 20] and product programs [5, 13]. Another approach to verify relational properties is to use relational Hoare logic [10] or specialized logics that target specific properties [1]. While both approaches have been applied successfully in several use cases, they suffer from fundamental limitations: (i) they are typically not efficient enough to scale to large programs and (ii) they are only partly automated and tailored to specific properties.

#### Contributions.

In this paper, we develop a new approach based on reduction to first-order reasoning, with the intent of reconciling expressiveness and automation.

(1) We introduce and formally characterize trace logic , an instance of many-sorted first-order logic with equality, which allows expressing properties over program locations, loop iterations, and computation traces (Section 4).

(2) We encode the semantics of programs as well as relational program properties in (Section 4). Specifically, given a program and a relational property , we construct a first-order formula in such that validity of entails that satisfies . Note that this semantic characterization stands in contrast with methods based on product programs, Hoare logics, and relational Hoare logics, where verification is syntax-directed.

(3) We show that relational properties, such as non-interference, can naturally be encoded in trace logic (Section 5).

(4) We implemented our approach in the Rapid tool, which relies on the first-order theorem prover Vampire [33]. We conducted experiments on security-relevant hyperproperties, such as non-interference and sensitivity. Our results show that Rapid is more expressive than state-of-the-art non-interference verification tools and that Vampire is better suited to the verification of security-relevant hyperproperties than state-of-the-art SMT-solvers like Z3 and CVC4.

## 2 Motivating Example

We motivate our work with the simple program of Figure 1. This program iterates over an integer-valued array a and stores in the variable hw the sum of array elements. If a is a bitstring, then this program leaks the so-called Hamming weight of a in the variable hw. Our aim is to prove the following relational property over two arbitrary computation traces and of Figure 1: if the elements of the array variable a in are component-wise equal to the elements of a in except for two consecutive positions and , for some , and the elements of a in at positions are swapped versions of the elements of a in (that is, the -th element of a in is the -th element of a in and vice-versa), then the program variable hw is the same at the end of and . We formalize this property as

 ∀kI.((∀posI.((pos≄k∧pos≄k+1)→a(pos,t1)≃a(pos,t2))∧a(k,t1)≃a(k+1,t2)∧a(k,t2)≃a(k+1,t1)∧0≤k+1

where and respectively specify that and are of sort integer . Further, denotes the value of the element at position of a in trace , whereas end refers to the last program location of Figure 1 (that is, line 14).

Property  (1) is challenging to verify, since it requires theory-specific reasoning over integers and it involves alternation of quantifiers, as the length of the array a is unbounded and the -th position (corresponding to the swap) is arbitrary. To understand the difficulty in automating such kind of reasoning, let us first illustrate how humans would naturally prove property (1). First, split the iterations of the loop of Figure 1 into three intervals: (i) The interval from the first iteration of the loop to the iteration where i has value , (ii) the interval from the iteration where i has value to the iteration where i has value , and (iii) the interval from the iteration where i has value to the last iteration of the loop. Next, for each of the intervals above, one proves that the equality of the value of hw in traces and is preserved; that is, if hw has the same value in and at the beginning of the interval, then hw also has the same value in and at the end of the interval. In particular, for the first and third intervals one uses inductive reasoning, to conclude the preservation of the equality across the whole interval from the step-wise preservation in the interval of the equality of the value hw in traces and . Further, for the second interval, one uses commutativity of addition to prove that the value of hw in traces and is preserved. By combining that the values of hw in traces and are preserved in each of the three intervals, one finally concludes that property (1) is valid.

While the above proof might be natural for humans, it is challenging for automated reasoners for the following reasons: (i) one needs to express and relate different iterations in the execution of the loop in Figure

1 and use these iterations to split the reasoning about loop intervals; (ii) one needs to automatically synthesize the loop intervals whose boundaries depend on values of program variables; and (iii) one needs to combine theory-specific reasoning with induction for proving quantified properties, possibly with alternations of quantifiers. In our work we address these challenges: we introduce trace logic, allowing us to express and automatically prove relational properties, including property (1). The key advantages of trace logic are as follows.

(i) In trace logic, program variables are encoded as unary and binary functions over program execution timepoints. This way, we can precisely express the value of each program variable at any program execution timepoint, without introducing abstractions. For Figure

1, for example, we write to denote to the value of hw in trace at timepoint end.

(ii) Trace logic further allows arbitrary quantification over iterations and values of program variables. In particular, we can express and reason about iterations that depend on (possibly non-ground) expressions involving program variables. We use superposition-based first-order reasoning to automate static analysis with trace logic and derive first-order properties about loop iterations, possibly with quantifier alternations. For Figure 1, we generate for example the property: where denotes the location where the loop condition is tested and denotes the first iteration of the loop upon which the loop condition does not hold anymore.

(iii) We guide superposition reasoning in trace logic by using a set of lemmas statically inferred from the program semantics. These lemmas express inductive properties about the program behavior. To illustrate such lemmas, we first introduce the following notation. For an arbitrary program variable v, let denote that v has the same value in both traces at iteration of the loop. For example, for every program variable v of Figure 1, we introduce the following definition:

 Eqv(it):=v(l9(it),t1)≃v(l9(it),t2).

In particular, for variable hw, we introduce:

 Eqhw(it):=hw(l9(it),t1)≃hw(l9(it),t2).

We then derive the following inductive lemma for each program variable v:

 ∀itBN.((Eqv(0)∧∀itN.((it

where and denote iterations and denotes the successor of . Lemma (2) asserts that if v has the same value in traces and at the beginning of the loop (that is, at iteration ) and if the values of v are step-wise equal in traces and up to an arbitrary iteration , then the values of v are equal in traces and at iteration (and hence the values of v are preserved in and for the entire interval up to ). For Figure 1, we generate lemma (2) for hw as:

 ∀itBN.((Eqhw(0)∧∀itN.((it

Note that lemma (2), and in particular lemma (3) for hw, is crucial for proving that the values of hw in traces and are the same up to iteration , as considered in the relational property of (1). With this lemma at hand, we automatically prove property (1) of Figure 1, using superposition reasoning in trace logic.

## 3 Preliminaries

This section fixes our terminology and programming model.

### 3.1 First-order logic

We consider standard many-sorted first-order logic with equality, where equality is denoted by . We allow all standard boolean connectives and quantifiers in the language and write instead of , for two arbitrary first-order terms and . A signature is any finite set of symbols. We consider equality as part of the language; hence, is not a symbol. We write to denote that the formula is a tautology. In particular, we write , if is valid.

By a first-order theory, or simply just theory, we mean the set of all formulas valid on a class of first-order structures. When we discuss a theory, we call symbols occurring in the signature of the theory interpreted, and all other symbols uninterpreted. In our work, we consider the combination (union) of the theory of natural numbers and the one of integers. The signature of consists of standard symbols , , and , respectively interpreted as zero, successor, predecessor and less. Note that does not contain interpreted symbols for (arbitrary) addition and multiplication. We use the theory to represent and reason about loop iterations (see Section 4). The signature of consists of the standard integer constants and integer operators , and . We use the theory to represent and reason about integer-valued program variables (see Section 4). Additionally we use two (uninterpreted) sorts as two sets of uninterpreted symbols: (i) the sort Timepoint, written as , for denoting (unique) timepoints in the execution of the program and (ii) the sort Trace, written as , for denoting computation traces of a program.

Given a logical variable and sort , we write to denote that the sort of is . We use standard first-order interpretations/models modulo a theory , for example modulo . We write to denote that holds in all models of (and hence valid). If is a model of , we write if holds in the interpretation .

### 3.2 Programming Model W

We consider programs written in a standard while-like programming language, denoted as , with mutable and constant integer- and integer-array-variables. The language includes standard side-effect free expressions over booleans and integers. Each program in consists of a single top-level function main, with arbitrary nestings of if-then-else  and while-statements. For simplicity, whenever we refer to loops, we mean while-loops. For each statement s, we refer to while-statements in which s is nested in as enclosing loops of s. The semantics of is formalized in Section 4.3, with further details in Appendix 0.A.1.

## 4 Trace Logic

We now introduce the concept of trace logic for expressing both the semantics and (relational) properties of -programs.

### 4.1 Locations and Timepoints

We consider a program in as a set of locations, where each location intuitively corresponds to a point in the program at which an interpreter can stop. That is, for each program statement s, we introduce a program location . We denote by the location corresponding to the end of the program.

As program locations can be revisited during program executions, for example due to the presence of loops, we model locations as follows. For each location corresponding to a program statement s, we introduce a function symbol with target sort in our language, denoting the timepoint where the interpreter visits the location. For each enclosing loop of the statement s, the function symbol has an argument of type ; this way, we distinguish between different iterations of the enclosing loop of s. We denote the set of all such function-symbols as . When s is a loop, we additionally include a function symbol with target sort and an argument of sort for each enclosing loop of s. This way, denotes the iteration in which s terminates for given iterations of the enclosing loops of s. We denote the set of all such function symbols as .

###### Example 1

Consider Figure 1. We abbreviate each statement s by the line number of the first line of s. We use to refer to the timepoint corresponding to the first assignment of i in the program. We denote by and the timepoints corresponding to evaluating the loop condition in the first and, respectively, last loop iteration. Further, we write and for the timepoint corresponding to the beginning of the loop body in the -th and, respectively, second iteration of the loop. Note that is a term algebra expression of .

For simplicity, let us define terms over the most commonly used timepoints. First, define to be a function, which returns for each while-statement s a unique variable of sort . Second, let s be a statement, let be the enclosing loops of s and let iter be an arbitrary term of sort .

 tps :=ls(itw1,…,itwk) if s is not while-statement tps(it) :=ls(itw1,…,itwk,it) if s is while-statement lastIts :=ns(itw1,…,itwk) if s is while-statement

Third, let s be an arbitrary statement. We refer to the timepoint where the execution of s has started (parameterized by the enclosing iterators) by

 starts:={tps(0) if s is while-statementtps otherwise

Fourth, for an arbitrary statement s, let denote the timepoint which follows immediately after s has been evaluated completely (including the evaluation of substatements of s):

 ends:=⎧⎪ ⎪ ⎪ ⎪ ⎪ ⎪⎨⎪ ⎪ ⎪ ⎪ ⎪ ⎪⎩starts′if s′ occurs after s in a contextends′if s is last statement in % if-branch of s′ends′if s is last statement in % else-branch of s′tpw(succ(itw))if s is last % statement in body of {w}

### 4.2 Program Variables and Expressions

In our setting, we reason about program behavior by expressing properties over program variables v. To do so, we capture the value of program variables v at timepoints (from ) in arbitary program execution traces (from ). Hence, we model program variables v as functions , where gives the value of v at timepoint , in trace . If the program variable v is an array, we add an additional argument of sort , which corresponds to the position at which the array is accessed. We denote by the set of such introduced function symbols denoting program variables. We finally model arithmetic constants and program expressions using integer functions.

Note that our setting can be simplified for (i) non-mutable variables – in this case we omit the timepoint argument in the function representation of the variable; (ii) for non-relational properties about programs – in this case, we only focus on one computation trace and hence the trace argument in the function from can be omitted.

###### Example 2

Consider again Figure 1. By we refer to the value of program variable i in trace

at the moment before

i is first assigned. We use to refer to the value of variable alength in trace . As a is unchanged in the program, we write for the value of array a in trace at position , where is the value of i in trace at timepoint . In case a would have changed during the loop, we would have written instead. We denote by the value of the expression i+1 in trace at timepoint .

Consider now an arbitrary program expression e. We write to denote the value of e at timepoint , in trace . With these notations at hand, we introduce two definitions expressing properties about values of expressions e at arbitrary timepoints and traces. Consider now , that is a function denoting a program variable v, and let denote two timepoints. We define:

 Eq(v,tp1,tp2):={∀posI.v(tp1,pos,tr)≃v(tp2,pos,tr),if {v} is arrayv(tp1,tr)≃v(tp2,tr),otherwise (4)

That is, in (4) states that the program variable v has the same values at and . We also define:

 EqAll(tp1,tp2):=⋀v∈SVEq(v,tp1,tp2), (5)

asserting that all program variables have the same values at the timepoints and .

### 4.3 Semantics of W

We now describe the semantics of expressed in our trace logic . To do so, we state trace axioms of capturing the behavior of possible program computation traces and then define .

In what follows, we consider an arbitrary but fixed program in , and give all definitions relative to . Note that our semantics defines arbitrary executions, which are modeled by a free variable of sort .

#### Main-function

Let be statements and be a program with top-level function func main \{s$_1; \ldots;_k$\}. The semantics of is defined by the conjunction of the semantics of the statements s in the top-level function and is the same for each trace. That is:

 ⟦P⟧:=k⋀i=1⟦si⟧. (6)

The semantics of is then defined by structural induction, by asserting trace axioms for each program statement s, as follows.

#### Skip

Let s be a statement skip. The evaluation of has no effect on the value of the program variables. Hence:

 ⟦s⟧:=⋀v∈SVEq(v,ends,tps) (7)

#### Integer assignments

Let s be an assignment v = e, where v is an integer program variable and e is an expression. We reason as follows. The assignment s is evaluated in one step. After the evaluation of s, the variable v has the same value as e before the evaluation, and all other variables remain unchanged. Hence:

 ⟦s⟧:=v(ends)≃⟦e⟧(tps,tr)∧⋀v′∈SV∖{v}Eq(v′,ends,tps) (8)

#### Array assignments

Let s be an assignment a[e$_1$] = e$_2$, where a is an array variable and are expressions. We consider that the assignment is evaluated in one step. After the evaluation of s, the array a has the same value as before the evaluation, except for the position corresponding to the value of e before the evaluation, where the array now has the value of e before the evaluation. All other program variables remain unchanged and we have:

 ⟦s⟧:= ∀posI.(pos≄e1(tps,tr)→ a(ends,pos,tr)≃a(tps,pos,tr)) (9a) ∧ a(ends,e1(tps,tr))≃e2(tps,tr) (9b) ∧ ⋀v∈SV∖{a}Eq(v,ends,tps) (9c)

Let s be the statement: if(Cond)\{s$_1;\ldots;_k$\} else \{s$_1‘;\ldots;_{k‘}‘ \}. The semantics of s is defined by the following two properties: (i) entering the if-branch and/or entering the else-branch does not change the values of the variables, (ii) the evaluation in the branches proceeds according to the semantics of the statements in each of the branches. Thus:  ⟦s⟧:= ⟦Cond⟧(tps)→EqAll(starts1,tps) (10a) ∧ ¬ ⟦Cond⟧(tps)→EqAll(start′s1,tps) (10b) ∧ ⟦Cond⟧(tps)→⟦s1⟧∧⋯∧⟦sk⟧ (10c) ∧ ¬ ⟦Cond⟧(tps)→⟦s1′⟧∧⋯∧⟦sk′′⟧ (10d) #### While-Loops Let s be the while-statement while(Cond)\{s$_1;\ldots;_k\$\}.We refer to Cond as the loop condition. We use the following four properties to defined the semantics of s: (i) the iteration is the first iteration where the loop condition does not hold, (ii) entering the loop body does not change the values of the variables, (iii) the evaluation in the body proceeds according to the semantics of the statements in the body, (iv) the values of the variables at the end of evaluating s are the same as the variable values at the loop condition location in iteration . We then have:

 ⟦s⟧:= ∀itsN.(its

### 4.4 Trace Logic L

We now have all ingredients to define our trace logic , allowing us to reason about both relational and non-relational properties of programs.

Let be a set of nullary function symbols of sort . Intuitively, these symbols denote traces and allow us to express relational properties. The signature of contains the symbols of the theories and together with symbols introduced in Section 4.1-4.2, that is symbols denoting timepoints, last iterations in loops, program variables and traces. Formally,

 Sig(L):=(SN∪SI)∪(STp∪Sn∪SV∪STr).

Recall that the semantics of is defined by the trace axioms (7)-(11). By extending standard small-step operational semantics with timepoints and traces, we obtain the small-step semantics of (see our Appendix for details). For proving soundness, of this semantics, we rely on so-called execution-interpretation of a program execution : such an interpretation is a model in which for every (array) variable v the term resp. is interpreted as the value of v at the execution step in corresponding to timepoint . We then refer to the soundness of the semantics of as -soundness, defined as:

###### Definition 1 (W-Soundness)

Let be a program and let be a trace logic property. We say that is -sound, if for any execution-interpretation we have .

By using structural induction over program statements, we derive -soundness of the semantics of (see our Appendix for details). That is:

###### Theorem 4.1 (W-Soundness of Semantics of W)

For a given terminating program , the trace axioms (7)-(11) are -sound.

As a consequence, the semantics of any terminating program expressed in , as defined in (6), is -sound.

### 4.5 Program Correctness in Trace Logic L

Let be a program and be a first-order property of , with expressed in . We use to express and prove that “satisfies” , that is is partially correct w.r.t. , as follows:

1. We express in , as discussed in Section 4.3;

2. We prove the partial correctness of with respect to ; that is, we prove

 ⟦P⟧⊨N∪IF.

In what follows, we first discuss (relational) properties expressed in (Section 5) and then focus on proving partial correctness using (Section 6).

## 5 Hyperproperties in Trace Logic

We exemplify the expressiveness of trace logic by encoding non-interference [38], a fundamental security property. We also showcase the generic lemmas, similar to property (2), introduced by our work to automate the verification of hyperproperties. The examples considered in this section are deemed as insecure by existing syntax-driven, non-interference verification techniques, such as [38, 27].

Non-interference [26] is a security property that prevents information flow from confidential data to public channels. It is a so-called -safety property expressing that, given two runs of a program containing high and low confidentiality variables, denoted by and respectively, if the input for all variables is the same in both runs, the output of the computation should result in the same values for variables in both traces regardless of the initial value of any variable. Intuitively, this means that no private input leaks to any public sink. In what follows, we let lo denote an variable and hi an variable.

We formalize non-interference in trace logic as follows. Let denote the first timepoint of the execution and let denote that has the same value(s) in both traces at timepoint , that is:

 EqTr(v,tp):=⎧⎪ ⎪ ⎪⎨⎪ ⎪ ⎪⎩∀posI.v(tp,pos,t1)≃v(tp,pos,t2))if v is mutable array∀posI.v(pos,t1)≃v(pos,t2))if v is% constant arrayv(tp,t1)≃v(tp,t2))if v is mutable variablev(t1)≃v(t2)if v is constant variable

We then express non-interference as:

 (⋀v∈LEqTr(v,l0))→(⋀v∈LEqTr(v,lend)). (12)
###### Example 3

Consider the program illustrated in Figure 1(a), which branches on an guard. In the two branches, however, the variable is updated in the same way, thereby not leaking anything about the guard. The non-interference property for this program is a special instance of property (12), as follows:

 EqTr(lo,l0)→EqTr(lo,lend). (13)

By adjusting superposition reasoning to trace logic (see Section 6), we can automatically verify the property above. Traditional information-flow type systems [38] would however fail to prove this program secure, as they prevent any branching on guards. More permissive static analysis techniques based on program dependency graphs, such as Joana [27]

, would also classify this program as insecure.