Proof-Carrying Parameters in Certified Symbolic Execution: The Case Study of Antiunification

10/22/2021
by   Andrei Arusoaie, et al.
Alexandru Ioan Cuza University
0

Unification and antiunification are essential algorithms used by symbolic execution engines and verification tools. Complex frameworks for defining programming languages, such as K, aim to generate various tools (e.g., interpreters, symbolic execution engines, deductive verifiers, etc.) using only the formal definition of a language. K is the best effort implementation of Matching Logic, a logical framework for defining languages. When used at an industrial scale, a tool like the K framework is constantly updated, and in the same time it is required to be trustworthy. Ensuring the correctness of such a framework is practically impossible. A solution is to generate proof objects as correctness certificates that can be checked by an external trusted checker. In K, symbolic execution makes intensive use of unification and antiunification algorithms to handle conjunctions and disjunctions of term patterns. Conjunctions and disjunctions of formulae have to be automatically normalised and the generation of proof objects needs to take such normalisations into account. The executions of these algorithms can be seen as parameters of the symbolic execution steps and they have to provide proof objects that are used then to generate the proof object for the program execution step. We show in this paper that Plotkin's antiunification can be used to normalise disjunctions and to generate the corresponding proof objects. We provide a prototype implementation of our proof object generation technique and a checker for certifying the generated objects.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

02/19/2021

Crowbar: Behavioral Symbolic Execution for Deductive Verification of Active Objects

We present the Crowbar tool, a deductive verification system for the ABS...
10/21/2021

Certifying C program correctness with respect to CompCert with VeriFast

VeriFast is a powerful tool for verification of various correctness prop...
01/14/2020

Gillian: Compositional Symbolic Execution for All

We present Gillian, a language-independent framework for the development...
10/18/2019

Programming and Symbolic Computation in Maude

Rewriting logic is both a flexible semantic framework within which widel...
10/10/2018

Trapezoidal Generalization over Linear Constraints

We are developing a model-based fuzzing framework that employs mathemati...
08/15/2018

Homeomorphic Embedding modulo Combinations of Associativity and Commutativity Axioms

The Homeomorphic Embedding relation has been amply used for defining ter...
02/04/2019

Symbolic QED Pre-silicon Verification for Automotive Microcontroller Cores: Industrial Case Study

We present an industrial case study that demonstrates the practicality a...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Matching logic [35, 11] (hereafter shorthanded as ) is a logical framework which is used for specifying programming languages semantics [21, 23, 29] and for reasoning about programs [19, 18, 39, 25, 6]. In , the operational semantics of a programming language is used for both execution and verification of programs.

The best effort implementation of is the K framework. K aims to be an ideal framework where language designers should only formally specify the syntax and the semantics of their language. Then, the framework should automatically generate from the language formal specification a set of tools without additional efforts: a parser, an interpreter, a compiler, a model checker, a deductive verifier, etc. In the last few years, K is actively used on an industrial scale for the verification of programs that run on the blockchain (smart contracts), e.g., [30, 12]. Due to its complexity, a big challenge that K is currently facing is related to trustworthiness of the framework. Right now K has a half a million lines of code and it is hard to establish its correctness.

Recent research [9] tackles this trustworthiness issue by proposing an approach based on proof object generation. The key idea is to generate proof objects for the tasks performed by K and its autogenerated tools, and then use an external trustworthy proof checker to check the generated objects. This eliminates the need to trust the huge K implementation. So far, the authors of [9] focused on formalising program executions as mathematical proofs and generating their corresponding proof objects. This required the development of a proof generator which uses an improved proof system of  [16], and a proof checker implementation in Metamath [27]. The K definition of a language corresponds to a theory which consists of a set of symbols (that represents the formal syntax of ) and a set of axioms (that specify the formal semantics of ). In [9], program executions are specified using formulas of the form:

where is the formula that specifies the initial state of the execution, is the formula that specifies the final state, and “” states the rewriting/reachability relation between states. The correctness of an execution,

is witnessed by a formal proof, which uses the proof system.

For a given execution, the K tool computes the proof parameters needed by the proof generator to generate the corresponding proof object. The set of parameters consists of the complete execution trace and the rewriting/matching information given by the rules applied together with the corresponding matching substitutions.

Generating proof objects for concrete program executions is an important step towards a trustworthy K framework. A more challenging task is to generate proof objects for symbolic executions. Symbolic execution is a key component in program verification and it has been used in K as well (e.g.,  [3, 24, 19]). The difficulty with symbolic execution is the fact that the parameters of an execution step must carry more proof information than in the concrete executions case. First, instead of matching, proof parameters must include unification information. Second, path conditions need to be carried along the execution.

In , there is a natural way to deal with symbolic execution. patterns have a normal form , where is a term (pattern) and is a constraint (predicate). In particular, can be the program state configuration and the path condition. Patterns are evaluated to the set of values that match and satisfy . To compute the symbolic successors of a pattern, say , with respect to a rule, say , we need to unify the patterns and . Because unification can be expressed as a conjunction in  [4, 35], we can say that only the states matched by transit to states matched by . Expressing unification as a conjunction is an elegant feature of . However, in practice, unification algorithms are still needed to compute the symbolic successors of a pattern because they provide a unifying substitution. The symbolic successors are obtained by applying the unifying substitution to the right hand side of a rule (e.g., ) and adding the substitution (as an formula) to the path condition. Also, unification algorithms are being used to normalise conjunctions of the form , so that they consist of only one term and a constraint, . Therefore, the unification algorithms are parameters of the symbolic execution steps and they must be used to generate the corresponding proof objects. In [4] we present a solution to normalise conjunctions of term patterns using the syntactic unification algorithm [26]. The unification algorithm is instrumented to also generate a proof object for the equivalence between  and .

It is often the case when more than one rule can be applied during symbolic execution. For instance, if an additional rule can be applied to , then the set of target states must match or . This set of states is matched by the disjunction . For the case when holds, the disjunction reduces to , which is not a normal form but it can be normalised using antiunification.

Contributions

This paper continues the current work in [9] and [4] towards a trustworthy semantics-based framework for programming languages. We use Plotkin’s antiunification algorithm to (1) normalise disjunctions of term patterns and (2) to generate proof objects for the equivalence between disjunctions and their corresponding normal form. Each step performed by the antiunification algorithm on an input disjunction produces a formula which is equivalent to the disjunction. Each equivalence has a corresponding proof object. The generated proof objects (one for each equivalence) are assembled into a final proof object. We further provide a (3) prototype implementation of a proof generator and a (4) proof checker that we use for checking the generated objects.

Related work.

We use an applicative version of that is described in [13, 11]. This version of is shown to capture many logical frameworks, such as many-sorted first-order logic (MSFOL), constructors and terms algebras with the general induction and coinduction principles, MSFOL with least fixpoints, order-sorted algebras, etc. A proof system for that is more amenable for automation was proposed in [16]. The authors improve their previous work [14], by adding a context-driven fixpoint rule in the proof system which can deal with goals that need wrapping and unwrapping contexts. This was inspired by the work on automation of inductive proofs (e.g., [22]) for Separation Logic [34].

K is a complex tool and ensuring the correctness of the generated tools is hard to achieve in practice. supplies an appropriate underlying logic foundation for K [15]. However, if K could output proofs that can be checked by an external proof checker (based on the existing proof system of Applicative ) then there is no need to verify K. Recent work shows that it is possible to generate proof objects for various tasks that the K generated tools perform (e.g., program execution, simplifications, unification). For instance, in [9], the authors tackle program executions: they are formalised as mathematical proofs, and then, their complete proof objects are generated. The proof objects can then be checked using a proof checker implemented in Metamath [27].

Unification and antiunification algorithms are used by the K tools, e.g., to handle the symbolic execution of programs written in languages formally defined in K [3, 24, 19]. The unification problem consists of finding the most general instance of two terms (see, e.g., [7]), which supplies the set of all common instances. This set of all common instances can represented in as the conjunction of the two terms [35]. In [4], we have shown that the syntactic unification algorithm [26] can be used to find a ”normal” form for the conjunction of two terms and to instrument it to generate a proof object for the equivalence. These transformations improve the efficiency of the prover implementation in K.

The antiunification problem is dual to unification and it consists of finding the most specific template (pattern) of two terms and it was independently considered by Plotkin [32] and Reynolds [33]. Antiunification is used, e.g., in [31] to generalize proofs, in [20] to type-check pointcuts of an aspect, and in [40] for computing the least general type of two types. In this paper we use the antiunification algorithm for normalizing patterns of the form and for generating proof objects for the corresponding equivalence. The proof objects are generated using an approach based on “macro” level inference rules inspired from [28]. This is why we could not use the Metamath proof checker presented in [9, 16], and we developed our own proof checker. The classical untyped generalisation algorithm is extended to an order-sorted typed setting with sorts, subsorts, and subtype polymorphism [1, 8, 2]. We claim that the approach proposed in this paper can be smoothly extended to the order-sorted case.

Organisation

Section 2 briefly introduces and its proof system. Section 3 presents a specification that captures the term algebra up to an isomorphism. Only the case of non-mutual recursive sorts is considered. Section 4 includes the first main contribution, the representation of the antiunification in . A soundness theorem in terms for the Plotkin’s antiunification algorithm is proved. Section 5 describes the second main contribution, the algorithm generating proof objects for the equivalences given by the antiunification algorithm and its implementation in Maude, together with a Maude proof checker that certifies the generated proofs. We conclude in Section 6.

2 Matching Logic

Matching logic ([35] started as a logic over a particular case of constrained terms [36, 38, 18, 37, 39, 6, 25], but now it is developed as a full logical framework. We recall from [11, 13] the definitions and notions that we use in this paper.

Definition 1

A signature is a triple , where is a set of element variables , is a set of set variables , and is a set of constant symbols (or constants). The set of -patterns is generated by the grammar below, where , , and :

The pattern is called definedness. For convenience, we introduce it directly in the syntax of patterns but it can be axiomatised (as in [35]). A pattern is positive in if all free occurrences of in are under an even number of negations. The pattern is an application and, by convention, it is left associative. We specify an signature only by when and are clear from the context.

Remark 1

The patterns below are derived constructs:

The pattern is called totality and it is defined using the definedness pattern. This totality pattern is used to define other mathematical instruments like equality and membership. The priorities of pattern constructs is given by this ordered list: , where has the highest priority and has the lowest priority. By convention, the scope of the binders extends as much as possible to the right, and parentheses can be used to restrict the scope of the binders.

Example 1

Let be an signature. Then , , , , , , , , are examples of patterns.

We write and to denote the pattern obtained by substituting all free occurrences of and , respectively, in for . In order to avoid variable capturing, we consider that -renaming happens implicitly.

Patterns evaluate to the sets of elements that match them.

Definition 2

Given a signature , a -model is a triple containing:

  • [topsep=0pt, partopsep=0pt, itemsep=0pt]

  • a nonempty carrier set ;

  • a binary function called application;

  • an interpretation for every constant as a subset of .

The application is extended to sets, , as follows: , for all .

Definition 3

Given a model , an -valuation is a function with for all , and for all . The extension of is defined as:
, for all ; , for all ; , for all ; ; ; ; ; ; , with , , and is the unique least fixpoint given by the Knaster-Tarski fixpoint theorem [41].

Note that the definedness pattern has a two-value semantics: it can be either or . The patterns that evaluate to or are called predicates. In this category are included totality, equality and membership.

Remark 2

The -evaluation is extended to the derived constructs as expected. For instance, we have and .

Example 2

Let be the signature in Example 1. A possible -model is , where , if and otherwise, , if and , and otherwise. is the set of natural numbers, and is the set of lists of naturals, written as or  with .

Since for any M-valuation , , we obtain that the pattern matches the set . Following the same idea, it is easy to see that matches the singleton set , and matches . Moreover, the pattern matches the set .

Definition 4

We say that is valid in , and write , iff , for all . If is a set of patterns, then: iff , for all , and iff implies . An specification is a pair with a signature and a set of -patterns.

2.0.1 The Matching Logic Proof System

The proof system we use is shown in Figure LABEL:fig:proofsystem, The first part is the Hilbert-style proof system given in [11]. It contains four categories of rules: propositional tautologies, frame reasoning over application contexts, standard fixpoint reasoning, and two technical rules needed for the completeness result. In , an application context is a pattern with a distinguished placeholder variable such that the path from the root of to has only applications. is a shorthand for and denotes the set of free variables in .

Hilbert-style proof system
Propositional , if is a propositional tautology over patterns
Modus Ponens φ_1 →φ_2
-Quantifier
-Generalisation if
Propagation
Propagation
Propagation     if
Framing
Set Variable Substitution
Pre-Fixpoint
Knaster-Tarski
Existence
Singleton
a where are fresh variables. If we want to compute the lgg of and , we build the initial antiunification problem with and we apply Plotkin’s rule repeatedly. When this rule cannot be applied anymore, we say that the obtained antiunification problem is in solved form. The obtained is the lgg of and , while defines the two substitutions and such that and . Note that the pairs are not commutative. It has been proved in [32] that the above antiunification algorithm terminates and computes the lgg. In fact, for any input the algorithm computes all the generalisations of and , and it stops when it finds the lgg.
Lemma 1 ()

Let and be two term patterns and such that . If and is in solved form, then is a generalisation of and , for all .

Remark 3

If is the input unification problem and is fresh w.r.t. the variables of and (i.e., ), then the Plotkin’s algorithm will generate only fresh variables w.r.t. the previously generated variables and . Each occurs at most once in the computed generalisation and at most once in every computed by the algorithm.

Example 3

Let us consider the term patterns and . Using Plotkin’s algorithm on the input (note that is fresh w.r.t. ) we obtain:

The lgg of and is while the substitutions and satisfy and . The generated variables occur at most once in the computed lgg, and .

We use to denote the fact that is in solved form.

4.1 Antiunification representation in

In , the lgg of and is given by their disjunction , that is, the pattern that matches elements matched by or (cf. Remark 2). Disjunctions over term patterns are difficult to handle in practice, and thus, an equivalent normal form that has only one maximal structural component is convenient. We show that this form can be obtained using Plotkin’s antiunification algorithm. Instead of , we use their lgg, say , to capture the largest common structure of and . The computed substitutions and , with and , are used to build a certain constraint so that is equivalent to .

Definition 6

Let be a substitution. We denote by the predicate .

Lemma 2 ()

For all term patterns and , and for all substitutions such that , , and for all , we have .

Example 4

Recall from Example 3 and let be a generalisation of with . By Lemma 2, . Let be another generalisation of with . Then .

Remark 4

Note that in Example 4 is equivalent with because the order of , and is not important in this context. So, we prefer more compact notations like . Moreover, we use instead of and instead of . By we mean that for all . So, the conclusion of Lemma 2 is .

Using Lemma 2, the disjunction is equivalent to , where is the lgg of and . The pattern has one structural component , but it appears twice! However, using a macro rule (i.e., -Collapse from Figure 5) we obtain the equivalence . Moreover, since , we obtain (by transitivity) the equivalence .

Example 5

Recall the term patterns and from Example 3. Let be a generalisation of and with substitutions and . Using Lemma 2 and -Collapse, .

Let be another generalisation of and with and . Then .

At each step, the Plotkin’s algorithm computes generalisations for and , until it reaches the lgg.

Antiunification problems are encoded as patterns as below:

Definition 7

For each antiunification problem we define a corresponding pattern , where , , and .

Example 6

We show here the corresponding encodings for antiunification problems that occur in the execution of Plotkin’s algorithm in Example 3:

  1. [topsep=0pt, partopsep=0pt, itemsep=0pt]

  2. is encoded as ;

  3. is encoded as:

    ;

  4. is encoded as:

    .

A key observation here is that these encondings are all equivalent.

Theorem 4.1 shows that our approach is sound:

Theorem 4.1 ()

(Soundness) Let and be two term patterns and a variable such that . If , then .

5 Generating Proof Objects for AntiUnification

In this section we describe how the antiunification algorithm can be instrumented in order to generate proof objects. The main idea we follow is as follows and it can be used for a larger class of term-algebra-based algorithms:

  • [topsep=0pt, partopsep=0pt, itemsep=0pt]

  • each execution step generates a proof obligation ;

  • for each generic step we design a proof schema that, when instantiated on the parameters of the execution step , generate a proof object for ;

  • [topsep=0pt, partopsep=0pt, itemsep=0pt]

  • we obtain a proof object for the above equivalence by combining the proof objects corresponding to the execution steps using the transitivity.

5.1 The Proof System Used for Proof Generation

The structure of the proof certificates generated by our approach supplies a proof evidence of the algorithm execution. This is accomplished with the help of the additional macro rules shown in Figure 5. Proving these rules using the proof system in Figure LABEL:fig:proofsystem is out of scope of this paper. The soundness of these rules is proved in Appendix 7 and Appendix 8.

Additional proof rules
-Ctx (∃¯x.φ_1 ∧φ_2) ↔∃¯x.φ_1 ∧φ’_2φ_2 ↔φ’_2
-Scope