Explicit Auditing

08/01/2018 ∙ by Wilmer Ricciotti, et al. ∙ 0

The Calculus of Audited Units (CAU) is a typed lambda calculus resulting from a computational interpretation of Artemov's Justification Logic under the Curry-Howard isomorphism; it extends the simply typed lambda calculus by providing audited types, inhabited by expressions carrying a trail of their past computation history. Unlike most other auditing techniques, CAU allows the inspection of trails at runtime as a first-class operation, with applications in security, debugging, and transparency of scientific computation. An efficient implementation of CAU is challenging: not only do the sizes of trails grow rapidly, but they also need to be normalized after every beta reduction. In this paper, we study how to reduce terms more efficiently in an untyped variant of CAU by means of explicit substitutions and explicit auditing operations, finally deriving a call-by-value abstract machine.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Transparency is an increasing concern in computer systems: for complex systems, whose desired behavior may be difficult to formally specify, auditing is an important complement to traditional techniques for verification and static analysis for security [2, 6, 12, 27, 19, 16], program slicing [22, 26], and provenance [21, 24]. However, formal foundations of auditing as a programming language primitive are not yet well-established: most approaches view auditing as an extra-linguistic operation, rather than a first-class construct. Recently, however, Bavera and Bonelli [14] introduced a calculus in which recording and analyzing audit trails are first-class operations. They proposed a -calculus based on a Curry-Howard correspondence with Justification Logic [9, 10, 8, 7] called calculus of audited units, or CAU. In recent work, we developed a simplified form of CAU and proved strong normalization [25].

The type system of CAU is based on modal logic, following Pfenning and Davies [23]: it provides a type of audited units, where is “evidence”, or the expression that was evaluated to produce the result of type . Expressions of this type contain a value of type along with a “trail” explaining how was obtained by evaluating . Trails are essentially (skeletons of) proofs of reduction of terms, which can be inspected by structural recursion using a special language construct.

To date, most work on foundations of auditing has focused on design, semantics, and correctness properties, and relatively little attention has been paid to efficient execution, while most work on auditing systems has neglected these foundational aspects. Some work on tracing and slicing has investigated the use of “lazy” tracing [22]; however, to the best of our knowledge there is no prior work on how to efficiently evaluate a language such as CAU in which auditing is a built-in operation. This is the problem studied in this paper.

A naïve approach to implementing the semantics of CAU as given by Bavera and Bonelli runs immediately into the following problem: a CAU reduction first performs a principal contraction (e.g. beta reduction), which typically introduces a local trail annotation describing the reduction, that can block further beta-reductions. The local trail annotations are then moved up to the nearest enclosing audited unit constructor using one or more permutation reductions. For example:

where is a bang-free evaluation context and is a subtrail that indicates where in context the -step was performed. As the size of the term being executed (and distance between an audited unit constructor and the redexes) grows, this evaluation strategy slows down quadratically in the worst case; eagerly materializing the traces likewise imposes additional storage cost.

While some computational overhead seems inevitable to accommodate auditing, both of these costs can in principle be mitigated. Trail permutations are computationally expensive and can often be delayed without any impact on the final outcome. Pushing trails to the closest outer bang does not serve any real purpose: it would be more efficient to keep the trail where it was created and perform normalization only if and when the trail must be inspected (and this operation does not even actually require an actual pushout of trails, because we can reuse term structure to compute the trail structure on-the-fly).

This situation has a well-studied analogue: in the -calculus, it is not necessarily efficient to eagerly perform all substitutions as soon as a -reduction happens. Instead, calculi of explicit substitutions such as Abadi et al.’s  [1] have been developed in which substitutions are explicitly tracked and rewritten. Explicit substitution calculi have been studied extensively as a bridge between the declarative rewriting rules of -calculi and efficient implementations. Inspired by this work, we hypothesize that calculi with auditing can be implemented more efficiently by delaying the operations of trail extraction and erasure, using explicit symbolic representations for these operations instead of performing them eagerly.

Particular care must be placed in making sure that the trails we produce still correctly describe the order in which operations were actually performed (e.g. respecting call-by-name or call-by-value reduction): when we perform a principal contraction, pre-existing trail annotations must be recorded as history that happened before the contraction, and not after. In the original eager reduction style, this is trivial because we never contract terms containing trails; however, we will show that, thanks to the explicit trail operations, correctness can be achieved even when adopting a lazy normalization of trails. Accordingly, we will introduce explicit terms for delayed trail erasure and delayed trail extraction . We can use these features to decrease the cost of normalization: for instance, the -reduction above can be replaced by a rule with delayed treatment of substitution and trails, denoted by :

Here, we use de Bruijn notation [15] (as in , and anticipating Sections 3 and 4), and write for the explicit substitution of for the outermost bound variable of . The trail constructor stands for transitive composition of trails, while and are congruence rules on trails, so the trail says that the redex’s trail is constructed by extracting the latent trail information from and , combining it appropriately, and then performing a step. The usual contractum itself is obtained by substituting the erased argument into the erased function body . Although this may look a bit more verbose than the earlier beta-reduction, the additional work done to create the trail is all work that would have been done anyway using the eager system, while the use of lazy trail-extraction and trail-erasure operations gives us many more ways to do the remaining work efficiently — for example, if the trail is never subsequently used, we can just discard it without doing any more work.

Contributions

We study an extension of Abadi et al.’s calculus  [1] with explicit auditing operations. We consider a simplified, untyped variant of the Calculus of Audited Units (Section 2); this simplifies our presentation because type information is not needed during execution. We revisit in Section 3, extend it to include auditing and trail inspection features, and discuss problems with this initial, naïve approach. We address these problems by developing a new calculus with explicit versions of the “trail extraction” and “trail erasure” operations (Section 4), and we show that it correctly refines (subject to an obvious translation). In Section 5, we build on to define an abstract machine for audited computation and prove its correctness. Details of the proofs are included in the appendix.

2 The Untyped Calculus of Audited Units

The language presented here is an untyped version of the calculi  [14] and Ricciotti and Cheney’s  [25] obtained by erasing all typing information and a few other related technicalities: this will allow us to address all the interesting issues related to the reduction of CAU terms, but with a much less pedantic syntax. To help us explain the details of the calculus, we adapt some examples from our previous paper [25]; other examples are described by Bavera and Bonelli [14].

Unlike the typed variant of the calculus, we only need one sort of variables, denoted by the letters . The syntax of is as follows:

Terms
Trails

extends the pure lambda calculus with audited units (colloquially, “bang ”), whose purpose is to decorate the term with a log of its computation history, called trail in our terminology: when evolves as a result of computation, will be updated by adding information about the reduction rules that have been applied. The form is in general not intended for use in source programs: instead, we will write for , where represents the empty execution history (reflexivity trail).

Audited units can then be employed in larger terms by means of the “let-bang” operator, which unpacks an audited unit and thus allows us to access its contents. The variable declared by a is bound in its second argument: in essence will reduce to , where free occurrences of have been replaced by ; the trail will not be discarded, but will be used to produce a new trail explaining this reduction.

The expression form is an auxiliary, intermediate annotation of with partial history information which is produced during execution and will eventually stored in the closest surrounding bang.

Example 1

In we can express history-carrying terms explicitly: for instance, if we use to denote the Church encoding of a natural number , and or for lambda terms computing addition and factorial on said representation, we can write audited units like

where is a trail representing the history of i.e., for instance, a witness for the computation that produced by reducing ; likewise, might describe how computing produced . Supposing we wish to add these two numbers together, at the same time retaining their history, we will use the construct to look inside them:

where the final trail is produced by composing and ; if this reduction happens inside an external bang, will eventually be captured by it.

Trails, representing sequences of reduction steps, encode the (possibly partial) computation history of a given subterm. The main building blocks of trails are (representing standard beta reduction), (contraction of a let-bang redex) and (denoting the execution of a trail inspection). For every class of terms we have a corresponding congruence trail (, the last of which is associated with trail inspections), with the only exception of bangs, which do not need a congruence rule because they capture all the computation happening inside them. The syntax of trails is completed by reflexivity (representing a null computation history, i.e. a term that has not reduced yet) and transitivity (i.e. sequential composition of execution steps). As discussed by our earlier paper [25], we omit Bavera and Bonelli’s symmetry trail form.

Example 2

We build a pair of natural numbers using Church’s encoding:

The trail for the first computation step is obtained by transitivity (trail constructor ) from the original trivial trail (, i.e. reflexivity) composed with , which describes the reduction of the applied lambda: this subtrail is wrapped in a congruence because the reduction takes place deep inside the left-hand subterm of an application (the other argument of is reflexivity, because no reduction takes place in the right-hand subterm).

The second beta-reduction happens at the top level and is thus not wrapped in a congruence. It is combined with the previous trail by means of transitivity.

The last term form , called trail inspection, will perform primitive recursion on the computation history of the current audited unit. The metavariables and associated with trail inspections are trail replacements, i.e. maps associating to each possible trail constructor, respectively, a term or a trail:

When the trail constructors are irrelevant for a certain or , we will omit them, using the notations or . These constructs represent (or describe) the nine cases of a structural recursion operator over trails, which we write as .

Definition 1

The operation , which produces a term by structural recursion on applying the inspection branches , is defined as follows:

where the sequence is obtained from by pointwise recursion.

Example 3

Trail inspection can be used to count all of the contraction steps in the history of an audited unit, by means of the following trail replacement:

where is a variant of taking nine arguments, as required by the arity of . For example, we can count the contractions in as:

2.1 Reduction

Reduction in includes rules to contract the usual beta redexes (applied lambda abstractions), “beta-bang” redexes, which unpack the bang term appearing as the definiens of a , and trail inspections. These rules, which we call principal contractions, are defined as follows:

Substitution is defined in the traditional way, avoiding variable capture. The first contraction is familiar, except for the fact that the reduct has been annotated with a trail. The second one deals with unpacking a bang: from we obtain , which is then substituted for in the target term ; the resulting term is annotated with a trail. The third contraction defines the result of a trail inspection . Trail inspection will be contracted by capturing the current history, as stored in the nearest enclosing bang, and performing structural recursion on it according to the branches defined by . The concept of “nearest enclosing bang” is made formal by contexts in which the hole cannot appear inside a bang (or bang-free contexts, for short):

The definition of the principal contractions is completed, as usual, by a contextual closure rule stating that they can appear in any context :

The principal contractions introduce local trail subterms , which can block other reductions. Furthermore, the rule for trail inspection assumes that the annotating the enclosing bang really is a complete log of the history of the audited unit; but at the same time, it violates this invariant, because the trail created after the contraction is not merged with the original history .

For these reasons, we only want to perform principal contractions on terms not containing local trails: after each principal contraction, we apply the following rewrite rules, called permutation reductions, to ensure that the local trail is moved to the nearest enclosing bang:

Moreover, the following rules are added to the relation to ensure confluence:

As usual, is completed by a contextual closure rule. We prove

Lemma 1 ([14])

is terminating and confluent.

When a binary relation on terms is terminating and confluent, we will write for the unique -normal form of . Since principal contractions must be performed on -normal terms, it is convenient to merge contraction and -normalization in a single operation, which we will denote by :

Example 4

We take again the term from Example 1 and reduce the outer as follows:

This -reduction substitutes for ; a trail is produced immediately inside the bang, in the same position as the redex. Then, we -normalize the resulting term, which results in the two trails being combined and used to annotate the enclosing bang.

3 Naïve explicit substitutions

We seek to adapt the existing abstract machines for the efficient normalization of lambda terms to . Generally speaking, most abstract machines act on nameless terms, using de Bruijn’s indices [15], thus avoiding the need to perform renaming to avoid variable capture when substituting a term into another.

Moreover, since a substitution requires to scan the whole term and is thus not a constant time operation, it is usually not executed immediately in an eager way. The abstract machine actually manipulates closures, or pairs of a term and an environment declaring lazy substitutions for each of the free variables in : this allows to be applied in an incremental way, while scanning the term in search for a redex. In the -calculus of Abadi et al. [1], lazy substitutions and closures are manipulated explicitly, providing an elegant bridge between the classical -calculus and its concrete implementation in abstract machines. Their calculus expresses beta reduction as the rule

where is a nameless abstraction à la de Bruijn, and is a (suspended) explicit substitution mapping the variable corresponding to the first dangling index in to , and all the other variables to themselves. Terms in the form , representing closures, are syntactically part of , as opposed to substitutions , which are meta-operations that compute a term. In this section we formulate a first attempt at adding explicit substitutions to

. We will not prove any formal result for the moment, as our purpose is to elicit the difficulties of such a task. An immediate adaptation of

-like explicit substitutions yields the following syntax:

Terms
Substitutions

where is the first de Bruijn index, the nameless binds the first free index of its argument, and similarly the nameless binds the first free index of its second argument. Substitutions include the identity (or empty) substitution , lift (which reinterprets all free indices as their successor ), the composition (equivalent to the sequencing of and ) and finally (indicating a substitution that will replace the first free index with , and other indices with their predecessor under substitution ). Trails are unchanged.

We write as syntactic sugar for . Then, reductions can be expressed as follows:

(trail inspection, which does not use substitutions, is unchanged). The idea is that explicit substitutions make reduction more efficient because their evaluation does not need to be performed all at once, but can be delayed, partially or completely; delayed explicit substitutions applied to the same term can be merged, so that the term does not need to be scanned twice. The evaluation of explicit substitution can be defined by the following -rules:

These rules are a relatively minor adaptation from those of : as in that language, -normal forms do not contain explicit substitutions, save for the case of the index , which may be lifted multiple times, e.g.:

If we take to represent the de Bruijn index , as in , -normal terms coincide with a nameless representation of .

The -rules are deferrable, in that we can perform -reductions even if a term is not in -normal form. We would like to treat the -rules in the same way, perhaps performing -normalization only before trail inspection; however, we can see that changing the order of -rules destroys confluence even when -redexes are triggered in the same order.

Figure 1: Non-joinable reduction in with naïve explicit substitutions

Consider for example the reductions in Figure 1: performing a -step before the beta-reduction, as in the right branch, yields the expected result. If instead we delay the -step, the trail decorating is duplicated by beta reduction; furthermore, the order of and gets mixed up: even though records computation that happened (once) before , the final trail asserts that happened (twice) after .111Although the right branch describes an unfaithful account of history, it is still a coherent one: we will explain this in more detail in the conclusions. As expected, the two trails (and consequently the terms they decorate) are not joinable.

The example shows that -reduction on terms whose trails have not been normalized is anachronistic. If we separated the trails stored in a term from the underlying, trail-less term, we might be able to define a catachronistic, or time-honoring version of -reduction. For instance, if we write for trail-erasure and for the trail-extraction of a term , catachronistic beta reduction could be written as follows:

Without any pretense of being formal, we can give a partial definition of trail-erasure and trail-extraction , which we collectively refer to as trail projections, as follows:

This definition is only partial: we do not say what to do when the term contains explicit substitutions. When computing, say, , the best course of action we can think of is to obtain the -normal form of , which is a pure term with no explicit substitutions, and then proceed with its trail-extraction.

But the whole approach is clumsy: trail-erasure and trail-extraction are multi-step operations that need to scan their entire argument, even when it does not contain any explicit substitution. We would achieve greater efficiency if they could be broken up into sub-steps, much like we did with substitution.

Surely, to obtain this result we need a language in which terms and trails can mention trail-erasure and trail-extraction explicitly. This is the language that we will introduce in the next section.

4 The calculus

We define the untyped Calculus of Audited Units with explicit substitutions, or , as the following extension of the syntax of presented in Section 2:

builds on the observations about explicit substitutions we made in the previous section: in addition to closures , it provides syntactic trail erasures denoted by ; dually, the syntax of trails is extended with the explicit trail-extraction of a term, written . In the naïve presentation, we gave a satisfactory set of -rules defining the semantics of explicit substitutions, which we keep as part of . To express the semantics of explicit projections, we provide in Figure 2 rules stating that and commute with most term constructors (but not with ) and are blocked by explicit substitutions. These rules are completed by congruence rules asserting that they can be used in any subterm or subtrail of a given term or trail.

Figure 2: -reduction for explicit trail projections

The rules from Section 2 are added to with the obvious adaptations. We prove that and , together, yield a terminating and confluent rewriting system.

Theorem 4.1

is terminating and confluent.

Proof

Tools like AProVE [17] are able to prove termination automatically. Local confluence can be proved easily by considering all possible pairs of rules: full confluence follows as a corollary of these two results.

4.1 Beta reduction

We replace the definition of -reduction by the following lazy rules that use trail-extraction and trail-erasure to ensure that the correct trails are eventually produced:

where specifies that the reduction cannot take place within a bang, a substitution, or a trail erasure:

As usual, the relation is extended to inner subterms by means of congruence rules. However, we need to be careful: we cannot reduce within a trail-erasure, because if we did, the newly created trail would be erroneously erased:

This is why we express the congruence rule by means of contexts such that holes cannot appear within erasures (the definition also employs substitution contexts to allow reduction within substitutions): Formally, evaluation contexts are defined as follows:

Definition 2 (evaluation context)

We denote -equivalence (the reflexive, symmetric, and transitive closure of ) by means of . As we will prove, -equivalent terms can be interpreted as the same term: for this reason, we define reduction in as the union of and :

4.2 Properties of the rewriting system

The main results we prove concern the relationship between and : firstly, every reduction must still be a legal reduction within ; in addition, it should be possible to interpret every reduction as a reduction over suitable -normal terms.

Theorem 4.2

If , then .

Theorem 4.3

If , then .

Although , just like , is not confluent (different reduction strategies produce different trails, and trail inspection can be used to compute on them, yielding different terms as well), the previous results allow us to use Hardin’s interpretation technique [18] to prove a relativized confluence theorem:

Theorem 4.4

If and , and furthermore and are joinable in , then and are joinable in .

Proof

See Figure 3.

Figure 3: Relativized confluence for .

While the proof of Theorem 4.2 is not overly different from the similar proof for the -calculus, Theorem 4.3 is more interesting. The main challenge is to prove that whenever , we have . However, when proceeding by induction on , the terms and are too normalized to provide us with a good enough induction hypothesis: in particular, we would want them to be in the form even when is reflexivity. We call terms in this quasi-normal form focused, and prove the theorem by reasoning on them. The appendix contains the details of the proof.

5 A call-by-value abstract machine

In this section, we derive an abstract machine implementing a weak call-by-value strategy. More precisely, the machine will consider subterms shaped like , where is a pure term with no explicit operators, and is an environment, i.e. an explicit substitution containing only values. In the tradition of lazy abstract machines, values are closures (typically pairing a lambda and an environment binding its free variables); in our case, the most natural notion of closure also involves trail erasures and bangs:

Closures
Values
Environments

According to this definition, the most general case of closure is a telescope of bangs, each equipped with a complete history, terminated at the innermost level by a lambda abstraction applied to an environment and enclosed in an erasure.

The environment contains values with dangling trails, which may be captured by bangs contained in ; however, the erasure makes sure that none of these trails may reach the external bangs; thus, along with giving meaning to free variables contained in lambdas, closures serve the additional purpose of making sure the history described by the is complete for each bang.

The machine we describe is a variant of the SECD machine. To simplify the description, the code and environment are not separate elements of the machine state, but they are combined, together with a trail, as the top item of the stack. Another major difference is that a code can be not only a normal term without explicit operations, but also be a fragment of abstract syntax tree. The stack is a list of tuples containing a trail, a code, and an environment, and represents the subterm currently being evaluated (the top of the stack) and the unevaluated context, i.e. subterms whose evaluation has been deferred (the remainder of the stack). As a pleasant side-effect of allowing fragments of the AST into the stack, we never need to set aside the current stack into the dump: is just a list of values representing the evaluated context (i.e. the subterms whose evaluation has already been completed).

Codes
Tuples
Stack
Dumps
Configurations

The AST fragments allowed in codes include application nodes , bang nodes , incomplete let bindings , and inspection nodes . A tuple in which the code happens to be a term can be easily interpreted as ; however, tuples whose code is an AST fragment only make sense within a certain machine state. The machine state is described by a configuration consisting of a stack and a dump.