On the Taylor Expansion of Probabilistic λ-Terms (Long Version)

04/21/2019 ∙ by Ugo Dal Lago, et al. ∙ 0

We generalise Ehrhard and Regnier's Taylor expansion from pure to probabilistic λ-terms through notions of probabilistic resource terms and explicit Taylor expansion. We prove that the Taylor expansion is adequate when seen as a way to give semantics to probabilistic λ-terms, and that there is a precise correspondence with probabilistic Böhm trees, as introduced by the second author.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Linear logic is a proof-theoretical framework which, since its inception [10], has been built around an analogy between on the one hand linearity in the sense of linear algebra, and on the other hand the absence of copying and erasing in cut elimination and higher-order rewriting. This analogy has been pushed forward by Ehrhard and Regnier, who introduced a series of logical and computational frameworks accounting, along the same analogy, for concepts like that of a differential, or the very related one of an approximation. We are implicitly referring to differential -calculus [6], to differential linear logic [8], and to the Taylor expansion of ordinary -terms [9]. The latter has given rise to an extremely interesting research line, with many deep contributions in the last ten years. Not only the Taylor expansion of pure -terms has been shown to be endowed with a well-behaved notion of reduction, but the Böhm tree and Taylor expansion operators are now known to commute [7]. This easily implies that the equational theory (on pure -terms) induced by the Taylor expansion coincides with the one induced by Böhm trees.

The Taylor expansion operator is essentially quantitative, in that its codomain is not merely the set of resource -terms [3, 6], a term syntax for promotion-free differential proofs, but the set of linear combinations of those terms, with positive real number coefficients. When enlarging the domain of the operator to account for a more quantitative language, one is naturally lead to consider algebraic -calculi, to which giving a clean computational meaning has been proved hard so far [18].

But what about probabilistic -calculi [11], which have received quite some attention recently (see, e.g. [5, 2, 16]

) due to their applicability to randomised computation and bayesian programming? Can the Taylor expansion naturally be generalised to those calculi? This is an interesting question, to which we give the first definite positive answer in this paper. In particular, we show that the Taylor expansion of probabilistic

-terms is a conservative extension of the well-known one on ordinary -terms. In particular, the target can be taken, as usual, as a linear combination of ordinary resource -terms, i.e., the same kind of structure which Ehrhard and Regnier considered in their work on the Taylor expansion of pure -terms. We moreover show that the Taylor expansion, as extended to probabilistic -terms, continues to enjoy the nice properties it has in the deterministic realm. In particular, it is adequate as a way to give semantics to probabilistic -terms, and the equational theory on probabilistic -terms induced by Taylor expansion coincides with the one induced by a probabilistic variation on Böhm trees [1]. The latter, noticeably, has been proved to capture observational equivalence, one quotiented modulo -equivalence [1].

Are we the first ones to embark on the challenge of generalising Taylor’s expansion to probabilistic -calculi, and in general to effectful calculi? Actually, some steps in this direction have recently been taken. First of all, we need to mention the line of works originated by Tsukada and Ong’s paper on rigid resource terms [14]. This has been claimed from the very beginning to be a way to model effects in the resource -calculus, but it has also been applied to, among others, probabilistic effects, giving rise to quantitative denotational models [15]. The obtained models are based on species, and are proved to be adequate. The construction being generic, there is no aim at providing a precise comparison between the discriminating power of the obtained theory and, say, observational equivalence: the choice of the underlying effect can in principle have a huge impact on it.

One should also mention Vaux’s work on the algebraic -calculus [18], where one can build arbitrary linear combinations of terms. He showed a correspondence between Taylor expansion and Böhm trees, but only for terms whose Böhm trees approximants at finite depths are computable in a finite number of steps. This includes all ordinary -terms but not all probabilistic ones. More recently Olimpieri and Vaux have studied a Taylor expansion for a non-deterministic -calculus [19] corresponding to our notion of explicit Taylor expansion (Section 3).

In the rest of this section, probabilistic Taylor expansion will be informally introduced by way of an example, so as to make the main concepts comprehensible to the non-specialist. In sections 2 and 3, we introduce a new form of resource term, and a notion of explicit Taylor expansion from probabilistic -terms. These constructions have an interest in themselves (again, see [19]) but in this paper they are just an intermediate step towards proving our main results. Definitionally, the crux of the paper is Section 4, in which the Taylor expansion of a probabilistic -term is made to produce ordinary resource terms. The relationship between the introduced theory and the one induced by Probabilistic Böhm trees [13] is investigated in Section 5 and Section 7.

The Probabilistic Taylor Expansion, Informally

In this section, we introduce the main ingredients of the probabilistic Taylor expansion by way of an extremely simple, although instructive, example. Let us consider the probabilistic -term , where is an operator for binary, fair, probabilistic choice, , and is a purely diverging, term. As such, is a term of a minimal, untyped, probabilistic -calculus. Evaluation of , if performed leftmost-outermost is as in Figure 1

. In particular, the probability of convergence for

is .

Figure 1: ’s Reduction Tree.

Please observe that two copies of the argument are produced, and that the “rightmost” one is evaluated only when the “leftmost” one converges, i.e. when the probabilistic choice produces as a result.

The main idea behind building the Taylor expansion of any -term is to describe the dynamics of by way of linear approximations of . In the realm of the -calculus, a linear approximation has traditionally been taken as a resource -term, which can be seen as a pure -term in which applications have the form , where is a term and is a multiset of terms, and in which the result of firing the redex is the linear combination of all the terms obtained by allocating the resources in to the occurrences of in . For instance, one such element in the Taylor expansion of is , where the occurrence of in head position is provided with only one copy of its argument. If applied to the multiset , this term would reduce into . Similarly, an element in the Taylor expansion of would be , which reduces into . Another element of the same Taylor expansion is , but this one reduces into : there is no way to use its resources linearly, i.e., using them without copying and erasing. The actual Taylor expansion of a term is built by translating any application into an infinite sum . For instance, the Taylor expansion of is . Remark that any summand properly reduces only when , in which case it reduces to . In turn reduces properly only when , and the result is . All the other terms reduce to . In the end the Taylor expansion of normalises to .

Extending the Taylor expansion to probabilistic terms seems straightforward, a natural candidate for the Taylor expansion of being just . When computing the Taylor expansion of we will find expressions such as , i.e. . For non-trivial reasons, the Taylor expansion of any diverging term normalises to , so just like in our previous example, the only element in which does not reduce to is . The difference is that this time it appears with a coefficient , so normalises to . Please notice how this is precisely the “normal form” of the original term . This is a general phenomenon, whose deep consequences will be investigated in the rest of this paper, and in particular in Section 5.

Notations

We write for the set of natural numbers and for the set of nonnegative real numbers. Given a set , we write for the set of families of positive real numbers indexed by elements in . We write such families as linear combinations: an element is a sum , with . The support of a family is . We write for those families such that is finite. Given we often write for unless we want to emphasise the difference between the two expressions. We also define finite multisets over as functions such that for finitely many . We use the notation to describe the multiset such that is the number of indices such that .

2 Probabilistic Resource -Calculus

In this section, we describe the theory of resource terms with explicit choices, for the purpose of extending many of the properties of resource terms to the probabilistic case. All this has an interest in itself, but here this is mainly useful as a way to render certain proofs about the Taylor Expansion easier (see Section 3 for more details). For this reason we try to give the reader a clear understanding of this calculus and of why these definitions and properties are useful, without focusing on the actual proofs. These are straightforward generalisations of those for deterministic resource terms [9] and can be found in an extended version of this paper [4]. The same results have recently been given for a non-deterministic calculus [19] by Olimpieri and Vaux.

2.1 The Basics

Definition 2.1.

The sets of probabilistic simple resource terms and of probabilistic simple resource poly-terms over a set of variables are defined by mutual induction as follows:

where ranges over . We call finite probabilistic resource terms the finite linear combinations of resource terms in , and finite probabilistic resource poly-terms the finite linear combinations of resource poly-terms in . We extend the constructors of simple (poly-)terms to (poly-)terms by linearity, e.g., if then is defined as the poly-term such that and if is not an abstraction.

Some consecutive abstractions will be indicated as , or even as . Similarly, to describe many successive applications , we use a single pair of brackets and we write . We write for , which is ranged over by metavariables like . Note that intuitively should stand for either or , not their union. For instance we will prove some properties for finite linear combinations in , but the only relevant linear combinations are the actual (poly-)terms in or . Yet this distinction is technically irrelevant, and all our results hold if we define as a union.

The reason why linear combinations over such elements are dubbed terms will be clear once we describe the operational semantics of the resource calculus. The main point of the resource -calculus is to allow functions to use their argument arbitrarily many times and yet remain entirely linear, which is achieved by taking multisets as arguments: if a function uses its argument times then it needs to receive resources as argument and use each of them linearly. This idea has two consequences. First, an application can fail if a function is not given exactly as many arguments as it needs, as it would need either to duplicate or to discard some of them. Second, the result of a valid application is often not unique: a function can choose how to allocate the different resources to the different calls to its argument, and different choices may lead to different results. Both these features are treated using linear combinations: a failed application results in (i.e. the trivial linear combination) and a successful one yields the sum of all its possible outcomes.

Definition 2.2.

We define the substitution of for in by:

where are the free occurrences of in and is the set of permutations over . Alternatively, we could define by induction on , as follows

where is the disjoint union of sets.

Example 2.1.

A basic example is : there are two occurrences of in , so there are two ways to substitute for them. Remark that we also have : the two occurrences of are not as clearly distinguished as in the first example but they still count as different occurrences. Similarly and : there are two distinct occurrences of , so there are two ways to allocate them. As another example, please consider : the substitution fails if the number of resources does not match the number of free occurrences of the substituted variable.

The operational semantics of the deterministic resource -calculus [9] is usually given as a single rule of -reduction. In the probabilistic setting, we also need rules to make choices commute with head contexts.

Definition 2.3.

The reductions and are defined from to by:

extended under arbitrary contexts. We simply write for . Reduction can be extended to finite terms in the following way: if , and then .

As the resource -calculus does not allow any duplication, and -reduction erases some constructors, it naturally decreases the size of the involved simple terms. Consequently, -reduction is strongly normalising. This result can be extended to the whole reduction , which is also confluent.

More specifically we define the size of a simple (poly-)term in a natural way. To any we associate two sizes: and . We order with a reverse lexicographical order: iff there exists such that and for all .

Proposition 2.1.

The reduction is confluent and strongly normalising on . Given we write for its unique normal form for , and given we write for .

Proof.

Proving weak confluence is straightforward. Strong normalisation is proven in two steps. First using an appropriate weight on terms describing how deep choices are we can prove that is strongly normalising. Second one can observe that preserves size, and that if and then , hence if then . The confluence is given by Newman’s Lemma. ∎

2.2 Complete Left Reduction

This reduction is not convenient to study (poly-)terms with particular properties such as uniformity or regularity, which we will define later. For instance given a simple poly-term we can reduce independently the different occurrences of , so not every reduct of is of the form with . Similarly given a term we can reduce independently the elements of its support, possibly losing some common properties shared by these elements. For that reason (as well as the issue of infinite terms discussed in the rest of this section) we are mostly interested in normalisation rather than reduction. To study this normalisation we still need some small-step operational semantics, but it will be more convenient to consider the complete left reduction defined as follows.

Definition 2.4.

We define the complete left reduct of a simple (poly-)term by induction:

We extend this definition to terms: .

Proposition 2.2.

For all , .

Proposition 2.3.

For all there is such that .

Proof.

The reduction being strongly normalising we reason by induction on the bound on the length of the reductions of . We have either and is already in normal form or reduces into in a least one step and we conclude by induction hypothesis. ∎

2.3 Infinite Terms

So far we only worked with finite terms but to fully express the operational behaviour of a -term in the resource -calculus, which is the purpose of the Taylor expansion, we need infinite ones. We can extend the constructors of the calculus to by linearity and generalise the reduction relation , but Proposition 2.1 fails. Indeed let and . For , let . Then, for all the term normalises in steps and does not normalise in a finite number of reduction steps. A simple solution to this problem is to define the “normal form” of an infinite term by normalising each of its components: we can set . But then another problem arises. In our previous example, we have for all , thus we would have , which is not an element of as the coefficient of is infinite. Still we can use this pointwise normalisation if we consider terms with a particular property, called uniformity.

Definition 2.5.

The coherence relation on is defined by:

For we write when for all , . A simple (poly-)term is called uniform if , and a term is called uniform if .

Remark 2.1.

In the rule for we require and to ensure that whenever , the simple (poly-)terms and are necessarily uniform. This is not crucial, as we will only consider uniform (poly-)terms, whose support contains only uniform simple (poly-)terms by definition, but this simplifies inductive reasoning.

What makes coherence and uniformity interesting is that if two coherent terms and have disjoint supports, then all of their reducts, and in particular their normal forms, have disjoint supports. Then any element in the support of comes either from or from , but it cannot come from both.

Lemma 2.4.

If and then . Besides if then and .

Proof.

By induction on :

  • If then for and to be both nonempty we need to have and for some , and in this case and . The hypothesis implies , and if then .

  • If with then either one of the substitutions is or we have .

  • If , or , with in each case , then the result is immediate by induction hypothesis.

  • If then we use the induction hypothesis on and (given by Proposition LABEL:prop:coh_ref) to prove that for we have , and similarly for , and the result follows. Notice that we will never have .

  • If then , and similarly for . Observe that for and we have and so we can apply the induction hypothesis to and and to and to get the result.

  • Finally if we use a similar reasoning: for any and we have , and , hence by induction hypothesis for any and we have , and . This gives the first part of the result. Now if then necessarily , and we can find sets and such that , and (up to permutation of the indices in and ). By induction hypothesis we get and , hence and .

Proposition 2.5.

Given , if then . If moreover then .

Proof.

It is sufficient to prove the result for simple terms as the generalisation to finite terms is straightforward. We reason by induction on and the proof of .

  • If or the result is immediate by induction hypothesis.

  • If then and are uniform and by induction hypothesis so are and , hence .

  • The case of head normal forms is immediate by induction hypothesis.

  • If then we apply Lemma 2.4.

  • The cases of head choices are immediate.

  • The case of poly-terms is immediate by induction hypothesis.

Corollary 2.6.

Given , if then . If moreover then .

Proof.

Using Proposition 2.3, by induction on . ∎

This immediately implies that pointwise reduction of infinite uniform terms is well defined, as both complete left reducts and normal forms of distinct but coherent simple (poly-)terms have disjoint supports.

Corollary 2.7.

If is uniform then and are in . We write and respectively for these sums.

Proof.

For all we have by hypothesis so the previous proposition gives . Therefore given any there is at most one such that . The same goes for normalisation. ∎

Remark 2.2.

Although both complete left reduction and normal forms are well defined for infinite terms, Proposition 2.3 doesn’t hold: consider , and , then is uniform and but for all , . Besides is not even the limit of the as approaches . However normal forms are indeed limits of complete left reducts restricted to normal simple terms.

Proposition 2.8.

Given a uniform (poly-)term and given in normal form, we have for all large enough.

Proof.

If then by Corollary 2.6 there is a unique such that , and by Proposition 2.3 for all large enough we have . ∎

2.4 Regular Terms

The deterministic Taylor expansion associates to any -term a uniform term, and explicit choices are adopted precisely for the sake of preserving this property in the probabilistic case. Taylor expansions have another important property: they are entirely defined by their support. If a simple term is in the support of the Taylor expansion of a -term , then its coefficient is the inverse of its multinomial coefficient, which does not depend on . Moreover this property is preserved by normalisation. Using explicit choices enforces this result in the probabilistic case, as well.

Definition 2.6.

For any we define the multinomial coefficient by:

where is the multiplicity of in .

Definition 2.7.

A uniform term is called regular if for all , .

Multinomial coefficients correspond to the number of permutations of multisets which preserve the description of simple (poly-)terms. For instance, given variables , the coefficient is exactly the number of permutations such that . For a more precise interpretation of multinomial coefficients see [9] or [14]. Due to their relation with permutations in multisets, these coefficients appear naturally when we perform substitutions.

Theorem 2.9.

For any uniform, for , and , we have: .

There exist two methods to prove similar theorems in the literature, and both can be used to prove Theorem 2.9. The first one is the original proof by Ehrhard and Regnier for the pure deterministic case [9], and its generalisation is straightforward and only requires to extend the notion of uniformity (to take into account that is uniform). The second one is by Asada, Tsukada and Ong for a simply typed calculus with choices [14], and it has been extended to the untyped case by Olimpieri and Vaux in an unpublished paper [19]. We present here a direct generalisation of the proof in [9].

Definition 2.8.

A multilinear-free (poly)-term is a (poly)-term such that all of its variables are free and each one occurs exactly once. A multilinear-free substitution is a partial function from to multilinear-free terms such that for all in . We say that is adapted if and no element of is bound in . Then is the multilinear-free (poly)-term obtained by applying on the variables of . Similarly for any multilinear-free (poly)-term and we write for the term obtained by applying to the variables of without renaming captured variables. A pair is said to represent if .

Definition 2.9.

We define the following sets of bijections over variables: