# A note on confluence in typed probabilistic lambda calculi

On the topic of probabilistic rewriting, there are several works studying both termination and confluence of different systems. While working with a lambda calculus modelling quantum computation, we found a system with probabilistic rewriting rules and strongly normalizing terms. We examine the effect of small modifications in probabilistic rewriting, affine variables, and strategies on the overall confluence in this strongly normalizing probabilistic calculus.

## Authors

• 1 publication
• 12 publications
• ### Lambda Calculus and Probabilistic Computation

We introduce two extensions of λ-calculus with a probabilistic choice op...
01/09/2019 ∙ by Claudia Faggian, et al. ∙ 0

• ### On Randomised Strategies in the λ-Calculus (Long Version)

In this work we introduce randomised reduction strategies, a notion alre...
05/10/2018 ∙ by Ugo Dal Lago, et al. ∙ 0

• ### Probabilistic Rewriting: Relations between Normalization, Termination, and Unique Normal Forms

We investigate how techniques from Rewrite Theory can help us to study c...
04/16/2018 ∙ by Claudia Faggian, et al. ∙ 0

• ### Decomposing Probabilistic Lambda-calculi

A notion of probabilistic lambda-calculus usually comes with a prescribe...
02/19/2020 ∙ by Ugo Dal Lago, et al. ∙ 0

• ### The Matrix Calculus You Need For Deep Learning

This paper is an attempt to explain all the matrix calculus you need in ...
02/05/2018 ∙ by Terence Parr, et al. ∙ 0

• ### Termination in Convex Sets of Distributions

Convex algebras, also called (semi)convex sets, are at the heart of mode...
10/28/2017 ∙ by Ana Sokolova, et al. ∙ 0

• ### Grid-Free Computation of Probabilistic Safety with Malliavin Calculus

We work with continuous-time, continuous-space stochastic dynamical syst...
04/29/2021 ∙ by Francesco Cosentino, et al. ∙ 0

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

When dealing with probabilistic lambda calculus, we can find two different sources of divergence.

• A single redex may reduce in two different ways via a probabilistic reduction.

• A term with multiple redexes and no strategy, could be reduced in different ways.

For example, we can consider a lambda calculus extended with a coin reducing to or

with probability

each. Then, taking just the coin, we are in the first case of divergence. While taking, for example, , we are in the second case, since we can either beta reduce, or reduce the coin.

There is no point in trying to achieve confluence in the first case: the coin is non-confluent by design. However, we can analyse the branching paths and verify that the probability of reducing to a particular term stays the same, regardless of the reduction sequence. This is what we call probabilistic confluence.

To study this kind of cases, Bournez and Kirchner developed the notion of PARS [BK02], later refined in [BG06]. Using the techniques described in [DCM17, Faggian] we can define the rewriting rules over the distributions.

If we denote by

the probability distribution where

has probability , the possible reductions from the previous example are depicted in Figure 1. The resulting distributions are not only different, but also divergent, since in the left branch, the probability to arrive, for example, to is , while it is one of the possible results in the right branch.

In this short paper we consider a simply typed lambda calculus extended with a coin, and show different possibilities for achieving some sort of confluence, without giving preference to any of them.

In Section 2 we introduce the calculus to be studied, without any restrictions either in the rewriting rules, or in the typing rules. As we argued above, this naive definition is not confluent (cf. Figure 1), unless a strategy is defined (in which case it becomes trivially probabilistically confluent, as will be discussed in Section 3.1). In Section 3.2 we show that we can achieve confluence by internalising the probabilistic reductions in the terms. In Section 3.3 we show that we can achieve probabilistic confluence by taking an affine-linear type system. Then, in Section 4, we show that we can relax the type system in an if-then-else branching, obtaining a probabilistic confluence result modulo a computational equivalence.

## 2 The λ∘ calculus

In this section we present (read “lambda coin”), which is the simply typed lambda calculus extended with booleans ( and ), an if-then-else construction, and a coin . Terms are inductively defined by

 t:=x∣λx.t∣tt∣1∣0∣if t then t else t∣\fullmoon

The rewrite system is given in Table 1. The rules mean that reduces with probability in one step to , where the sum of probabilities of reducing one redex is . In particular, every non-contextual rule has probability , since there is only one rule per redex, except for the coin, which reduces with probability to and probability to . If we may write .

The reduction rules are intentionally as permissive as possible (even the branches of an if-then-else can be reduced) in order to analyse its (lack of) confluence. The type system is given in Table 2.

Strong normalization for this calculus follows trivially from the proof for the simply typed -calculus with booleans. The only reduction added is the probabilistic coin toss and it takes at most one step for each operator. Hence, using the rewriting over probabilistic distributions techniques from [DCM17], we only need to show local confluence in order to achieve global confluence altogether. This is an adaptation of Newman’s lemma for probabilistic calculi (see [DCM17] for a longer discussion about probabilistic confluence).

Clearly, is not confluent, as already seen in the introduction (see Figure 1). It is easy to see that these distributions represent different results.

The divergence stems from three characteristics of the calculus: (1) Lack of a reduction strategy. (2) Probabilistic reductions. (3) Duplication of variables.

Removing just one of these elements renders the system confluent, however each modification comes with its own trade-off. We will examine each case, one by one, in the following section.

## 3 Removing the divergence sources

### 3.1 Defining a strategy

The definition of a strategy is the easiest modification. Choosing a reduction strategy makes all critical pairs disappear, since there is only one possible reduction rule to be applied for each term distribution. For example, in Figure 1 a call-by-name strategy would take the right path, where call-by-value would take the left one.

Ultimately the choice lies in how to interpret the duplication of variables. Reducing via call-by-name means that the probabilistic event is duplicated. Whereas, a call-by-value strategy duplicates the outcome of said event (see [Dallago] for a discussion on this choice).

### 3.2 Internalising the probabilities

Following [lambdarho], we can modify the reduction on to internalise the entire distribution of a term. In this particular case, every reduction has probability , and the coin toss deterministically reduces to its probability distribution. We write for the probability distribution , for the probability distribution , etc. Then, we can consider the rewrite rule . This idea is common in non-probabilistic settings as well, e.g., [AlvesDunduaFloridoKutsiaIGPL18].

Following this approach brings confluence to the calculus, since every repetition of a probabilistic event rewrites to the same result, its distribution. For example, the two branches of Figure 1 become and , both converging to .

Although this is a valid solution, it forces us to consider every possible state of a program at the same time along with its probability of occurrence, making the management of the system more complex. Here we are dealing with a simple coin, but more involved calculi might have several different reductions, each with its own distribution. If not designed correctly, a language that holds every possible state in the probability distribution can easily become too cumbersome to be effective.

### 3.3 Affine variables

The last reasonable solution is to restrict duplication. One way to do this is by controlling the appearance of variables at the type system level, with an affine type system, see Table 3.

This type system solves the counterexample from Figure 1, since the considered term has no type in this system. In particular, we can prove the following property:

###### Lemma 3.1.

If then .

Notice that this property is not true in the unrestricted . For example, while , we have .

The drawback in this approach is clear, there is a loss in expressivity. Of course, in some cases, this restriction is a desirable quality. For example, in quantum computing it may serve to avoid cloning qubits, a forbidden operation in quantum mechanics.

## 4 Computational confluence with a sub-affine type system

The solution considered in Section 3.3 seems quite extreme. In particular, using the same variable in different branches of an if-then-else construction does not actually duplicate it, since only one of those branches will remain. However, changing the rule from Table 3 to given by

breaks confluence anyway. We call this calculus “sub-affine”. Consider the following example. Let , then

[column sep=-50pt] & (λx.λy. )🌕[dl][dr]&

[
 (1/2,(λx.λy.if y then x else not x)0); (1/2,(λx.λy.if y then x else not x)1)
]

[d]& & [(1, (λy. ))][d]

[(, λy. );(, λy. )]

& & [ (, λy. );(, λy. );(, λy. );(, λy. ) ]

Note that terms in both distributions are in normal form. The two paths are syntactically divergent, however the resulting programs share the same behaviour under the same inputs. If we were to apply the resulting abstractions to or to , both paths would yield . Therefore, these distributions are semantically confluent, they are not the same terms but they represent the same function.

We can formalise this notion for the sub-affine calculus as follows. Let

 C:=◊∣Cv

be an elimination context, where is called “placeholder” and is a normal closed term. We write for . Notice that is a term. We say that is an elimination context of , written , if for all , we have . That is, it applies until it reaches the basic type .

The computational equivalence is defined as follows.

###### Definition 4.1.

Let and be two distributions of terms, all closed of type . Then, we say that these distributions are computationally equivalent (notation ) if for all we have and such that the and are in normal form, and , where denotes the equality on distributions.

The previous definition means that two distributions are computationally equivalent if by applying the resulting terms to all the possible inputs, they produce the same probability distribution of results. Notice that the definition is not assuming confluence.

Then, we can prove that the confluence modulo computational equivalence of the sub-affine calculus (Theorem 4.4). We need the following two lemmas, which follow by straightforward induction.

Let , then . ∎

###### Lemma 4.3.

Let reduce to the distribution , then reduces to . ∎

###### Theorem 4.4 (Computational confluence).

Let in the sub-affine calculus. If reduces to the distribution and to the distribution , then .

###### Proof.

We only consider the six critical pairs, four derived from the if-then-else construction and two from the classical lambda calculus. Non-critical pairs are trivially probabilistically confluent.

[column sep=-1em,row sep=1ex] & [dl][dr] &

[(pi, )]i[dr,dashed] && [(1, r)][dl,dashed]

& [(pi, ri)]i & [column sep=-1em,row sep=1ex] & [dl][dr] &

[(pi, )]i[dr,dashed] && [(1, s)][dl,dashed]

& [(pi, s)]i ∼[(1, s)]

The symmetrical cases with close in a similar way. The fifth critical pair closes by Lemma 4.3.

[row sep=1ex] [(pi, (λx.ti)r)]i[dr,dashed] & (λx.t)r[l][r]& [(1, t[r/x])][dl,dashed]

& [(pi, ti[r/x])]i &

The last critical pair requires a more thorough analysis.

[row sep=1ex] [(pi, (λx.t)ri)]i[dr,dashed] & (λx.t)r[l][r]& [(1, t[r/x])]

&[(pi,t[ri/x])]i &

We must prove that for all , if and , then . It is enough to take only one path to and to according to the definition of . If , then . If appears once in , . If appears more than once in , it appears at most times (where is the number of if-then-else constructions), so we can proceed by induction on .

• If , then there is one if-then-else, say and appears both in and in . Therefore, there are three cases:

• is or , then, for or , we have . Notice that .

• with , so we have . That is, we arrived to the distribution . Notice that since , we arrive to the same distribution in the right branch of this critical pair.

• is an open term, hence not reducing to a constant or . In such a case, since is a closed term, it means that the if-then-else construction in is under a lambda-abstraction, therefore, we must beta-reduce first, either by with the argument in , or with an external argument given by the context . We can repeat the same process until we get one of the previous treated cases (that is, at some point, the if-then-else becomes a redex).

• If , we proceed as the previous case, reducing the if-then-else first, which reduces and so the induction hypothesis applies.∎

## 5 Conclusion

In this paper we have analysed the different possibilities to transform a simple probabilistic calculus into a (probabilistically / computationally) confluent calculus. The main contribution on this note is Theorem 4.4, which proves that we can relax the affinity restriction on “non-interfering” paths. This technique has been considered in the first author master’s thesis [Romero20] to prove the computational confluence of the quantum lambda calculus  [lambdarho].