Probabilistic Rewriting: Relations between Normalization, Termination, and Unique Normal Forms

04/16/2018 ∙ by Claudia Faggian, et al. ∙ 0

We investigate how techniques from Rewrite Theory can help us to study calculi whose evaluation is both probabilistic and non-deterministic (think untyped probabilistic lambda-calculus, in which non-determinism arises from choosing between different redexes). We are interested in relations between weak and strong normalization, and whenever the result is unique. We provide ARS-like local conditions, which also extend to a method to compare strategies. As an application, we study the untyped lambda-calculus equipped with a probabilistic choice. We show that weak call-by-value reduction has the same striking properties it has for the standard lambda-calculus: the normal forms are unique, and weak normalization implies strong normalization.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Rewriting Theory [38]

is a foundational theory of computing. Its impact extends to both the theoretical side of computer science, and the development of programming languages. A clear example of both aspects is the paradigmatic term rewriting system,

-calculus, which is also the foundation of functional programming. Abstract Rewriting Systems (ARS) are the general theory which captures the common substratum of rewriting theory, independently of the particular structure of the objects. It studies properties of terms transformations, such as normalization, termination, unique normal form, and the relations among them. Such results are a powerful set of tools which can be used when we study the computational and operational properties of any calculus or programming language. Furthermore, the theory provides tools to study and compare strategies, which become extremely important when a system may have reductions leading to a normal form, but not necessarily. Here we need to know: is there a strategy which is guaranteed to lead to a normal form, if any exists (normalizing strategies)? Which strategies diverge if at all possible (perpetual strategies)?

Probabilistic Computation models uncertainty. Probabilistic models such as automata [33]

, Turing machines

[36], and the -calculus [35]

exist since long. The pervasive role it is assuming in areas as diverse as robotics, machine learning, natural language processing, has stimulated the research on probabilistic programming languages, including functional languages

[26, 34, 31] whose development is increasingly active. A typical programming language supports at least discrete distributions by providing a probabilistic construct which models sampling from a distribution. This is also the most concrete way to endow the -calculus with probabilistic choice [13, 10, 16]. Within the vast research on models of probabilistic systems, we wish to mention that probabilistic rewriting is the explicit base of PMaude [1], a language for specifying probabilistic concurrent systems.

Probabilistic Rewriting. Somehow surprisingly, while a large and mature body of work supports the study of rewriting systems – even infinitary ones [12, 23] – work on the abstract theory of probabilistic rewriting systems is still sparse. The notion of Probabilistic Abstract Reduction Systems (PARS) has been introduced by Bournez and Kirchner in [5], and then extended in [4] to account for non-determinism. Recent work [7, 15, 24, 3] shows an increased research interest. The key element in probabilistic rewriting is that even when the probability that a term leads to a normal form is (almost sure termination), that degree of certitude is typically not reached in any finite number of steps, but it appears as a limit. Think of a rewrite rule (as in Fig. 3) which rewrites to either the value T or , with equal probability . We write this . After steps, reduces to T with probability . Only at the limit this computation terminates with probability .

The most well-developed literature on PARS is concerned with methods to prove almost sure termination, see e.g. [4, 18, 3] (this interest matches the fact that there is a growing body of methods to establish AST [2, 19, 21, 29]). However, considering rewrite rules subject to probabilities opens numerous other questions on PARS, which motivate our investigation.

We study a rewrite relation on distributions, which describes the evolution of a probabilistic system, for example a probabilistic program . The result of the computation is a distribution over all the possible values of . The intuition (see [26]) is that the program is executed, and random choices are made by sampling. This process eventually defines a distribution over the various outputs that the program can produce. We write this .

What happens if the evaluation of a term is also non-deterministic? Remember that non-determinism arises naturally in the -calculus, because a term may have several redexes. This aspect has practical relevance to programming. Together with the fact that the result of a terminating computation is unique, it is key to the inherent parallelism of functional programs (see e.g. [28]). When assuming non-deterministic evaluation, several questions on PARS arise naturally. For example: (1.) when is the result unique? (naively, if and , is ?) (2.) Do all rewrite sequences from the same term have the same probability to reach a result? (3.) If not, does there exist a strategy to find a result with greatest probability?

Such questions are relevant not only to the theory, but also to the practice of computing. We believe that to study them, we can advantageously adapt techniques from Rewrite Theory. However, we cannot assume that standard properties of ARS hold for PARS. The game-changer is that termination appears as a limit. In Sec. 4.4 we show that a well-known ARS property, Newman’s Lemma, does not hold for PARS. This is not surprising; indeed, Newman’s Lemma is known not to hold in general for infinitary rewriting [22, 25]. Still, our counter-example points out that moving from ARS to PARS is non-trivial. There are two main issues: we need to find the right formulation and the right proof technique. It seems especially important to have a collection of proof methods which apply well to PARS.

Content and contributions.

Probability is concerned with asymptotic behaviour: what happens not after a finite number of steps, but when tends to infinity. In this paper we focus on the asymptotic behaviour of rewrite sequences with respect to normal forms. We study computational properties such as (1.),(2.),(3.) above. We do so with the point of view of ARS, aiming for properties which hold independently of the specific nature of the rewritten objects; the purpose is to have tools which apply to any probabilistic rewriting system.

After introducing and motivating our formalism (Sec. 2 and 3), in Sec. 4, we extend to the probabilistic setting the notions of Normalization (WN), Termination (SN) and Unique Normal Form (UN). In the rest of the paper, we provide methods and criteria to establish these properties, and we uncover relations between them. In particular, we study normalizing strategies. To do so, we extend to the probabilistic setting a proposal by Van Oostrom [39], which is based on Newman’s property of Random Descent [30, 39, 40] (see Sec. 1.1). The Random Descent method turns out to provide proof techniques which are well suited to PARS. Specific contributions are the following.

– We propose an analogue of UN for PARS. This is not obvious; the question was already studied in [15] for PARS which are AST, but their solution does not extend to general PARS.

– We investigate the classical ARS method to prove UN via confluence. It turns out that the notion of confluence does not need to be as strong as the classical case would suggest, broadening its scope of application. Subtle aspects appear when dealing with limits, and the proof demand specific techniques.

– We develop a probabilistic extension of the ARS notions of Random Descent (-RD, Sec. 5) and of being better (-better, Sec. 7) as tools to analyze and compare strategies, in analogy to their counterpart in [39]. Both properties are here parametric with respect to a chosen event of interest. -RD entails that all rewrite sequences from a term lead to the same result, in the same expected number of steps (the average of number of steps, weighted w.r.t. probability). -better offers a method to compare strategies (“strategy is always better than strategy ”) w.r.t. the probability of reaching a result and the expected time to reach a result. It provides a sufficient criterion to establish that a strategy is normalizing (resp. perpetual) i.e. the strategy is guaranteed to lead to a result with maximal (resp. minimal) probability. A significant technical feature (inherited from [39]) is that both notions of -RD and -better come with a characterization via a local condition (in ARS, a typical example of a local vs global condition is local confluence vs confluence).

We apply these methods to study a probabilistic -calculus, which we discuss below together with the notion of Random Descent. A deeper example of application to probabilistic -calculus is in [17]; we discuss it in Sec.8 “Further work and applications”.

Remark (On the term Random Descent).

Please note that in [30], the term Random refers to non-determinism (in the choice of the redex), not to randomized choice.

Related work.

We discuss related work in the context of PARS [4, 5]. We are not aware of any work which investigates normalizing strategies (or normalization in general, rather than termination). Instead, confluence in probabilistic rewriting has already drawn interesting work. A notion of confluence for a probabilistic rewrite system defined over a -calculus is studied in [14, 9]; in both case, the probabilistic behavior corresponds to measurement in a quantum system. The work more closely related to our goals is [15]. It studies confluence of non-deterministic PARS in the case of finitary termination (being finitary is the reason why a Newman’s Lemma holds), and in the case of AST. As we observe in Sec. 4.3, their notion of unique limit distribution (if are limits, then ), while simple, it is not an analogue of UN for general PARS; we extend the analysis beyond AST, to the general case, which arises naturally when considering probabilistic -calculus. On confluence, we also mention [24], whose results however do not cover non-deterministic PARS; the probability of the limit distribution is concentrated in a single element, in the spirit of Las Vegas Algorithms. [24] revisits results from [5], while we are in the non-deterministic framework of [4].

The way we define the evolution of PARS, via the one-step relation , follows the approach in [7], which also contains an embryo of the current work (a form of diamond property); the other results and developments are novel. A technical difference with [7] is that for the formalism to be general, a refinement is necessary (see Sec. 2.5); the issue was first pointed out in [15]. Our refinement is a variation of the one introduced (for the same reasons) in [3]; we however do not strictly adopt it, because we prefer to use a standard definition of distribution. [3] demonstrates the equivalence with the approach in [4].

1.1 Key notions

Random Descent.

Newman’s Random Descent (RD) [30] is an ARS property which guarantees that normalization suffices to establish both termination and uniqueness of normal forms. Precisely, if an ARS has random descent, paths to a normal form do not need to be unique, but they have unique length. In its essence: if a normal form exists, all rewrite sequences lead to it, and all have the same length111or, in Newman’s original terminology: the end-form is reached by random descent (whenever and with in normal form, all maximal reductions from have length and end in ).. While only few systems directly verify it, RD is a powerful ARS tool; a typical use in the literature is to prove that a strategy has RD, to conclude that it is normalizing. A well-known property which implies RD is a form of diamond:“.

In [39] Von Oostrom defines a characterization of RD by means of a local property and proposes RD as a uniform method to (locally) compare strategies for normalization and minimality (resp. perpetuality and maximality). [40] extends the method and abstracts the notion of length into a notion of measure. In Sec. 5 and 7 we develop similar methods in a probabilistic setting. The analogous of length, is the expected number of steps (Sec. 5.1).

Probabilistic Weak -calculus.

A notable example of system which satisfies RD is the pure untyped -calculus endowed with call-by-value (CbV) weak evaluation. Weak [20, 6] means that reduction does not evaluate function bodies (i.e. the scope of -abstractions). We recall that weak CbV is the basis of the ML/CAML family of functional languages (and of most probabilistic functional languages). Because of RD, weak CbV -calculus has striking properties (see e.g. [8] for an account). First, if a term has a normal form , any rewrite sequence will find it; second, the number of steps such that is always the same.

In Sec. 6, we study a probabilistic extension of weak CbV, . We show that it has analogous properties to its classical counterpart: all rewrite sequences converge to the same result, in the same expected number of steps.

Local vs global conditions.

To work locally means to reduce a test problem which is global, i.e., quantified over all rewrite sequences from a term, to local properties (quantified only over one-step reductions from the term), thus reducing the space of search when testing.

A paradigmatic example of a global property is confluence (CR: s.t. ). Its global nature makes it difficult to establish. A standard way to factorize the problem is: (1.) prove termination and (2.) prove local confluence (WCR: s.t. ). This is exactly Newman’s lemma:  Termination + WCR CR. The beauty of Newman’s lemma is that a global property (CR) is guaranteed by a local property (WCR). Locality is also the strength and beauty of the RD method. While Newman’s lemma fails in a probabilistic setting (see Sec. 4.4), RD methods can be adapted (Sec. 5 and 7).

1.2 Probabilistic -calculus and (Non-)Unique Result

Rewrite theory provides numerous tools to study uniqueness of normal forms, as well as techniques to study and compare strategies. This is not the case in the probabilistic setting. Perhaps a reason is that when extending the -calculus with a choice operator, confluence is lost, as was observed early [11]; we illustrate it in Example 1 and 2, which is adapted from [11, 10]. The way to deal with this issue in probabilistic -calculi (e.g. [13, 10, 16]) has been to fix a deterministic reduction strategy, typically “leftmost-outermost”. To fix a strategy is not satisfactory, neither for the theory nor the practice of computing. To understand why this matters, recall for example that confluence of the -calculus is what makes functional programs inherently parallel: every sub-expression can be evaluated in parallel, still, we can reason on a program using a deterministic sequential model, because the result of the computation is independent of the evaluation order (we refer to [28], and to Harper’s text “Parallelism is not Concurrency” for discussion on deterministic parallelism, and how it differs from concurrency). Let us see what happens in the probabilistic case.

Example 1 (Confluence failure).

Let us consider the untyped -calculus extended with a binary operator which models probabilistic choice. Here is just flipping a fair coin: reduces to either or with equal probability ; we write this as .

Consider the term , where and ; here is the standard constructs for the exclusive , T and F are terms which code the booleans.

  • If we evaluate and independently, from we obtain , while from we have either T or F, with equal probability . By composing the partial results, we obtain , and therefore .

  • If we evaluate sequentially, in a standard left-most outer-most fashion, reduces to which reduces to and eventually to .

Example 2.

The situation becomes even more complex if we examine also the possibility of diverging; try the same experiment as above on the term , with (where ). Proceeding as before, we now obtain either or .

We do not need to loose the features of -calculus in the probabilistic setting. In fact, while some care is needed, determinism of the evaluation can be relaxed without giving up uniqueness of the result: the calculus we introduce in Sec. 6 is an example (we relax determinism to RD); we fully develop this direction in further work [17]. To be able to do so, we need abstract tools and proof techniques to analyze probabilistic rewriting. The same need for theoretical tools holds, more in general, whenever we desire to have a probabilistic language which allows for deterministic parallel reduction.

In this paper we focus on uniqueness of the result, rather than confluence, which is an important and sufficient, but not necessary property.

2 Probabilistic Abstract Rewriting System

We assume the reader familiar with the basic notions of rewrite theory (such as Ch. 1 of [38]), and of discreteprobability theory. We review the basic language of both. We then recall the definition of PARS from [5, 4], and explain on examples how a system described by a PARS evolves. This will motivate the formalism which we introduce in Sec. 3.

2.1 Basics on ARS.

An abstract rewrite system (ARS) is a pair consisting of a set and a binary relation on ; denotes the transitive reflexive closure of . An element is in normal form if there is no with ; denotes the set of the normal forms of . If and , we say has a normal form .

Unique Normal Form.

has the property of unique normal form (with respect to reduction)(UN) if . has the normal form property (NFP) if . NFP implies UN.

Normalization and Termination.

The fact that an ARS has unique normal forms implies neither that all terms have a normal form, nor that if a term has a normal form, each rewrite sequence converges to it. A term is terminating222Please observe that the terminology is community-dependent. In logic: Strong Normalization, Weak Normalization, Church-Rosser (hence the standard abbreviations SN, WN, CR). In computer science: Termination, Normalization, Confluence. (aka strongly normalizing, SN), if it has no infinite sequence ; it is normalizing (aka weakly normalizing, WN), if it has a normal form. These are all important properties to establish about an ARS, as it is important to have a rewrite strategy which finds a normal form, if it exists.

2.2 Basics on Probabilities.

The intuition is that random phenomena are observed by means of experiments (running a probabilistic program is such an experiment); each experiment results in an outcome. The collection of all possible outcomes is represented by a set, called the sample space . When the sample space is countable, the theory is simple. A discrete probability space is given by a pair , where is a countable set, and is a

discrete probability distribution

on , i.e. a function such that . A probability measure is assigned to any subset as . In the language of probabilists, a subset of is called an event.

Example 3 (Die).

Consider tossing a die once. The space of possible outcomes is the set . The probability of each outcome is . The event

“result is odd

is the subset , whose probability is .

Each function , where is another countable set, induces a probability distribution on by composition: i.e. . Thus is also a probability space. In the language of probability theory, is called a discrete random variable on . The expected value

(also called the expectation or mean) of a random variable

is the weighted (in proportion to probability) average of the possible values of . Assume discrete and a non-negative function, then .

2.3 (Sub)distributions: operations and notation.

We need the notion of subdistribution to account for unsuccessful computations and partial results. Given a countable set , a function is a probability subdistribution if . We write for the set of subdistributions on . With a slight abuse of language, we often use the term distribution also for subdistribution. The support of is the set . denotes the set of with finite support.

is equipped with the order relation of functions : if for each . Multiplication for a scalar () and sum () are defined as usual, , , provided , and .

We adopt the following convention: if , and , we also write , with the implicit assumption that the extension behaves as on , and is otherwise. In particular, we identify a subdistribution and its support.

Notation 4 (Representation).

We represent a (sub)distribution by explicitly indicating the support, and (as superscript) the probability assigned to each element by . We write if and otherwise.

2.4 Probabilistic Abstract Rewrite Systems (PARS).

L/.style= edge label=node[left,blue,font=]#1 , for tree= grow=0,reversed, parent anchor=east,child anchor=west, edge=line cap=round,outer sep=+1pt, l sep=8mm [c, [c, L=1/2, [c, L=1/4,[…],[T]] [T, L=1/4]] [T, L=1/2] ] P/.style= edge label=node[left,blue,font=]#1 , for tree= grow=0,reversed, parent anchor=east,child anchor=west, edge=line cap=round,outer sep=+1pt, l sep=8mm [2, [1,P=1/2[0,P=1/4],[2  ,P=1/4]], [3,P=1/2[2  ,P=1/4],[4  ,P=1/4]] ] P/.style= edge label=node[left,blue,font=]#1 , for tree= grow=0,reversed, parent anchor=east,child anchor=west, edge=line cap=round,outer sep=+1pt, l sep=8mm [2, [1,P=1/2[0,P=1/4],[2,P=1/4,[1  ,P=1/8],[3  ,P=1/8]]], [3,P=1/2[2,P=1/4,[stop,P=1/4]],[4  ,P=1/4]] ]

Figure 1: Almost Sure Termination
Figure 2: Deterministic PARS
Figure 3: Non-deterministicPARS

A probabilistic abstract rewrite system (PARS) is a pair of a countable set and a relation such that for each , . We write for and we call it a rewrite step, or a reduction. An element is in normal form if there is no with . We denote by the set of the normal forms of (or simply NF when is clear). A PARS is deterministic if, for all , there is at most one with .

Remark.

The intuition behind is that the rewrite step () has probability . The total probability given by the sum of all steps is .

Probabilistic vs Non-deterministic.

It is important to have clear the distinction between probabilistic choice (which globally happens with certitude) and non-deterministic choice (which leads to different distributions of outcomes.) Let us discuss some examples.

Example 5 (A deterministic PARS).

Fig. 3 shows a simple random walk over , which describes a gambler starting with points and playing a game where every time he either gains point with probablity or looses point with probability . This system is encoded by the following PARS on : . Such a PARS is deterministic, because for every element, at most one choice applies. Note that is a normal form.

Example 6 (A non-deterministic PARS).

Assume now (Fig. 3) that the gambler of Example 5 is also given the possibility to stop at any time. The two choices are here encoded as follows: .

2.5 Evolution of a system described by a PARS.

We now need to explain how a system which is described by a PARS evolves. An option is to follow the stochastic evolution of a single run, a sampling at a time, as we have done in Fig. 3, 3, and 3. This is the approach in [4], where non-determinism is solved by the use of policies. Here we follow a different (though equivalent) way (see the Related Work Section). We describe the possible states of the system, at a certain time , globally, as a distribution on the space of all terms. The evolution of the system is then a sequence of distributions. Since all the probabilistic choices are taken together, the only source of choice in the evolution is non-determinism. This global approach allows us to deal with non-determinism by using techniques which have been developed in Rewrite Theory. Before introducing the formal definitions, we informally examine some examples, and point out why some care is needed.

L/.style= edge label=node[midway,left, font=]#1 , for tree= grow=0,reversed, parent anchor=east,child anchor=west, edge=-¿,outer sep=+1pt, l sep=6mm [, [, blue, L= [ , blue, L=] [, , L=, ]] [, L=, [ , L= ] [, L=, ]] ] L/.style= edge label=node[midway,left, font=]#1 , for tree= grow=0,reversed, parent anchor=east,child anchor=west, edge=-¿,outer sep=+1pt, l sep=6mm [, [, blue, L= [ ,blue, L=] [, L=]] [, red, L= [ , L= ] [ , red, L=]] ]

Figure 4: Ex.8 (non-deterministic PARS)
Figure 5: Ex.9 (non-deterministic PARS)
Example 7 (Fig.3 continued).

The PARS described by the rule (in Fig. 3) evolves as follows: .

Example 8 (Fig.5).

Fig. 5 illustrates the possible evolutions of a non-deterministic system which has two rules: and . The arrows are annotated with the chosen rule.

Example 9 (Fig.5).

Fig. 5 illustrates the possible evolutions of a system with rules and .

If we look at Fig. 3, we observe that after two steps, there are two distinct occurrences of the element 2, which live in two different runs of the program: the run 2.1.2, and the run 2.3.2. There are two possible transitions from each . The next transition only depends on the fact of having 2, not on the run in which 2 occurs: its history is only a way to distinguish the occurrence. For this reason, given a PARS , we keep track of different occurrences of an element , but not necessarily of the history. Next section formalizes these ideas.

Markov Decision Processes.

To understand our distinction between occurrences of

in different paths, it is helpful to think how a system is described in the framework of Markov Decision Processes (MDP)

[32]. Indeed, in the same way as ARS correspond to transition systems, PARS correspond to probabilistic transitions. Let us regard a PARS step as a probabilistic transition ( is here a name for the rule). Let assume is an initial state. In the setting of MDP, a typical element (called sample path) of the sample space is a sequence where is a rule, an element, , and so on. The index is interpreted as time. On various random variables are defined; for example, , which represents the state at time . The sequence is called a stochastic process.

3 A Formalism for Probabilistic Rewriting

We introduce a formalism to describe the evolution of a system described by a PARS. From now on, we assume to be a countable set on which a PARS is defined.

The sample space.

Let be a list over , and the collection of all such lists. More formally, we fix a countable index set , and let be the graph of a function from to (). We denote by the collection of all such . is the collection of finitely supported distributions on (i.e. , with ). For concreteness, here we assume . Hence, if is finite, is simply a list over .

Notation 10.

If , we write its support as a list. We write for and for

Remark 11 (Index Set).

The role of indexing is only to distinguish different occurrences; the specific order is irrelevant. We use as index set for simplicity. Another natural instance of is i.e. the set of finite sequences on . This way, occurrences are labelled by their path, which allows a direct connection with the sample space of Markov Decision Processes [32] we mention in 2.5 (see Appendix).

Given the PARS , we work with two families of probability spaces: , where (used e.g. to describe a rewrite step) and ), where and .
Letters Convention: we reserve the letters for distributions in , and the letters for distributions in .
Embedding and Flattening: we move between and subsets of via the maps and (Fig. 7 and 7), where to define an injection , we fix an enumeration , and identify with its graph. Given a distribution , the function flat induces the distribution (Fig. 7); conversely, given , the function induces the distributions (Fig. 7). Recall that in Sec. 2.2 we already reviewed how functions induce distributions; indeed, with that language, and are random variables.

Example 12.

Assume , and an enumeration of . Then which we also write .

where
where

Figure 6: Flattening
Figure 7: Embedding

Disjoint sum . The disjoint sum of lists is simply their concatenation. The disjoint sum of sets in and of the corresponding distributions is easily defined.

The rewriting relation .

Let be a PARS. We now define a binary relation on , which is obtained by lifting the relation . Several natural choices are possible. Here, we choose a lifting which forces all non-terminal elements to be reduced. This plays an important role for the development of the paper, as it corresponds to the the key notion of one step reduction in classical ARS (see discussion in Sec. 8).

Definition 13 (Lifting).

Given a relation , its lifting to a relation is defined by the following rules, where for readibility we use Notation 10.

In rule , is the result of embedding in (see Fig. 7 and Example 12). To apply rule , we choose a reduction step from for each . The disjoint sum of all () is weighted with the probability of each .

Example 14.

Let us derive the reduction in Fig. 3.

Rewrite sequences.

We write to indicate that there is a finite sequence such that for all (and to specify its length ). We write to indicate an infinite rewrite sequence.

Figures conventions:

we depict any rewrite relation simply as ; as it is standard, we use for ; solid arrows are universally quantified, dashed arrows are existentially quantified.

3.1 Normal Forms. Equivalences and Order.

The intuition is that a rewrite sequence describes a computation; a distribution such that represents a state (precisely, the state at time ) in the evolution of the system with initial state . Let represents a state of the system. The probability that the system is in normal form is described by (recall Example 3); the probability that the system is in a specific normal form is described by . It is convenient to denote by the restriction of to . Observe that . The probability of reaching a normal form can only increase in a rewrite sequence (becaluse of (L1) in Def. 13). Therefore the following key lemma holds.

Lemma 15.

If then .

Equivalences and Order.

In this paper we do not need, and do not define, any equality on lists. If we wanted, the natural one would be equality up to reordering, making lists into multisets; however, here we are rather interested in observing specific events. Given , we only conside equivalence and order relations w.r.t. the associated (flat) distribution in and in . The order on is the pointwise order (Sec. 2.3).

Definition 16 (Equivalence and Order).

Let .

  1. Flat Equivalence: , if . Similarly, if .

  2. Equivalence in Normal Form: , if . Similarly, , if

  3. Equivalence in the NF -norm: , if , and , if

Observe that (2.) and (3.) compare and abstracting from any term which is not in normal form; these two will be the relations which matter to us.

Example 17.

Assume T is a normal form and are not. (1.) Let . holds for because . (2.) Let , . both hold, does not.

The above example illustrates also the following.

Fact 18.

. Similarly for the order relations.

4 Asymptotic Behaviour and Normal Forms

We examine the asymptotic behaviour of rewrite sequences with respect to normal forms. If a rewrite sequence describes a computation, the result of the computation is a distribution on the possible outputs of the probabilistic program. We are interested in the result at the limit, which is formalized by the (standard) notion of limit distribution (Def. 20). What is less standard here, and demands care, is that each termhas a set of limits. In the section we investigate the notions of normalization, termination and unique normal form for PARS.

4.1 Limit Distributions

Before introducing limit distributions, we revisit some facts on sequences of bounded functions.

Monotone Convergence.

Let be a non-decreasing sequence of (sub)distributions over a countable set (the order on subdistributions is defined pointwise, Sec. 2.3). For each the sequence of real numbers is nondecreasing and bounded, therefore the sequence has a limit, which is the supremum: . Observe that if then , where we recall that .

Fact 19.

Given as above, the following properties hold. Define

  1. is a subdistribution over .

Proof.

(1.) follows from the fact that is a nondecreasing sequence of functions, hence (by Monotone Convergence, see Thm. 43 in Appendix) we have :

(2.) is immediate, because the sequence is nondecreasing and bounded.
(3.) follows from (1.) and (2.). Since , then is a subdistribution. ∎

Limit distributions.

Let be a rewrite sequence. If , then is nondecreasing (by Lemma 15); so we can apply Fact 19, with now being .

Definition 20 (Limits).

Let be a rewrite sequence from . We say

  1. converges with probability .

  2. converges to (written ), where for

We call a limit distribution of . We write if has a converging sequence, and define .

4.2 Normalization and Termination

Non-determinism implies that several rewrite sequences are possible from the same . In the setting of ARS, the notion of reaching a result from a term comes in two flavours (see Sec. 2.1): (1.) there exists a rewrite sequence from which leads to a normal form (normalization, WN); (2.) each rewrite sequence from leads to a normal form (termination, SN). Below, we do a similar distinction . Instead of reaching a normal form or not, a sequence does so with a probability .

Definition 21 (Normalization and Termination).

Let , . We write if there exists a sequence from which converges with probability .

  • is p- ( normalizes with probability ) if is the greatest probability to which a sequence from can converge.

  • is p-( terminates with probability ) if each sequence from converges with probability . is Almost Sure Terminating (AST) if it terminates with probability .

A PARS is p-, p-, AST, if each satisfies that property.

Example 22.

The system in Fig. 5 is -, but not -. The top rewrite sequence (in blue ) converges to . The bottom rewrite sequence (in red) converges to . In between, we have all dyadic possibilities. In contrast, the system in Fig. 5 is AST.

Remark (Not only Ast).

Many natural examples are not limited to termination and AST, such as those in Fig. 5, in Example 2 and  36. For this reason, we go beyond AST, and moreover make a distinction between weak and strong normalization.

4.3 On Unique Normal Forms

How do different rewrite sequences from the same initial compare w.r.t. the result they compute? Assume and , it is natural to wonder how and relate. Normalization and termination are quantitative yes/no properties - we are only interested in the measure , for limit distribution; for example, if and , then converges with probability , but we make no distinction between the two -very different- results. Similarly, consider again Fig. 5. The system is AST, however the limit distributions are not unique: they span the continuum , for . These observations motivate attention to finer-grained properties.

In Sec. 2.1 we reviewed the ARS notion of unique normal form (UN). Let us now examine an analogue of UN in a probabilistic setting. An intuitive candidate is the following :

ULD: if , then

which was first proposed in [15], where is shown that, in the case of AST, confluence implies ULD. However, ULD is not a good analogue in general, because a PARS does not need to be AST (or ); it may well be that and , with , as in Ex. 2 and in Fig. 5; similar examples are natural in an untyped probabilistic -calculus (recall that the -calculus is not SN!). In the general case, ULD is not implied by confluence: the system in Fig. 5 is indeed confluent. We then would like to say that it satisfies UN.

We propose as probabilistic analogue of UN the following property

: has a unique maximal element.

Remark.

In the case of  (and AST), all limits are maximal, hence  becomes ULD.

4.3.1 Confluence and .

We justify that  is an appropriate generalization of the UN property, by showing that it satisfies an analogue of standard ARS results: “Confluence implies UN” (see Thm. 25) and “the Normal Form Property implies UN” (Lemma 24). While the statements are similar to the classical ones, the content is not. To understand why is different, and non-trivial, observe that is in general uncountable, hence there is not even reason to believe that has maximal elements, for the same reason as has no max, even if it has a sup.

Remark 23 (Which notion of Confluence?).

To guarantee , it suffices a weaker form of confluence than one would expect. Assume ; with the standard notion of confluence in mind, we may require that such that , or that such that , and . Both are fine, but a weaker notion of equivalence suffices: NF -Confluence (defined below), which only regards normal forms. Obviously, the two stronger notions of confluence which we just discussed, imply it.

A PARS satisfies the following properties if they hold for each :

  • NF

    -Confluence(Confluence in Normal Form):
    with , such that , , and .

  •  (Normal Form Property): if is maximal in , and then .

  •  (Limit Distributions Property): if and , there exists such that and .

The following result (which is standard for ARS) is easy, and independent from confluence.

Lemma 24.

For each PARS such that has maximal elements, .

Proof.

Let be maximal. If , there is a sequence from such that .  implies that , , and therefore . We conclude that ; hence if is maximal, . ∎

To prove that NF -Confluence implies  is more delicate. We need to prove that confluence implies existence and uniqueness of maximal elements of .

Theorem 25.

For each PARS, NF -Confluence implies .

Proof.

We give the proof in Appendix, Section A.2.1. ∎

Note that the proofs in this section refine those for the analogous ARS properties in a way similar to the generalization to infinitary rewriting, by approximation; the quantitative character of probability add specific elements which are reminiscent of calculus.

4.4 Newman’s Lemma Failure, and Proof Technique for PARS

In Prop. 24 and 25, the statement has the same flavour as similar ones for ARS, but the notions are not the same. The notion of limit (and therefore that of , , and ) does not belong to ARS. For this reason, the rewrite system which we are studying is not simply an ARS, and one should not assume that standard ARS properties hold. An illustration of this is Newman’s Lemma. Given a PARS, let us assume AST and observe that in this case, confluence at the limit can be identified with . A wrong attempt: AST + , where : if and , then , with , . This does not hold. A counterexample is the PARS in Fig. 5, which does satisfy . (More in the Appendix.)

What is at play here is that the notion of termination is not the same in ARS and in PARS. A fundamental fact of ARS (on which all proofs of Newman’s Lemma rely) is: termination implies that the rewriting relation is well-founded. All terminating ARS allow well-founded induction as proof technique; this is not the case for probabilistic termination. To transfer properties from ARS to PARS there are two issues: we need to find the right formulation and the right proof technique.

Our counter-example still leaves open the question “Are there local properties which guarantee ?” In the rest of the paper, we develop proof techniques to study , ,  and their relations. We will always aim at local conditions.

5 Random Descent (RD)

In this section we introduce Random Descent (-RD), a tool which is able to guarantee some remarkable properties : , -termination as soon as there exists a sequence which converges to , and also the fact that all rewrite sequences from a term have the same expected number of steps. -RD generalizes to PARS the notion of Random Descent: after any steps, non-determinism is irrelevant up to a chosen equivalence . Indeed -RD is defined parametrically over an equivalence relation on . For concreteness, assume to be either or (see Def. 16). Then -RD implies that all rewrite sequences from :

  • have the same probability of reaching a normal form after steps (for each );

  • converge to the same limit;

  • have the same expected number of steps.

Main technical result is a local characterization of the property (Thm 29), similarly to [39].

Figure 8: Random Descent
Figure 9: Diamond
Figure 10: Proof of 29
Definition 26 ( Random Descent).

Let be an equivalence relation on . The PARS satisfies the following properties (in Fig. 10) if they hold for each .

  • -RD: for each pair of sequences , from , ) holds, .

  • local -RD (-LRD): if , then for each there exist with , , and ).

Example 27.

In Fig. 5 -RD holds for , but not for .

When , it is easy to check that -RD guarantees the following.

Proposition 28.
  1. -RD implies Uniformity: p- p-.

  2. -RD implies Uniformity and .

Proof.

Uniformity is immediate;  follows from Prop. 25. ∎

While expressive, -RD is of little practical use, as it is a property which is universally quantified on the sequences from . The property -LRD is instead local. Somehow surprisingly, the local property characterizes -RD.

Theorem 29 (Characterization).

The following properties are equivalent: (1.) -LRD;  (2.) if and , then );  (3.) -RD.

Proof.

. See Fig. 10. We prove that (2) holds by induction on . If , the claim is trivial. If , let be the first step from to and the first step from to . By -LRD, there exists such that and such that , with ). Since , we can apply the inductive hypothesis, and conclude that ). By using the induction hypothesis on , we have that