# On Probabilistic Term Rewriting

We study the termination problem for probabilistic term rewrite systems. We prove that the interpretation method is sound and complete for a strengthening of positive almost sure termination, when abstract reduction systems and term rewrite systems are considered. Two instances of the interpretation method - polynomial and matrix interpretations - are analyzed and shown to capture interesting and nontrivial examples when automated. We capture probabilistic computation in a novel way by way of multidistribution reduction sequences, this way accounting for both the nondeterminism in the choice of the redex and the probabilism intrinsic in firing each rule.

There are no comments yet.

## Authors

• 3 publications
• 19 publications
• 4 publications
• ### Compositional Analysis for Almost-Sure Termination of Probabilistic Programs

In this work, we consider the almost-sure termination problem for probab...
01/18/2019 ∙ by Mingzhang Huang, et al. ∙ 0

• ### An Automated Approach to the Collatz Conjecture

We explore the Collatz conjecture and its variants through the lens of t...
05/31/2021 ∙ by Emre Yolcu, et al. ∙ 0

• ### New Approaches for Almost-Sure Termination of Probabilistic Programs

We study the almost-sure termination problem for probabilistic programs....
06/14/2018 ∙ by Mingzhang Huang, et al. ∙ 0

• ### The Probabilistic Termination Tool Amber

We describe the Amber tool for proving and refuting the termination of a...
07/27/2021 ∙ by Marcel Moosbrugger, et al. ∙ 0

• ### Transforming Dependency Chains of Constrained TRSs into Bounded Monotone Sequences of Integers

In the dependency pair framework for proving termination of rewriting sy...
02/19/2018 ∙ by Tomohiro Sasano, et al. ∙ 0

• ### Faithful (meta-)encodings of programmable strategies into term rewriting systems

Rewriting is a formalism widely used in computer science and mathematica...
05/24/2017 ∙ by Horatiu Cirstea, et al. ∙ 0

• ### Intersection Types and (Positive) Almost-Sure Termination

Randomized higher-order computation can be seen as being captured by a l...
10/23/2020 ∙ by Ugo Dal Lago, et al. ∙ 0

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

Interactions between computer science and probability theory are pervasive and extremely useful to the first discipline. Probability theory indeed offers models that enable

abstraction, but it also suggests a new model of computation, like in randomized computation or cryptography [18]. All this has stimulated the study of probabilistic computational models and programming languages: probabilistic variations on well-known models like automata [12, 25][27, 16], and the -calculus [26, 21]are known from the early days of theoretical computer science.

The simplest way probabilistic choice can be made available in programming is endowing the language of programs with an operator modeling sampling from (one or many) distributions. Fair, binary, probabilistic choice is for example perfectly sufficient to get universality if the underlying programming language is itself universal (e.g., see [9]).

Term rewriting [28] is a well-studied model of computation when no probabilistic behavior is involved. It provides a faithful model of pure functional programming which is, up to a certain extent, also adequate for modeling higher-order parameter passing [11]. What is peculiar in term rewriting is that, in principle, rule selection turns reduction into a potentially nondeterministic process. The following question is then a natural one: is there a way to generalize term rewriting to a fully-fledged probabilistic model of computation? Actually, not much is known about probabilistic term rewriting: the definitions we find in the literature are one by Agha et al. [1] and one by Bournez and Garnier [4]

. We base our work on the latter, where probabilistic rewriting is captured as a Markov decision process; rule selection remains a nondeterministic process, but each rule can have one of many possible outcomes, each with its own probability to happen. Rewriting thus becomes a process in which both nondeterministic and probabilistic aspects are present and intermingled. When firing a rule, the reduction process implicitly samples from a distribution, much in the same way as when performing binary probabilistic choice in one of the models mentioned above.

In this paper, we first define a new, simple framework for discrete probabilistic reduction systems, which properly generalizes standard abstract reduction systems [28]. In particular, what plays the role of a reduction sequence, usually a (possibly infinite) sequence of states, is a sequence of (multi)distributions over the set of states. A multidistribution is not merely a distribution, and this is crucial to appropriately account for both the probabilistic behaviour of each rule and the nondeterminism in rule selection. Such correspondence does not exist in Bournez and Garnier’s framework, as nondeterminism has to be resolved by a strategy, in order to define reduction sequences. However, the two frameworks turn out to be equiexpressive, at least as far as every rule has finitely many possible outcomes. We then prove that the probabilistic ranking functions [4] are sound and complete111The completeness of probabilistic ranking functions has been refuted in [14], but the counterexample there is invalid since a part of reduction steps are not counted. We thank Luis María Ferrer Fioriti for this analysis. for proving strong almost sure termination, a strengthening of positive almost sure termination [4]. We moreover show that ranking functions provide bounds on expected runtimes.

This paper’s main contribution, then, is the definition of a simple framework for probabilistic term rewrite systems as an example of this abstract framework. Our main aim is studying whether any of the well-known techniques for termination of term rewrite systems can be generalized to the probabilistic setting, and whether they can be automated. We give positive answers to these two questions, by describing how polynomial and matrix interpretations can indeed be turned into instances of probabilistic ranking functions, thus generalizing them to the more general context of probabilistic term rewriting. We moreover implement these new techniques into the termination tool NaTT [29].

## 2 Related Work

Termination is a crucial property of programs, and has been widely studied in term rewriting. Tools checking and certifying termination of term rewrite systems are nowadays capable of implementing tens of different techniques, and can prove termination of a wide class of term rewrite systems, although the underlying verification problem is well-known to be undecidable [28].

Termination remains an interesting and desirable property in a probabilistic setting, e.g., in probabilistic programming [19] where inference algorithms often rely on the underlying program to terminate. But what does termination mean when systems become probabilistic? If one wants to stick to a qualitative definition, almost-sure termination is a well-known answer: a probabilistic computation is said to almost surely terminate iff non-termination occurs with null probability. One could even require positive almost-sure termination, which asks the expected time to termination to be finite. Recursion-theoretically, checking (positive) almost-sure termination is harder than checking termination of non-probabilistic programs, where termination is at least recursively enumerable, although undecidable: in a universal probabilistic imperative programming language, almost sure termination is complete, while positive almost-sure termination is complete [22].

Many sound verification methodologies for probabilistic termination have recently been introduced (see, e.g., [4, 5, 17, 14, 8]). In particular, the use of ranking martingales has turned out to be quite successful when the analyzed program is imperative, and thus does not have an intricate recursive structure. When the latter holds, techniques akin to sized types have been shown to be applicable [10]. Finally, as already mentioned, the current work can be seen as stemming from the work by Bournez et al. [6, 4, 5]

. The added value compared to their work are first of all the notion of multidistribution as way to give an instantaneous description of the state of the underlying system which exhibits both nondeterministic and probabilistic features. Moreover, an interpretation method inspired by ranking functions is made more general here, this way acommodating not only interpretations over the real numbers, but also interpretations over vectors, in the sense of matrix interpretations. Finally, we provide an automation of polynomial and matrix interpretation inference here, whereas nothing about implementation were presented in Bournez’s work.

## 3 Preliminaries

In this section, we give some mathematical preliminaries which will be essential for the rest of the development. With we denote the set of real numbers, with the set of non-negative real numbers, and with the set .

#### Probability Distributions.

A (probability) distribution on a countable set is a function such that . The support of a distribution is the set . We write

for the set of probability distributions over

. We write for distribution when is a finite set . pairwise distinct s)

#### Stopping times.

A stopping time with respect to a stochastic process

is a random variable

, taking values in , with the property that for each , the occurrence or non-occurrence of the event depends only on the values of . An instance of a stopping time is the first hitting time with respect to a set is defined as , where . Every stopping time satisfies

 \E(S)=∞∑n=1n⋅\probS=n=∞∑n=1\probS≥n\tpkt (1)

## 4 Probabilistic Abstract Reduction Systems

An abstract reduction system (ARS) on a set is a binary relation . Having means that reduces to in one step, or is a one-step reduct of . Bournez and Garnier [4] extended the ARS formalism to probabilistic computations, which we will present here using slightly different notations.

We write for the set of non-negative reals. A (probability) distribution on a countable set is a function such that . We say a distribution is finite if its support is finite, and write for if (with pairwise distinct s). We write for the set of finite distributions on .

###### Definition 1 (Pars, [4])

A probabilistic reduction over a set is a pair of and , written . A probabilistic ARS (PARS) over is a (typically infinite) set of probabilistic reductions. An object is called terminal (or a normal form) in , if there is no with . With we denote the set of terminals in .

The intended meaning of is that “there is a reduction step with probability ”.

###### Example 1 (Random walk)

A random walk over with bias probability is modeled by the PARS consisting of the probabilistic reduction

 n+1→\prsp:n;1−p:n+2for all n∈\N.

A PARS describes both nondeterministic and probabilistic choice; we say a PARS is nondeterministic if with . In this case, the distribution of one-step reducts of is nondeterministically chosen from and . Bournez and Garnier [4] describe reduction sequences via stochastic sequences, which demand nondeterminism to be resolved by fixing a strategy (also called policies). In contrast, we capture nondeterminism by defining a reduction relation on distributions, and emulate ARSs by when . For the probabilistic case, taking Example 1 we would like to have

 \prs1:1\pTO[\Arw[\half]]\prs\half:0;\half:2\tkom

meaning that the distribution of one-step reducts of is . Continuing the reduction, what should the distribution of two-step reducts of be? Actually, it cannot be a distribution (on ): by probability we have no two-step reduct of . One solution, taken by [4], is to introduce representing the case where no reduct exists. We take another solution: we consider generalized distributions where probabilities may sum up to less than one, allowing

 \prs1:1\pTO[\Arw[\half]]\prs\half:0;\half:2\pTO[\Arw[\half]]\prs\quoter:1;\quoter:3\tpkt

Further continuing the reduction, one would expect as the next step, but note that a half of the probability of is the probability of reduction sequence , and the other half is of .

###### Example 2

Consider the PARS consisting of the following rules:

 \funa →\prs\half:\funb1;\half:\funb2 \funb1 →\prs1:\func \func →\prs1:\fund1 \funb2 →\prs1:\func \func →\prs1:\fund2\tpkt

Reducing twice always yields , so the distribution of the two-step reducts of is . More precisely, there are two paths to reach : and , each with probability . Each of them can be nondeterministically continued to and , so the distribution of three-step reducts of is the nondeterministic choice among , , . On the other hand, if we defined the reduction relation in such a way that reduced to , then we would not be able to emulate ARSs by reducing only to and .

These analyses lead us to the following generalization of distributions.

###### Definition 2 (Multidistributions)

A multidistribution on is a finite multiset of pairs of and , written , such that

 \szμ\defsym∑\prpa∈\mdistonep ≤ 1\tpkt

We denote the set of multidistributions on by .

Abusing notation, we identify with multidistribution as no confusion can arise. For a function , we often generalize the domain and range to multidistributions as follows:

 f(\prmsp1:a1,…;pn:an)\defsym\prmsp1:f(a1),…;pn:f(an)\tpkt

The scalar multiplication of a multidistribution is , which is also a multidistribution if . More generally, multidistributions are closed under convex multiset unions, defined as with and .

Now we introduce the reduction relation over multidistributions.

###### Definition 3 (Probabilistic Reduction)

Given a PARS , we define the probabilistic reduction relation as follows:

In the last rule, we assume and . We denote by the set of all possible reduction sequences from , i.e., iff and for any .

Thus if is obtained from by replacing every nonterminal in with all possible reducts with respect to some , suitably weighted by probabilities, and by removing terminals. The latter implies that is not preserved during reduction: it decreases by the probabilities of terminals.

To continue Example 1, we have the following reduction sequence:

 \prms1:1 \pTO[\Arw[\half]]\prms\half:0;\half:2\pTO[\Arw[\half]]\msetempty⊎\prms\quoter:1;\quoter:3 \pTO[\Arw[\half]]\prms\eighth:0;\eighth:2⊎\prms\eighth:2;\eighth:4\pTO[\Arw[\half]]…

The use of multidistributions resolves the issues indicated in Example 2 when dealing with nondeterministic systems. We have, besides others, the reduction

 \prms1:\funa\pTO[\Amd]\prms\half:\funb1,\half:\funb2\pTO[\Amd]\prms\half:\func,\half:\func\pTO[\Amd]\prms\half:\fund1,\half:\fund2\tpkt

The final step is possible because is not collapsed to .

When every probabilistic reduction in is of form for some , then simulates the non-probabilistic ARS via the relation . Only a little care is needed as normal forms are followed by .

###### Proposition 1

Let be an ARS and define by iff . Then iff either and for some , or and is a normal form in .

###### Proof

For only the first two rules of Definition 3 are effective. Then the claim directly follows.

### 4.1 Notions of Probabilistic Termination

A binary relation is called terminating if it does not give rise to an infinite sequence . In a probabilistic setting, infinite sequences are problematic only if they occur with non-null probability.

###### Definition 4 ()

A PARS is almost surely terminating () if for any reduction sequence , it holds that .

Intuitively, is the probability of having -step reducts, so its tendency towards zero indicates that infinite reductions occur with zero probability.

###### Example 3 (Example 1 Revisited)

The system is for , whereas it is not for . Note that although is , the expected number of reductions needed to reach a terminal is infinite.

Let be a PARS and . Following terminology from rewriting, we define the expected derivation length of by

Intuitively, this definition is equivalent to taking mean length of terminal paths in . The notion of positive almost sure termination (), introduced by Bournez and Garnier [4], constitutes a refinement of  demanding that the expected derivation length is finite for every initial state and for every strategy, i.e., for every reduction sequence starting from , is bounded. Without fixing a strategy, however, this condition does not ensure bounds on the derivation length.

###### Example 4

Consider the (non-probabilistic) ARS on with reductions and for every . It is easy to see that every reduction sequence is of finite length, and thus, this ARS is . There is, however, no global bound on the length of reduction sequences starting from .

Hence we introduce a stronger notion, which actually plays a more essential role than . It is based on a natural extension of derivation height from complexity analysis of term rewriting.

###### Definition 5 (Strong )

A PARS is strongly almost surely terminating () if the expected derivation height of every is finite, where is defined as .

In Example 4, we make essential use of that admits infinitely many one-step reducts. Thus the ARS is not finitely branching, and does not contradict the claims in [4]. Nevertheless and does not coincide on finitely branching PARSs. The following example is found by an anonymous reviewer.

###### Example 5

Consider PARS over , consisting of

 an →\prs\half:an+1;\half:0 an →\prs1:2n⋅n n+1 →\prs1:n\tpkt

Then is finitely branching and PAST, because every reduction sequence from is one of the following forms:

• with

and is finite for each . However, is not bounded, since .

### 4.2 Probabilistic Ranking Functions

Bournez and Garnier [4] generalized ranking functions, a popular and classical method for proving termination of non-probabilistic systems, to PARS. We give here a simpler but equivalent definition of probabilistic ranking function, taking advantage of the notion of multidistribution.

For a (multi)distribution over real numbers, the expected value of is denoted by . A function is naturally generalized to , so for , . For we define the order on by iff .

###### Definition 6

Given a PARS on , we say that a function is a (probabilistic) ranking function (sometimes referred to as Lyapunov ranking function), if there exists such that implies .

The above definition slightly differs from the formulation in [4]: the latter demands the drift to at least , which is equivalent to ; and allows any lower bound , which can be easily turned into by adding the lower bound to the ranking function.

We prove that a ranking function ensures and gives a bound on expected derivation length. Essentially the same result can be found in [8], but we use only elementary mathematics not requiring notions from probability theory. We moreover show that this method is complete for proving .

###### Lemma 1

Let be a ranking function for a PARS . Then there exists such that whenever .

###### Proof

As is a ranking function for , we have such that implies . Consider . We prove the claim by induction on the derivation of .

• Suppose and . Then and since .

• Suppose and . From the assumption , and as we conclude .

• Suppose , , and for all . Induction hypothesis gives . Thus,

 \E(f(\mdistone))=n∑i=1pi⋅\E(f(\mdistonei))≥n∑i=1pi⋅(\E(f(\mdisttwoi))+ϵ⋅\sz\mdisttwoi)=n∑i=1pi⋅\E(f(\mdisttwoi))+ϵ⋅n∑i=1pi⋅\sz\mdisttwoi=\E(f(\mdisttwo))+ϵ⋅\sz\mdisttwo\tpkt
###### Lemma 2

Let be a ranking function for PARS . Then there is such that for every .

###### Proof

We first show for every , by induction on . Let be given by Lemma 1. The base case is trivial, so let us consider the inductive step. By Lemma 1 and induction hypothesis we get

 \E(f(\mdistonem)) ≥\E(f(\mdistonem+1))+ϵ⋅\sz\mdistonem+1 ≥ϵ⋅n∑i=m+2\sz\mdistonei+ϵ⋅\sz\mdistonem+1=ϵ⋅n∑i=m+1\sz\mdistonei\tpkt

By fixing , we conclude that the sequence is bounded by , and so is its limit . ∎

###### Theorem 4.1

Ranking functions are sound and complete for proving .

###### Proof

For soundness, let be a ranking function for a PARS . For every derivation starting from , we have by Lemma 2. Hence, , concluding that is .

For completeness, suppose that is , and let . Then we have , and

concluding . Thus, taking , is a ranking function according to Definition 6. ∎

### 4.3 Relation to Formulation by Bournez and Garnier

As done by Bournez and Garnier [4], the dynamics of probabilistic systems are commonly defined as stochastic sequences, i.e., infinite sequences of random variables whose -th variable represents the -th reduct. A disadvantage of this approach is that nondeterministic choices have to be a priori resolved by means of strategies, making the rewriting-based understanding of nondeterminism inapplicable. We now relate this formulation to ours, and see that the corresponding notions of and coincide.

We shortly recap central definitions from [4]. We assume basic familiarity with stochastic processes, see e.g. [7]. Here we fix a PARS on . A history (of length ) is a finite sequence of objects from , and such a sequence is called terminal if is. A strategy is a function from nonterminal histories to distributions such that . A history is called realizable under iff for every , it holds that .

###### Definition 7 (Stochastic Reduction, [4])

Let be a PARS on and a special symbol. A sequence of random variables over is a (stochastic) reduction in (under strategy ) if

 \prob\rvn+1=⊥∣\rvn=⊥ =1; \prob\rvn+1=⊥∣\rvn=\objone =1 if \objone is terminal; \prob\rvn+1=⊥∣\rvn=\objone =0 if \objone is nonterminal; \prob\rvn+1=\objone∣\rvn=\objonen,…,\rv0=\objone0 =\distone(\objone) if ϕ(\objone0,…,\objonen)=\distone,

where is a realizable nonterminal history under .

Notice that as an immediate consequence of the law of total probability we obtain

 \prob\rvn=\objonen=∑\mathclap\seq[0][n]\objone∈\Cone\prob\rv0=\objone0,…,\rvn=\objonen\tpkt (2)

Thus, is set up so that trajectories correspond to reductions , and signals termination. In correspondence, the derivation length is given by the first hitting time to :

###### Definition 8 (Ast/of [4])

For define the random variable , where by convention. A PARS is stochastically AST (resp. PAST) if for every stochastic reduction in , (resp. ).

We will now see that stochastic AST/ coincides with AST/. To this end, we first clarify the correspondence of stochastic reductions and reductions over multidistributions. The quintessence of this correspondence is that any stochastic derivation translates to a reduction so that the probabilities of realisable histories in are recorded in , i.e., . Equation (2) then gives the correspondence between and . With Lemmas 4 and 6, we make this correspondence precise. Guided by (2), we associate the multidistribution over with the distribution over such that for all , and .

#### Stochastic Reduction to Multidistribution Reduction.

Here we fix a stochastic reduction in a PARS . We show that corresponds to a multidistribution reduction, assuming is finitely supported: for only finitely many .

For each , we define the multidistribution over realizable histories of length by

 \mdisttwon\defsym\msetp:(\seq[0][n]\objone)∣p=\prob\rv0=a0,…,\rvn=an>0\tpkt

Note that is well defined since is finitely supported. Then we can inductively show that is well defined, using the fact that is finite for every .

The following lemma clarifies then how the multidistributions evolve.

###### Lemma 3

For each , we have

 \mdisttwon+1=⨄\prp(\seq[0][n]\objone)∈\mdisttwon,\objonen∉\Term\PARSone\mopenp⋅ϕ(a0,…,an)(an+1):(a0,…,an,an+1)∣ϕ(a0,…,an)(an+1)>0\mclose\tpkt
###### Proof

Fix a realizable history and let . Notice that is realizable if . Recall . By definition of conditional probability we thus have

 \prob\rv0=a0,…,\rvn+1=an+1=\distone(\objonen+1)⋅\prob\rv0=a0,…,\rvn=an\tkom

as desired. ∎

###### Lemma 4

There exist multidistributions such that for all , and .

###### Proof

From define multidistribution

 \lastν\defsym\msetp:an∣p:(a0,…,an)∈ν\tpkt

We show satisfies the desired properties. It is easy to see that . We show . Consider arbitrary . If is terminal, we have

 \prms1:an\pTO[\PARSone]∅\tkom

and otherwise we have , so

 \prms1:an\pTO[\PARSone]ϕ(a0,…,an)\tpkt

Combining them we get

 \lastνn =⨄p:(a0,…,an)∈νnp⋅\mset1:an \pTO[\PARSone]⨄p:(a0,…,an)∈νn,an∉\Term\PARSonep⋅ϕ(a0,…,an) =\lastνn+1\tpkt

The last equation follows from Lemma 3. ∎

#### Multidistribution Reduction to Stochastic Reduction.

For the inverse translation, let us fix a PARS and such that and is a distribution. We now map to a stochastic sequence by fixing a strategy according to . As a first step, let us construct a sequence of multidistributions over realizable histories from in such a way that the multidistribution assigns the probability to the history precisely when the occurrence was developed through a sequence of steps in .

###### Definition 9

For each such that there is a step in , we define the multidistribution over realizable histories of length , with , inductively as follows. Here, is defined as in Lemma 4. First, we set . In the inductive case, observe that for each nonterminal history occurring in there exists a transition so that

 \mdistonen =⨄p:(a0,…,an)∈\mdistthreen\prmsp:an \pTO[\PARSone]⨄p:(a0,…,an)∈\mdistthreenan∉\Term\PARSonean+1∈\Suppda0,…,an\prmsp⋅da0,…,an(an+1):an+1=\mdistonen+1\tpkt (3)

We set

 \mdistthreen+1\defsym⨄p:(a0,…,an)∈\mdistthreenan∉\Term\PARSonean+1∈\Suppda0,…,an\msetp⋅da0,…,an(an+1):(a0,…,an+1)\tpkt

Note that when we have .

Crucially, even if two objects occurring in are equal, they are separated by their history in . In other words:

###### Lemma 5

For all with defined, we have that is a set.

###### Proof

The proof is by induction on . The base case is trivial, as and thus is a distribution. The inductive step follows directly from the induction hypothesis and the fact that the supports , for the distributions mentioned in (3), are sets. ∎

This then justifies the following definition of the strategy , which we use in the simulation of below.

###### Definition 10

We define the strategy so that , such that, if is a nonterminal histories, then is used in (3) to reduce the occurrence of in , and otherwise is arbitrary.

###### Lemma 6 (Reduction to Stochastic Sequences)

Let be the stochastic derivation under strategy with . Then for all .

###### Proof

We show that iff . Using that , the lemma follows then from (2). The proof is by induction on .

The base case is trivial, as corresponds to the starting distribution of . Concerning the inductive step, it suffices to realise that if and only if , is nonterminal and . As by definition, the lemma follows then from induction hypothesis. ∎

#### Relating AST and  to its stochastic versions.

We have established a one-to-one correspondence between infinite multidistribution reductions and stochastic reductions . It is then not difficult to establish a correspondence between the expected derivation length of and the time of termination of , relying on the following auxiliary lemma.

###### Lemma 7

Let be a stochastic derivation in with finitely supported , and a sequence of multidistributions satisfying . The following two properties hold.

1. for every .

2. .

###### Proof

Concerning the first property, we have

 \probT→X≥n=\prob\rvn∈\Cone=∑\objone∈\Cone\prob\rvn=\objone=∑\objone∈<