# Aiming Low Is Harder - Inductive Proof Rules for Lower Bounds on Weakest Preexpectations in Probabilistic Program Verification

We present a new inductive proof rule for reasoning about lower bounds on weakest preexpectations, i.e., expected values of random variables after execution of a probabilistic loop. Our rule is simple in the sense that the semantics of the loop needs to be applied to a candidate lower bound only a finite number of times in order to verify that the candidate is indeed a lower bound. We do not require finding the limit of a sequence as many previous rules did. Furthermore, and also in contrast to existing rules, we do not require the random variables to be bounded.

## Authors

• 4 publications
• 11 publications
• 7 publications
• 48 publications
• ### Lower bound on Wyner's Common Information

An important notion of common information between two random variables i...
02/16/2021 ∙ by Erixhen Sula, et al. ∙ 0

• ### An Exponential Lower Bound for Zadeh's pivot rule

The question whether the Simplex Algorithm admits an efficient pivot rul...
11/04/2019 ∙ by Yann Disser, et al. ∙ 0

• ### MaxSAT Resolution and Subcube Sums

We study the MaxRes rule in the context of certifying unsatisfiability. ...
05/23/2020 ∙ by Yuval Filmus, et al. ∙ 0

• ### Lower Bound for RIP Constants and Concentration of Sum of Top Order Statistics

Restricted Isometry Property (RIP) is of fundamental importance in the t...
07/13/2019 ∙ by Gen Li, et al. ∙ 0

• ### A short note on the joint entropy of n/2-wise independence

In this note, we prove a tight lower bound on the joint entropy of n unb...
09/03/2017 ∙ by Amey Bhangale, et al. ∙ 0

• ### A Lower Bound of the Number of Rewrite Rules Obtained by Homological Methods

It is well-known that some equational theories such as groups or boolean...
02/27/2020 ∙ by Mirai Ikebuchi, et al. ∙ 0

• ### A New Proof Rule for Almost-Sure Termination

An important question for a probabilistic program is whether the probabi...
11/09/2017 ∙ by Annabelle McIver, et al. ∙ 0

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## I Introduction

We study probabilistic programs featuring discrete probabilistic choices such as , i.e., the program

is executed with probability

and is executed with probability . Apart from probabilistic choice, we support usual constructs such as assignments, conditional branching, sequential composition, and unbounded loops. Describing randomized algorithms has been the classical application of probabilistic programs. Recently, however, applications in biology, quantum computing, cyber security, and in particular machine learning and artificial intelligence have led to a rapidly growing interest in probabilistic programming [1, 2, 22].

Although probabilistic programs are typically relatively small in practice, reasoning about their correctness is intricate and in general strictly harder than reasoning about nonprobabilistic programs [30, 32]. The basic notion of program termination exemplifies this: Whereas nonprobabilistic programs either terminate or not, probabilistic programs may terminate with a probability between 0 and 1. Furthermore, whereas nonprobabilistic programs either terminate in finitely many steps or diverge, the expected runtime of a probabilistic program may be infinite, even if its termination probability is 1.

Establishing correctness of probabilistic programs requires formal reasoning. Predicate transformer calculi à la Dijkstra [17, 18] provide an important tool to enable qualitative formal reasoning for nonprobabilistic programs. To develop analogous calculi for probabilistic programs, one has to take into account that the final state of a probabilistic program need not be unique. Thus, instead of mapping inputs to outputs, probabilistic programs map initial states to a probability distribution over final states. More precisely, they yield subdistributions where the “missing” probability mass represents the probability of nontermination.

The probabilistic and quantitative analog to predicate transformers for nonprobabilistic programs are expectation transformers for probabilistic programs (cf. [25, 38, 39, 42, 46]). Random variables mapping program states to nonnegative real values (e.g., , where and are program variables), are commonly called expectations.111See [3, 10, 11, 13, 14, 15, 16, 19, 23, 35, 42, 44, 45]. Given expectation , initial state , and probabilistic program , a key issue is to determine the expected value of after termination of on input . A mapping from initial states to corresponding expected values is commonly called a weakest preexpectation (in analogy to Dijkstra’s weakest preconditions, as it is evaluated in initial states), whereas in this context, is called the postexpectation (in analogy to Dijkstra’s postconditions, as it is evaluated in final states). If is the indicator function of an event , then the weakest preexpectation of with respect to represents the probability that has occurred after termination of . As another example, the weakest preexpectation of with respect to the constant function represents the probability that terminates. Note that while the latter two postexpectations were bounded as they map to the interval , the postexpectation from above is potentially unbounded, i.e., it maps to .

Weakest–preexpectation–style reasoning was first studied in Kozen’s seminal work on probabilistic propositional dynamic logic (PPDL) [38, 39]. Its box– and diamond–modalities provide probabilistic versions of Dijkstra’s weakest (liberal) preconditions. Amongst others, Jones [28], Hehner [25], and McIver & Morgan [42] have furthered this line of research, e.g., by considering nondeterminism and proof rules for loops. Recently, Kaminski et al. [33, 34] presented expectation transformer reasoning about the expected runtimes while Batz et al. [8] consider a quantitative separation logic together with a weakest preexpectation calculus for verifying probabilistic programs with dynamic memory.

Most of the above–mentioned works share a common technique: an induction rule for proving upper bounds on weakest preexpectations of loops, which are characterized as the least fixed point of an appropriate expectation transformer . This induction proof principle (called “Park induction”) reads

 Φ(I) ⊑ Iimplies{{lfp}} Φ ⊑ I ,

i.e., we check for a suitable partial order to prove that is indeed an upper bound on the sought–after least fixed point. We call such a candidate for a bound on a weakest preexpectation an invariant.222These invariants are quantitative, since they evaluate to a number instead of a truth value. For an analogy to invariants in Hoare logic, see [29]. Note that reasoning about upper bounds by induction is conceptually relatively easy.

Apart from upper bounds, there is a genuine interest in reasoning about lower bounds as well. In general, lower bounds help to assess the quality and tightness of upper bounds. Moreover, giving total correctness guarantees amounts to lower–bounding the correctness probability, e.g., for proving membership in complexity classes like RP and PP. Furthermore, lower bounds on expected resource consumption reveal the existence of certain attacks: if a lower bound on an expected runtime depends on a secret program variable, this may compromise the secret, thus allowing for timing side–channel attacks. In addition, a “large” lower bound indicates the possibility of denial–of–service attacks.

A simple proof principle analogous to induction, namely

 I ⊑ Φ(I)impliesI ⊑ {{lfp}} Φ , ↯

is unsound in general. Sound proof rules for lower bounds, on the other hand, often suffer from the fact that postexpectations need to be bounded [42], or that one has to find the limit of an appropriate sequence of expectations, as well as the sequence itself [3, 31, 33, 34, 48, 49]. In the latter case, the resulting proof rules are conceptually much more involved than the induction principle for upper bounds.

In this paper, we study relatively simple additional constraints that can be added to the premise of the (unsound) implication above, such that this implication becomes true. We thus obtain a simple inductive proof rule for proving lower bounds on weakest preexpectations in a compositional manner.

Our rule will rely on the notions of uniform integrability and conditional difference boundedness as well as the Optional Stopping Theorem. Previous works have also used these notions. Barthe et al. [7] focus on synthesizing exact martingale expressions. Hermanns & Fioriti [20] develop a type system for uniform integrability in order to prove (positive) almost–sure termination of probabilistic programs and give upper bounds on the expected runtime. Chatterjee & Fu [12] give lower bounds on expected runtimes. Kobayashi et al. [37] provide a semi–decision procedure for lower bounding termination probabilities of probabilistic higher–order recursive programs. Ngo et al. [47] perform automated template–driven resource analysis, but infer upper bounds only. The latter four works only analyze the termination behavior of a probabilistic program, whereas we focus on general expected values. More importantly and in contrast to many previous works, we do not only make use of uniform integrability and/or conditional difference boundedness of some auxiliary stochastic process in order to prove soundness of our proof rule. Instead, we

1. establish a notion of uniform integrability and conditional difference boundedness purely in terms of expectations and expectation transformers,

2. construct a canonical stochastic process corresponding to a given loop, a postexpectation, and an invariant, and show its relationship to the sought–after preexpectation,

3. show that uniform integrability of an invariant in the expectation transformer sense corresponds to uniform integrability of the corresponding canonical stochastic process in the classical probability theoretic sense, and

4. present a purely expectation transformer counterpart to the classical Optional Stopping Theorem.

The latter then yields an inductive proof rule for lower bounds on arbitrary and even unbounded weakest preexpectations.

#### Organization of the paper

In Section II, we give a pri- mer on weakest preexpectation reasoning. In Section III, we revisit techniques for reasoning about loops and give a more elaborate problem statement. Moreover, we develop the notion of uniform integrability in terms of expectation transformers. In Section IV

, we provide preliminaries on probability theory and instantiate these notions in our setting.

In particular, we present our construction of the canonical stochastic process mentioned above. In Section V, we develop our main result: An Optional Stopping Theorem for weakest preexpectation reasoning, yielding a simple inductive proof rule for lower bounds on weakest preexpectations of loops. In Section VI, we revisit upper bounds and, using Fatou’s Lemma, obtain an alternative explanation on why reasoning about upper bounds is easier. The appendix contains case studies to illustrate the effectiveness of our proof rule and the proofs of our results.

## Ii Weakest Preexpectation Reasoning

Weakest preexpectations for probabilistic programs are a generalization of Dijkstra’s weakest preconditions for nonprobabilistic programs. Dijkstra employs predicate transformers, which push a postcondition  (a predicate) backward through a nonprobabilistic program  and yield the weakest precondition  (another predicate) describing the largest set of states such that whenever  is started in a state satisfying , terminates in a state satisfying .333We consider total correctness, i.e., from any state satisfying the weakest precondition , definitely terminates. The weakest preexpecation calculus on the other hand employs expectation transformers which act on real–valued functions called expectations, mapping program states to non–negative reals. These transformers push a postexpectation  backward through a probabilistic program  and yield a preexpectation , such that  represents the expected value of  after executing . The term expectation is due to McIver & Morgan [42] and may appear somewhat misleading at first glance. For now, just note that we clearly distinguish between expectations and expected values: An expectation is hence not an expected value, per se. Instead, an expectation can rather be thought of as a random variable.

###### Definition 1 (Expectations [29, 42]).

Let denote the finite set of program variables and let denote the set of program states.

The set of expectations, denoted by , is defined as

 F = {f ∣∣ f:Σ→¯¯¯¯R≥0} ,

where . We say that is finite and write , if for all . A partial order on is obtained by point–wise lifting the usual order  on , i.e.,

 f1 ⪯ f2iff∀s∈Σ:  f1(s) ≤ f2(s) .

is a complete lattice where suprema and infima are constructed point–wise.

We note that our notion of expectations is more general than the one of McIver & Morgan: Their work builds almost exclusively on bounded expectations, i.e., non–negative real–valued functions which are bounded from above by some constant, whereas we allow unbounded expectations. As a result, we have that forms a complete lattice, whereas McIver & Morgan’s space of bounded expectations does not.

### Ii-a Weakest Preexpectations

Given program and postexpectation , mapping (final) states to non–negative reals, we are interested in the expected value of , evaluated in the final states reached after termination of . But since the behavior of depends on its input, we are actually interested in a function that maps each initial state  to the respective expected value of  evaluated in the final states reached after termination of on input . On examining the type of , we observe that is again an expectation and we call it the weakest preexpectation of with respect to the postexpectation , denoted . Put as an equation, if is the probability (sub)measure444 is the probability that is the final state reached after termination of on input . We have , where the “missing” probability mass is the probability of nontermination of on . over final states reached after termination of  on initial state , then we have555As is countable, the integral can be expressed as .

As for the term expectation, note that both and are expectations of type . But while in fact represents an expected value (namely the one of ), itself does not. In an analogy to Dijkstra’s pre– and postconditions, since is evaluated in the final states after termination of it is called the postexpectation and since is evaluated in the initial states before executing it is called the preexpectation.

### Ii-B The Weakest Preexpectation Calculus

We now show how, given a program and a postexpectation, weakest preexpectations can be determined in a systematic and compositional manner by recapitulating the weakest preexpectation calculus à la McIver & Morgan. The calculus builds upon Kozen’s probabilistic propositional dynamic logic [38, 39] and Dijkstra’s weakest precondition calculus [18].

The weakest preexpectation calculus employs expectation transformers which move backward through the program in a continuation–passing style. As a diagram, this is depicted in Fig. 1. If we are given the sequential composition of two programs and and are interested in the expected value of some postexpectation after executing , then we can first determine the weakest preexpectation of with respect to , i.e., . Thereafter, we can use the intermediate result as postexpectation to determine the weakest preexpectation of with respect to . Overall, this gives the weakest preexpectation of with respect to postexpectation .

The above explanation for sequential composition illustrates the compositional nature of the weakest preexpectation calculus. Just like for sequential composition, the weakest preexpectation transformers for all other language constructs can also be defined by induction on the program structure:

###### Definition 2 (The wp–Transformer [42]).

Let pGCL be the set of programs in the probabilistic guarded command language [42]. The weakest preexpectation transformer

 {wp}:{{pGCL}}→F→F

is defined according to the rules given in Table I, where denotes the Iverson–bracket of , i.e., evaluates to if and to otherwise. Moreover, for any variable and any expression , let be the expectation with for any , where and for all . The application of states to expressions is defined in the straightforward way.

We call the function

[φ,C]Φf used for defining wp of while loops, the characteristic function of with respect to . Its least fixed point is understood in terms of the partial order . We omit , , or from whenever they are clear from the context.

###### Example 3 (Applying the wp Calculus).

Consider the probabilistic program given by . Suppose we want to know the expected value of , i.e., the weakest preexpectation of with respect to the postexpectation . Using the annotation style of Fig. 1(b) (a), we can annotate the program as shown in Fig. 1(b) (b) using the rules from Table I. At the top, we read off the weakest preexpectation of with respect to , namely , meaning that the expected value of after termination of on is .

The wp–transformer satisfies a few elementary properties, which are sometimes called healthiness conditions [26, 36, 42] or homomorphism properties [4]:

###### Theorem 4 (Healthiness Conditions [29, 42]).

Let , , , and . Then:

1. Continuity: .

2. Strictness: is strict, i.e., . Here,“” denotes the constant expectation that maps every to .

3. Monotonicity:

4. Linearity:

## Iii Bounds on Weakest Preexpectations

As we saw in Example 3,

given a loop–free program and postexpectation , it is generally straightforward to determine : We simply apply the rules in Table I, which guide us along the syntax of . For loops, on the other hand, the situation is more difficult. Often, it is not clear how to determine the corresponding least fixed point, not to mention that weakest preexpectations are generally not computable [32]. Therefore, we often have to content ourselves with some approximation of the least fixed point.

For us, a sound approximation is either a lower or an upper bound on the least fixed point. There are in principle two problems: (1) finding a candidate bound and (2) verifying that the candidate is indeed an upper or lower bound. In this paper, we study the latter problem.

### Iii-a Upper Bounds

The Park induction principle provides us with a very convenient proof rule for verifying upper bounds. In general, this principle reads as follows:

###### Theorem 5 (Park Induction [50]).

Let be a complete lattice and let be continuous.666It would even suffice for to be monotonic, but we consider continuous functions throughout this paper. Then has a least fixed point in and for any ,

 Φ(I) ⊑ Iimplies{{lfp}} Φ ⊑ I .

In the realm of weakest precondition reasoning, by Theorem 4 (3) this immediately gives us the following induction principle:

###### Corollary 6 (Park Induction for wp[29, 39]).

Let be the characteristic function of the while loop with respect to postexpectation and let . Then

We call an that satisfies a superinvariant. The striking power of Park induction is its simplicity: Once an appropriate candidate is found (even though this is usually not an easy task), all we have to do is push it through the characteristic function once and check whether we went down in our underlying partial order. If this is the case, we have verified that is indeed an upper bound on the least fixed point and thus on the sought–after weakest preexpectation.

###### Example 7 (Induction for Upper Bounds).

Consider the program , given by

 {while}(a≠0){

where we assume that variable ranges over the naturals, and suppose we want to reason about an upper bound on the expected value of after execution of . To this end, we propose the superinvariant and check its superinvariance by applying the characteristic function

to , which — as one can easily check — gives us , i.e., is a fixed point of and hence trivially a superinvariant. Thus, by Theorem 5

and therefore (evaluated in the initial state) is an upper bound on the expected value of (evaluated in the final states) after executing .

For making a comparison to the lower bound case which we consider later, we now give an explanation why Park induction is sound. Consider the so–called Tarski–Kantorovich Principle:

###### Theorem 8 (Tarski–Kantorovich Principle [27]).

Let be a complete lattice, let be continuous, and let , such that . Then the sequence is a descending chain that converges to an element

 Φω(I) = limn→ωΦn(I) ∈ D ,

which is a fixed point of . In particular, is the greatest fixed point of smaller or equal to .

Dually, now let . Then the sequence is an ascending chain that converges to which is again a fixed point of . Moreover, is the least fixed point of greater or equal to .

The well–known Kleene Fixed Point Theorem [40], which states that , where is the least element of , is a special case of the Tarski–Kantorovich Principle.

In our setting, applying the Tarski–Kantorovich principle to a superinvariant , the iteration of on will yield some fixed point smaller than or equal to and this fixed point is necessarily greater than or equal to the least fixed point of .

### Iii-B Lower Bounds

For verifying lower bounds, we do not have a rule as simple as Park induction available. In particular, for a given complete lattice and monotonic function , the rule

 I ⊑ Φ(I)impliesI ⊑ {{lfp}} Φ , ↯

is unsound in general. We call an satisfying a subinvariant and the above rule simple lower induction.

Generally, we will call an that is a sub– or a superinvariant an invariant. being an invariant thus expresses mainly its inductive nature, namely that is comparable with with respect to the partial order .

An explanation why simple lower induction is unsound is as follows: By Theorem 8, we know from that is the least fixed point of greater than or equal to . Since is a fixed point, it is necessarily smaller than or equal to the greatest fixed point of , but we do not know whether is smaller, greater, or equal to the least fixed point of . We only know that if indeed is smaller than the least fixed point and we have , then iterating on also converges to the least fixed point, i.e.,

 I ⊑ {{lfp}} ΦimpliesΦω(I) = {{lfp}} Φ .

If, however, and is strictly greater than , then iterating on will yield a fixed point strictly greater than , contradicting soundness of simple lower induction.

While we just illustrated by means of the Tarski–Kantorovich principle why the simple lower induction rule is not sound in general, we should note that the rule is not per se absurd: So called metering functions [21] basically employ simple lower induction to verify lower bounds on runtimes of nonprobabilistic programs [29]. For weakest preexpectations, however, simple lower induction is indeed unsound:

###### Counterexample 9 (Simple Induction for Lower Bounds).

Consider the following loop

 {while}(a≠0){ {a{:=}0}[\sfrac12]{b{:=}b+1} k{:=}k+1 } ,

where we assume that variables and range over the naturals. As in Example 7, the weakest preexpectation of the above loop with respect to the postexpectation is . So in particular, it is independent of . The corresponding characteristic function is

 Φb(X) = [a=0]⋅b + [a≠0]⋅12⋅(X[a/0] + X[b/b+1])[k/k+1].

Let us consider , which does depend on . Indeed, one can check that , i.e., is a subinvariant. If the simple lower induction rule were sound, we would immediately conclude that is a lower bound on , but this is obviously false since

### Iii-C Problem Statement

The purpose of this paper is to present a sound lower induction rule of the following form: Let be the characteristic function of the while loop with respect to the postexpectation and let . Then

 I ⪯ Φf(I) ∧ \small some side\small conditionsimpliesI ⪯ {{lfp}} Φf .

We still want our lower induction rule to be simple in the sense that checking the side conditions should be conceptually as simple as checking . Intuitively, we want to apply the semantics of the loop body only finitely often, not times, to avoid reasoning about limits of sequences or alike. We provide such side conditions in our main contribution, Theorem 35, which transfers the Optional Stopping Theorem of probability theory to weakest preexpectation reasoning.

### Iii-D Uniform Integrability

We now present a sufficient and necessary criterion to under–approximate the least fixed points that we seek for. Let again be the characteristic function of the loop with respect to the postexpectation , i.e.,

Theorem 4 implies that is continuous and monotonic.

Let us now consider a subinvariant , i.e., . If we iterate on ad infinitum, then the Tarski–Kantorovich principle (Theorem 8) guarantees that we will converge to some fixed point greater or equal to . From monotonicity of and the Tarski–Kantorovich principle, one can easily show that the fixed point coincides with the least fixed point of if and only if itself was already below or equal to the least fixed point of , i.e., we obtain the following theorem:

###### Theorem 10 (Uniform Integrability and Lower Bound).

For any subinvariant , we have

 Φωf(I) = {{lfp}} ΦfiffI ⪯ {{lfp}} Φf .

More generally, for any expectation (not necessarily a sub– or superinvariant), if iterating on converges to the least fixed point of , then we call uniformly integrable for :

###### Definition 11 (Uniform Integrability of Expectations).

Given a loop , an expectation is called uniformly integrable (u.i.) for if exists and

 limn→ωΦnf(X) = {{% lfp}} Φf .

Uniform integrability [24] — a notion that comes originally from probability theory — will be essential for the Optional Stopping Theorem in Section V. So far, however, we have studied the function solely from an expectation transformer point of view. Moreover, we have also defined a purely expectation–theoretical notion of uniform integrability. In particular, we did not use any probability theory.

In Section IV, we will study from a stochastic process point of view. Stochastic processes are not inductive per se, whereas our expectation transformer approach makes heavy use of induction. We will see, however, how we can rediscover the inductiveness also in the realm of stochastic processes. We will also see how our notion of uniform integrability corresponds to uniform integrability in its original sense. First, however, we give some preliminaries on probability theory.

## Iv From Expectations to Stochastic Processes

In this section, we connect concepts from weakest preexpectations with notions from probability theory, like probability measures (Section IV-A), stochastic processes (Section IV-B) and the original definition of uniform integrability (Section IV-C). To that end, we introduce general definitions of these notions and instantiate them in our setting. Proofs can be found in Appendix C. For further background on probability theory, we refer to Appendix B and [9, 24].

Let us fix for this section an arbitrary probabilistic loop . The loop body may contain loops but we require to be universally almost–surely terminating (AST), i.e., terminates on any input with probability 1. The set of all program states can be uniquely split into , with iff . The set thus consists of the terminal states from which the loop is not executed further.

### Iv-a Canonical Probability Space

We begin with constructing a canonical probability measure and space corresponding to the execution of our loop. As every pGCL

program is, from an operational point of view, a countable Markov chain, our construction is similar to the standard construction for Markov chains (cf.

[51]).

In general, a measurable space is a pair consisting of a sample space and a –field of , which is a collection of subsets of , closed under complement and countable union, such that . In our setting, a loop induces the following canonical measurable space:

###### Definition 12 (Loop Space).

The loop induces a unique measurable space  as follows: The sample space is given as

 Ω\tiny\rm loop \coloneqq Σω = {ϑ:N→Σ} ,

i.e., all infinite sequences of program states (so–called runs). For , we denote by the –th state in the sequence (starting to count at 0). The –field is the smallest –field that contains all cylinder sets , for all finite prefixes , i.e.,

 F\tiny\rm loop = ⟨{Cyl(π) ∣∣ π∈Σ+}⟩σ .

Intuitively, a run is an infinite sequence of states

 ϑ = s0s1s2s3⋯ ,

where represents the initial state on which the loop is started and is a state that could be reached after iterations of the loop. Obviously, some sequences in may not actually be admissible by our loop. We next develop a canonical probability measure corresponding to the execution of the loop, which will assign the measure to inadmissible runs.

We start with considering a single loop iteration. The loop body induces a family of distributions777Since the loop body is AST, these are distributions and not just subdistributions.

 ∙μC:Σ→Σ→[0,1] ,

such that is the probability that after executing one iteration of the loop body on , the program is in state .

In general, a probability measure over a measurable space is a mapping , such that , , and for any pairwise disjoint . The triple is called a probability space.

In our setting, a loop induces not only one but a family of probability measures on the loop space . This family is again parameterized by the initial state in which the loop is started. Using the distributions above, we can first define the probability of a finite non–empty prefix of a run, i.e., for , is the probability that is the sequence of states reached after the first loop iterations, when starting the loop in state . Hence, the family

 ∙p:Σ→Σ+→[0,1]

of distributions on is defined by

where is the Kronecker delta, i.e., evaluates to  if  and to  otherwise. Using the family , we now obtain a canonical probability measure on the loop space.

###### Lemma 13 (Loop Measure [5]).

There exists a unique family of probability measures with

 sP(Cyl(π)) = sp(π) .

We now turn to random variables and their expected values. A mapping on a probability space is called (–)measurable or random variable if for any open set its preimage lies in , i.e., . If , then this is equivalent to checking for any . The expected value of a random variable is defined as .888Details on integrals for arbitrary measures can be found in Appendix B. If takes only countably many values we have

We saw that gives rise to a unique canonical measurable space and to a family of probability measures parameterized by the initial state on which our loop is started. We now define a corresponding parameterized expected value operator .

###### Definition 14 (Expected Value for Loops ∙E).

Let and be a random variable. The expected value of with respect to the loop measure , parameterized by state , is defined by .

Next, we define a random variable that corresponds to the number of iterations that our loop makes until it terminates.

###### Definition 15 (Looping Time).

The mapping

 T¬φ:Ω\tiny\rm loop→¯¯¯¯N, ϑ↦inf{n∈N∣ϑ[n]∈Σ¬φ} ,

is a random variable and called the looping time of . Here, .

The canonical –field that we defined for our loop contains runs, which are infinite sequences of states. But after  iterations of the loop we only know the first states of a run. Gaining knowledge in this successive fashion can be captured by a so–called filtration of the –field . In general, a filtration is a sequence of subsets of , such that and is itself a –field for any , i.e., is approximated from below.

###### Definition 16 (Loop Filtration).

The sequence with

 F\tiny\rm loopn = ⟨{Cyl(π)∣π∈Σ+, |π|=n+1}⟩σ ,

is a filtration of .

Next, we recall the notion of stopping times from probability theory. For a probability space  with filtration , a random variable is called a stopping time with respect to  if for every we have .

Let us reconsider the looping time and the loop filtration . In order to decide for a run whether its looping time is , we only need to consider the states . Hence, for any and thus is a stopping time with respect to .

Note that does not reflect the actual runtime of , as it does not take the runtime of the loop body into account. Instead, only counts the number of loop iterations of the “outer loop” . This enriches the class of probabilistic programs our technique will be able to analyze, as we will not need to require that the whole program has finite expected runtime, but only that the outer loop is expected to be executed finitely often.

### Iv-B Canonical Stochastic Process

From now on, we additionally fix two expectations . Intuitively, will later play the role of the postexpectation and the role of an invariant (i.e., is a sub– or superinvariant).

We now present a canonical stochastic process, i.e., a sequence of random variables that captures approximating the weakest preexpectation of with respect to the postexpectation , using the invariant .

###### Definition 17 (Induced Stochastic Process).

The stochastic process induced by , denoted , is given by

 Xf,In:Ω\tiny\rm loop→¯¯¯¯R≥0,ϑ↦{f(ϑ[T¬φ(ϑ)]),if T¬φ(ϑ)≤nI(ϑ[n+1]),if T¬φ(ϑ)>n .

Now, in what sense does the stochastic process capture approximating the weakest preexpectation of our loop with respect to by invariant ? takes as argument a run  of the loop, i.e., a sequence of states reached after each iteration of the loop body, and assigns to  a value as follows: If theloop has reached a terminal state within iterations,  returns the value of the postexpectation evaluated in that terminal state. If no such terminal state is reached within steps,  simply approximates the remainder of the run, i.e.,

 \definecolor[named]pgfstrokecolorrgb.5,.5,.5\pgfsys@color@gray@stroke.5\pgfsys@color@gray@fill.5ϑ[0]⋯ϑ[n]\underbracketϑ[n+1]ϑ[n+2]ϑ[n+3]⋯ ,

by returning the value of the invariant evaluated in . We see that needs at most the first states of a run to determine its value. Thus, is not –measurable but –measurable, as there exist runs that agree on the first states but yield different images under . Hence, we shift the loop filtration by one.

###### Definition 18 (Shifted Loop Filtration).

The filtration of  is defined by

 G\tiny\rm loopn \coloneqq F\tiny\rm loopn+1 = ⟨{Cyl(π)∣π∈Σ+, |π|=n+2}⟩σ .

Note that , so is a stopping time w.r.t.  as well.

###### Lemma 19 (Adaptedness of Induced Stochastic Process).

is adapted to , i.e., is –measurable.

The loop space, the loop measure, and the induced stochastic process  are not defined by induction on the number of steps performed in the program. The loop space, for instance, contains all infinite sequences of states, whether they are admissible or not. Only the loop measure then filters out the inadmissible runs and gives them probability 0.

Reasoning by invariants and characteristic functions, on the other hand, is inductive. We will thus relate iterating the characteristic function on to the stochastic process . For this, let again be the characteristic function of with respect to our postexpectation , i.e.,

We now develop a first connection between the stochastic process and the characteristic function , which involves the notion of conditional expected values with respect to a –field, for which we provide some preliminaries here. In general, for , the indicator function maps to if and to otherwise. is –measurable iff . If is a random variable on and is a –field with respect to , then the conditional expected value is a –measurable mapping such that for every the equality holds, i.e., restricted to the set the conditional expected value and have the same expected value. Hence, is a random variable that is like , but for elements that are indistinguishable in the subfield , it “distributes the value of equally”.

###### Theorem 20 (Relating Xf,I and Φf).

For any and any , we have

 sE(Xf,In+1∣∣G% \tiny\rm loopn) = Xf,Φf(I)n .

Note that both sides are mappings of type .

Intuitively, Theorem 20 expresses the following: Consider some cylinder , i.e., is a sequence of states of length . Then, independent of the initial state of the loop, the average value that takes on this cylinder with respect to the measure coincides with the average value of on that cylinder.

Using Theorem 20, one can now explain in which way iterating on represents an expected value, thus revealing the inductive structure inside the induced stochastic process:

###### Corollary 21 (Relating Expected Values of Xf,I and Iterations of Φf).

For any and any , we have

 sE(Xf,In) = Φn+1f(I)(s) .

Intuitively, represents allowing for at most evaluations of the loop guard. For any state , the number is composed of

• ’s average value on the final states of those runs starting in that terminate within guard evaluations, and

• ’s average value on the –nd states of those runs starting in that do not terminate within guard evaluations.

We now want to take to the limit to consider all possible numbers of iterations of the loop body. We will see that this corresponds to evaluating the stochastic process at the time when our loop terminates, i.e., the looping time :

###### Definition 22 (Canonical Stopped Process).

The mapping

 Xf,IT¬φ:Ω\tiny\rm loop→¯¯¯¯R≥0,ϑ↦{