Probabilistic Programs. Probabilistic programs are classical imperative programs extended with random value generators
that produce random values according to some desired probability distribution[34, 36, 16]. They provide the appropriate model for a wider variety of applications, such as analysis of stochastic network protocols [2, 22], robot planning 
, etc. General probabilistic programs induce infinite-state Markov processes with complex behaviours, so that the formal analysis is needed in critical situations. The formal analysis of probabilistic programs is an active research topic across different disciplines, such as probability theory and statistics[21, 31, 29], formal methods [2, 22]18], and programming languages [6, 13, 32, 12, 9].
Termination Problems. In this paper, we focus on proving termination properties of probabilistic programs. Termination is the most basic and fundamental notion of liveness for programs. For non-probabilistic programs, the proof of termination coincides with the construction of ranking functions , and many different approaches exist for such construction [4, 11, 30, 33]. For probabilistic programs the most natural and basic extensions of the termination problem are almost-sure termination and finite termination. First, the almost-sure termination problem asks whether the program terminates with probability 1. Second, the finite termination problem asks whether the expected termination time is finite. Finite termination implies almost-sure termination, while the converse is not true in general. Here we focus on the almost-sure termination problem.
Previous Results. Below we describe the most relevant previous results on termination of probabilistic programs.
to obtain a sound (but not complete) approach for almost-sure termination over infinite-state probabilistic programs with infinite-domain random variables, but without non-determinism. For countable state space probabilistic programs without non-determinism, the Lyapunov ranking functions provide a sound and complete method to prove finite termination[3, 15].
infinite probabilistic choices with non-determinism. In the presence of non-determinism, the Lyapunov-ranking-function method as well as the ranking-supermartingale method are sound but not complete . Different approaches based on martingales and proof rules have been studied for finite termination [13, 20]. The synthesis of linear and polynomial ranking supermartingales have been established [9, 8]. Approaches for high-probability termination and non-termination has also been considered . Recently, supermartingales and lexicographic ranking supermartingales have been considered for proving almost-sure termination of probabilistic programs [26, 1].
Note that the problem of deciding termination of probabilistic programs is undecidable , and its precise undecidability characterization has been investigated. Finite termination of recursive probabilistic programs has also been studied through proof rules .
Our Contributions. Now we formally describe our contributions. We consider probabilistic programs where all program variables are integer-valued. Our main contributions are three folds.
Almost-Sure Termination: Supermartingale-Based Approach. We show new results that supermartingales (i.e., not necessarily ranking supermartingales) with lower bounds on conditional absolute difference present a sound approach for proving almost-sure termination of probabilistic programs. Moreover, no previous supermartingale based approaches present explicit (optimal) bounds on tail probabilities of non-termination within a given number of steps.
Almost-Sure Termination: CLT-Based Approach. We present a new approach based on Central Limit Theorem (CLT) that is sound to establish almost-sure termination. The extra power of CLT allows one to prove probabilistic programs where no global lower bound exists for values of program variables, while previous approaches based on (ranking) supermartingales [13, 9, 8, 26, 1]. For example, when we consider the program and take the sampling variable to observe the probability distribution such that for all integers , then the value of could not be bounded from below during program execution; previous approaches fail on this example, while our CLT-based approach succeeds.
Algorithmic Methods. We discuss algorithmic methods for the two approaches we present, showing that we not only present general approaches for almost-sure termination, but possible automated analysis techniques as well.
Recent Related Work. In the recent work , supermartingales are also considered for proving almost-sure termination. The difference between our results and the paper are as follows. First, while the paper relaxes our conditions to obtain a more general result on almost-sure termination, our supermartingale-based approach can derive optimal tail bounds along with proving almost-sure termination. Second, our CLT-based approach can handle programs without lower bound on values of program variables, while the result in the paper requires a lower bound. We also note that our supermartingale-based results are independent of the paper (see arXiv versions  and [7, Theorem 5 and Theorem 6]). A more elaborate description of related works is put in Section 7.
Below we first introduce some basic notations and concepts in probability theory (see e.g. the standard textbook  for details), then present the syntax and semantics of our probabilistic programs.
2.1 Basic Notations and Concepts
In the whole paper, we use , , , and to denote the sets of all positive integers, non-negative integers, integers, and real numbers, respectively.
Probability Space. A probability space is a triple , where is a non-empty set (so-called sample space), is a -algebra over (i.e., a collection of subsets of that contains the empty set and is closed under complementation and countable union) and is a probability measure on , i.e., a function such that (i) and (ii) for all set-sequences that are pairwise-disjoint (i.e., whenever ) it holds that . Elements of are usually called events. We say an event holds almost-surely (a.s.) if .
Random Variables. [35, Chapter 1] A random variable from a probability space is an -measurable function , i.e., a function satisfying the condition that for all , the set belongs to ; is bounded if there exists a real number such that for all , we have and . By convention, we abbreviate as .
Expectation. The expected value of a random variable from a probability space , denoted by , is defined as the Lebesgue integral of w.r.t , i.e., ; the precise definition of Lebesgue integral is somewhat technical and is omitted here (cf. [35, Chapter 5] for a formal definition). In the case that the range of , is countable with distinct ’s, we have .
Characteristic Random Variables. Given random variables from a probability space and a predicate over , we denote by the random variable such that if holds, and otherwise.
By definition, . Note that if does not involve any random variable, then can be deemed as a constant whose value depends only on whether holds or not.
Filtrations and Stopping Times. A filtration of a probability space is an infinite sequence of -algebras over such that for all . A stopping time (from ) w.r.t is a random variable such that for every , the event belongs to .
Conditional Expectation. Let be any random variable from a probability space such that . Then given any -algebra , there exists a random variable (from ), conventionally denoted by , such that
is -measurable, and
for all , we have .
The random variable is called the conditional expectation of given . The random variable is a.s. unique in the sense that if is another random variable satisfying (E1)–(E3), then .
Discrete-Time Stochastic Processes. A discrete-time stochastic process is a sequence of random variables where ’s are all from some probability space (say, ); and is adapted to a filtration of sub--algebras of if for all , is -measurable.
Difference-Boundedness. A discrete-time stochastic process is difference-bounded if there is such that for all a.s..
Stopping Time . Given a discrete-time stochastic process adapted to a filtration , we define the random variable by where . By definition, is a stopping time w.r.t .
Martingales. A discrete-time stochastic process adapted to a filtration is a martingale (resp. supermartingale) if for every , and it holds a.s. that (resp. ). We refer to [35, Chapter 10] for more details.
Discrete Probability Distributions over Countable Support. A discrete probability distribution over a countable set is a function such that . The support of , is defined as .
2.2 The Syntax and Semantics for Probabilistic Programs
In the sequel, we fix two countable sets, the set of program variables and the set of sampling variables. W.l.o.g, these two sets are disjoint. Informally, program variables are the variables that are directly related to the control-flow and the data-flow of a program, while sampling variables reflect randomized inputs to programs. In this paper, we consider integer-valued variables, i.e., every program variable holds an integer upon instantiation, while every sampling variable is bound to a discrete probability distribution over integers. Possible extensions to real-valued variables are discussed in Section 5.
The Syntax. The syntax of probabilistic programs is illustrated by the grammar in Figure 1. Below we explain the grammar.
Variables. Expressions (resp. ) range over program (resp. sampling) variables.
Arithmetic Expressions. Expressions (resp. ) range over arithmetic expressions over both program and sampling variables (resp. program variables), respectively. As a theoretical paper, we do not fix the detailed syntax for and .
Boolean Expressions. Expressions range over propositional arithmetic predicates over program variables.
Programs. A program from could be either an assignment statement indicated by ‘’, or ‘skip’ which is the statement that does nothing, or a conditional branch indicated by the keyword ‘if’, or a while-loop indicated by the keyword ‘while’, or a sequential composition of statements connected by semicolon.
The syntax of our programming language is quite general and covers major features of probabilistic programming. For example, compared with a popular probabilistic-programming language from , the only difference between our syntax and theirs is that they have extra observe statements.∎
Single (Probabilistic) While Loops. In order to develop approaches for proving almost-sure termination of probabilistic programs, we first analyze the almost-sure termination of programs with a single while loop. Then, we demonstrate that the almost-sure termination of general probabilistic programs without nested loops can be obtained by the almost-sure termination of all components which are single while loops and loop-free statements (see Section 5). Formally, a single while loop is a program of the following form:
where is the loop guard from and is a loop-free program with possibly assignment statements, conditional branches, sequential composition but without while loops. Given a single while loop, we assign the program counter to the entry point of the while loop and the program counter to the terminating point of the loop. Below we give an example of a single while loop.
Consider the following single while loop:
where is a program variable and is a sampling variable that observes certain fixed distributions (e.g., a two-point distribution such that ). Informally, the program performs a random increment/decrement on until its value is no greater than zero.
The Semantics. Since our approaches for proving almost-sure termination work basically for single while loops (in Section 5 we extend to probabilistic programs without nested loops), we present the simplified semantics for single while loops.
We first introduce the notion of valuations which specify current values for program and sampling variables. Below we fix a single while loop in the form (1) and let (resp. ) be the set of program (resp. sampling) variables appearing in . The size of is denoted by , respectively. We impose arbitrary linear orders on both of so that and . We also require that for each sampling variable , a discrete probability distribution is given. Intuitively, at each loop iteration of , the value of is independently sampled w.r.t the distribution.
Valuations. A program valuation
is a (column) vector. Intuitively, a valuation specifies that for each , the value assigned is the -th coordinate of . Likewise, a sampling valuation is a (column) vector . A sampling function is a function assigning to every sampling variable a discrete probability distribution over . The discrete probability distribution over is defined by: .
For each program valuation , we say that satisfies the loop guard , denoted by , if the formula holds when every appearance of a program variable is replaced by its corresponding value in . Moreover, the loop body in encodes a function which transforms the program valuation before the execution of and the independently-sampled values in into the program valuation after the execution of .
Semantics of single while loops.
Now we present the semantics of single while loops. Informally, the semantics is defined by a Markov chain, where the state space is a set of pairs of location and sampled values and the probability transition function will be clarified later. We call states in configurations. A path under the Markov chain is an infinite sequence of configurations. The intuition is that in a path, each (resp. ) is the current program valuation (the current program counter to be executed) right before the -th execution step of . Then given an initial configuration , the probability space for is constructed as the standard one for its Markov chain over paths (for details see [2, Chatper 10]). We shall denote by the probability measure (over the -algebra of subsets of paths) in the probability space for (from some fixed initial program valuation ).
Consider any initial program valuation . The execution of the single while loop from results in a path as follows. Initially, and . Then at each step , the following two operations are performed. First, a sampling valuation is obtained through samplings for all sampling variables, where the value for each sampling variable observes a predefined discrete probability distribution for the variable. Second, we clarify three cases below:
if and , then the program enters the loop and we have , , and thus we simplify the executions of as a single computation step;
if and , then the program enters the terminating program counter and we have , ;
if then the program stays at the program counter and we have , .
Based on the informal description, we now formally define the probability transition function P:
, for any such that ;
for any such that ;
for any ;
for all other cases.
We note that the semantics for general probabilistic programs can be defined in the same principle as for single while loops with the help of transition structures or control-flow graphs (see [9, 8]).
Almost-Sure Termination. In the following, we define the notion of almost-sure termination over single while loops. Consider a single while loop . The termination-time random variable is defined such that for any path , the value of at the path is , where . Then is said to be almost-surely terminating (from some prescribed initial program valuation ) if . Besides, we also consider bounds on tail probabilities of non-termination within loop-iterations. Tail bounds are important quantitative aspects that characterizes how fast the program terminates.
3 Supermartingale Based Approach
In this section, we present our supermartingale-based approach for proving almost-sure termination of single while loops. We first establish new mathematical results on supermartingales, then we show how to apply these results to obtain a sound approach for proving almost-sure termination.
The following proposition is our first new mathematical result.
Proposition 1 (Difference-bounded Supermartingales)
Consider any difference-bounded supermartingale adapted to a filtration satisfying the following conditions:
is a constant random variable;
for all , it holds for all that (i) and (ii) implies ;
Lower Bound on Conditional Absolute Difference (LBCAD). there exists such that for all , it holds a.s. that implies .
Then and the function .
Informally, the LBCAD condition requires that the stochastic process should have a minimal amount of vibrations at each step. The amount is the least amount that the stochastic process should change on its value in the next step (eg, is not allowed). Then it is intuitively true that if the stochastic process does not increase in expectation (i.e., a supermartingale) and satisfies the LBCAD condition, then we have at some point the stochastic processes will drop below zero. The formal proof ideas are as follows.
Key Proof Ideas. The main idea is a thorough analysis of the martingale
for some sufficiently small and its limit through Optional Stopping Theorem (cf. Theorem 0.B.1 in the appendix). We first prove that is indeed a martingale. The difference-boundedness ensures that the martingale is well-defined. Then by letting , we prove that through Optional Stopping Theorem and the LBCAD condition . Third, we prove from basic definitions and the LBCAD condition that
By setting for sufficiently large , one has that
It follows that .∎
Optimality of Proposition 2. We now present two examples to illustrate two aspects of optimality of Proposition 1. First, in Example 2 we show an application on the classical symmetric random walk that the tail bound of Proposition 1 is optimal. Then in Example 3 we establish that the always non-negativity condition required in the second item of Proposition 1 is critical (i.e., the result does not hold without the condition).
Consider the family of independent random variables defined as follows: and each () satisfies that and . Let the stochastic process be inductively defined by: . is difference bounded since is bounded. For all we have . Choose the filtration such that every is the smallest -algebra that makes measurable. Then models the classical symmetric random walk and implies a.s. Thus, ensures the LBCAD condition. From Proposition 1, we obtain that and . It follows from [5, Theorem 4.1] that . Hence,the tail bound in Proposition 1 is optimal.∎
In Proposition 1, the condition that is necessary; in other words, it is necessary to have rather than when . This can be observed as follows. Consider the discrete-time stochastic processes and given as follows:
the random variables are independent, is the random variable with constant value and each () satisfies that and ;
Let be the filtration which is the smallest -algebra that makes measurable for every . Then one can show that (adapted to ) satisfies integrability and the LBCAD condition, but . Detailed justifications are available in Appendix 0.B.∎
In the following, we illustrate how one can apply Proposition 1 to prove almost-sure termination of single while loops. Below we fix a single while loop in the form (1). We first introduce the notion of supermartingale maps which are a special class of functions over configurations that subjects to supermartingale-like constraints.
Definition 1 (Supermartingale Maps)
A (difference-bounded) supermartingale map (for ) is a function satisfying that there exist real numbers such that for all configurations , the following conditions hold:
if then ;
if and , then (i) and (ii) for all ;
if and then
(for difference-boundedness) for all and such that , and for all and such that and .
Thus, is a supermartingale map if conditions (D1)–(D3) hold. Furthermore, is difference bounded if in extra (D4) holds.
Intuitively, the conditions (D1),(D2) together ensure non-negativity for the function . Moreover, the difference between “” in (D1) and “” in (D2) ensures that is positive iff the program still executes in the loop. The condition (D3.1) ensures the supermartingale condition for that the next expected value does not increase, while the condition (D3.2) says that the expected value of the absolute change between the current and the next step is at least , relating to the same amount in the LBCAD condition. Finally, the condition (D4) corresponds to the difference-boundedness in supermartingales in the sense that it requires the change of value both after the loop iteration and right before the termination of the loop should be bounded by the upper bound .
Now we state the main theorem of this section which says that the existence of a difference-bounded supermartingale map implies almost-sure termination.
Theorem 3.1 (Soundness)
If there exists a difference-bounded supermartingale map for , then for any initial valuation we have and .
Key Proof Ideas. Let be any difference-bounded supermartingale map for the single while loop program , be any initial valuation and be the parameters in Definition 1. We define the stochastic process adapted to by where (resp. ) refers to the random variable (resp. the vector of random variables) for the program counter (resp. program valuation) at the th step. Then terminates iff stops. We prove that satisfies the conditions in Proposition 1, so that is almost-surely terminating with the same tail bound.
Theorem 3.1 suggests that to prove almost-sure termination, one only needs to find a difference-bounded supermartingale map.
Informally, Theorem 3.1 can be used to prove almost-sure termination of while loops where there exists a distance function (as a supermartingale map) that measures the distance of the loop to termination, for which the distance does not increase in expectation and is changed by a minimal amount in each loop iteration. The key idea to apply Theorem 3.1 is to construct such a distance function.
Below we illustrate an example.
Consider the single while loop in Example 1 where the distribution for is given as and this program can be viewed as non-biased random walks. The program has infinite expected termination so previous approach based on ranking supermartingales cannot apply. Below we prove the almost-sure termination of the program. We define the difference-bounded supermartingale map by: and for every . Let . Then for every , we have that
the condition (D1) is valid by the definition of ;
if and , then and for all . Then the condition (D2) is valid;
if and , then and . Thus, we have that the condition (D3) is valid.
The condition (D4) is clear as the difference is less than .
It follows that is a difference-bounded supermartingale map. Then by Theorem 3.1 it holds that the program terminates almost-surely under any initial value with tail probabilities bounded by reciprocal of square root of the thresholds. By similar arguments, we can show that the results still hold when we consider that the distribution of
in general has bounded range, non-positive mean value and non-zero variance by lettingfor some sufficiently large constant .∎
Now we extend Proposition 1 to general supermartingales. The extension lifts the difference-boundedness condition but derives with a weaker tail bound.
Proposition 2 (General Supermartingales)
Consider any supermartingale adapted to a filtration satisfying the following conditions:
is a constant random variable;
for all , it holds for all that (i) and (ii) implies ;
(LBCAD). there exists such that for all , it holds a.s. that implies .
Then and the function .
Key Proof Ideas. The key idea is to extend the proof of Proposition 1 with the stopping times ’s () defined by . For any , we first define a new stochastic process by for all . Then we define the discrete-time stochastic process by
for some appropriate positive real number . We prove that is still a martingale. Then from Optional Stopping Theorem, by letting , we also have . Thus, we can also obtain similarly that
For and , we obtain . Hence, . By Optional Stopping Theorem, we have . Furthermore, we have by Markov’s Inequality that . Thus, for sufficiently large with , we can deduce that .∎
Similar to Theorem 3.1, we can establish a soundness result for general supermartingales. The result simply says that the existence of a (not necessarily difference-bounded) supermartingale map implies almost-sure termination and a weaker tail bound .
The following example illustrates the application of Proposition 2 on a single while loop with unbounded difference.
Consider the following single while loop program
where the distribution for is given as . The supermartingale map is defined as the one in Example 4. In this program, is not difference-bounded as is not bounded. Thus, satisfies the conditions except (D4) in Definition 1. We now construct a stochastic process which meets the requirements of Proposition 2. It follows that the program terminates almost-surely under any initial value with tail probabilities bounded by . In general, if observes a distribution with bounded range , non-positive mean and non-zero variance, then we can still prove the same result as follows. We choose a sufficiently large constant so that the function with is still a supermartingale map since the non-negativity of for all . ∎
4 Central Limit Theorem Based Approach
We have seen in the previous section a supermartingale-based approach for proving almost-sure termination. However by Example 3, an inherent restriction is that the supermartingale should be non-negative. In this section, we propose a new approach through Central Limit Theorem that can drop this requirement but requires in extra an independence condition.
We first state the well-known Central Limit Theorem [35, Chapter 18].
Theorem 4.1 (Lindeberg-Lévy’s Central Limit Theorem)
Suppose is a sequence of independent and identically distributed random variables with and is finite. Then as approaches infinity, the random variables converge in distribution to a normal . In the case , we have for every real number
where is the standard normal cumulative distribution functions evaluated at
is the standard normal cumulative distribution functions evaluated at.
The following lemma is key to our approach, proved by Central Limit Theorem.
Let be a sequence of independent and identically distributed random variables with expected value and finite variance for every . For every , let be a discrete-time stochastic process, where and for . Then there exists a constant , for any , we have .
According to the Central Limit Theorem (Theorem 4.1),
holds for every real number . Note that
Choose . Then we have when . Now we fix a proper , and get from the limit form equation such that for all we have
Since implies , we obtain that for every .
Incremental Single While Loops. Due to the independence condition required by Central Limit Theorem, we need to consider special classes of single while loops. We say that a single while loop in the form (1) is incremental if is a sequential composition of assignment statements of the form where is a program variable, ’s are sampling variables and ’s are constant coefficients for sampling variables. We then consider incremental single while loops. For incremental single while loops, the function for the loop body is incremental, i.e., for some constant matrix .
By Example 3, previous approaches cannot handle incremental single while loops with unbounded range of sampling variables (so that a supermartingale with a lower bound on its values may not exist). On the other hand, any additional syntax such as conditional branches or assignment statements like will result in an increment over certain program variables that is dependent on the previous executions of the program, breaking the independence condition.
To prove almost-sure termination of incremental single while loops through Central Limit Theorem, we introduce the notion of linear progress functions. Below we fix an incremental single while loop in the form (1).
Definition 2 (Linear Progress Functions)
A linear progress function for is a function satisfying the following conditions:
there exists and such that for all program valuations ;
for all program valuations , if then ;
and , where
(resp. ) is the mean (resp. variance) of the distribution , for .
Intuitively, the condition (L1) says that the function should be linear; the condition (L2) specifies that if the value of is non-positive, then the program terminates; the condition (L3) enforces that the mean of should be non-positive, while its variance should be non-zero. The main theorem of this section is then as follows.
Theorem 4.2 (Soundness)
For any incremental single while loop program , if there exists a linear progress function for , then for any initial valuation we have .
Let be a linear progress function for . We define the stochastic process by , where is the vector of random variables that represents the program valuation at the th execution step of . Define . We have for . Thus, is a sequence of independent and identically distributed random variables. We have and by the independency of ’s and the condition (L3) in Definition 2. Now we can apply Lemma 1 and obtain that there exists a constant such that for any initial program valuation , we have . By the recurrence property of Markov chain, we have is almost-surely stopping. Notice that from (L2), implies and (in the next step) termination of the single while loop. Hence,we have that is almost-surely terminating under any initial program valuation .∎
Theorem 4.2 can be applied to prove almost-sure termination of while loops whose increments are independent, but the value change in one iteration is not bounded. Thus, Theorem 4.2 can handle programs which Theorem 3.1 and Proposition 2 as well as previous supermartingale-based methods cannot.
In the following, we present several examples, showing that Theorem 4.2 can handle sampling variables with unbounded range which previous approaches cannot handle.
Consider the program in Example 1 where we let
be a two-sided geometric distribution sampling variable such thatand for some . First note that by the approach in , we can prove that this program has infinite expected termination time, and thus previous ranking-supermartingale based approach cannot be applied. Also note that the value that may take has no lower bound. This means that we can hardly obtain the almost-sure termination by finding a proper supermartingale map that satisfy both the non-negativity condition and the non-increasing condition. Now we apply Theorem 4.2. Choose . It follows directly that both (L1) and (L2) hold. Since for symmetric property and where is the standard geometric distribution with parameter , we have (L3) holds. Thus, is a legal linear progress function and this program is almost-sure terminating by Theorem 4.2.∎
Consider the following program with a more complex loop guard.
This program terminates when the point on the plane leaves the area above the parabola by a two-dimensional random walk. We suppose that are both positive and