1 Introduction
Probabilistic Programs. Probabilistic programs are classical imperative programs extended with random value generators
that produce random values according to some desired probability distribution
[34, 36, 16]. They provide the appropriate model for a wider variety of applications, such as analysis of stochastic network protocols [2, 22], robot planning [17], etc. General probabilistic programs induce infinitestate Markov processes with complex behaviours, so that the formal analysis is needed in critical situations. The formal analysis of probabilistic programs is an active research topic across different disciplines, such as probability theory and statistics
[21, 31, 29], formal methods [2, 22][18], and programming languages [6, 13, 32, 12, 9].Termination Problems. In this paper, we focus on proving termination properties of probabilistic programs. Termination is the most basic and fundamental notion of liveness for programs. For nonprobabilistic programs, the proof of termination coincides with the construction of ranking functions [14], and many different approaches exist for such construction [4, 11, 30, 33]. For probabilistic programs the most natural and basic extensions of the termination problem are almostsure termination and finite termination. First, the almostsure termination problem asks whether the program terminates with probability 1. Second, the finite termination problem asks whether the expected termination time is finite. Finite termination implies almostsure termination, while the converse is not true in general. Here we focus on the almostsure termination problem.
Previous Results. Below we describe the most relevant previous results on termination of probabilistic programs.

infinite probabilistic choices without nondeterminism. The approach in [23, 24] was extended in [6] to ranking supermartingales
to obtain a sound (but not complete) approach for almostsure termination over infinitestate probabilistic programs with infinitedomain random variables, but without nondeterminism. For countable state space probabilistic programs without nondeterminism, the Lyapunov ranking functions provide a sound and complete method to prove finite termination
[3, 15]. 
infinite probabilistic choices with nondeterminism. In the presence of nondeterminism, the Lyapunovrankingfunction method as well as the rankingsupermartingale method are sound but not complete [13]. Different approaches based on martingales and proof rules have been studied for finite termination [13, 20]. The synthesis of linear and polynomial ranking supermartingales have been established [9, 8]. Approaches for highprobability termination and nontermination has also been considered [10]. Recently, supermartingales and lexicographic ranking supermartingales have been considered for proving almostsure termination of probabilistic programs [26, 1].
Note that the problem of deciding termination of probabilistic programs is undecidable [19], and its precise undecidability characterization has been investigated. Finite termination of recursive probabilistic programs has also been studied through proof rules [28].
Our Contributions. Now we formally describe our contributions. We consider probabilistic programs where all program variables are integervalued. Our main contributions are three folds.

AlmostSure Termination: SupermartingaleBased Approach. We show new results that supermartingales (i.e., not necessarily ranking supermartingales) with lower bounds on conditional absolute difference present a sound approach for proving almostsure termination of probabilistic programs. Moreover, no previous supermartingale based approaches present explicit (optimal) bounds on tail probabilities of nontermination within a given number of steps.

AlmostSure Termination: CLTBased Approach. We present a new approach based on Central Limit Theorem (CLT) that is sound to establish almostsure termination. The extra power of CLT allows one to prove probabilistic programs where no global lower bound exists for values of program variables, while previous approaches based on (ranking) supermartingales [13, 9, 8, 26, 1]. For example, when we consider the program and take the sampling variable to observe the probability distribution such that for all integers , then the value of could not be bounded from below during program execution; previous approaches fail on this example, while our CLTbased approach succeeds.

Algorithmic Methods. We discuss algorithmic methods for the two approaches we present, showing that we not only present general approaches for almostsure termination, but possible automated analysis techniques as well.
Recent Related Work. In the recent work [26], supermartingales are also considered for proving almostsure termination. The difference between our results and the paper are as follows. First, while the paper relaxes our conditions to obtain a more general result on almostsure termination, our supermartingalebased approach can derive optimal tail bounds along with proving almostsure termination. Second, our CLTbased approach can handle programs without lower bound on values of program variables, while the result in the paper requires a lower bound. We also note that our supermartingalebased results are independent of the paper (see arXiv versions [25] and [7, Theorem 5 and Theorem 6]). A more elaborate description of related works is put in Section 7.
2 Preliminaries
Below we first introduce some basic notations and concepts in probability theory (see e.g. the standard textbook [35] for details), then present the syntax and semantics of our probabilistic programs.
2.1 Basic Notations and Concepts
In the whole paper, we use , , , and to denote the sets of all positive integers, nonnegative integers, integers, and real numbers, respectively.
Probability Space. A probability space is a triple , where is a nonempty set (socalled sample space), is a algebra over (i.e., a collection of subsets of that contains the empty set and is closed under complementation and countable union) and is a probability measure on , i.e., a function such that (i) and (ii) for all setsequences that are pairwisedisjoint (i.e., whenever ) it holds that . Elements of are usually called events. We say an event holds almostsurely (a.s.) if .
Random Variables. [35, Chapter 1] A random variable from a probability space is an measurable function , i.e., a function satisfying the condition that for all , the set belongs to ; is bounded if there exists a real number such that for all , we have and . By convention, we abbreviate as .
Expectation. The expected value of a random variable from a probability space , denoted by , is defined as the Lebesgue integral of w.r.t , i.e., ; the precise definition of Lebesgue integral is somewhat technical and is omitted here (cf. [35, Chapter 5] for a formal definition). In the case that the range of , is countable with distinct ’s, we have .
Characteristic Random Variables. Given random variables from a probability space and a predicate over , we denote by the random variable such that if holds, and otherwise.
By definition, . Note that if does not involve any random variable, then can be deemed as a constant whose value depends only on whether holds or not.
Filtrations and Stopping Times. A filtration of a probability space is an infinite sequence of algebras over such that for all . A stopping time (from ) w.r.t is a random variable such that for every , the event belongs to .
Conditional Expectation. Let be any random variable from a probability space such that . Then given any algebra , there exists a random variable (from ), conventionally denoted by , such that

is measurable, and

, and

for all , we have .
The random variable is called the conditional expectation of given . The random variable is a.s. unique in the sense that if is another random variable satisfying (E1)–(E3), then .
DiscreteTime Stochastic Processes. A discretetime stochastic process is a sequence of random variables where ’s are all from some probability space (say, ); and is adapted to a filtration of subalgebras of if for all , is measurable.
DifferenceBoundedness. A discretetime stochastic process is differencebounded if there is such that for all a.s..
Stopping Time . Given a discretetime stochastic process adapted to a filtration , we define the random variable by where . By definition, is a stopping time w.r.t .
Martingales. A discretetime stochastic process adapted to a filtration is a martingale (resp. supermartingale) if for every , and it holds a.s. that (resp. ). We refer to [35, Chapter 10] for more details.
Discrete Probability Distributions over Countable Support. A discrete probability distribution over a countable set is a function such that . The support of , is defined as .
2.2 The Syntax and Semantics for Probabilistic Programs
In the sequel, we fix two countable sets, the set of program variables and the set of sampling variables. W.l.o.g, these two sets are disjoint. Informally, program variables are the variables that are directly related to the controlflow and the dataflow of a program, while sampling variables reflect randomized inputs to programs. In this paper, we consider integervalued variables, i.e., every program variable holds an integer upon instantiation, while every sampling variable is bound to a discrete probability distribution over integers. Possible extensions to realvalued variables are discussed in Section 5.
The Syntax. The syntax of probabilistic programs is illustrated by the grammar in Figure 1. Below we explain the grammar.

Variables. Expressions (resp. ) range over program (resp. sampling) variables.

Arithmetic Expressions. Expressions (resp. ) range over arithmetic expressions over both program and sampling variables (resp. program variables), respectively. As a theoretical paper, we do not fix the detailed syntax for and .

Boolean Expressions. Expressions range over propositional arithmetic predicates over program variables.

Programs. A program from could be either an assignment statement indicated by ‘’, or ‘skip’ which is the statement that does nothing, or a conditional branch indicated by the keyword ‘if’, or a whileloop indicated by the keyword ‘while’, or a sequential composition of statements connected by semicolon.
Remark 1
The syntax of our programming language is quite general and covers major features of probabilistic programming. For example, compared with a popular probabilisticprogramming language from [16], the only difference between our syntax and theirs is that they have extra observe statements.∎
Single (Probabilistic) While Loops. In order to develop approaches for proving almostsure termination of probabilistic programs, we first analyze the almostsure termination of programs with a single while loop. Then, we demonstrate that the almostsure termination of general probabilistic programs without nested loops can be obtained by the almostsure termination of all components which are single while loops and loopfree statements (see Section 5). Formally, a single while loop is a program of the following form:
(1) 
where is the loop guard from and is a loopfree program with possibly assignment statements, conditional branches, sequential composition but without while loops. Given a single while loop, we assign the program counter to the entry point of the while loop and the program counter to the terminating point of the loop. Below we give an example of a single while loop.
Example 1
Consider the following single while loop:
where is a program variable and is a sampling variable that observes certain fixed distributions (e.g., a twopoint distribution such that ). Informally, the program performs a random increment/decrement on until its value is no greater than zero.
The Semantics. Since our approaches for proving almostsure termination work basically for single while loops (in Section 5 we extend to probabilistic programs without nested loops), we present the simplified semantics for single while loops.
We first introduce the notion of valuations which specify current values for program and sampling variables. Below we fix a single while loop in the form (1) and let (resp. ) be the set of program (resp. sampling) variables appearing in . The size of is denoted by , respectively. We impose arbitrary linear orders on both of so that and . We also require that for each sampling variable , a discrete probability distribution is given. Intuitively, at each loop iteration of , the value of is independently sampled w.r.t the distribution.
Valuations. A program valuation
is a (column) vector
. Intuitively, a valuation specifies that for each , the value assigned is the th coordinate of . Likewise, a sampling valuation is a (column) vector . A sampling function is a function assigning to every sampling variable a discrete probability distribution over . The discrete probability distribution over is defined by: .For each program valuation , we say that satisfies the loop guard , denoted by , if the formula holds when every appearance of a program variable is replaced by its corresponding value in . Moreover, the loop body in encodes a function which transforms the program valuation before the execution of and the independentlysampled values in into the program valuation after the execution of .
Semantics of single while loops.
Now we present the semantics of single while loops. Informally, the semantics is defined by a Markov chain
, where the state space is a set of pairs of location and sampled values and the probability transition function will be clarified later. We call states in configurations. A path under the Markov chain is an infinite sequence of configurations. The intuition is that in a path, each (resp. ) is the current program valuation (the current program counter to be executed) right before the th execution step of . Then given an initial configuration , the probability space for is constructed as the standard one for its Markov chain over paths (for details see [2, Chatper 10]). We shall denote by the probability measure (over the algebra of subsets of paths) in the probability space for (from some fixed initial program valuation ).Consider any initial program valuation . The execution of the single while loop from results in a path as follows. Initially, and . Then at each step , the following two operations are performed. First, a sampling valuation is obtained through samplings for all sampling variables, where the value for each sampling variable observes a predefined discrete probability distribution for the variable. Second, we clarify three cases below:

if and , then the program enters the loop and we have , , and thus we simplify the executions of as a single computation step;

if and , then the program enters the terminating program counter and we have , ;

if then the program stays at the program counter and we have , .
Based on the informal description, we now formally define the probability transition function P:

, for any such that ;

for any such that ;

for any ;

for all other cases.
We note that the semantics for general probabilistic programs can be defined in the same principle as for single while loops with the help of transition structures or controlflow graphs (see [9, 8]).
AlmostSure Termination. In the following, we define the notion of almostsure termination over single while loops. Consider a single while loop . The terminationtime random variable is defined such that for any path , the value of at the path is , where . Then is said to be almostsurely terminating (from some prescribed initial program valuation ) if . Besides, we also consider bounds on tail probabilities of nontermination within loopiterations. Tail bounds are important quantitative aspects that characterizes how fast the program terminates.
3 Supermartingale Based Approach
In this section, we present our supermartingalebased approach for proving almostsure termination of single while loops. We first establish new mathematical results on supermartingales, then we show how to apply these results to obtain a sound approach for proving almostsure termination.
The following proposition is our first new mathematical result.
Proposition 1 (Differencebounded Supermartingales)
Consider any differencebounded supermartingale adapted to a filtration satisfying the following conditions:

is a constant random variable;

for all , it holds for all that (i) and (ii) implies ;

Lower Bound on Conditional Absolute Difference (LBCAD). there exists such that for all , it holds a.s. that implies .
Then and the function .
Informally, the LBCAD condition requires that the stochastic process should have a minimal amount of vibrations at each step. The amount is the least amount that the stochastic process should change on its value in the next step (eg, is not allowed). Then it is intuitively true that if the stochastic process does not increase in expectation (i.e., a supermartingale) and satisfies the LBCAD condition, then we have at some point the stochastic processes will drop below zero. The formal proof ideas are as follows.
Key Proof Ideas. The main idea is a thorough analysis of the martingale
for some sufficiently small and its limit through Optional Stopping Theorem (cf. Theorem 0.B.1 in the appendix). We first prove that is indeed a martingale. The differenceboundedness ensures that the martingale is welldefined. Then by letting , we prove that through Optional Stopping Theorem and the LBCAD condition . Third, we prove from basic definitions and the LBCAD condition that
By setting for sufficiently large , one has that
It follows that .∎
Optimality of Proposition 2. We now present two examples to illustrate two aspects of optimality of Proposition 1. First, in Example 2 we show an application on the classical symmetric random walk that the tail bound of Proposition 1 is optimal. Then in Example 3 we establish that the always nonnegativity condition required in the second item of Proposition 1 is critical (i.e., the result does not hold without the condition).
Example 2
Consider the family of independent random variables defined as follows: and each () satisfies that and . Let the stochastic process be inductively defined by: . is difference bounded since is bounded. For all we have . Choose the filtration such that every is the smallest algebra that makes measurable. Then models the classical symmetric random walk and implies a.s. Thus, ensures the LBCAD condition. From Proposition 1, we obtain that and . It follows from [5, Theorem 4.1] that . Hence,the tail bound in Proposition 1 is optimal.∎
Example 3
In Proposition 1, the condition that is necessary; in other words, it is necessary to have rather than when . This can be observed as follows. Consider the discretetime stochastic processes and given as follows:

the random variables are independent, is the random variable with constant value and each () satisfies that and ;

for .
Let be the filtration which is the smallest algebra that makes measurable for every . Then one can show that (adapted to ) satisfies integrability and the LBCAD condition, but . Detailed justifications are available in Appendix 0.B.∎
In the following, we illustrate how one can apply Proposition 1 to prove almostsure termination of single while loops. Below we fix a single while loop in the form (1). We first introduce the notion of supermartingale maps which are a special class of functions over configurations that subjects to supermartingalelike constraints.
Definition 1 (Supermartingale Maps)
A (differencebounded) supermartingale map (for ) is a function satisfying that there exist real numbers such that for all configurations , the following conditions hold:

if then ;

if and , then (i) and (ii) for all ;

if and then

, and

where ;


(for differenceboundedness) for all and such that , and for all and such that and .
Thus, is a supermartingale map if conditions (D1)–(D3) hold. Furthermore, is difference bounded if in extra (D4) holds.
Intuitively, the conditions (D1),(D2) together ensure nonnegativity for the function . Moreover, the difference between “” in (D1) and “” in (D2) ensures that is positive iff the program still executes in the loop. The condition (D3.1) ensures the supermartingale condition for that the next expected value does not increase, while the condition (D3.2) says that the expected value of the absolute change between the current and the next step is at least , relating to the same amount in the LBCAD condition. Finally, the condition (D4) corresponds to the differenceboundedness in supermartingales in the sense that it requires the change of value both after the loop iteration and right before the termination of the loop should be bounded by the upper bound .
Now we state the main theorem of this section which says that the existence of a differencebounded supermartingale map implies almostsure termination.
Theorem 3.1 (Soundness)
If there exists a differencebounded supermartingale map for , then for any initial valuation we have and .
Key Proof Ideas. Let be any differencebounded supermartingale map for the single while loop program , be any initial valuation and be the parameters in Definition 1. We define the stochastic process adapted to by where (resp. ) refers to the random variable (resp. the vector of random variables) for the program counter (resp. program valuation) at the th step. Then terminates iff stops. We prove that satisfies the conditions in Proposition 1, so that is almostsurely terminating with the same tail bound.
Theorem 3.1 suggests that to prove almostsure termination, one only needs to find a differencebounded supermartingale map.
Remark 2
Informally, Theorem 3.1 can be used to prove almostsure termination of while loops where there exists a distance function (as a supermartingale map) that measures the distance of the loop to termination, for which the distance does not increase in expectation and is changed by a minimal amount in each loop iteration. The key idea to apply Theorem 3.1 is to construct such a distance function.
Below we illustrate an example.
Example 4
Consider the single while loop in Example 1 where the distribution for is given as and this program can be viewed as nonbiased random walks. The program has infinite expected termination so previous approach based on ranking supermartingales cannot apply. Below we prove the almostsure termination of the program. We define the differencebounded supermartingale map by: and for every . Let . Then for every , we have that

the condition (D1) is valid by the definition of ;

if and , then and for all . Then the condition (D2) is valid;

if and , then and . Thus, we have that the condition (D3) is valid.

The condition (D4) is clear as the difference is less than .
It follows that is a differencebounded supermartingale map. Then by Theorem 3.1 it holds that the program terminates almostsurely under any initial value with tail probabilities bounded by reciprocal of square root of the thresholds. By similar arguments, we can show that the results still hold when we consider that the distribution of
in general has bounded range, nonpositive mean value and nonzero variance by letting
for some sufficiently large constant .∎Now we extend Proposition 1 to general supermartingales. The extension lifts the differenceboundedness condition but derives with a weaker tail bound.
Proposition 2 (General Supermartingales)
Consider any supermartingale adapted to a filtration satisfying the following conditions:

is a constant random variable;

for all , it holds for all that (i) and (ii) implies ;

(LBCAD). there exists such that for all , it holds a.s. that implies .
Then and the function .
Key Proof Ideas. The key idea is to extend the proof of Proposition 1 with the stopping times ’s () defined by . For any , we first define a new stochastic process by for all . Then we define the discretetime stochastic process by
for some appropriate positive real number . We prove that is still a martingale. Then from Optional Stopping Theorem, by letting , we also have . Thus, we can also obtain similarly that
For and , we obtain . Hence, . By Optional Stopping Theorem, we have . Furthermore, we have by Markov’s Inequality that . Thus, for sufficiently large with , we can deduce that .∎
Remark 3
Similar to Theorem 3.1, we can establish a soundness result for general supermartingales. The result simply says that the existence of a (not necessarily differencebounded) supermartingale map implies almostsure termination and a weaker tail bound .
The following example illustrates the application of Proposition 2 on a single while loop with unbounded difference.
Example 5
Consider the following single while loop program
where the distribution for is given as . The supermartingale map is defined as the one in Example 4. In this program, is not differencebounded as is not bounded. Thus, satisfies the conditions except (D4) in Definition 1. We now construct a stochastic process which meets the requirements of Proposition 2. It follows that the program terminates almostsurely under any initial value with tail probabilities bounded by . In general, if observes a distribution with bounded range , nonpositive mean and nonzero variance, then we can still prove the same result as follows. We choose a sufficiently large constant so that the function with is still a supermartingale map since the nonnegativity of for all . ∎
4 Central Limit Theorem Based Approach
We have seen in the previous section a supermartingalebased approach for proving almostsure termination. However by Example 3, an inherent restriction is that the supermartingale should be nonnegative. In this section, we propose a new approach through Central Limit Theorem that can drop this requirement but requires in extra an independence condition.
We first state the wellknown Central Limit Theorem [35, Chapter 18].
Theorem 4.1 (LindebergLévy’s Central Limit Theorem)
Suppose is a sequence of independent and identically distributed random variables with and is finite. Then as approaches infinity, the random variables converge in distribution to a normal . In the case , we have for every real number
where
is the standard normal cumulative distribution functions evaluated at
.The following lemma is key to our approach, proved by Central Limit Theorem.
Lemma 1
Let be a sequence of independent and identically distributed random variables with expected value and finite variance for every . For every , let be a discretetime stochastic process, where and for . Then there exists a constant , for any , we have .
Proof
According to the Central Limit Theorem (Theorem 4.1),
holds for every real number . Note that
Choose . Then we have when . Now we fix a proper , and get from the limit form equation such that for all we have
Since implies , we obtain that for every .
Incremental Single While Loops. Due to the independence condition required by Central Limit Theorem, we need to consider special classes of single while loops. We say that a single while loop in the form (1) is incremental if is a sequential composition of assignment statements of the form where is a program variable, ’s are sampling variables and ’s are constant coefficients for sampling variables. We then consider incremental single while loops. For incremental single while loops, the function for the loop body is incremental, i.e., for some constant matrix .
Remark 4
By Example 3, previous approaches cannot handle incremental single while loops with unbounded range of sampling variables (so that a supermartingale with a lower bound on its values may not exist). On the other hand, any additional syntax such as conditional branches or assignment statements like will result in an increment over certain program variables that is dependent on the previous executions of the program, breaking the independence condition.
To prove almostsure termination of incremental single while loops through Central Limit Theorem, we introduce the notion of linear progress functions. Below we fix an incremental single while loop in the form (1).
Definition 2 (Linear Progress Functions)
A linear progress function for is a function satisfying the following conditions:

there exists and such that for all program valuations ;

for all program valuations , if then ;

and , where

,

(resp. ) is the mean (resp. variance) of the distribution , for .

Intuitively, the condition (L1) says that the function should be linear; the condition (L2) specifies that if the value of is nonpositive, then the program terminates; the condition (L3) enforces that the mean of should be nonpositive, while its variance should be nonzero. The main theorem of this section is then as follows.
Theorem 4.2 (Soundness)
For any incremental single while loop program , if there exists a linear progress function for , then for any initial valuation we have .
Proof
Let be a linear progress function for . We define the stochastic process by , where is the vector of random variables that represents the program valuation at the th execution step of . Define . We have for . Thus, is a sequence of independent and identically distributed random variables. We have and by the independency of ’s and the condition (L3) in Definition 2. Now we can apply Lemma 1 and obtain that there exists a constant such that for any initial program valuation , we have . By the recurrence property of Markov chain, we have is almostsurely stopping. Notice that from (L2), implies and (in the next step) termination of the single while loop. Hence,we have that is almostsurely terminating under any initial program valuation .∎
Theorem 4.2 can be applied to prove almostsure termination of while loops whose increments are independent, but the value change in one iteration is not bounded. Thus, Theorem 4.2 can handle programs which Theorem 3.1 and Proposition 2 as well as previous supermartingalebased methods cannot.
In the following, we present several examples, showing that Theorem 4.2 can handle sampling variables with unbounded range which previous approaches cannot handle.
Example 6
Consider the program in Example 1 where we let
be a twosided geometric distribution sampling variable such that
and for some . First note that by the approach in [1], we can prove that this program has infinite expected termination time, and thus previous rankingsupermartingale based approach cannot be applied. Also note that the value that may take has no lower bound. This means that we can hardly obtain the almostsure termination by finding a proper supermartingale map that satisfy both the nonnegativity condition and the nonincreasing condition. Now we apply Theorem 4.2. Choose . It follows directly that both (L1) and (L2) hold. Since for symmetric property and where is the standard geometric distribution with parameter , we have (L3) holds. Thus, is a legal linear progress function and this program is almostsure terminating by Theorem 4.2.∎Example 7
Consider the following program with a more complex loop guard.
This program terminates when the point on the plane leaves the area above the parabola by a twodimensional random walk. We suppose that are both positive and