1. Introduction
In this work, we consider expected cost analysis of nondeterminisitic probabilistic programs, and present a sound and efficient approach for a large class of such programs. We start with the description of probabilistic programs and the cost analysis problem, and then present our contributions.
Probabilistic programs.
Extending classical imperative programs with randomization, i.e. generation of random values according to a predefined probability distribution, leads to the class of probabilistic programs
(gordon2014probabilistic). Probabilistic programs are shown to be powerful models for a wide variety of applications, such as analysis of stochastic network protocols (netkat; netkat2; netkat3), machine learning applications
(roy2008stochastic; gordon2013model; scibior2015practical; claret2013bayesian), and robot planning (thrun2000probabilistic; thrun2002probabilistic), to name a few. There are also many probabilistic programming languages (such as Church (goodman2008church), Anglican (anglican) and WebPPL (dippl)) and automated analysis of such programs is an active research area in formal methods and programming languages (see (SriramCAV; pmaf; pldi18; AgrawalC018; ChatterjeeFNH16; ChatterjeeFG16; EsparzaGK12; KaminskiKMO16; kaminski2018hardness)).Nondeterministic programs.
Besides probability, another important modeling concept in programming languages is nondeterminism. A classic example is abstraction: for efficient static analysis of large programs, it is often infeasible to keep track of all variables. Abstraction ignores some variables and replaces them with worstcase behavior, which is modeled by nondeterminism (cousot1977abstract).
Termination and cost analysis.
The most basic liveness question for probabilistic programs is termination. The basic qualitative questions for termination of probabilistic programs, such as, whether the program terminates with probability 1 or whether the expected termination time is bounded, have been widely studied (kaminski2018hardness; ChatterjeeFG16; KaminskiKMO16; ChatterjeeFNH16). However, in program analysis, the more general quantitative task of obtaining precise bounds on resourceusage is a challenging problem that is of significant interest for the following reasons: (a) in applications such as hard realtime systems, guarantees of worstcase behavior are required; and (b) the bounds are useful in early detection of egregious performance problems in large code bases. Works such as (SPEED1; SPEED2; Hoffman1; Hoffman2) provide excellent motivation for the study of automated methods to obtain worstcase bounds for resourceusage of nonprobabilistic programs. The same motivation applies to the class of probabilistic programs as well. Thus, the problem we consider is as follows: given a probabilistic program with costs associated to each execution step, compute bounds on its expected accumulated cost until its termination.
Previous approaches.
While there is a large body of work for qualitative termination analysis problems (see Section LABEL:sec:rel for details), the cost analysis problem has only been considered recently. The most relevant previous work for cost analysis is that of Ngo, Carbonneaux and Hoffmann (pldi18), which considers the stepwise costs to be nonnegative and bounded. While several interesting classes of programs satisfy the above restrictions, there are many natural and important classes of examples that cannot be modeled in this framework. For example, in the analysis of cryptocurrency protocols, such as mining, there are both energy costs (positive costs) and solution rewards (negative costs). Similarly, in the analysis of queuing networks, the cost is proportional to the length of the queues, which might be unbounded. For concrete motivating examples see Section 3.
Our contribution.
In this work, we present a novel approach for synthesis of polynomial bounds on the expected accumulated cost of nondeterministic probabilistic programs.

Our sound framework can handle the following cases: (a) general positive and negative costs, with bounded updates to the variables at every step of the execution; and (b) nonnegative costs with general updates (i.e. unbounded costs and unbounded updates to the variables). In the first case, our approach obtains both upper and lower bounds, whereas in the second case we only obtain upper bounds. In contrast, previous approaches only provide upper bounds for bounded nonnegative costs. A key technical novelty of our approach is an extension of the classical Optional Stopping Theorem (OST) for martingales.

We present a sound algorithmic approach for the synthesis of polynomial bounds. Our algorithm runs in polynomial time. Note that no previous approach provides polynomial runtime guarantee for synthesis of such bounds for nondeterministic probabilistic programs. Our synthesis approach is based on application of results from semialgebraic geometry.

Finally, we present experimental results on a variety of programs, which are motivated from applications such as cryptocurrency protocols, stochastic linear recurrences, and queuing networks, and show that our approach can efficiently obtain tight polynomial resourceusage bounds.
We start with preliminaries (Section 2) and then present a set of motivating examples (Section 3). Then, we provide an overview of the main technical ideas of our approach in Section 4. The following sections each present technical details of one of the steps of our approaches.
2. Preliminaries
In this section, we define some necessary notions from probability theory and probabilistic programs. We also formally define the expected accumulated cost of a program.
2.1. Martingales
We start by reviewing some notions from probability theory. We consider a probability space where is the sample space, is the set of events and is the probability measure.
Random variables.
A random variable is an measurable function , i.e. a function satisfying the condition that for all , the set of all points in the sample space with an value of less than belongs to .
Expectation.
The expected value of a random variable , denoted by , is the Lebesgue integral of wrt . See (williams1991probability) for the formal definition of Lebesgue integration. If the range of is a countable set , then .
Filtrations and stopping times.
A filtration of the probability space is an infinite sequence such that for every , the triple is a probability space and . A stopping time wrt is a random variable such that for every , the event is in . Intuitively, is interpreted as the time at which the stochastic process shows a desired behavior.
Discretetime stochastic processes.
A discretetime stochastic process is a sequence of random variables in . The process is adapted to a filtration , if for all , is a random variable in .
Martingales.
A discretetime stochastic process adapted to a filtration is a martingale (resp. supermartingale, submartingale) if for all , and it holds almost surely (i.e., with probability ) that (resp. , ). See (williams1991probability) for details.
Intuitively, a martingale is a discretetime stochastic process, in which at any time , the expected value in the next step, given all previous values, is equal to the current value . In a supermartingale, this expected value is less than or equal to the current value and a submartingale is defined conversely. Applying martingales for termination analysis is a wellstudied technique (SriramCAV; ChatterjeeFG16; ChatterjeeNZ2017).
2.2. Nondeterministic Probabilistic Programs
We now fix the syntax and semantics of the nondeterministic probabilistic programs we consider in this work.
Syntax.
Our nondeterministic probabilistic programs are imperative programs with the usual conditional and loop structures (i.e. if and while), as well as the following new structures: (a) probabilistic branching statements of the form “if prob” that lead to the then part with probability and to the else part with probability , (b) nondeterministic branching statements of the form “if …” that nondeterministically lead to either the then part or the else part, and (c) statements of the form tick whose execution triggers a cost of . Moreover, the variables in our programs can either be program variables, which act in the usual way, or sampling variables, whose values are randomly sampled from predefined probability distributions each time they are accessed in the program.
Formally, nondeterministic probabilistic programs are generated by the grammar in Figure 1. In this grammar (resp. ) expressions range over program (resp. sampling) variables. For brevity, we omit the else part of the conditional statements if it contains only a single skip. See Appendix LABEL:app:syntax for more details about the syntax.
An example program is given in Figure 2.1(left). Note that the complete specification of the program should also include distributions from which the sampling variables are sampled.
Labels.
We refer to the status of the program counter as a label, and assign labels and to the start and end of the program, respectively. Our label types are as follows:

An assignment label corresponds to an assignment statement indicated by or skip. After its execution, the value of the expression on its right hand side is stored in the variable on its left hand side and control flows to the next statement. A skip assignment does not change the value of any variable.

A branching label corresponds to a conditional statement, i.e. either an “if …” or a “while …”, where is a condition on program variables, and the next statement to be executed depends on whether is satisfied or not.

A probabilistic label corresponds to an “if prob() …” with , and leads to the then branch with probability and the else branch with probability .

A nondeterministic label corresponds to a nondeterministic branching statement indicated by “if …”, and is nondeterministically followed by either the then branch or the else branch.

A tick label corresponds to a statement tick() that triggers a cost of , and leads to the next label. Note that is an arithmetic expression, serving as the stepwise cost function, and can depend on the values of program variables.
Valuations.
Given a set of variables, a valuation over is a function that assigns a value to each variable. We denote the set of all valuations on by .
Control flow graphs (CFGs) (allen1970control).
We define control flow graphs of our programs in the usual way, i.e. a CFG contains one vertex for each label and an edge connects a label to another label , if can possibly be executed right after by the rules above. Formally, a CFG is a tuple
(1) 
where:

and are finite sets of program variables and sampling (randomized) variables, respectively;

is a finite set of labels partitioned into (i) the set of assignment labels, (ii) the set of branching labels, (iii) the set of probabilistic labels, (iv) the set of nondeterministic labels, (v) the set of tick labels, and (vi) a special terminal label corresponding to the end of the program. Note that the start label corresponds to the first statement of the program and is therefore covered in cases (i)–(v).

is a transition relation whose every member is a triple of the form where is the source and is the target of the transition, and is the rule that must be obeyed when the execution goes from to . The rule is either an update function if , which maps values of program and sampling variables before the assignment to the values of program variables after the assignment, or a condition over if , or a real number if , or if , or a cost function if . In the last case, the cost function is specified by the arithmetic expression in tick() and maps the values of program variables to the cost of the tick operation.
Example 2.1 ().
Figure 2.1 provides an example program and its CFG. We assume that the probability distributions for the random variables and are and respectively. In this program, the value of the variable is incremented by the sampling variable , whose value is with probability and with probability . Then, the variable is assigned a random value sampled from the variable , that is with probability and with probability . The tick command then incurs a cost of , i.e. is used as the cost function.
Runs and schedulers.
A(n infinite) run of a program is an infinite sequence of labels and valuations to program variables that respects the rules of the CFG. A scheduler is a policy that chooses the next step, based on the history of the program, when the program reaches a nondeterministic choice. For more formal semantics see Appendix LABEL:app:semantic.
Termination time (HolgerPOPL).
The termination time is a random variable defined on program runs as . We define . Note that is a stopping time on program runs. Intuitively, the termination time of a run is the number of steps it takes for the run to reach the termination label or if it never terminates.
Types of termination (kaminski2018hardness; HolgerPOPL; ChatterjeeFG16).
A program is said to almost surely terminate if it terminates with probability 1 using any scheduler. Similarly, a program is finitely terminating if it has finite expected termination time over all schedulers. Finally, a program has the concentration property or concentratedly terminates if there exist positive constants and such that for sufficiently large , we have for all schedulers, i.e. if the probability that the program takes steps or more decreases exponentially as grows.
Termination analysis of probabilistic programs is a widelystudied topic. For automated approaches, see (ChatterjeeFG16; AgrawalC018; SriramCAV; mciver2017new).
2.3. Expected Accumulated Cost
The main notion we use in cost analysis of nondeterministic probabilistic programs is the expected accumulated cost until program termination. This concept naturally models the total cost of execution of a program in the average case. We now formalize this notion.
Cost of a run.
We define the random variable as the cost at the th step in a run, which is equal to a cost function if the th executed statement is a tick statement and is zero otherwise, i.e. given a run , we define:
Moreover, we define the random variable as the total cost of all steps, i.e. . Note that when the program terminates, the run remains in the state and does not trigger any costs. Hence, represents the total accumulated cost until termination. Given a scheduler and an initial valuation to program variables, we define as the expected value of the random variable over all runs that start with and use for making choices at nondeterministic points.
Definition 2.2 (Expected Accumulated Cost).
Given an initial valuation to program variables, the maximum expected accumulated cost, , is defined as , where ranges over all possible schedulers.
Intuitively, is the maximum expected total cost of the program until termination, i.e. assuming a scheduler that resolves nondeterminism to maximize the total accumulated cost.
In this work, we focus on automated approaches to find polynomial bounds for .
3. Motivating Examples
In this section, we present several motivating examples for the expected cost analysis of nondeterministic probabilistic programs. Previous general approaches for probabilistic programs, such as (pldi18), require the following restrictions: (a) stepwise costs are nonnegative; and (b) stepwise costs are bounded. We present natural examples which do not satisfy the above restrictions. Our examples are as follows:

In Section 3.1, we present an example of Bitcoin mining, where the costs are both positive and negative, but bounded. Then in Section 3.2, we present an example of Bitcoin pool mining, where the costs are both positive and negative, as well as unbounded, but the updates to the variables at each program execution step are bounded.

In Section 3.3, we present an example of queuing networks which also has unbounded costs but bounded updates to the variables.

In Section 3.4, we present an example of stochastic linear recurrences, where the costs are nonnegative but unbounded, and the updates to the variable values are also unbounded.
3.1. Bitcoin Mining
Popular decentralized cryptocurrencies, such as Bitcoin and Ethereum, rely on proofofwork Blockchain protocols to ensure a consensus about ownership of funds and validity of transactions (nakamoto2008bitcoin; vogelstellerethereum). In these protocols, a subset of the nodes of the cryptocurrency network, called miners, repeatedly try to solve a computational puzzle. In Bitcoin, the puzzle is to invert a hash function, i.e. to find a nonce value , such that the SHA256 hash of the state of the Blockchain and the nonce becomes less than a predefined value (nakamoto2008bitcoin). The first miner to find such a nonce is rewarded by a fixed number of bitcoins. If several miners find correct nonces at almost the same time, which happens with very low probability, only one of them will be rewarded and the solutions found by other miners will get discarded (baliga2017understanding).
Given the oneway property of hash functions, the only strategy for a miner is to constantly try randomlygenerated nonces until one of them leads to the desired hash value. Therefore, a miner’s chance of getting the next reward is proportional to her computational power. Bitcoin mining uses considerable electricity and is therefore very costly (de2018bitcoin).
Bitcoin mining can be modeled by the nondeterministic probabilistic program given in Figure 3.1. In this program, a miner starts with an initial balance of and mines as long as he has some money left for the electricity costs. At each step, he generates and checks a series of random nonces. This leads to a cost of for electricity. With probability , one of the generated nonces solves the puzzle. When this happens, with probability the current miner is the only one who has solved the puzzle and receives a reward of units. However, with probability , other miners have also solved the same puzzle in roughly the same time. In this case, whether the miner receives his reward or not is decided by nondeterminism. The values of parameters and can be found experimentally in the real world. Basically, is the cost of electricity for the miner, which depends on location, is the reward for solving the puzzle, which depends on the Bitcoin exchange rate, and and
depend on the total computational power of the Bitcoin network, which can be estimated at any time
(hashrate). In the sequel, we assume .Remark 1 ().
Note that in the example of Figure 3.1, the costs are both positive () and negative (), but bounded by the constants and . Also all updates to the program variable are bounded by .
3.2. Bitcoin Pool Mining
As mentioned earlier, a miner’s chance of solving the puzzle in Bitcoin is proportional to her computational power. Given that the overall computational power of the Bitcoin network is enormous, there is a great deal of variance in miners’ revenues, e.g. a miner might not find a solution for several months or even years, and then suddenly find one and earn a huge reward. To decrease the variance in their revenues, miners often collaborate in
mining pools (rosenfeld2011analysis).A mining pool is created by a manager who guarantees a steady income for all participating miners. This income is proportional to the miner’s computational power. Any miner can join the pool and assign its computational power to solving puzzles for the pool, instead of for himself, i.e. when a puzzle is solved by a miner participating in a pool, the rewards are paid to the pool manager (ChatterjeeErgodic). Pools charge participation fees, so in the long term, the expected income of a participating miner is less than what he is expected to earn by mining on his own.
A pool can be modeled by the probabilistic program in Figure 3.2. The manager starts the pool with identical miners^{1}^{1}1This assumption does not affect the generality of our modeling. If the miners have different computational powers, a more powerful miner can be modeled as a union of several less powerful miners.. At each time step, the manager has to pay each miner a fixed amount . Miners perform the mining as in Figure 3.1. Note that their mining revenue now belongs to the pool manager. Finally, at each time step, a small stochastic change happens in the number of miners, i.e. a miner might choose to leave the pool or a new miner might join the pool. The probability of such changes can also be estimated experimentally. In our example, we have that the number of miners increases by one with probability , decrease by one with probability , and does not change with probability ().
Remark 2 ().
3.3. Queuing Networks
A wellstudied structure for modeling parallel systems is the Fork and Join (FJ) queuing network (alomari2014efficient). An FJ network consists of processors, each with its own dedicated queue (Figure 5). When a job arrives, the network probabilistically divides (forks) it into one or more parts and assigns each part to one of the processors by adding it to the respective queue. Each processor processes the jobs in its queue on a firstinfirstout basis. When all of the parts of a job are processed, the results are joined and the job is completed. The processing time of a job is the amount of time it takes from its arrival until its completion.
FJ networks have been used to model and analyze the efficiency of a wide variety of parallel systems (alomari2014efficient), such as web service applications (menasce2004response), complex network intrusion detection systems (alomari2012autonomic), MapReduce frameworks (dean2008mapreduce), programs running on multicore processors (hill2008amdahl), and health care applications such as diagnosing patients based on test results from several laboratories (almomen2012design).
An FJ network can be modeled as a probabilistic program. For example, the program in Figure 3.3 models a network with processors that accepts jobs for time units. At each unit of time, one unit of work is processed from each queue, and there is a fixed probability that a new job arrives. The network then probabilistically decides to assign the job to the first processor (with probability ) or the second processor (with probability ) or to divide it among them (with probability ). We assume that all jobs are identical and for processor it takes time units to process a job, while processor only takes time units. If the job is divided among them, processor takes units to finish its part and processor takes time unit. The variables and model the length of the queues for each processor, and the program cost models the total processing time of the jobs.
Note that the processing time is computed from the pointofview of the jobs and does not model the actual time spent on each job by the processors, instead it is defined as the amount of time from the moment the job enters the network, until the moment it is completed. Hence, the processing time can be computed as soon as the job is assigned to the processors and is equal to the length of the longest queue.
Remark 3 ().
In the example of Figure 3.3, note that the costs, i.e. and , depend on the length of the queues and are therefore unbounded. However, all updates in program variables are bounded, i.e. a queue size is increased by at most at each step of the program. The maximal update appears in the assignment .
3.4. Stochastic Linear Recurrences
Linear recurrences are systems that consist of a finite set x of variables, together with a finite set of linear update rules. At each step of the system’s execution, one of the rules is chosen and applied to the variables. Formally, if there are variables, then we consider x and each of the
’s to be a vector of length
, and applying the rule corresponds to the assignment . This process continues as long as a condition is satisfied. Linear recurrences are wellstudied and appear in many contexts, e.g. to model linear dynamical systems, in theoretical biology, and in statistical physics (see (joel1; joel2; joelSURVEY)). A classical example is the socalled species fight in ecology.A natural extension of linear recurrences is to consider stochastic linear recurrences, where at each step the rule to be applied is chosen probabilistically. Moreover, the cost of the process at each step is a linear combination of the variables. Hence, a general stochastic linear recurrence is a program in the form shown in Figure 3.4.
We present a concrete instantiation of such a program in the context of species fight. Consider a fight between two types of species, and , where there are a finite number of each type in the initial population. The types compete and might also prey upon each other. The fitness of the types depends on the environment, which evolves stochastically. For example, the environment may represent the temperature, and a type might have an advantage over the other type in warm/cold environment. The cost we model is the amount of resources consumed by the population. Hence, it is a linear combination of the population of each type (i.e. each individual consumes some resources at each time step).
Figure 8 provides an explicit example, in which with probability , the environment becomes hospitable to , which leads to an increase in its population, and assuming that preys on , this leads to a decrease in the population of . On the other hand, the environment might become hostile to , which leads to an increase in ’s population. Moreover, each individual of either type or consumes 1 unit of resource per time unit. We also assume that a population of less than is unsustainable and leads to extinction.
Remark 4 ().
Note that in Figure 8, there are unbounded costs () and unbounded updates to the variables (e.g. ). However, the costs are always nonnegative.
4. Main Ideas and Novelty
In this work, our main contribution is an automated approach for obtaining polynomial bounds on the expected accumulated cost of nondeterministic probabilistic programs. In this section, we present an outline of our main ideas, and a discussion on their novelty in comparison with previous approaches. The key contributions are organized as follows: (a) mathematical foundations; (b) soundness of the approach; and (c) computational results.
4.1. Mathematical Foundations
Martingalebased approach.
The previous approach of (pldi18) can only handle nonnegative bounded costs. Their main technique is to consider potential functions and probabilistic extensions of weakest precondition, which relies on monotonicity. This is the key reason why the costs must be nonnegative. Instead, our approach is based on martingales, and can hence handle both positive and negative costs.
Extension of OST.
A standard mathematical result for the analysis of martingales is the Optional Stopping Theorem (OST). The OST provides a set of conditions on a (super)martingale that are sufficient to ensure bounds on its expected value at a stopping time. One of the requirements of the OST is the socalled bounded difference condition, i.e. that there should exist a constant number , such that the stepwise difference is always less than . In program cost analysis, this condition translates to the requirement that the stepwise cost function at each program point must be bounded by a constant. Unfortunately, it is wellknown that the bounded difference condition in OST is an essential prerequisite, and thus application of classical OST can only handle programs with bounded costs.
We present an extension of the OST that provides certain new conditions for handling differences that are not bounded by a constant, but instead by a polynomial on the step number . Hence, our extended OST can be applied to programs such as the motivating examples in Sections 3.1, 3.2 and 3.3. The details of the OST extension are presented in Section 5.
4.2. Soundness of the Approach
For a sound approach to compute polynomial bounds on expected accumulated cost, we present the following results (details in Section 6):

We define the notions of polynomial upper cost supermartingale (PUCS) and polynomial lower cost submartingale (PLCS) for upper and lower bounds of the expected accumulated cost over probabilistic programs, respectively (see Section 6.1).

For the case where the costs can be both positive and negative (bounded or unbounded), but the variable updates are bounded, we use our extended OST to establish that PUCS’s and PLCS’s provide a sound approach to obtain upper and lower bounds on the expected accumulated cost (see Section 6.2).

For costs that are nonnegative (even with unbounded updates), we show that PUCS’s provide a sound approach to obtain upper bounds on the expected accumulated cost (see Section 6.3). The key mathematical result we use here is the Monotone Convergence theorem. We do not need OST in this case.
4.3. Computational Results
By our definition of PUCS/PLCS, a candidate polynomial
is a PUCS/PLCS for a given program, if it satisfies a number of polynomial inequalities, which can be obtained from the CFG of the program. Hence, we reduce the problem of synthesis of a PUCS/PLCS to solving a system of polynomial inequalities. Such systems can be solved using quantifier elimination, which is computationally expensive. Instead, we present the alternative sound method of using a Positivstellensatz, i.e. a theorem in real semialgebraic geometry that characterizes positive polynomials over a semialgebraic set. In particular, we use Handelman’s Theorem to show that given a nondeterministic probabilistic program, a PUCS/PLCS can be synthesized by solving a linear programming instance of polynomial size (wrt the size of the input program and invariant). Hence, our sound approach for obtaining polynomial bounds on the expected accumulated cost of a program runs in polynomial time. The details are presented in Section
7.4.4. Novelty
The main novelties of our approach are as follows:

In contrast to previous approaches (such as (pldi18)) that can only handle bounded positive costs (due to monotonicity requirements), our approach can handle both positive and negative costs, as well as unbounded costs. In particular, unlike previous approaches, our approach can handle the motivating examples of Section 3. Moreover, our approach presents a novel extension of classical results for martingales.

While the previous approach of (pldi18) could only present sound upper bounds with positive bounded costs, our approach for positive and negative costs, with the restriction of bounded updates to the variables, can provide both upper and lower bounds on the expected accumulated costs. Thus, for the examples of Sections 3.1, 3.2 and 3.3, we obtain both upper and lower bounds.

We present an efficient computational approach for obtaining bounds on the expected accumulated costs. Our algorithm has provable polynomial runtime guarantee. Previous approach of (pldi18) presents compositional inference rules and does not provide any polynomial runtime guarantee for the computation.
4.5. Limitations
We now discuss some limitations of our approach.

As in previous approaches, such as (pldi18; ijcai18), we need to assume that the input program terminates.

For programs with both positive and negative costs, we handle either bounded updates to variables or bounded costs. The most general case, with both unbounded costs and unbounded updates, remains open.

For unbounded updates to variables, we consider nonnegative costs, and present only upper bounds, and not lower bounds. However, note that our approach is the first one to present any lower bounds for cost analysis of probabilistic programs (with bounded updates to variables), and no previous approach can obtain lower bounds in any case.

While the previous approach of (pldi18) presents compositional inference proof rules, our approach cannot obtain such compositional rules. However, the efficiency of our approach comes from the fact that our algorithm is provably polynomialtime and relies on efficient linearprogramming solvers.
5. The Extension of the OST
The Optional Stopping Theorem (OST) states that, given a martingale (resp. supermartingale), if its stepwise difference is bounded, then its expected value at a stopping time is equal to (resp. no greater than) its initial value.
Theorem 5.1 (Optional Stopping Theorem (OST) (williams1991probability; doob1971martingale)).
Consider any stopping time wrt a filtration and any martingale (resp. supermartingale) adapted to and let . Then the following condition is sufficient to ensure that and (resp. ):

There exists an such that for all , almost surely.
It is wellknown that the stepwise bounded difference condition (i.e. ) is an essential prerequisite (williams1991probability). Below we present our extension of OST to unbounded differences.
Theorem 5.2 (The Extended OST).
Consider any stopping time wrt a filtration and any martingale (resp. supermartingale) adapted to and let . Then the following condition is sufficient to ensure that and (resp. ):

There exist real numbers such that (i) for sufficiently large , it holds that and (ii) for all , almost surely.
Intuition and proof idea.
We extend the OST so that the stepwise difference need not be bounded by a constant, but instead by a polynomial in terms of the step counter . However, we require that the stopping time satisfies the concentration condition that specifies an exponential decrease in . We present a rigorous proof that uses Monotone and Dominated Convergence Theorems along with the concentration bounds and polynomial differences to establish the above result. For technical details see Appendix LABEL:app:OST.
6. Polynomial Cost Martingales
In this section, we introduce the notion of polynomial cost martingales, which serve as the main tool for reducing the cost analysis problem over nondeterministic probabilistic programs to the analysis of a stochastic process.
6.1. Definitions
Below, we fix a probabilistic program and its CFG of form (1). In order to apply our extended OST for cost analysis of the program, it should first be translated into a discretetime stochastic process. This is achieved using the concept of cost martingales. To define cost martingales, we first need the notions of invariants and preexpectation.
Definition 6.1 (Invariants and linear invariants).
Given a program, its set of labels, and an initial valuation to program variables , an invariant is a function that assigns a set of valuations over to every label , such that for all configurations that are reachable from the initial configuration by a run of the program, it holds that . The invariant is called linear if every is a finite union of polyhedra.
Intuition.
An invariant is an overapproximation of the reachable valuations at each label of the program. The invariant is linear if it can be represented by linear inequalities.
Example 6.2 ().
Definition 6.3 (Preexpectation).
Consider any function . We define its preexpectation as the function by:

if is the terminal label;

if is an assignment label with the update function , and the next label is . Note that in the expectation , the values of and are treated as constants and observes the probability distributions specified for the sampling variables;

if is a branching label and are the labels for the truebranch and the falsebranch, respectively. The indicator is equal to when satisfies and otherwise. Conversely, is when does not satisfy and when it does;

if is a probabilistic label;

if is a tick label with the cost function and the successor label ;

if is a nondeterministic label.
Intuition.
The preexpectation is the cost of the current step plus the expected value of in the next step of the program execution, i.e. the step after the configuration . In this expectation, and are treated as constants. For example, the preexpectation at a probabilistic branching label is the averaged sum over the values of at all possible successor labels.
Example 6.4 ().
In Figure 6.4 (top) we consider the same program as in Example 2.1. Recall that the probability distributions used for sampling variables and are and , respectively. The table in Figure 6.4 (bottom) provides an example function and the corresponding preexpectation . The gray part shows the steps in computing the function and the black part is the final result^{2}^{2}2The reason for choosing this particular will be clarified by Example 6.6..
We now define the central notion of cost martingales. For algorithmic purposes, we only consider polynomial cost martingales in this work. We start with the notion of PUCS which is meant to serve as an upperbound for the expected accumulated cost of a program.
Definition 6.5 (Polynomial Upper Cost Supermartingales).
A polynomial upper cost supermartingale (PUCS) of degree wrt a given linear invariant is a function that satisfies the following conditions:

for each label , is a polynomial of degree at most over program variables;

for all valuations , we have ;

for all nonterminal labels and reachable valuations , we have .
Intuition.
Informally, (C1) specifies that the PUCS should be polynomial at each label, (C2) says that the value of the PUCS at the terminal label should always be zero, and (C3) specifies that at all reachable configurations , the preexpectation is no more than the value of the PUCS itself.
Note that if is polynomial in program variables, then is also polynomial if is an assignment, probabilistic branching or tick label. For example, in the case of assignment labels, is polynomial in if both and are polynomial.
Example 6.6 ().
We now define the counterpart of PUCS for lower bound.
Definition 6.7 (Polynomial Lower Cost Submartingales).
A polynomial lower cost submartingale (PLCS) wrt a linear invariant is a function that satisfies (C1) and (C2) above, and the additional condition (C3’) below (instead of (C3)):

for all nonterminal labels and reachable valuations , we have ;
Intuitively, a PUCS requires the preexpectation to be no more than itself, while a PLCS requires the converse, i.e. that should be no less than .
Example 6.8 ().
In the following sections, we prove that PUCS’s and PLCS’s are sound methods for obtaining upper and lower bounds on the expected accumulated cost of a program.
6.2. General Unbounded Costs and Bounded Updates
In this section, we consider nondeterministic probabilistic programs with general unbounded costs, i.e. both positive and negative costs, and bounded updates to the program variables. Using our extension of the OST (Theorem 5.2), we show that PUCS’s and PLCS’s are sound for deriving upper and lower bounds for the expected accumulated cost.
Recall that the extended OST has two prerequisites. One is that, for sufficiently large , the stopping time should have exponentially decreasing probability of nontermination, i.e. . The other is that the stepwise difference should be bounded by a polynomial on the number of steps. We first describe how these conditions affect the type of programs that can be considered, and then provide our formal soundness theorems.
The first prerequisite is equivalent to the assumption that the program has the concentration property. To ensure the first prerequisite, we apply the existing approach of differencebounded rankingsupermartingale maps (ChatterjeeFNH16; ChatterjeeFG16). We ensure the second prerequisite by assuming the bounded update condition, i.e. that every assignment to each program variable changes the value of the variable by a bounded amount. We first formalize the concept of bounded update and then argue why it is sufficient to ensure the second prerequisite.
Definition 6.9 (Bounded Update).
A program with invariant has the bounded update property over its program variables, if there exists a constant such that for every assignment label with update function , we have .
The reason for assuming bounded update.
A consequence of the bounded update condition is that at the th execution step of any run of the program, the absolute value of any program variable is bounded by , where is the constant bound in the definition above and is the initial value of the variable . Hence, for large enough , the absolute value of any variable is bounded by . Therefore, given a PUCS of degree , one can verify that the stepwise difference of is bounded by a polynomial on the number of steps. More concretely, is a degree polynomial over variables that are bounded by , so is bounded by for some constant . Thus, the bounded update condition is sufficient to fulfill the second prerequisite of our extended OST.
Based on the discussion above, we have the following soundness theorems:
Theorem 6.10 (Soundness of PUCS).
Consider a nondeterministic probabilistic program , with a linear invariant and a PUCS . If satisfies the concentration property and the bounded update property, then for all initial valuations .
Proof Sketch.
We define the stochastic process as , where is the random variable representing the label at the th step of a program run, and is a vector of random variables consisting of components which represent values of program variables at the th step. Furthermore, we construct the stochastic process such that . Recall that is the cost of the th step of the run and . We consider the termination time of and prove that satisfies the prerequisites of our extended OST (Theorem 5.2). This proof depends on the assumption that has concentration and bounded update properties. Then by applying Theorem 5.2, we have that . Since , we obtain the desired result. For a more detailed proof, see Appendix LABEL:app:PUPFs. ∎
Example 6.11 ().
Given that the in Example 6.4 is a PUCS, we can conclude that for all initial values and , we have .
We showed that PUCS’s are sound upper bounds for the expected accumulated cost of a program. The following theorem provides a similar result for PLCS’s and lower bounds.
Theorem 6.12 (Soundness of PLCS).
Consider a nondeterministic probabilistic program , with a linear invariant and a PLCS . If satisfies the concentration property and the bounded update property, then for all initial valuations .
The proof is similar to that of Theorem 6.10 and is relegated to Appendix LABEL:app:PLPFs.
Example 6.13 ().
Given that the in Example 6.4 is a PLCS, we can conclude that for all initial values and , we have .
Remark 5 ().
Remark 6 ().
Note that the motivating examples in Sections 3.1, 3.2 and 3.3, i.e. Bitcoin mining, Bitcoin pool mining and FJ queuing networks, have potentially unbounded costs that can be both positive and negative. Moreover, they satisfy the bounded update property. Therefore, using PUCS’s and PLCS’s leads to sound bounds on the expected accumulated costs of these programs.
6.3. Unbounded Nonnegative Costs and General Updates
In this section, we consider programs with unbounded nonnegative costs, and show that a PUCS is a sound upper bound for their expected accumulated cost. This result holds for programs with arbitrary unbounded updates to the variables.
Our main tool is the wellknown Monotone Convergence Theorem (MCT) (williams1991probability), which states that if is a random variable and is a nondecreasing discretetime stochastic process such that almost surely, then .
As in the previous case, the first step is to translate the program to a stochastic process. However, in contrast with the previous case, in this case we only consider nonnegative PUCS’s. This is because all costs are assumed to be nonnegative. We present the following soundness result:
Theorem 6.14 (Soundness of nonnegative PUCS).
Consider a nondeterministic probabilistic program , with a linear invariant and a nonnegative PUCS . If all the stepwise costs in are always nonnegative, then for all initial valuations .
Proof Sketch.
We define the stochastic process as in Theorem 6.12, i.e. . By definition, for all , we have , hence by induction, we get . Given that is nonnegative, , so . By applying the MCT, we obtain , which is the desired result. For a more detailed proof, see Appendix LABEL:app:PUPFs2. ∎
Remark 7 ().
Note that the motivating example in Section 3.4, i.e. the species fight stochastic linear recurrence, has unbounded nonnegative costs. Therefore, nonnegative PUCS’s lead to sound upper bounds on the expected accumulated cost of this program.
7. Algorithmic Approach
In the previous section, we showed that in order to derive bounds for the expected accumulated cost of a program, it suffices to synthesize a PUCS/PLCS. In this section, we provide automated algorithms that, given a program , an initial valuation , a linear invariant and a constant , synthesize a PUCS/PLCS of degree . For brevity, we only describe our algorithm for PUCS synthesis. A PLCS can be synthesized in the same manner. Our algorithms run in polynomial time and reduce the problem of PUCS/PLCS synthesis to a linear programming instance by applying Handelman’s theorem.
In order to present Handelman’s theorem, we need a few basic definitions. Let be a finite set of variables and a finite set of linear functions (degree polynomials) over . We define as the set of all valuations to the variables in that satisfy for all . We also define the monoid set of as
By definition, it is obvious that if , then for every , we have . Handelman’s theorem characterizes every polynomial that is positive over .
Theorem 7.1 (Handelman’s Theorem (handelman1988representing)).
Let be a polynomial such that for all . If is compact, then
for some , and .
Intuitively, Handelman’s theorem asserts that every polynomial that is positive over must be a positive linear combination of polynomials in . This means that in order to synthesize a polynomial that is positive over we can limit our attention to polynomials of the form . When using Handelman’s theorem in our algorithm, we fix a constant and only consider those elements of that are obtained by multiplicands or less.
We now have all the required tools to describe our algorithm for synthesizing a PUCS.
PUCS Synthesis Algorithm.
The algorithm has four steps:

Creating a Template for . Let be the set of program variables. According to (C1), we aim to synthesize a PUCS , such that for each label of the program, is a polynomial of degree at most over . Let be the set of all monomials of degree at most over the variables . Then, has to be of the form for some unknown real values . We call this expression a template for . Note that by condition (C2) the template for is simply . The algorithm computes these templates at every label , treating the ’s as unknown variables.

Computing Preexpectation. The algorithm symbolically computes a template for using Definition 6.3 and the template obtained for in step (1). This template will also contain ’s as unknown variables.

Pattern Extraction. The algorithm then processes condition (C3) by symbolically computing polynomials for every label . Then, as in Handelman’s theorem, it rewrites each on the lefthandside of the equations above in the form , using the linear invariant as the set of linear functions. The nonnegativity of is handled in a similar way. This effectively translates (C3) and the nonnegativity into a system of linear equalities over the ’s and the new nonnegative unknown variables resulting from equation .

Solution via Linear Programming. The algorithm calls an LPsolver to find a solution of that optimizes .
If the algorithm is successful, i.e. if the obtained system of linear equalities is feasible, then the solution to the LP contains values for the unknowns and hence, we get the coefficients of the PUCS . Note that we are optimizing for , so the obtained PUCS is the one that produces the best polynomial upper bound for the expected accumulated cost of with initial valuation . We use the same algorithm for PLCS synthesis, except that we replace (C3) with (C3’).
Theorem 7.2 ().
The algorithm above has polynomial runtime and synthesizes sound upper and lower bounds for the expected accumulated cost of the given program .
Proof.
Step (1) ensures that (C1), (C2) are satisfied, while step (3) forces the polynomials and ’s to be nonnegative, ensuring nonnegativity and (C3). So the synthesized is a PUCS. Steps (1)–(3) are polynomialtime symbolic computations. Step (4) solves an LP of polynomial size. Hence, the runtime is polynomial wrt the length of the program. The reasoning for PLCS synthesis is similar. ∎
4. Main Ideas and Novelty
In this work, our main contribution is an automated approach for obtaining polynomial bounds on the expected accumulated cost of nondeterministic probabilistic programs. In this section, we present an outline of our main ideas, and a discussion on their novelty in comparison with previous approaches. The key contributions are organized as follows: (a) mathematical foundations; (b) soundness of the approach; and (c) computational results.
4.1. Mathematical Foundations
Martingalebased approach.
The previous approach of (pldi18) can only handle nonnegative bounded costs. Their main technique is to consider potential functions and probabilistic extensions of weakest precondition, which relies on monotonicity. This is the key reason why the costs must be nonnegative. Instead, our approach is based on martingales, and can hence handle both positive and negative costs.
Extension of OST.
A standard mathematical result for the analysis of martingales is the Optional Stopping Theorem (OST). The OST provides a set of conditions on a (super)martingale that are sufficient to ensure bounds on its expected value at a stopping time. One of the requirements of the OST is the socalled bounded difference condition, i.e. that there should exist a constant number , such that the stepwise difference is always less than . In program cost analysis, this condition translates to the requirement that the stepwise cost function at each program point must be bounded by a constant. Unfortunately, it is wellknown that the bounded difference condition in OST is an essential prerequisite, and thus application of classical OST can only handle programs with bounded costs.
We present an extension of the OST that provides certain new conditions for handling differences that are not bounded by a constant, but instead by a polynomial on the step number . Hence, our extended OST can be applied to programs such as the motivating examples in Sections 3.1, 3.2 and 3.3. The details of the OST extension are presented in Section