1 Introduction
Continuity properties of systems. Continuity property for systems requires that the change in the output is bounded by a monotone function of the change in the input. Analysis of continuity properties are of great interest in program analysis and reactive system analysis, for example: (a) robustness of numerical computations; (b) analysis of sensitivity of numerical queries [22] in databases; (c) analysis of stability of learning algorithms [11]; and (d) robustness analysis of programs [19].
Probabilistic systems. The continuity analysis is similarly relevant for probabilistic systems, where the notion of continuity is extended to expected continuity to average over the probabilistic behaviours of the system. For example, statistical notions of differential privacy [21]
; robustness analysis of Markov chains, Markov decision processes, and stochastic games
[2, 24, 12, 20, 14]; stability analysis of randomized learning algorithms [11, 27]; all fall under the umbrella of continuity analysis of probabilistic systems.Program sensitivity. The notion of particular interest among continuity is program sensitivity (aka Lipschitz continuity) which specifies that the change in the output is proportional to the change in the input. Formally, there is a constant (Lipschitz constant) such that if the input changes by amount , then the change in the ouput is at most . In this work we consider the expected sensitivity of probabilistic programs given as probabilistic while loops.
Previous results. The expected sensitivity analysis of probabilistic programs was considered in [5], where an elegant method is presented based on relational program logic framework. The heart of the analysis technique is couplingbased methods, and the approach is shown to work very nicely on several examples from learning to statistical physics. However, the approach works on examples of probabilistic while loops, where the number of iterations of the loop is fixed and bounded (i.e., the number of iterations is fixed to a given number ). Many examples of probabilistic while loops does not have fixed number of iterations, rather the number of iterations is randomized and the expected number of iterations is finite. In this work we consider sensitivity analysis of probabilistic programs where the number of iterations is not fixed, rather the number of iterations is stochastic and depends on probabilistic sampling variables.
Our contributions. Our main contributions are as follows:

We present a sound approach for sensitivity analysis of probabilistic while loops where the number of iterations is not fixed, rather the expected number of iterations is finite.

In contrast to previous couplingbased approach, our approach is based on ranking supermartingales (RSMs) and continuity property of the loop body. We first present the results for nonexpansive loops, which corresponds to the case where the Lipschitz constant is 1, and then present the result for general loops.

Since RSMs based approaches have been shown to be automated based on constraint solving, the same results in conjunction with our sound approach presents an automated approach for sensitivity analysis of probabilistic programs. We demonstrate the effectiveness of our approach on several examples, such as, a case study of the regularized stochastic gradient descent algorithm and experimental results on several examples such as miniroulette.
Technical contribution. In terms of technical contribution there are key differences between our approach and the previous approach [5]. The advantage of [5] is that they present a compositional approach based on probabilistic coupling, however, it leads to complex proof rules with side conditions. Moreover, the approach is interactive in the sense that a coupling is provided manually and then the proof is automated. In contrast, our approach is based on RSMs and continuity property, and it leads to an automated approach, as well as can handle stochastic iterations. However, it does not have the compositional properties of [5].
2 Probabilistic Programs
We present the syntax and semantics of our probabilistic programming language as follows. Throughout the paper, we denote by , , and the sets of all natural numbers (including zero), integers, and real numbers, respectively.
Syntax. Our probabilistic programming language is imperative and composed of statements. We present a succinct description below (see Appendix 0.A for details).

Variables. Expressions (resp. ) range over program (resp. sampling) variables.

Constants. Expressions range over decimals.

Arithmetic Expressions. Expressions (resp. ) range over arithmetic expressions over both program and sampling variables (resp. program variables). For example, if are program variables and is a sampling variable, then is an instance of and is an instance of . In this paper, we do not fix the syntax for and .

Boolean Expressions. Expressions range over propositional arithmetic predicates over program variables.

Statements . Assignment statements are indicated by ‘’; ‘skip’ is the statement that does nothing; Standard conditional branches are indicated by the keyword ‘if’ accompanied with a propositional arithmetic predicate serving as the condition for the branch. Whileloops are indicated by the keyword ‘while’ with a propositional arithmetic predicate as the loop guard. Probabilistic choices are model as probabilistic branches with the key word “if prob” that lead to the then
branch with probability
and to the elsebranch with probability .
In this work, we consider probabilistic programs without nondeterminism.
Semantics. We first recall several standard notions from probability spaces as follows (see e.g. standard textbooks [37, 9]).
Probability Spaces. A probability space is a triple , where is a nonempty set (socalled sample space), is a algebra over (i.e., a collection of subsets of that contains the empty set and is closed under complementation and countable union), and is a probability measure on , i.e., a function such that (i) and (ii) for all setsequences that are pairwisedisjoint (i.e., whenever ) it holds that . Elements are usually called events. An event is said to hold almost surely (a.s.) if .
Random Variables. A random variable (r.v.) on a probability space is an measurable function , i.e., a function satisfying the condition that for all , the set belongs to . By convention, we abbreviate as .
Expectation. The expected value of a random variable on a probability space , denoted by , is defined as the Lebesgue integral of w.r.t , i.e., ; the precise definition of Lebesgue integral is somewhat technical and is omitted here (cf. [37, Chapter 5] for a formal definition). In the case that is countable with distinct ’s, we have that .
To present the semantics, we also need the notion of valuations.
Valuations. Let be a finite set of variables with an implicit linear order over its elements. A valuation on
is a vector
in such that for each , the th coordinate of , denoted by , is the value for the th variable in the implicit linear order on . For the sake of convenience, we write for the value of a variable in a valuation .Program and Sampling Valuations. A program valuation is a valuation on the set of program variables . Similarly, a sampling valuation is a valuation on the set of sampling variables . Given a program valuation and a propositional arithmetic predicate , the satisfaction relation is defined in the standard way so that we have iff holds when program variables in are substituted by their corresponding values in .
The Semantics. Now we give a brief description of the semantics for probabilistic programs. We follow the standard operational semantics through Markov chains (see e.g. [17, 13, 25]). Given a probabilistic program without nondeterminism, its semantics is given as a general statespace Markov chain (GSSMC) [33, Chapter 3] with possibly an uncountable state space, where the state space consists of all pairs of program counters and program valuations for which the program counter refers to the next statement to be executed and the program valuation specifies the current values for program variables, and the kernel function that specifies the stochastic transitions between states is given by the statements in the program. For an initial state where is the initial program counter and is an initial program valuation, each probabilistic program induces a unique probability space through its corresponding GSSMC where the sample space consists of all infinite sequences of states in the GSSMC (as runs), the sigmaalgebra is generated by all cylinder sets of runs with a common finite prefix, and the probability measure is uniquely determined by the kernel function and the initial state. We denote by the probability measure for a probabilistic program with the initial state , and by the expectation under the probability measure . The detailed semantics can be found in [17, 13, 25].
3 Expected Sensitivity
Compared with the couplingbased definitions for expected sensitivity in [5], we consider average sensitivity
that directly compare the distance between the expected values from two closeby initial program valuations. Average sensitivity can be used to model algorithmic stability in many machinelearning algorithms (see e.g.
[11]). In this paper, we focus on average sensitivity and will simply refer to average sensitivity as expected sensitivity.The following definition illustrates our definition of expected sensitivity. In order to have the notion welldefined, we only consider probabilistic programs that terminate with probability one (i.e., almostsure termination [17, 13, 28, 32]) for all initial program valuations.
Definition 1 (Expected Sensitivity)
Consider a probabilistic program that terminates with probability one for all initial program valuations. We say that the program is expected affinesensitive in a program variable over a set of initial program valuations if there exist nonnegative real constants such that for any initial program valuations with sufficiently small , we have that
(1) 
where are random variables representing the values of after the execution of the program starting from the initial valuations respectively, and the maxnorm is defined by . Furthermore, if we can choose in (1), then we say that the program is expected linearsensitive in the program variable .
Thus, a program is expected affinesensitive in a program variable if the difference of the expected value of after the termination of is bounded by an affine function in the difference of the initial valuations. The program is expected linearsensitive if the difference can be bounded by a linear function.
Remark 1 (Comparison with [5])
Our definition is looser than the couplingbased definition in [5], as it is shown in [5] that the couplingbased expected sensitivity implies our notion of expected sensitivity. However, the converse does not hold. This is because when one chooses the same program and the same initial program valuations , the expected difference in our sense is zero, while the expected difference in [5] would still be greater than zero as the couplingbased definition focuses on the difference over the product of two independent runs, which may be nonzero as a probabilistic program typically has multiple probabilistic executions. Moreover, there is no couplingbased expected linearsensitivity defined in [5], whereas we study expected linearsensitivity directly from its definition.
4 Motivating Examples
In the following, we show several motivating examples for expectedsensitivity analysis of probabilistic programs. We consider in particular probabilistic programs with a randomized number of execution steps. As existing results [5, 29] only consider probabilistic forloops with a fixed number of loop iterations, none of the examples in this section can be handled by previous approaches.
Example 1 (Miniroulette)
A particular gambler’sruin game is called Miniroulette, which is a popular casino game based on a 13slot wheel. A player starts the game with chips. She needs one chip to make a bet and she bets as long as she has chips. If she loses a bet, the chip will not be returned, but a winning bet will not consume the chip and results in a specific amount of (monetary) reward, and possibly even more chips. The following types of bets can be placed at each round. (1) Evenmoney bets: In these bets, specific slots are chosen. Then the ball is rolled and the player wins the bet if it lands in one of the slots. So the player has a winning probability of . Winning them gives a reward of two unit and one extra chip. (2) 2to1 bets: These bets correspond to chosen slots and winning them gives a reward of and extra chips. (3) 3to1, 5to1 and 11to1 bets: These are defined similarly and have winning probabilities of , and respectively. Suppose at each round, the player chooses each type of bets probabilistically equally. The probabilistic program for this example is shown in Figure 1(left), where the program variable represents the number of chips and the program variable records the accumulated rewards. We also consider a continuous variant of this Miniroulette example in Figure 1(right), where we replace increments to the variable
by uniformlydistributed sampling variables
and one may choose , , , , , or other uniform distributions that ensure the termination of the program. In both the examples, we consider the expected sensitivity in the program variable that records the accumulated reward. Note that the number of loop iterations in all the programs in Figure 1 is randomized and not fixed as the loop guard is and the increment of is random in each loop iteration.Example 2 (Stochastic Gradient Descent)
The most widely used method in machine learning practice is stochastic gradient descent (SGD). The general form of an SGD algorithm is depicted in Figure LABEL:fig:example4. In the figure, an SGD algorithm is modelled as a probabilistic while loop, where is a sampling variable whose value is sampled uniformly from , is a vector of program variables that represents parameters to be learned, is a program variable that represents the index, and is a constant that represents the step size. The symbol represents the gradient, while each (
) is the loss function for the
th input data. By convention, the total loss function is given as the expectation of the sum of all ’s, i.e., . At each loop iteration, a data is chosen uniformly from all data and the parameters in are adjusted by the step size multiplies the gradient of the th loss function . The loop guard can either be practical so that a fixed number of iterations is performed (as is analyzed in existing approaches [27, 5, 29] for expected sensitivity), or the local criteria that the magnitude () of the gradient of the total loss function is small enough, or the global criteria that the value of is small enough.We consider a regularized version of the SGD algorithm. The idea is to add a penalty term to the loss function to control the complexity of the learned parameters in , so as to make the learned parameters more reasonable and avoid overfitting [10]. A scheme for a regularized SGD is depicted in Figure LABEL:fig:sgdreg. In the figure, the main difference is that in each loop iteration, the parameters in are adjusted further by the term (). This adjustment results from adding the penalty term to the total loss function , so that its gradient is contributed to the learning of in every loop iteration.
To be more concrete, we consider applying the
regularized SGD algorithm to the linearregression problem. The problem is that given a fixed number of
input coordinates on the plane (as data), the task is to find the parameters such that the line equation best fits the input coordinates. The optimality is measured by the loss function , where is the penalty term and is the loss function for the th input coordinate. By letting , the regularized SGD algorithm in Figure LABEL:fig:sgdreg can be applied directly to solve the linearregression problem. While previous results [27, 5, 29] consider SGD with a fixed number of loop iterations, we choose the loop guard in Figure LABEL:fig:sgdreg to be the global condition where is the threshold below which the magnitude of is small enough. Then we consider the expected sensitivity of the regularized SGD for the linearregression problem w.r.t the initial parameters . Note that the regularized SGD algorithm in our setting does not have a fix number of loop iterations, thus cannot be analyzed by the previous approaches [27, 5, 29].5 Proving Expected Sensitivity for Nonexpansive Loops
We consider simple probabilistic while loops and investigate sound approaches for proving expected sensitivity over such programs. A simple probabilistic while loop is of the form
(2) 
where is the loop guard that is a propositional arithmetic predicate over program variables, and the loop body is a statement without whileloops.
Update Functions. Given a simple probabilistic while loop in the form (2) with the disjoint sets and of program and sampling variables, we can abstract away detailed executions of the loop body by an update function as follows. First, we let be the set of all program counters that refer to a probabilistic branch (i.e., ) in the loop body of . Then we define to be the set of all functions from into the set ; informally, a function in specifies for each probabilistic branch in which branch (i.e., either the then or the elsebranch) is chosen in a real loop iteration. Finally, the update function simply gives the program valuation after the loop iteration given (i) a function in that specifies the probabilistic choices at probabilistic branches, (ii) a program valuation that specifies the values for program variables before the loop iteration and (iii) a sampling valuation that gives all the sampled values for the sampling variables in the loop iteration.
Runs. We also simplify the notion of runs over programs in the form (2). A run for a probabilistic loop in the form (2) is an infinite sequence of program valuations such that is the program valuation before the th execution of the loop body. Note that if , then where (resp. ) specifies all the probabilistic branches (the sample values for sampling variables); otherwise, .
Notations. To ease the use of notations, we always use for a program valuation, for a sampling valuation and for an element in , with super/subscripts.
We consider simple probabilistic while loops whose loop body satisfies a continuity property below. We note that the continuity of the loop body is a natural requirement in ensuring the sensitivity of the whole loop.
Definition 2 (Continuity of the Loop Body )
We say that the loop body of a loop in the form (2) is continuous if there is a real constant such that

.
If we can choose , then we say that the loop is nonexpansive.
Below we illustrate a running example.
Example 3 (Running Example)
Consider the simple probabilistic while loop in Figure 4 on Page 4. In the program, is a program variable and is a sampling variable. Informally, in every loop iteration, the value of
is increased by a sampled value w.r.t the probability distribution of
until the value of is greater than . There is no probabilistic branch so is a singleton set that only contains the empty function. The update function for the loop body is then given by for program valuation , sampling valuation and the only element . By definition, the loop is nonexpansive.5.1 Proving Expected Affine Sensitivity
We demonstrate our sound approach for proving expected affine sensitivity over nonexpansive loops. By definition, if we have that a simple probabilistic while loop is nonexpansive and has good termination property, then it is intuitive that a fixed number of executions of its loop body has the expected affinesensitivity. We show that this intuition is true, if we use rankingsupermartingale maps (RSMmaps) to ensure the termination property of probabilistic programs. Below we fix a simple probabilistic while loop in the form (2) with the update function , loop guard and loop body . As we consider a simple class of whileloops, we also consider a simplified version of RSMmaps that uses the update function to abstract away the detailed execution within the loop body.
Definition 3 (RSMmaps [17, 13, 25])
A rankingsupermartingale map (RSMmap) is a function such that there exist real numbers satisfying the following conditions:

;

;

,
where is the expected value of such that is treated as a constant vector and (resp. ) observes the (joint) probability distributions of sampling variables (resp. the probabilities of the probabilistic branches).
The existence of an RSMmap can prove finite expected termination time of a probabilistic programs [17, 25] (see Theorem 0.C.1 in Appendix 0.C). In this sense, an RSMmap controls the randomized number of loop iterations, so that the sensitivity of the whole loop follows nearly from the continuity of the loop body. However, there is another phenomenon to take into account, i.e., the situation that executions from different initial program valuations may have different number of loop iterations. In this situation, we need to ensure that when the execution from one initial program valuation terminates and the execution from the other does not, the final values from the other execution should not be far from the values of the terminated execution. To ensure this property, we require (i) a boundedupdate condition that the valuechange of a program variable in one loop iteration is bounded, and (ii) an RSMmap that is continuous.
Definition 4 (Bounded Update )
We say that the loop has bounded update in a program variable if there exists a real constant such that

.
Definition 5 (Continuity of RSMmaps )
An RSMmap is continuous if there exists a real constant such that

.
Now we demonstrate the main result for proving expected affinesensitivity of nonexpansive simple probabilistic while loops.
Theorem 5.1
A nonexpansive probabilistic loop in the form (2) is expected affinesensitive in a program variable over its loop guard if we have (i) has bounded update in the program variable and (ii) there exists a continuous RSMmap for .
Proof (Sketch)
Let be a bound for from Definition 4, and be a continuous RSMmaps with from Definition 3 and the constant from Definition 5. Consider any initial program valuations such that . Denote . As we consider independent executions of the loop from , we use to denote the random variable for the number of loop iterations for for the executions starting from an initial program valuation . We also use to denote the random variable for the value of after the execution of from . We illustrate the main idea through clarifying the relationships between program valuations in any runs , that start from respectively and use the same sampled values in each loop iteration. Consider that the event holds (i.e., both the executions do not terminate before the th step). We have the following cases:

Both and violate the loop guard , i.e., . This case describes that the loop terminates exactly after the th iteration of the loop for both the initial valuations. From the condition (B1), we obtain directly that .

Exactly one of violates the loop guard . W.l.o.g., we assume that and . From the property of RSMmaps (Theorem 0.C.1 in Appendix 0.C), we have that . Then from , we obtain . From the condition (B1), we have . Then by the condition (B3), we have . Furthermore, from the condition (A3) we have . Then we obtain that . Hence we have

Neither nor violates the loop guard . In this case, the loop will continue from both and . Then in the next iteration, the same analysis can be carried out for the next program valuations .
From the termination property ensured by RSMmaps (Theorem 0.C.1 in Appendix 0.C), the probability that the third case happens infinitely often equals zero. Thus, the sensitivity analysis eventually reduces to the first two cases, and the first two cases derives the expected affinesensitivity. From the first two cases, the difference contributed to the total sensitivity when one of the runs terminates at a step is at most
Then by an integral expansion and a summation for all , we can derive the desired result. The detailed proof is put in Appendix 0.C. ∎
Example 4 (Running Example)
Consider our running example in Figure 4 on Page 4. Here we choose that the sampling variable
observes the Bernoulli distribution such that
. We can construct an RSMmap with . The RSMmap is also continuous with . Hence from Theorem 5.1, we conclude that the loop is expected affine sensitive in the program variable over its loop guard.Example 5 (Miniroulette)
We now show that the Miniroulette example in Figure 1(left) is expected affinesensitive in the program variable over its loop guard. To show this, we construct the function with . We also clarify the following points.

For any values to the program variable before a loop iteration and any that resolves the probabilistic branches, we have that after the loop iteration where the value of is determined by the probabilistic branch (i.e for branch , ). The same applies to the program variable . Thus the loop is nonexpansive.

All increments to is bounded, hence the loop has bounded update in .

The loop guard implies , thus (A1) is satisfied. When , and , we have , ensuring (A2). When , , we have , ensuring (A3). Thus is an RSMmap.

Given any values and to the program variables , we have . Thus is continuous.
Thus by Theorem 5.1, we obtain that the program is expected affinesensitive in over its loop guard.
5.2 Proving Expected LinearSensitivity
To prove expected linearsensitivity, one possible way is to extend the approach for expected affinesensitivity. However, simply extending the approach is not correct, as is shown by the following example.
Example 6
Consider again our running example in Figure 4, where the sampling variable observes the distribution such that . From Example 4, we have the the program is expected affinesensitive in the program variable . However, we show that the program is not expected linearsensitive in . Consider initial inputs and (), since we only add or to the value of , we have that the output value under the input must be , while the output value under equals . It follows that we could not find a constant such that when .
The reason why we have such a situation in Example 6 has been illustrated previously: under different initial program valuations, it may happen that the loop may terminate in different number of loop iterations, even if we require that the same values for the sampling variables are sampled. In order to handle this phenomenon, we introduce a condition that bounds the probability that this phenomenon happens.
Definition 6 (ContinuityUponTermination)
We say that the loop body of a probabilistic while loop in the form (2) is continuous upon termination if there exists a constant such that
where is the probability regarding the sampled values that given the program valuations before the loop iteration being and the probabilistic branches resolved by , after one loop iteration we have can still enter the loop, while violates the loop guard and the loop stops.
Informally, the continuityupontermination property requires that when the initial program valuations to the loop body are close, the probability that after the current loop iteration one of them stays in the loop while the other jumps out of the loop is also close. This condition is satisfied in many situations where we have continuouslydistributed sampling variables. For example, consider our running example (Example 3) where now observes the uniform distribution over the interval . Then for any initial values for the program variable , the probability that but equals the chance that the sampled value of falls in , which is no greater than
as the probability density function of
is over the interval . Thus, the continuityupontermination property is satisfied.To show the applicability of this property, we prove a result showing that a large class of simple affine probabilistic while loops has this property. Below we say that a propositional arithmetic predicate is affine if can be equivalently rewritten into a DNF normal form with constant matrices and constant vectors so that for all program valuations , we have iff the statement holds.
Lemma 1
Given a probabilistic while loop in the form (2) with the loop body , the loop guard and the update function , if it holds that

is an affine function in the input program and sampling valuation and is affine that is equivalent to a DNF form , and

all the sampling variables are continuouslydistributed and have bounded probability density function, and

the sampling variables are not dummy in all the atomic propositions of the loop guard , i.e., for all , and program valuations we have the coefficients over sampling variables in are not all zero in all rows,
then the loop has the continuityupontermination property.
The continuityupontermination is a key property to ensure expected linear sensitivity. With this property, we can then bound in the linear way the probability that the executions from different initial program valuations have different number of loop iterations by the difference between initial program valuations. The main result for expected linearsensitivity is as follows.
Theorem 5.2
A nonexpansive probabilistic loop in the form (2) is expected linearsensitive in a program variable over its loop guard if we have (i) has bounded update in the program variable , (ii) there exists a continuous RSMmap for and (iii) its loop body has the continuityupontermination property.
Proof
The proof resembles the one for expected affinesensitivity (Theorem 5.1). Consider runs , that start from respectively program valuations and use the same sampled values in each loop iteration. Consider that the event holds (i.e., both the executions do not terminate before the th step).
We have exactly the three cases demonstrated in expected affinesensitivity analysis, and again the sensitivity analysis eventually reduces to the first two cases (see the proof for Theorem 5.1). As we enhance the conditions in Theorem 5.1 with the continuityupontermination property, we have a strengthened analysis for the second case (exactly one of jumps out of the loop guard) as follows. W.l.o.g, we assume that and in the second case. As in the proof of Theorem 5.1, we obtain that
for some constant . Now we have the continuityupontermination property, so that the second case happens with probability at most , where is from Definition 6. Thus, the difference contributed to the total sensitivity when one of the runs terminates at step is at most
where the first summand is from the second case and the second from the first case. By summing up all ’s together, we obtain the desired expected linearsensitivity. ∎
Example 7
We now show that the variant Miniroulette example in Figure 1(right) is expected linearsensitive in the program variable over its loop guard. To show this, we construct the function with . We also clarify the following points.

For any values to the program variable before a loop iteration and any that resolves the probabilistic branches, we have that after the loop iteration where the value of is decided by the executed branch and its distribution(i.e for branch , ). The same applies to the program variable . Thus the loop body is nonexpansive.

All increments to is bounded, hence the loop has bounded update in .

The loop guard implies , thus (A1) is satisfied. When , and , we have , ensuring (A2). When , , we have , ensuring A(3). Thus is an RSMmap.

Given any values and to the program variables , we have . Thus is continuous.

By Lemma 1, we can conclude that the loop has continuityupontermination property.
Then by Theorem 5.2, we can conclude that this probabilistic program is expected linearsensitive in the program variable .
6 Proving Expected Sensitivity for General Loops
In the following, we show how our sound approach for proving expected sensitivity of nonexpansive loops can be enhanced to general loops with continuous loop body. We first illustrate the main difficulty when we lift from the nonexpansive to the general case. Then we enhance general RSMmaps to differencebounded RSMmaps and show how they can address the difficulty. Next we illustrate our approach for proving expected affinesensitivity. Finally, we present a case study on regularized SGD algorithms to show that our approaches can indeed solve problems arising from real applications.
A major barrier to handle general loops is that the difference between two program valuations may tend to infinity as the number of loop iterations increases. For example, consider a simple probabilistic while loop where at every execution in the loop body the value of a program variable is tripled and the loop terminates with probability immediately after the current loop iteration. Then given two different initial values for , we have that
where are given in (1). Thus the expectedsensitivity properties does not hold for this example, as the increasing speed of is higher than that for program termination. To cope with this aspect, we consider differencebounded RSMmaps, as follows.
Definition 7 (Differentbounded RSMmaps [17])
An RSMmap for a loop in the form (2) with the update function and the loop guard is differencebounded if it holds that
for some nonnegative real constant .
The condition (A4) specifies that the difference between values of the RSMmap that before and after the execution of every loop iteration be bounded. This condition ensures exponential decreasing for program termination [17] (see Theorem 0.D.1 in Appendix 0.D).
Based on Theorem 0.D.1, we demonstrate our sound approach for proving expected affinesensitivity of general loops. The main idea is to use the exponential decrease from differencebounded RSMmaps to counteract the unbounded increase in the difference between program values. Below for a program valuation , and a propositional arithmetic predicate , we denote by the neighbourhood . Then the main result is as follows.
Theorem 6.1
Consider a loop in the form (2) and a program variable such that the following conditions hold: (i) the loop body is continuous with the constant specified in Definition 2, and has bounded update in the program variable ; (ii) there exists a continuous differencebounded continuous RSMmap for with parameters from Definition 3 and Definition 7 such that . Then for any program valuation such that and , there exists such that the loop is expected affinesensitive in over .
The proof resembles the one for Theorem 5.1 and compares with the exponentialdecreasing factor . See Appendix 0.D for the detailed proof.
6.1 Case Study on Regularized SGD
We now demonstrate how one can verify through Theorem 6.1 that the regularized SGD demonstrated in Example 2 is expected affinesensitive in the parameters of around a sufficiently small neighbourhood of an initial vector of parameters, when the step size is small enough and the threshold is above the optimal minimal value. For the sake of simplicity, we directly treat the program variable as a sampling variable that observes the discrete uniform probability distribution over . We also omit as there is only one element in . Due to page limit, we give a brief description below. The details are put in Appendix 0.E.
The First Step. We show that there exists a radius such that for all , it always holds that , where (resp. ) represents the random value for the program variable (resp. ) right before the th execution of the loop body. Thus, the values for the program variables will always be bounded no matter which randomized execution the program takes.
The Second Step. We construct an RSMmap for the regularized SGD algorithm. Define the RSMmap be . From the Taylor expansion, we have that
where and is the Hessian matrix of . Then by averaging over all ’s from , we obtain that
Recall that the vector is always bounded by . Moreover, as is continuous and nonzero over the bounded region , has a nonzero minimum over this region. Thus when is sufficiently small, we have that where we have for some constants derivable from the boundedness within . From the boundedness, we can similarly derive constants such that (i) for all , (ii) the loop is bounded update in both with a bound , (ii) the loop is continuous with a constant , and (iv) the RSMmap is continuous with some constant . We can also choose and for some constant from the loop guard and the bounded update of the loop body. Thus, by applying Theorem 6.1, the regularized SGD for linear regression is expected affinesensitive.
The Third Step. Below we show that the derived expected affinesensitivity is nontrivial for this example. From the detailed proof of Theorem 6.1 in Appendix 0.D, we obtain that
where are the random variables for the value of after the execution of the SGD algorithm and are the input initial parameters around a fixed initial program valuation . Furthermore, through detailed calculation we have that when the step size tends to zero, the coefficient tends to zero, while the coefficient remains bounded (which however depends on ). Similar arguments hold for the program variable . This shows that the regularized SGD algorithm for linear regression is approximate linearexpected sensitive when the step size tend to zero. See Appendix 0.E for details.
7 Experimental Results
We have implemented our approach and obtained experimental results on a variety of programs. We follow previous approaches on synthesizing linear/polynomial RSMmaps [17, 13, 15] and use Lemma 1 to ensure the continuityupontermination property.
The Algorithm. Firstly, we set up a template for an RSMmap with unknown coefficients. Secondly, our algorithm transforms conditions of RSMmaps and other side conditions (e.g. continuity, bounded update) into a set of linear inequalities through either Farkas’ Lemma or Handelman’s Theorem (see [17, 13, 15]
for details). Finally, our algorithm solves the unknown coefficients in the template through linear programming and outputs the desired
together with other constants that witnesses the expected sensitivity of the input program.Results. We consider examples and their variants from the literature [15, 16, 17, 35]. All the experimental examples are with randomized execution time. We implemented our approach in Matlab R2018b. The results were obtained on a Windows machine with an Intel Core i5 2.9GHz processor and 8GB of RAM. The experimental results are illustrated in Table 1 for examples from [16, 35], where the first column specifies the example and the program variable of concern, the second is the running time for the example, and the last five specify the RSMmaps, related constants and type(i.e. expected affinesensitive or expected linearsensitive). A more detailed table with other constants and examples is available in Appendix 0.F.
Example ()  Time  Type  

MiniRoulette [16] ()  3.44  0.13  0  0.01  Affine  
Variant of MiniRoulette [16] ()  3.54  0.0491  0  0.01  Linear  
rdwalk [35] ()  2.92  0.05  0  0.01  Affine  
Variant of rdwalk [35] ()  2.98  0.075  0  0.01  Linear  
prdwalk [35] ()  3.00  0.0286  0  0.01  Linear  
Variant of prdwalk [35] ()  2.92  0.0143  0  0.01  Affine  
prspeed [35] ()  2.99  0  0.01  Affine  
Variant of prspeed [35] ()  3.09  0.0267  0  0.01  Linear  
race [35] ()  4.11  0.06  0  0.01  Linear  
Variant of race [35] ()  3.35  0.0429  0  0.01  Affine 
8 Related Work
In program verification Lipschitz continuity has been studied extensively: a SMTbased method for proving programs robust for a core imperative language is presented in [19]; a linear type system for proving sensitivity has been developed in [36]; approaches for differential privacy in higherorder languages have also been considered [3, 26, 38].
For probabilistic programs computing expectation properties have been studied over the decades, such as, influential works on PPDL [31] and PGCL [34]. Various approaches have been developed to reason about expected termination time of probabilistic programs [30, 25, 17] as well as to reason about whether a probabilistic program terminates with probability 1 [32, 28, 1, 18]. However, these works focus on nonrelational properties, such as, upper bounds expected termination time, whereas expected sensitivity is intrinsically relational. To the best of our knowledge while RSMs have been used for nonrelational properties, we are the first one to apply for relational properties.
There is also a great body of literature on relational analysis of probabilistic programs, such as, relational program logics [6] and differential privacy of algorithms [8]. However, this line of works do not consider relational expectation properties. There have also been several works on relational expectation properties, e.g., in the area of masking implementations in cryptography, quantitative masking [23]
and bounded moment model
[4]. The general framework to consider program sensitivity was considered in [7], and later improved in [5]. Several classical examples such as stochastic gradient descent with fixed iterations or Glauber dynamics can be analyzed in the framework of [5]. Another method for the sensitivity analysis of probabilistic programs has been proposed in [29] and they analysed a linearregression example derived from the algorithm in [5]. Our work extends [5] by allowing expected number of iterations, and using RSMs instead of coupling, which leads to an automated approach. The detailed comparision in terms of techniques of [29] is provided in Section 1.9 Conclusion
In this work we considered sensitivity analysis of probabilistic programs, and present an automated sound approach for analysis of programs that have expected finite number of iterations, rather than the number of iterations being fixed. Our method is not compositional, and an interesting direction of future work would be to consider how to incorporate compositional analysis methods with the approach proposed in this work.
References
 [1] Agrawal, S., Chatterjee, K., Novotný, P.: Lexicographic ranking supermartingales: an efficient approach to termination of probabilistic programs. PACMPL 2(POPL), 34:1–34:32 (2018). https://doi.org/10.1145/3158122, https://doi.org/10.1145/3158122
 [2] Aldous, D.J.: Random walks on finite groups and rapidly mixing Markov chains. S minaire de probabilit s de Strasbourg 17, 243–297 (1983)
 [3] de Amorim, A.A., Gaboardi, M., Hsu, J., Katsumata, S., Cherigui, I.: A semantic account of metric preservation. In: Proceedings of the 44th ACM SIGPLAN Symposium on Principles of Programming Languages, POPL 2017, Paris, France, January 1820, 2017. pp. 545–556 (2017), http://dl.acm.org/citation.cfm?id=3009890
 [4] Barthe, G., Dupressoir, F., Faust, S., Grégoire, B., Standaert, F., Strub, P.: Parallel implementations of masking schemes and the bounded moment leakage model. IACR Cryptology ePrint Archive 2016, 912 (2016), http://eprint.iacr.org/2016/912
 [5] Barthe, G., Espitau, T., Grégoire, B., Hsu, J., Strub, P.: Proving expected sensitivity of probabilistic programs. PACMPL 2(POPL), 57:1–57:29 (2018). https://doi.org/10.1145/3158145, http://doi.acm.org/10.1145/3158145
 [6] Barthe, G., Grégoire, B., Béguelin, S.Z.: Formal certification of codebased cryptographic proofs. In: Proceedings of the 36th ACM SIGPLANSIGACT Symposium on Principles of Programming Languages, POPL 2009, Savannah, GA, USA, January 2123, 2009. pp. 90–101 (2009). https://doi.org/10.1145/1480881.1480894, https://doi.org/10.1145/1480881.1480894
 [7] Barthe, G., Grégoire, B., Hsu, J., Strub, P.: Coupling proofs are probabilistic product programs. In: Proceedings of the 44th ACM SIGPLAN Symposium on Principles of Programming Languages, POPL 2017, Paris, France, January 1820, 2017. pp. 161–174 (2017), http://dl.acm.org/citation.cfm?id=3009896
 [8] Barthe, G., Köpf, B., Olmedo, F., Béguelin, S.Z.: Probabilistic relational reasoning for differential privacy. In: Proceedings of the 39th ACM SIGPLANSIGACT Symposium on Principles of Programming Languages, POPL 2012, Philadelphia, Pennsylvania, USA, January 2228, 2012. pp. 97–110 (2012). https://doi.org/10.1145/2103656.2103670, https://doi.org/10.1145/2103656.2103670
 [9] Billinsley, P.: Probability and Measure. JOHN WILEY & SONS (1995)

[10]
Bottou, L.: Stochastic gradient descent tricks. In: Montavon, G., Orr, G.B., Müller, K. (eds.) Neural Networks: Tricks of the Trade  Second Edition, Lecture Notes in Computer Science, vol. 7700, pp. 421–436. Springer (2012).
https://doi.org/10.1007/9783642352898_25, https://doi.org/10.1007/9783642352898_25  [11] Bousquet, O., Elisseeff, A.: Stability and generalization. Journal of Machine Learning Research 2, 499–526 (2002), http://www.jmlr.org/papers/v2/bousquet02a.html
 [12] van Breugel, F., Worrell, J.: Approximating and computing behavioural distances in probabilistic transition systems. Theor. Comput. Sci. 360(13), 373–385 (2006). https://doi.org/10.1016/j.tcs.2006.05.021, https://doi.org/10.1016/j.tcs.2006.05.021
 [13] Chakarov, A., Sankaranarayanan, S.: Probabilistic program analysis with martingales. In: CAV 2013. pp. 511–526 (2013)
 [14] Chatterjee, K.: Robustness of structurally equivalent concurrent parity games. In: Foundations of Software Science and Computational Structures  15th International Conference, FOSSACS 2012, Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2012, Tallinn, Estonia, March 24  April 1, 2012. Proceedings. pp. 270–285 (2012). https://doi.org/10.1007/9783642287299_18, https://doi.org/10.1007/9783642287299_18
 [15] Chatterjee, K., Fu, H., Goharshady, A.K.: Termination analysis of probabilistic programs through positivstellensatz’s. In: Chaudhuri, S., Farzan, A. (eds.) Computer Aided Verification  28th International Conference, CAV 2016, Toronto, ON, Canada, July 1723, 2016, Proceedings, Part I. Lecture Notes in Computer Science, vol. 9779, pp. 3–22. Springer (2016). https://doi.org/10.1007/9783319415284_1, https://doi.org/10.1007/9783319415284_1

[16]
Chatterjee, K., Fu, H., Goharshady, A.K., Okati, N.: Computational approaches for stochastic shortest path on succinct mdps. In: Proceedings of the TwentySeventh International Joint Conference on Artificial Intelligence, IJCAI 2018, July 1319, 2018, Stockholm, Sweden. pp. 4700–4707 (2018).
https://doi.org/10.24963/ijcai.2018/653, https://doi.org/10.24963/ijcai.2018/653  [17] Chatterjee, K., Fu, H., Novotný, P., Hasheminezhad, R.: Algorithmic analysis of qualitative and quantitative termination problems for affine probabilistic programs. ACM Trans. Program. Lang. Syst. 40(2), 7:1–7:45 (2018). https://doi.org/10.1145/3174800, https://doi.org/10.1145/3174800
 [18] Chatterjee, K., Novotný, P., Žikelić, Đ.: Stochastic invariants for probabilistic termination. In: POPL 2017. pp. 145–160 (2017)
 [19] Chaudhuri, S., Gulwani, S., Lublinerman, R.: Continuity analysis of programs. In: Proceedings of the 37th ACM SIGPLANSIGACT Symposium on Principles of Programming Languages, POPL 2010, Madrid, Spain, January 1723, 2010. pp. 57–70 (2010). https://doi.org/10.1145/1706299.1706308, https://doi.org/10.1145/1706299.1706308
 [20] Desharnais, J., Gupta, V., Jagadeesan, R., Panangaden, P.: Metrics for labelled markov processes. Theor. Comput. Sci. 318(3), 323–354 (2004). https://doi.org/10.1016/j.tcs.2003.09.013, https://doi.org/10.1016/j.tcs.2003.09.013
 [21] Dwork, C., McSherry, F., Nissim, K., Smith, A.: Calibrating noise to sensitivity in private data analysis. In: Proceedings of the Third Conference on Theory of Cryptography. pp. 265–284. TCC’06, SpringerVerlag, Berlin, Heidelberg (2006)
 [22] Dwork, C., Roth, A.: The algorithmic foundations of differential privacy. Found. Trends Theor. Comput. Sci. 9(34), 211–407 (2014)
 [23] Eldib, H., Wang, C., Taha, M.M.I., Schaumont, P.: Quantitative masking strength: Quantifying the power sidechannel resistance of software code. IEEE Trans. on CAD of Integrated Circuits and Systems 34(10), 1558–1568 (2015). https://doi.org/10.1109/TCAD.2015.2424951, https://doi.org/10.1109/TCAD.2015.2424951
 [24] Fu, H.: Computing game metrics on markov decision processes. In: Czumaj, A., Mehlhorn, K., Pitts, A.M., Wattenhofer, R. (eds.) Automata, Languages, and Programming  39th International Colloquium, ICALP 2012, Warwick, UK, July 913, 2012, Proceedings, Part II. Lecture Notes in Computer Science, vol. 7392, pp. 227–238. Springer (2012). https://doi.org/10.1007/9783642315855_23, https://doi.org/10.1007/9783642315855_23
 [25] Fu, H., Chatterjee, K.: Termination of nondeterministic probabilistic programs. In: Enea, C., Piskac, R. (eds.) Verification, Model Checking, and Abstract Interpretation  20th International Conference, VMCAI 2019, Cascais, Portugal, January 1315, 2019, Proceedings. Lecture Notes in Computer Science, vol. 11388, pp. 468–490. Springer (2019). https://doi.org/10.1007/9783030112455_22, https://doi.org/10.1007/9783030112455_22
 [26] Gaboardi, M., Haeberlen, A., Hsu, J., Narayan, A., Pierce, B.C.: Linear dependent types for differential privacy. In: The 40th Annual ACM SIGPLANSIGACT Symposium on Principles of Programming Languages, POPL ’13, Rome, Italy  January 23  25, 2013. pp. 357–370 (2013). https://doi.org/10.1145/2429069.2429113, https://doi.org/10.1145/2429069.2429113
 [27] Hardt, M., Recht, B., Singer, Y.: Train faster, generalize better: Stability of stochastic gradient descent. In: Proceedings of the 33nd International Conference on Machine Learning, ICML 2016, New York City, NY, USA, June 1924, 2016. pp. 1225–1234 (2016), http://jmlr.org/proceedings/papers/v48/hardt16.html
 [28] Huang, M., Fu, H., Chatterjee, K.: New approaches for almostsure termination of probabilistic programs. In: Ryu, S. (ed.) Programming Languages and Systems  16th Asian Symposium, APLAS 2018, Wellington, New Zealand, December 26, 2018, Proceedings. Lecture Notes in Computer Science, vol. 11275, pp. 181–201. Springer (2018). https://doi.org/10.1007/9783030027681_11, https://doi.org/10.1007/9783030027681_11
 [29] Huang, Z., Wang, Z., Misailovic, S.: Psense: Automatic sensitivity analysis for probabilistic programs. In: Automated Technology for Verification and Analysis  16th International Symposium, ATVA 2018, Los Angeles, CA, USA, October 710, 2018, Proceedings. pp. 387–403 (2018). https://doi.org/10.1007/9783030010904_23, https://doi.org/10.1007/9783030010904_23
 [30] Kaminski, B.L., Katoen, J., Matheja, C., Olmedo, F.: Weakest precondition reasoning for expected runtimes of probabilistic programs. In: Programming Languages and Systems  25th European Symposium on Programming, ESOP 2016, Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2016, Eindhoven, The Netherlands, April 28, 2016, Proceedings. pp. 364–389 (2016). https://doi.org/10.1007/9783662494981_15, https://doi.org/10.1007/9783662494981_15
 [31] Kozen, D.: A probabilistic PDL. J. Comput. Syst. Sci. 30(2), 162–178 (1985). https://doi.org/10.1016/00220000(85)900121, https://doi.org/10.1016/00220000(85)900121
 [32] McIver, A., Morgan, C., Kaminski, B.L., Katoen, J.P.: A new proof rule for almostsure termination. Proceedings of the ACM on Programming Languages 2(POPL), 33 (2017)
 [33] Meyn, S., Tweedie, R.: Markov Chains and Stochastic Stability. SpringerVerlag, London (1993), available at: probability.ca/MT
 [34] Morgan, C., McIver, A., Seidel, K.: Probabilistic predicate transformers. ACM Trans. Program. Lang. Syst. 18(3), 325–353 (1996). https://doi.org/10.1145/229542.229547, https://doi.org/10.1145/229542.229547
 [35] Ngo, V.C., Carbonneaux, Q., Hoffmann, J.: Bounded expectations: resource analysis for probabilistic programs. In: Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2018, Philadelphia, PA, USA, June 1822, 2018. pp. 496–512 (2018). https://doi.org/10.1145/3192366.3192394, https://doi.org/10.1145/3192366.3192394
 [36] Reed, J., Pierce, B.C.: Distance makes the types grow stronger: a calculus for differential privacy. In: Proceeding of the 15th ACM SIGPLAN international conference on Functional programming, ICFP 2010, Baltimore, Maryland, USA, September 2729, 2010. pp. 157–168 (2010). https://doi.org/10.1145/1863543.1863568, https://doi.org/10.1145/1863543.1863568
 [37] Williams, D.: Probability with Martingales. Cambridge University Press (1991)
 [38] WinogradCort, D., Haeberlen, A., Roth, A., Pierce, B.C.: A framework for adaptive differential privacy. PACMPL 1(ICFP), 10:1–10:29 (2017). https://doi.org/10.1145/3110254, https://doi.org/10.1145/3110254
Appendix 0.A The Detailed Syntax
The detailed syntax is in Figure 5.
Appendix 0.B Proof for the Integral Expansion
Theorem 0.B.1 (Integral Expansion)
Let be a simple probabilistic while loop in the form (2) and be a program variable. For any initial program valuation such that , we have
where is the random variable for the value of starting from the initial program valuation and is the probability that the probabilistic branches follows the choices in .
Proof
The result follows from the following derivations: