 # Proving Expected Sensitivity of Probabilistic Programs with Randomized Execution Time

The notion of program sensitivity (aka Lipschitz continuity) specifies that changes in the program input result in proportional changes to the program output. For probabilistic programs the notion is naturally extended to expected sensitivity. Previous approach develops a nice relational program logic framework for expected sensitivity of probabilistic while loops, where the number of iterations is fixed and bounded. In this work we present a sound approach for sensitivity analysis of probabilistic while loops, where the number of iterations is not fixed, but is randomized and only the expected number of iterations is finite. We demonstrate the effectiveness of our approach on several classical examples, e.g., mini-roulette and regularized stochastic gradient descent algorithm.

## Authors

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

Continuity properties of systems. Continuity property for systems requires that the change in the output is bounded by a monotone function of the change in the input. Analysis of continuity properties are of great interest in program analysis and reactive system analysis, for example: (a) robustness of numerical computations; (b) analysis of sensitivity of numerical queries  in databases; (c) analysis of stability of learning algorithms ; and (d) robustness analysis of programs .

Probabilistic systems. The continuity analysis is similarly relevant for probabilistic systems, where the notion of continuity is extended to expected continuity to average over the probabilistic behaviours of the system. For example, statistical notions of differential privacy 

; robustness analysis of Markov chains, Markov decision processes, and stochastic games

[2, 24, 12, 20, 14]; stability analysis of randomized learning algorithms [11, 27]; all fall under the umbrella of continuity analysis of probabilistic systems.

Program sensitivity. The notion of particular interest among continuity is program sensitivity (aka Lipschitz continuity) which specifies that the change in the output is proportional to the change in the input. Formally, there is a constant (Lipschitz constant) such that if the input changes by amount , then the change in the ouput is at most . In this work we consider the expected sensitivity of probabilistic programs given as probabilistic while loops.

Previous results. The expected sensitivity analysis of probabilistic programs was considered in , where an elegant method is presented based on relational program logic framework. The heart of the analysis technique is coupling-based methods, and the approach is shown to work very nicely on several examples from learning to statistical physics. However, the approach works on examples of probabilistic while loops, where the number of iterations of the loop is fixed and bounded (i.e., the number of iterations is fixed to a given number ). Many examples of probabilistic while loops does not have fixed number of iterations, rather the number of iterations is randomized and the expected number of iterations is finite. In this work we consider sensitivity analysis of probabilistic programs where the number of iterations is not fixed, rather the number of iterations is stochastic and depends on probabilistic sampling variables.

Our contributions. Our main contributions are as follows:

1. We present a sound approach for sensitivity analysis of probabilistic while loops where the number of iterations is not fixed, rather the expected number of iterations is finite.

2. In contrast to previous coupling-based approach, our approach is based on ranking supermartingales (RSMs) and continuity property of the loop body. We first present the results for non-expansive loops, which corresponds to the case where the Lipschitz constant is 1, and then present the result for general loops.

3. Since RSMs based approaches have been shown to be automated based on constraint solving, the same results in conjunction with our sound approach presents an automated approach for sensitivity analysis of probabilistic programs. We demonstrate the effectiveness of our approach on several examples, such as, a case study of the regularized stochastic gradient descent algorithm and experimental results on several examples such as mini-roulette.

Technical contribution. In terms of technical contribution there are key differences between our approach and the previous approach . The advantage of  is that they present a compositional approach based on probabilistic coupling, however, it leads to complex proof rules with side conditions. Moreover, the approach is interactive in the sense that a coupling is provided manually and then the proof is automated. In contrast, our approach is based on RSMs and continuity property, and it leads to an automated approach, as well as can handle stochastic iterations. However, it does not have the compositional properties of .

## 2 Probabilistic Programs

We present the syntax and semantics of our probabilistic programming language as follows. Throughout the paper, we denote by , , and the sets of all natural numbers (including zero), integers, and real numbers, respectively.

Syntax. Our probabilistic programming language is imperative and composed of statements. We present a succinct description below (see Appendix 0.A for details).

• Variables. Expressions (resp. ) range over program (resp. sampling) variables.

• Constants. Expressions range over decimals.

• Arithmetic Expressions. Expressions (resp. ) range over arithmetic expressions over both program and sampling variables (resp. program variables). For example, if are program variables and is a sampling variable, then is an instance of and is an instance of . In this paper, we do not fix the syntax for and .

• Boolean Expressions. Expressions range over propositional arithmetic predicates over program variables.

• Statements . Assignment statements are indicated by ‘’; ‘skip’ is the statement that does nothing; Standard conditional branches are indicated by the keyword ‘if’ accompanied with a propositional arithmetic predicate serving as the condition for the branch. While-loops are indicated by the keyword ‘while’ with a propositional arithmetic predicate as the loop guard. Probabilistic choices are model as probabilistic branches with the key word “if prob” that lead to the then

-branch with probability

and to the else-branch with probability .

In this work, we consider probabilistic programs without non-determinism.

Semantics. We first recall several standard notions from probability spaces as follows (see e.g. standard textbooks [37, 9]).

Probability Spaces. A probability space is a triple , where is a nonempty set (so-called sample space), is a -algebra over (i.e., a collection of subsets of that contains the empty set and is closed under complementation and countable union), and is a probability measure on , i.e., a function such that (i) and (ii) for all set-sequences that are pairwise-disjoint (i.e., whenever ) it holds that . Elements are usually called events. An event is said to hold almost surely (a.s.) if .

A random variable (r.v.) on a probability space is an -measurable function , i.e., a function satisfying the condition that for all , the set belongs to . By convention, we abbreviate as .

Expectation. The expected value of a random variable on a probability space , denoted by , is defined as the Lebesgue integral of w.r.t , i.e., ; the precise definition of Lebesgue integral is somewhat technical and is omitted here (cf. [37, Chapter 5] for a formal definition). In the case that is countable with distinct ’s, we have that .

To present the semantics, we also need the notion of valuations.

Valuations. Let be a finite set of variables with an implicit linear order over its elements. A valuation on

is a vector

in such that for each , the -th coordinate of , denoted by , is the value for the -th variable in the implicit linear order on . For the sake of convenience, we write for the value of a variable in a valuation .

Program and Sampling Valuations. A program valuation is a valuation on the set of program variables . Similarly, a sampling valuation is a valuation on the set of sampling variables . Given a program valuation and a propositional arithmetic predicate , the satisfaction relation is defined in the standard way so that we have iff holds when program variables in are substituted by their corresponding values in .

The Semantics. Now we give a brief description of the semantics for probabilistic programs. We follow the standard operational semantics through Markov chains (see e.g. [17, 13, 25]). Given a probabilistic program without non-determinism, its semantics is given as a general state-space Markov chain (GSSMC) [33, Chapter 3] with possibly an uncountable state space, where the state space consists of all pairs of program counters and program valuations for which the program counter refers to the next statement to be executed and the program valuation specifies the current values for program variables, and the kernel function that specifies the stochastic transitions between states is given by the statements in the program. For an initial state where is the initial program counter and is an initial program valuation, each probabilistic program induces a unique probability space through its corresponding GSSMC where the sample space consists of all infinite sequences of states in the GSSMC (as runs), the sigma-algebra is generated by all cylinder sets of runs with a common finite prefix, and the probability measure is uniquely determined by the kernel function and the initial state. We denote by the probability measure for a probabilistic program with the initial state , and by the expectation under the probability measure . The detailed semantics can be found in  [17, 13, 25].

## 3 Expected Sensitivity

Compared with the coupling-based definitions for expected sensitivity in , we consider average sensitivity

that directly compare the distance between the expected values from two close-by initial program valuations. Average sensitivity can be used to model algorithmic stability in many machine-learning algorithms (see e.g.

). In this paper, we focus on average sensitivity and will simply refer to average sensitivity as expected sensitivity.

The following definition illustrates our definition of expected sensitivity. In order to have the notion well-defined, we only consider probabilistic programs that terminate with probability one (i.e., almost-sure termination [17, 13, 28, 32]) for all initial program valuations.

###### Definition 1 (Expected Sensitivity)

Consider a probabilistic program that terminates with probability one for all initial program valuations. We say that the program is expected affine-sensitive in a program variable over a set of initial program valuations if there exist non-negative real constants such that for any initial program valuations with sufficiently small , we have that

 (1)

where are random variables representing the values of after the execution of the program starting from the initial valuations respectively, and the max-norm is defined by . Furthermore, if we can choose in (1), then we say that the program is expected linear-sensitive in the program variable .

Thus, a program is expected affine-sensitive in a program variable if the difference of the expected value of after the termination of is bounded by an affine function in the difference of the initial valuations. The program is expected linear-sensitive if the difference can be bounded by a linear function.

###### Remark 1 (Comparison with )

Our definition is looser than the coupling-based definition in , as it is shown in  that the coupling-based expected sensitivity implies our notion of expected sensitivity. However, the converse does not hold. This is because when one chooses the same program and the same initial program valuations , the expected difference in our sense is zero, while the expected difference in  would still be greater than zero as the coupling-based definition focuses on the difference over the product of two independent runs, which may be non-zero as a probabilistic program typically has multiple probabilistic executions. Moreover, there is no coupling-based expected linear-sensitivity defined in , whereas we study expected linear-sensitivity directly from its definition.

## 4 Motivating Examples

In the following, we show several motivating examples for expected-sensitivity analysis of probabilistic programs. We consider in particular probabilistic programs with a randomized number of execution steps. As existing results [5, 29] only consider probabilistic for-loops with a fixed number of loop iterations, none of the examples in this section can be handled by previous approaches.

###### Example 1 (Mini-roulette)

A particular gambler’s-ruin game is called Mini-roulette, which is a popular casino game based on a 13-slot wheel. A player starts the game with chips. She needs one chip to make a bet and she bets as long as she has chips. If she loses a bet, the chip will not be returned, but a winning bet will not consume the chip and results in a specific amount of (monetary) reward, and possibly even more chips. The following types of bets can be placed at each round. (1) Even-money bets: In these bets, specific slots are chosen. Then the ball is rolled and the player wins the bet if it lands in one of the slots. So the player has a winning probability of . Winning them gives a reward of two unit and one extra chip. (2) 2-to-1 bets: These bets correspond to chosen slots and winning them gives a reward of and extra chips. (3) 3-to-1, 5-to-1 and 11-to-1 bets: These are defined similarly and have winning probabilities of , and respectively. Suppose at each round, the player chooses each type of bets probabilistically equally. The probabilistic program for this example is shown in Figure 1(left), where the program variable represents the number of chips and the program variable records the accumulated rewards. We also consider a continuous variant of this Mini-roulette example in Figure 1(right), where we replace increments to the variable

by uniformly-distributed sampling variables

and one may choose , , , , , or other uniform distributions that ensure the termination of the program. In both the examples, we consider the expected sensitivity in the program variable that records the accumulated reward. Note that the number of loop iterations in all the programs in Figure 1 is randomized and not fixed as the loop guard is and the increment of is random in each loop iteration.

###### Example 2 (Stochastic Gradient Descent)

The most widely used method in machine learning practice is stochastic gradient descent (SGD). The general form of an SGD algorithm is depicted in Figure LABEL:fig:example4. In the figure, an SGD algorithm is modelled as a probabilistic while loop, where is a sampling variable whose value is sampled uniformly from , is a vector of program variables that represents parameters to be learned, is a program variable that represents the index, and is a constant that represents the step size. The symbol represents the gradient, while each (

) is the loss function for the

th input data. By convention, the total loss function is given as the expectation of the sum of all ’s, i.e., . At each loop iteration, a data is chosen uniformly from all data and the parameters in are adjusted by the step size multiplies the gradient of the th loss function . The loop guard can either be practical so that a fixed number of iterations is performed (as is analyzed in existing approaches [27, 5, 29] for expected sensitivity), or the local criteria that the magnitude () of the gradient of the total loss function is small enough, or the global criteria that the value of is small enough.

We consider a regularized version of the SGD algorithm. The idea is to add a penalty term to the loss function to control the complexity of the learned parameters in , so as to make the learned parameters more reasonable and avoid overfitting . A scheme for a regularized SGD is depicted in Figure LABEL:fig:sgdreg. In the figure, the main difference is that in each loop iteration, the parameters in are adjusted further by the term (). This adjustment results from adding the penalty term to the total loss function , so that its gradient is contributed to the learning of in every loop iteration.

To be more concrete, we consider applying the

-regularized SGD algorithm to the linear-regression problem. The problem is that given a fixed number of

input coordinates on the plane (as data), the task is to find the parameters such that the line equation best fits the input coordinates. The optimality is measured by the loss function , where is the penalty term and is the loss function for the th input coordinate. By letting , the regularized SGD algorithm in Figure LABEL:fig:sgdreg can be applied directly to solve the linear-regression problem. While previous results  [27, 5, 29] consider SGD with a fixed number of loop iterations, we choose the loop guard in Figure LABEL:fig:sgdreg to be the global condition where is the threshold below which the magnitude of is small enough. Then we consider the expected sensitivity of the regularized SGD for the linear-regression problem w.r.t the initial parameters . Note that the regularized SGD algorithm in our setting does not have a fix number of loop iterations, thus cannot be analyzed by the previous approaches [27, 5, 29].

## 5 Proving Expected Sensitivity for Non-expansive Loops

We consider simple probabilistic while loops and investigate sound approaches for proving expected sensitivity over such programs. A simple probabilistic while loop is of the form

 while Φ do P od (2)

where is the loop guard that is a propositional arithmetic predicate over program variables, and the loop body is a statement without while-loops.

Update Functions. Given a simple probabilistic while loop in the form (2) with the disjoint sets and of program and sampling variables, we can abstract away detailed executions of the loop body by an update function as follows. First, we let be the set of all program counters that refer to a probabilistic branch (i.e., ) in the loop body of . Then we define to be the set of all functions from into the set ; informally, a function in specifies for each probabilistic branch in which branch (i.e., either the then- or the else-branch) is chosen in a real loop iteration. Finally, the update function simply gives the program valuation after the loop iteration given (i) a function in that specifies the probabilistic choices at probabilistic branches, (ii) a program valuation that specifies the values for program variables before the loop iteration and (iii) a sampling valuation that gives all the sampled values for the sampling variables in the loop iteration.

Runs. We also simplify the notion of runs over programs in the form (2). A run for a probabilistic loop in the form (2) is an infinite sequence of program valuations such that is the program valuation before the -th execution of the loop body. Note that if , then where (resp. ) specifies all the probabilistic branches (the sample values for sampling variables); otherwise, .

Notations. To ease the use of notations, we always use for a program valuation, for a sampling valuation and for an element in , with super-/sub-scripts.

We consider simple probabilistic while loops whose loop body satisfies a continuity property below. We note that the continuity of the loop body is a natural requirement in ensuring the sensitivity of the whole loop.

###### Definition 2 (Continuity of the Loop Body L)

We say that the loop body of a loop in the form (2) is continuous if there is a real constant such that

• .

If we can choose , then we say that the loop is non-expansive.

Below we illustrate a running example.

###### Example 3 (Running Example)

Consider the simple probabilistic while loop in Figure 4 on Page 4. In the program, is a program variable and is a sampling variable. Informally, in every loop iteration, the value of

is increased by a sampled value w.r.t the probability distribution of

until the value of is greater than . There is no probabilistic branch so is a singleton set that only contains the empty function. The update function for the loop body is then given by for program valuation , sampling valuation and the only element . By definition, the loop is non-expansive.

### 5.1 Proving Expected Affine Sensitivity

We demonstrate our sound approach for proving expected affine sensitivity over non-expansive loops. By definition, if we have that a simple probabilistic while loop is non-expansive and has good termination property, then it is intuitive that a fixed number of executions of its loop body has the expected affine-sensitivity. We show that this intuition is true, if we use ranking-supermartingale maps (RSM-maps) to ensure the termination property of probabilistic programs. Below we fix a simple probabilistic while loop in the form (2) with the update function , loop guard and loop body . As we consider a simple class of while-loops, we also consider a simplified version of RSM-maps that uses the update function to abstract away the detailed execution within the loop body.

###### Definition 3 (RSM-maps η[17, 13, 25])

A ranking-supermartingale map (RSM-map) is a function such that there exist real numbers satisfying the following conditions:

• ;

• ;

• ,

where is the expected value of such that is treated as a constant vector and (resp. ) observes the (joint) probability distributions of sampling variables (resp. the probabilities of the probabilistic branches).

The existence of an RSM-map can prove finite expected termination time of a probabilistic programs [17, 25] (see Theorem 0.C.1 in Appendix 0.C). In this sense, an RSM-map controls the randomized number of loop iterations, so that the sensitivity of the whole loop follows nearly from the continuity of the loop body. However, there is another phenomenon to take into account, i.e., the situation that executions from different initial program valuations may have different number of loop iterations. In this situation, we need to ensure that when the execution from one initial program valuation terminates and the execution from the other does not, the final values from the other execution should not be far from the values of the terminated execution. To ensure this property, we require (i) a bounded-update condition that the value-change of a program variable in one loop iteration is bounded, and (ii) an RSM-map that is continuous.

###### Definition 4 (Bounded Update d)

We say that the loop has bounded update in a program variable if there exists a real constant such that

• .

###### Definition 5 (Continuity of RSM-maps M)

An RSM-map is continuous if there exists a real constant such that

• .

Now we demonstrate the main result for proving expected affine-sensitivity of non-expansive simple probabilistic while loops.

###### Theorem 5.1

A non-expansive probabilistic loop in the form (2) is expected affine-sensitive in a program variable over its loop guard if we have (i) has bounded update in the program variable and (ii) there exists a continuous RSM-map for .

###### Proof (Sketch)

Let be a bound for from Definition 4, and be a continuous RSM-maps with from Definition 3 and the constant from Definition 5. Consider any initial program valuations such that . Denote . As we consider independent executions of the loop from , we use to denote the random variable for the number of loop iterations for for the executions starting from an initial program valuation . We also use to denote the random variable for the value of after the execution of from . We illustrate the main idea through clarifying the relationships between program valuations in any runs , that start from respectively and use the same sampled values in each loop iteration. Consider that the event holds (i.e., both the executions do not terminate before the th step). We have the following cases:

• Both and violate the loop guard , i.e., . This case describes that the loop terminates exactly after the -th iteration of the loop for both the initial valuations. From the condition (B1), we obtain directly that .

• Exactly one of violates the loop guard . W.l.o.g., we assume that and . From the property of RSM-maps (Theorem 0.C.1 in Appendix 0.C), we have that . Then from , we obtain . From the condition (B1), we have . Then by the condition (B3), we have . Furthermore, from the condition (A3) we have . Then we obtain that . Hence we have

 |Ebn(Zbn)−Eb′n(Zb′n)| = |Ebn(Zbn)−b′n[z]| ≤ |Ebn(Zbn)−bn[z]|+|bn[z]−b′n[z]| ≤ d⋅M⋅δ+K′−Kϵ+δ = (d⋅Mϵ+1)⋅δ+d⋅(K′−K)ϵ.
• Neither nor violates the loop guard . In this case, the loop will continue from both and . Then in the next iteration, the same analysis can be carried out for the next program valuations .

From the termination property ensured by RSM-maps (Theorem 0.C.1 in Appendix 0.C), the probability that the third case happens infinitely often equals zero. Thus, the sensitivity analysis eventually reduces to the first two cases, and the first two cases derives the expected affine-sensitivity. From the first two cases, the difference contributed to the total sensitivity when one of the runs terminates at a step is at most

 P(Tb=n∨Tb′=n)⋅[(d⋅Mϵ+1)⋅δ+d⋅(K′−K)ϵ].

Then by an integral expansion and a summation for all , we can derive the desired result. The detailed proof is put in Appendix 0.C. ∎

###### Example 4 (Running Example)

Consider our running example in Figure 4 on Page 4. Here we choose that the sampling variable

observes the Bernoulli distribution such that

. We can construct an RSM-map with . The RSM-map is also continuous with . Hence from Theorem 5.1, we conclude that the loop is expected affine sensitive in the program variable over its loop guard.

###### Example 5 (Mini-roulette)

We now show that the Mini-roulette example in Figure 1(left) is expected affine-sensitive in the program variable over its loop guard. To show this, we construct the function with . We also clarify the following points.

1. For any values to the program variable before a loop iteration and any that resolves the probabilistic branches, we have that after the loop iteration where the value of is determined by the probabilistic branch (i.e for branch , ). The same applies to the program variable . Thus the loop is non-expansive.

2. All increments to is bounded, hence the loop has bounded update in .

3. The loop guard implies , thus (A1) is satisfied. When , and , we have , ensuring (A2). When , , we have , ensuring (A3). Thus is an RSM-map.

4. Given any values and to the program variables , we have . Thus is continuous.

Thus by Theorem 5.1, we obtain that the program is expected affine-sensitive in over its loop guard.

### 5.2 Proving Expected Linear-Sensitivity

To prove expected linear-sensitivity, one possible way is to extend the approach for expected affine-sensitivity. However, simply extending the approach is not correct, as is shown by the following example.

###### Example 6

Consider again our running example in Figure 4, where the sampling variable observes the distribution such that . From Example 4, we have the the program is expected affine-sensitive in the program variable . However, we show that the program is not expected linear-sensitive in . Consider initial inputs and (), since we only add or to the value of , we have that the output value under the input must be , while the output value under equals . It follows that we could not find a constant such that when .

The reason why we have such a situation in Example 6 has been illustrated previously: under different initial program valuations, it may happen that the loop may terminate in different number of loop iterations, even if we require that the same values for the sampling variables are sampled. In order to handle this phenomenon, we introduce a condition that bounds the probability that this phenomenon happens.

###### Definition 6 (Continuity-Upon-Termination)

We say that the loop body of a probabilistic while loop in the form (2) is continuous upon termination if there exists a constant such that

where is the probability regarding the sampled values that given the program valuations before the loop iteration being and the probabilistic branches resolved by , after one loop iteration we have can still enter the loop, while violates the loop guard and the loop stops.

Informally, the continuity-upon-termination property requires that when the initial program valuations to the loop body are close, the probability that after the current loop iteration one of them stays in the loop while the other jumps out of the loop is also close. This condition is satisfied in many situations where we have continuously-distributed sampling variables. For example, consider our running example (Example 3) where now observes the uniform distribution over the interval . Then for any initial values for the program variable , the probability that but equals the chance that the sampled value of falls in , which is no greater than

as the probability density function of

is over the interval . Thus, the continuity-upon-termination property is satisfied.

To show the applicability of this property, we prove a result showing that a large class of simple affine probabilistic while loops has this property. Below we say that a propositional arithmetic predicate is affine if can be equivalently rewritten into a DNF normal form with constant matrices and constant vectors so that for all program valuations , we have iff the statement holds.

###### Lemma 1

Given a probabilistic while loop in the form (2) with the loop body , the loop guard and the update function , if it holds that

1. is an affine function in the input program and sampling valuation and is affine that is equivalent to a DNF form , and

2. all the sampling variables are continuously-distributed and have bounded probability density function, and

3. the sampling variables are not dummy in all the atomic propositions of the loop guard , i.e., for all , and program valuations we have the coefficients over sampling variables in are not all zero in all rows,

then the loop has the continuity-upon-termination property.

The proof of Lemma 1 is elementary and is put in Appendix 0.C.

The continuity-upon-termination is a key property to ensure expected linear sensitivity. With this property, we can then bound in the linear way the probability that the executions from different initial program valuations have different number of loop iterations by the difference between initial program valuations. The main result for expected linear-sensitivity is as follows.

###### Theorem 5.2

A non-expansive probabilistic loop in the form (2) is expected linear-sensitive in a program variable over its loop guard if we have (i) has bounded update in the program variable , (ii) there exists a continuous RSM-map for and (iii) its loop body has the continuity-upon-termination property.

###### Proof

The proof resembles the one for expected affine-sensitivity (Theorem 5.1). Consider runs , that start from respectively program valuations and use the same sampled values in each loop iteration. Consider that the event holds (i.e., both the executions do not terminate before the th step).

We have exactly the three cases demonstrated in expected affine-sensitivity analysis, and again the sensitivity analysis eventually reduces to the first two cases (see the proof for Theorem 5.1). As we enhance the conditions in Theorem 5.1 with the continuity-upon-termination property, we have a strengthened analysis for the second case (exactly one of jumps out of the loop guard) as follows. W.l.o.g, we assume that and in the second case. As in the proof of Theorem 5.1, we obtain that

 |Ebn(Zbn)−Eb′n(Zb′n)| ≤ (d⋅Mϵ+1)⋅δ+d⋅(K′−K)ϵ≤C

for some constant . Now we have the continuity-upon-termination property, so that the second case happens with probability at most , where is from Definition 6. Thus, the difference contributed to the total sensitivity when one of the runs terminates at step is at most

 C⋅L′⋅P(Tb=n−1∨Tb′=n−1)⋅∥bn−1−b′n−1∥∞+ P(Tb=n∨Tb′=n)⋅∥bn−b′n∥∞

where the first summand is from the second case and the second from the first case. By summing up all ’s together, we obtain the desired expected linear-sensitivity. ∎

###### Example 7

We now show that the variant Mini-roulette example in Figure 1(right) is expected linear-sensitive in the program variable over its loop guard. To show this, we construct the function with . We also clarify the following points.

1. For any values to the program variable before a loop iteration and any that resolves the probabilistic branches, we have that after the loop iteration where the value of is decided by the executed branch and its distribution(i.e for branch , ). The same applies to the program variable . Thus the loop body is non-expansive.

2. All increments to is bounded, hence the loop has bounded update in .

3. The loop guard implies , thus (A1) is satisfied. When , and , we have , ensuring (A2). When , , we have , ensuring A(3). Thus is an RSM-map.

4. Given any values and to the program variables , we have . Thus is continuous.

5. By Lemma 1, we can conclude that the loop has continuity-upon-termination property.

Then by Theorem 5.2, we can conclude that this probabilistic program is expected linear-sensitive in the program variable .

## 6 Proving Expected Sensitivity for General Loops

In the following, we show how our sound approach for proving expected sensitivity of non-expansive loops can be enhanced to general loops with continuous loop body. We first illustrate the main difficulty when we lift from the non-expansive to the general case. Then we enhance general RSM-maps to difference-bounded RSM-maps and show how they can address the difficulty. Next we illustrate our approach for proving expected affine-sensitivity. Finally, we present a case study on regularized SGD algorithms to show that our approaches can indeed solve problems arising from real applications.

A major barrier to handle general loops is that the difference between two program valuations may tend to infinity as the number of loop iterations increases. For example, consider a simple probabilistic while loop where at every execution in the loop body the value of a program variable is tripled and the loop terminates with probability immediately after the current loop iteration. Then given two different initial values for , we have that

 Ez′(Z′)−Ez′′(Z′′)=∞∑n=1P(T=n)⋅3n⋅|z′−z′′|=∞∑n=1(32)n⋅|z′−z′′|=∞.

where are given in (1). Thus the expected-sensitivity properties does not hold for this example, as the increasing speed of is higher than that for program termination. To cope with this aspect, we consider difference-bounded RSM-maps, as follows.

###### Definition 7 (Different-bounded RSM-maps )

An RSM-map for a loop in the form (2) with the update function and the loop guard is difference-bounded if it holds that

for some non-negative real constant .

The condition (A4) specifies that the difference between values of the RSM-map that before and after the execution of every loop iteration be bounded. This condition ensures exponential decreasing for program termination  (see Theorem 0.D.1 in Appendix 0.D).

Based on Theorem 0.D.1, we demonstrate our sound approach for proving expected affine-sensitivity of general loops. The main idea is to use the exponential decrease from difference-bounded RSM-maps to counteract the unbounded increase in the difference between program values. Below for a program valuation , and a propositional arithmetic predicate , we denote by the neighbourhood . Then the main result is as follows.

###### Theorem 6.1

Consider a loop in the form (2) and a program variable such that the following conditions hold: (i) the loop body is continuous with the constant specified in Definition 2, and has bounded update in the program variable ; (ii) there exists a continuous difference-bounded continuous RSM-map for with parameters from Definition 3 and Definition 7 such that . Then for any program valuation such that and , there exists such that the loop is expected affine-sensitive in over .

The proof resembles the one for Theorem 5.1 and compares with the exponential-decreasing factor . See Appendix 0.D for the detailed proof.

### 6.1 Case Study on Regularized SGD

We now demonstrate how one can verify through Theorem 6.1 that the regularized SGD demonstrated in Example 2 is expected affine-sensitive in the parameters of around a sufficiently small neighbourhood of an initial vector of parameters, when the step size is small enough and the threshold is above the optimal minimal value. For the sake of simplicity, we directly treat the program variable as a sampling variable that observes the discrete uniform probability distribution over . We also omit as there is only one element in . Due to page limit, we give a brief description below. The details are put in Appendix 0.E.

The First Step. We show that there exists a radius such that for all , it always holds that , where (resp. ) represents the random value for the program variable (resp. ) right before the -th execution of the loop body. Thus, the values for the program variables will always be bounded no matter which randomized execution the program takes.

The Second Step. We construct an RSM-map for the regularized SGD algorithm. Define the RSM-map be . From the Taylor expansion, we have that

 G(w+Δw)=G(w)+(∇G(w))TΔw+12(Δw)THΔw

where and is the Hessian matrix of . Then by averaging over all ’s from , we obtain that

 Ei(η(F(i,w)))=G(w)−γ(∇G(w))T∇G(w)+γ22nn∑i=1(∇Gi(w)+α⋅w)TH(∇Gi(w)+α⋅w).

Recall that the vector is always bounded by . Moreover, as is continuous and non-zero over the bounded region , has a non-zero minimum over this region. Thus when is sufficiently small, we have that where we have for some constants derivable from the boundedness within . From the boundedness, we can similarly derive constants such that (i) for all , (ii) the loop is bounded update in both with a bound , (ii) the loop is continuous with a constant , and (iv) the RSM-map is continuous with some constant . We can also choose and for some constant from the loop guard and the bounded update of the loop body. Thus, by applying Theorem 6.1, the regularized SGD for linear regression is expected affine-sensitive.

The Third Step. Below we show that the derived expected affine-sensitivity is non-trivial for this example. From the detailed proof of Theorem 6.1 in Appendix 0.D, we obtain that

 |Ew(W1)−Ew′(W1)|≤¯¯¯¯A⋅∥w−w′∥∞+¯¯¯¯B

where are the random variables for the value of after the execution of the SGD algorithm and are the input initial parameters around a fixed initial program valuation . Furthermore, through detailed calculation we have that when the step size tends to zero, the coefficient tends to zero, while the coefficient remains bounded (which however depends on ). Similar arguments hold for the program variable . This shows that the regularized SGD algorithm for linear regression is approximate linear-expected sensitive when the step size tend to zero. See Appendix 0.E for details.

## 7 Experimental Results

We have implemented our approach and obtained experimental results on a variety of programs. We follow previous approaches on synthesizing linear/polynomial RSM-maps [17, 13, 15] and use Lemma 1 to ensure the continuity-upon-termination property.

The Algorithm. Firstly, we set up a template for an RSM-map with unknown coefficients. Secondly, our algorithm transforms conditions of RSM-maps and other side conditions (e.g. continuity, bounded update) into a set of linear inequalities through either Farkas’ Lemma or Handelman’s Theorem (see [17, 13, 15]

for details). Finally, our algorithm solves the unknown coefficients in the template through linear programming and outputs the desired

together with other constants that witnesses the expected sensitivity of the input program.

Results. We consider examples and their variants from the literature [15, 16, 17, 35]. All the experimental examples are with randomized execution time. We implemented our approach in Matlab R2018b. The results were obtained on a Windows machine with an Intel Core i5 2.9GHz processor and 8GB of RAM. The experimental results are illustrated in Table 1 for examples from [16, 35], where the first column specifies the example and the program variable of concern, the second is the running time for the example, and the last five specify the RSM-maps, related constants and type(i.e. expected affine-sensitive or expected linear-sensitive). A more detailed table with other constants and examples is available in Appendix 0.F.

## 8 Related Work

In program verification Lipschitz continuity has been studied extensively: a SMT-based method for proving programs robust for a core imperative language is presented in ; a linear type system for proving sensitivity has been developed in ; approaches for differential privacy in higher-order languages have also been considered [3, 26, 38].

For probabilistic programs computing expectation properties have been studied over the decades, such as, influential works on PPDL  and PGCL . Various approaches have been developed to reason about expected termination time of probabilistic programs [30, 25, 17] as well as to reason about whether a probabilistic program terminates with probability 1 [32, 28, 1, 18]. However, these works focus on non-relational properties, such as, upper bounds expected termination time, whereas expected sensitivity is intrinsically relational. To the best of our knowledge while RSMs have been used for non-relational properties, we are the first one to apply for relational properties.

There is also a great body of literature on relational analysis of probabilistic programs, such as, relational program logics   and differential privacy of algorithms . However, this line of works do not consider relational expectation properties. There have also been several works on relational expectation properties, e.g., in the area of masking implementations in cryptography, quantitative masking 

and bounded moment model

. The general framework to consider program sensitivity was considered in , and later improved in . Several classical examples such as stochastic gradient descent with fixed iterations or Glauber dynamics can be analyzed in the framework of . Another method for the sensitivity analysis of probabilistic programs has been proposed in  and they analysed a linear-regression example derived from the algorithm in . Our work extends  by allowing expected number of iterations, and using RSMs instead of coupling, which leads to an automated approach. The detailed comparision in terms of techniques of  is provided in Section 1.

## 9 Conclusion

In this work we considered sensitivity analysis of probabilistic programs, and present an automated sound approach for analysis of programs that have expected finite number of iterations, rather than the number of iterations being fixed. Our method is not compositional, and an interesting direction of future work would be to consider how to incorporate compositional analysis methods with the approach proposed in this work.

## References

•  Agrawal, S., Chatterjee, K., Novotný, P.: Lexicographic ranking supermartingales: an efficient approach to termination of probabilistic programs. PACMPL 2(POPL), 34:1–34:32 (2018). https://doi.org/10.1145/3158122, https://doi.org/10.1145/3158122
•  Aldous, D.J.: Random walks on finite groups and rapidly mixing Markov chains. S minaire de probabilit s de Strasbourg 17, 243–297 (1983)
•  de Amorim, A.A., Gaboardi, M., Hsu, J., Katsumata, S., Cherigui, I.: A semantic account of metric preservation. In: Proceedings of the 44th ACM SIGPLAN Symposium on Principles of Programming Languages, POPL 2017, Paris, France, January 18-20, 2017. pp. 545–556 (2017), http://dl.acm.org/citation.cfm?id=3009890
•  Barthe, G., Dupressoir, F., Faust, S., Grégoire, B., Standaert, F., Strub, P.: Parallel implementations of masking schemes and the bounded moment leakage model. IACR Cryptology ePrint Archive 2016,  912 (2016), http://eprint.iacr.org/2016/912
•  Barthe, G., Espitau, T., Grégoire, B., Hsu, J., Strub, P.: Proving expected sensitivity of probabilistic programs. PACMPL 2(POPL), 57:1–57:29 (2018). https://doi.org/10.1145/3158145, http://doi.acm.org/10.1145/3158145
•  Barthe, G., Grégoire, B., Béguelin, S.Z.: Formal certification of code-based cryptographic proofs. In: Proceedings of the 36th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL 2009, Savannah, GA, USA, January 21-23, 2009. pp. 90–101 (2009). https://doi.org/10.1145/1480881.1480894, https://doi.org/10.1145/1480881.1480894
•  Barthe, G., Grégoire, B., Hsu, J., Strub, P.: Coupling proofs are probabilistic product programs. In: Proceedings of the 44th ACM SIGPLAN Symposium on Principles of Programming Languages, POPL 2017, Paris, France, January 18-20, 2017. pp. 161–174 (2017), http://dl.acm.org/citation.cfm?id=3009896
•  Barthe, G., Köpf, B., Olmedo, F., Béguelin, S.Z.: Probabilistic relational reasoning for differential privacy. In: Proceedings of the 39th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL 2012, Philadelphia, Pennsylvania, USA, January 22-28, 2012. pp. 97–110 (2012). https://doi.org/10.1145/2103656.2103670, https://doi.org/10.1145/2103656.2103670
•  Billinsley, P.: Probability and Measure. JOHN WILEY & SONS (1995)
• 

Bottou, L.: Stochastic gradient descent tricks. In: Montavon, G., Orr, G.B., Müller, K. (eds.) Neural Networks: Tricks of the Trade - Second Edition, Lecture Notes in Computer Science, vol. 7700, pp. 421–436. Springer (2012).

https://doi.org/10.1007/978-3-642-35289-8_25,
https://doi.org/10.1007/978-3-642-35289-8_25
•  Bousquet, O., Elisseeff, A.: Stability and generalization. Journal of Machine Learning Research 2, 499–526 (2002), http://www.jmlr.org/papers/v2/bousquet02a.html
•  van Breugel, F., Worrell, J.: Approximating and computing behavioural distances in probabilistic transition systems. Theor. Comput. Sci. 360(1-3), 373–385 (2006). https://doi.org/10.1016/j.tcs.2006.05.021, https://doi.org/10.1016/j.tcs.2006.05.021
•  Chakarov, A., Sankaranarayanan, S.: Probabilistic program analysis with martingales. In: CAV 2013. pp. 511–526 (2013)
•  Chatterjee, K.: Robustness of structurally equivalent concurrent parity games. In: Foundations of Software Science and Computational Structures - 15th International Conference, FOSSACS 2012, Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2012, Tallinn, Estonia, March 24 - April 1, 2012. Proceedings. pp. 270–285 (2012). https://doi.org/10.1007/978-3-642-28729-9_18, https://doi.org/10.1007/978-3-642-28729-9_18
•  Chatterjee, K., Fu, H., Goharshady, A.K.: Termination analysis of probabilistic programs through positivstellensatz’s. In: Chaudhuri, S., Farzan, A. (eds.) Computer Aided Verification - 28th International Conference, CAV 2016, Toronto, ON, Canada, July 17-23, 2016, Proceedings, Part I. Lecture Notes in Computer Science, vol. 9779, pp. 3–22. Springer (2016). https://doi.org/10.1007/978-3-319-41528-4_1, https://doi.org/10.1007/978-3-319-41528-4_1
• 

Chatterjee, K., Fu, H., Goharshady, A.K., Okati, N.: Computational approaches for stochastic shortest path on succinct mdps. In: Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, IJCAI 2018, July 13-19, 2018, Stockholm, Sweden. pp. 4700–4707 (2018).

https://doi.org/10.24963/ijcai.2018/653,
https://doi.org/10.24963/ijcai.2018/653
•  Chatterjee, K., Fu, H., Novotný, P., Hasheminezhad, R.: Algorithmic analysis of qualitative and quantitative termination problems for affine probabilistic programs. ACM Trans. Program. Lang. Syst. 40(2), 7:1–7:45 (2018). https://doi.org/10.1145/3174800, https://doi.org/10.1145/3174800
•  Chatterjee, K., Novotný, P., Žikelić, Đ.: Stochastic invariants for probabilistic termination. In: POPL 2017. pp. 145–160 (2017)
•  Chaudhuri, S., Gulwani, S., Lublinerman, R.: Continuity analysis of programs. In: Proceedings of the 37th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL 2010, Madrid, Spain, January 17-23, 2010. pp. 57–70 (2010). https://doi.org/10.1145/1706299.1706308, https://doi.org/10.1145/1706299.1706308
•  Desharnais, J., Gupta, V., Jagadeesan, R., Panangaden, P.: Metrics for labelled markov processes. Theor. Comput. Sci. 318(3), 323–354 (2004). https://doi.org/10.1016/j.tcs.2003.09.013, https://doi.org/10.1016/j.tcs.2003.09.013
•  Dwork, C., McSherry, F., Nissim, K., Smith, A.: Calibrating noise to sensitivity in private data analysis. In: Proceedings of the Third Conference on Theory of Cryptography. pp. 265–284. TCC’06, Springer-Verlag, Berlin, Heidelberg (2006)
•  Dwork, C., Roth, A.: The algorithmic foundations of differential privacy. Found. Trends Theor. Comput. Sci. 9(3-4), 211–407 (2014)
•  Eldib, H., Wang, C., Taha, M.M.I., Schaumont, P.: Quantitative masking strength: Quantifying the power side-channel resistance of software code. IEEE Trans. on CAD of Integrated Circuits and Systems 34(10), 1558–1568 (2015). https://doi.org/10.1109/TCAD.2015.2424951, https://doi.org/10.1109/TCAD.2015.2424951
•  Fu, H.: Computing game metrics on markov decision processes. In: Czumaj, A., Mehlhorn, K., Pitts, A.M., Wattenhofer, R. (eds.) Automata, Languages, and Programming - 39th International Colloquium, ICALP 2012, Warwick, UK, July 9-13, 2012, Proceedings, Part II. Lecture Notes in Computer Science, vol. 7392, pp. 227–238. Springer (2012). https://doi.org/10.1007/978-3-642-31585-5_23, https://doi.org/10.1007/978-3-642-31585-5_23
•  Fu, H., Chatterjee, K.: Termination of nondeterministic probabilistic programs. In: Enea, C., Piskac, R. (eds.) Verification, Model Checking, and Abstract Interpretation - 20th International Conference, VMCAI 2019, Cascais, Portugal, January 13-15, 2019, Proceedings. Lecture Notes in Computer Science, vol. 11388, pp. 468–490. Springer (2019). https://doi.org/10.1007/978-3-030-11245-5_22, https://doi.org/10.1007/978-3-030-11245-5_22
•  Gaboardi, M., Haeberlen, A., Hsu, J., Narayan, A., Pierce, B.C.: Linear dependent types for differential privacy. In: The 40th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL ’13, Rome, Italy - January 23 - 25, 2013. pp. 357–370 (2013). https://doi.org/10.1145/2429069.2429113, https://doi.org/10.1145/2429069.2429113
•  Hardt, M., Recht, B., Singer, Y.: Train faster, generalize better: Stability of stochastic gradient descent. In: Proceedings of the 33nd International Conference on Machine Learning, ICML 2016, New York City, NY, USA, June 19-24, 2016. pp. 1225–1234 (2016), http://jmlr.org/proceedings/papers/v48/hardt16.html
•  Huang, M., Fu, H., Chatterjee, K.: New approaches for almost-sure termination of probabilistic programs. In: Ryu, S. (ed.) Programming Languages and Systems - 16th Asian Symposium, APLAS 2018, Wellington, New Zealand, December 2-6, 2018, Proceedings. Lecture Notes in Computer Science, vol. 11275, pp. 181–201. Springer (2018). https://doi.org/10.1007/978-3-030-02768-1_11, https://doi.org/10.1007/978-3-030-02768-1_11
•  Huang, Z., Wang, Z., Misailovic, S.: Psense: Automatic sensitivity analysis for probabilistic programs. In: Automated Technology for Verification and Analysis - 16th International Symposium, ATVA 2018, Los Angeles, CA, USA, October 7-10, 2018, Proceedings. pp. 387–403 (2018). https://doi.org/10.1007/978-3-030-01090-4_23, https://doi.org/10.1007/978-3-030-01090-4_23
•  Kaminski, B.L., Katoen, J., Matheja, C., Olmedo, F.: Weakest precondition reasoning for expected run-times of probabilistic programs. In: Programming Languages and Systems - 25th European Symposium on Programming, ESOP 2016, Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2016, Eindhoven, The Netherlands, April 2-8, 2016, Proceedings. pp. 364–389 (2016). https://doi.org/10.1007/978-3-662-49498-1_15, https://doi.org/10.1007/978-3-662-49498-1_15
•  Kozen, D.: A probabilistic PDL. J. Comput. Syst. Sci. 30(2), 162–178 (1985). https://doi.org/10.1016/0022-0000(85)90012-1, https://doi.org/10.1016/0022-0000(85)90012-1
•  McIver, A., Morgan, C., Kaminski, B.L., Katoen, J.P.: A new proof rule for almost-sure termination. Proceedings of the ACM on Programming Languages 2(POPL),  33 (2017)
•  Meyn, S., Tweedie, R.: Markov Chains and Stochastic Stability. Springer-Verlag, London (1993), available at: probability.ca/MT
•  Morgan, C., McIver, A., Seidel, K.: Probabilistic predicate transformers. ACM Trans. Program. Lang. Syst. 18(3), 325–353 (1996). https://doi.org/10.1145/229542.229547, https://doi.org/10.1145/229542.229547
•  Ngo, V.C., Carbonneaux, Q., Hoffmann, J.: Bounded expectations: resource analysis for probabilistic programs. In: Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2018, Philadelphia, PA, USA, June 18-22, 2018. pp. 496–512 (2018). https://doi.org/10.1145/3192366.3192394, https://doi.org/10.1145/3192366.3192394
•  Reed, J., Pierce, B.C.: Distance makes the types grow stronger: a calculus for differential privacy. In: Proceeding of the 15th ACM SIGPLAN international conference on Functional programming, ICFP 2010, Baltimore, Maryland, USA, September 27-29, 2010. pp. 157–168 (2010). https://doi.org/10.1145/1863543.1863568, https://doi.org/10.1145/1863543.1863568
•  Williams, D.: Probability with Martingales. Cambridge University Press (1991)
•  Winograd-Cort, D., Haeberlen, A., Roth, A., Pierce, B.C.: A framework for adaptive differential privacy. PACMPL 1(ICFP), 10:1–10:29 (2017). https://doi.org/10.1145/3110254, https://doi.org/10.1145/3110254

## Appendix 0.A The Detailed Syntax

The detailed syntax is in Figure 5.

## Appendix 0.B Proof for the Integral Expansion

###### Theorem 0.B.1 (Integral Expansion)

Let be a simple probabilistic while loop in the form (2) and be a program variable. For any initial program valuation such that , we have

 Eb(Zb)=∫∑ℓ∈Lpℓ⋅EF(ℓ,b,r)(ZF(ℓ,b,r))dr

where is the random variable for the value of starting from the initial program valuation and is the probability that the probabilistic branches follows the choices in .

###### Proof

The result follows from the following derivations:

 Eb(Zb)= (by the definition of expectation) = ∫(ℓ,r)∘ω′∈ΩZb((ℓ,r)∘ω′)Pb(d(ℓ,r)∘