Quantifying Dynamic Leakage: Complexity Analysis and Model Counting-based Calculation

03/09/2019
by   Bao Trung Chu, et al.
0

A program is non-interferent if it leaks no secret information to an observable output. However, non-interference is too strict in many practical cases and quantitative information flow (QIF) has been proposed and studied in depth. Originally, QIF is defined as the average of leakage amount of secret information over all executions of a program. However, a vulnerable program that has executions leaking the whole secret but has the small average leakage could be considered as secure. This counter-intuition raises a need for a new definition of information leakage of a particular run, i.e., dynamic leakage. As discussed in [5], entropy-based definitions do not work well for quantifying information leakage dynamically; Belief-based definition on the other hand is appropriate for deterministic programs, however, it is not appropriate for probabilistic ones. In this paper, we propose new simple notions of dynamic leakage based on entropy which are compatible with existing QIF definitions for deterministic programs, and yet reasonable for probabilistic programs in the sense of [5]. We also investigated the complexity of computing the proposed dynamic leakage for three classes of Boolean programs. We also implemented a tool for QIF calculation using model counting tools for Boolean formulae. Experimental results on popular benchmarks of QIF research show the flexibility of our framework. Finally, we discuss the improvement of performance and scalability of the proposed method as well as an extension to more general cases.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

05/11/2019

On the Compositionality of Dynamic Leakage and Its Application to the Quantification Problem

Quantitative information flow (QIF) is traditionally defined as the expe...
11/23/2018

Generalised Entropies and Metric-Invariant Optimal Countermeasures for Information Leakage under Symmetric Constraints

We introduce a novel generalization of entropy and conditional entropy f...
01/28/2019

Quantitative Verification of Masked Arithmetic Programs against Side-Channel Attacks

Power side-channel attacks, which can deduce secret data via statistical...
02/22/2018

What's the Over/Under? Probabilistic Bounds on Information Leakage

Quantitative information flow (QIF) is concerned with measuring how much...
02/04/2021

Cumulant Expansion of Mutual Information for Quantifying Leakage of a Protected Secret

The information leakage of a cryptographic implementation with a given d...
09/20/2019

Output-sensitive Information flow analysis

Constant-time programming is a countermeasure to prevent cache based att...
07/11/2019

Approximate Model Counting, Sparse XOR Constraints and Minimum Distance

The problem of counting the number of models of a given Boolean formula ...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Researchers have realized the importance of knowing where confidential information reaches by the execution of a program to verify whether the program is safe. The non-interference property, namely, any change of confidential input does not affect public output, was coined in 1982 by Goguen and Meseguer [14] as a criterion for the safety. This property, however, is too strict in many practical cases, such as password verification, voting protocol and averaging scores. A more elaborated notion called quantitative information flow (QIF) [23] has been getting much attention of the community. QIF is defined as the amount of information leakage from secret input to observable output. The program can be considered to be safe (resp. vulnerable) if this quantity is negligible (resp. large). QIF analysis is not easier than verifying non-interference property because if we can calculate QIF of a program, we can decide whether it satisfies non-interference or not. QIF calculation is normally approached in an information-theoretic fashion to consider a program as a communication channel with input as source, and output as destination. The quantification is based on entropy notions including Shannon entropy, min-entropy and guessing entropy [23]. QIF (or the information leakage) is defined as the remaining uncertainty about secret input after observing public output, i.e., the mutual information between source and destination of the channel. Another quantification proposed by Clarkson, et al. [11]

, is the difference between ‘distances’ (Kullback-Leibler divergence) from the probability distribution on secret input that an attacker believes in to the real distribution, before and after observing the output values.


While QIF is about the average amount of leaked information over all observable outputs, dynamic leakage is about the amount of information leaked by observing a particular output. Hence, QIF is aimed to verify the safety of a program in a static scenario in compile time, and dynamic leakage is aimed to verify the safety of a specific running of a program. So which of them should be used as a metric to evaluate a system depends on in what scenario the software is being considered.

Example 1

if then
else

In Example 1 above, assume to be a positive integer, then there are 16 possible values of , from 8 to 23. While an observable value between 9 and 23 reveals everything about the secret variable, i.e., there is only one possible value of to produce such , a value of 8 gives almost nothing, i.e., there are so many possible values of which produce 8 as output. Taking the average of leakages on all possible execution paths results in a relatively small value, which misleads us into regarding that the vulnerability of this program is small. Therefore, it is crucial to differentiate risky execution paths from safe ones by calculating dynamic leakage, i.e., the amount of information can be learned from observing the output which is produced by a specific execution path. But, as discussed in [5], any of existing QIF models (either entropy based or belief tracking based) does not always seem reasonable to quantify dynamic leakage. For example, entropy-based measures give sometimes negative leakage. Usually, we consider that the larger the value of the measure is, the more information is leaked, and in particular, no information is leaked when the value is 0. In the interpretation, it is not clear how we should interpret a negative value as a leakage metric. Actually, [5] claims that the non-negativeness is a requirement for a measure of dynamic QIF. Also, MONO, one of the axioms for QIF in [2] turns out to be identical to this non-negative requirement. Belief-based one always give non-negative leakage for deterministic programs but it may become negative for probabilistic programs. In addition, the measure using belief model depends on secret values. This would imply (1) even if a same output value is observed, the QIF may become different depending on which value is assumed to be secret, which is unnatural, and (2) a side-channel may exist when further processing is added by system managers after getting quantification result. Hence, as suggested in [5], it is better to introduce a new notion for quantifying dynamic leakage caused by observing a specific output value.
The contributions of this paper are three-fold.

  • We present our criteria for an appropriate definition of dynamic leakage and propose two notions that satisfy those criteria. We propose two notions because there is a trade-off between the easiness of calculation and the preciseness (see Section 2).

  • Complexity of computing the proposed dynamic leakages is analyzed for three classes of Boolean programs.

  • By applying model counting of logical formulae, a prototype was implemented and feasibility of computing those leakages is discussed based on experimental results.

According to [5]

, we arrange three criteria that a ‘good’ definition of dynamic leakage should satisfy, namely, the measure should be (R1) non-negative, (R2) independent of a secret value to prevent a side channel and (R3) compatible with existing notions to keep the consistency within QIF as a whole (both dynamic leakage and normal QIF). Based on those criteria, we come up with two notions of dynamic leakage QIF1 and QIF2, where both of them satisfy all (R1), (R2) and (R3). QIF1, motivated by entropy-based approach, takes the difference between the initial and remaining self-information of the secret before and after observing output as dynamic leakage. On the other hand, QIF2 models that of the joint probability between secret and output. Because both of them are useful in different scenarios, we studied these two models in parallel in the theoretical part of the paper. We call the problems of computing QIF1 and QIF2 for Boolean programs CompQIF1 and CompQIF2, respectively. For example, we show that even for deterministic loop-free programs with uniformly distributed input, both CompQIF1 and CompQIF2 are

-hard. Next, we assume that secret inputs of a program are uniformly distributed and consider the following method of computing QIF1 and QIF2 (only for deterministic programs for QIF2 by the technical reason mentioned in Section 4): (1) translate a program into a Boolean formula that represents relationship among values of variables during a program execution, (2) augment additional constraints that assign observed output values to the corresponding variables in the formula, (3) count models of the augmented Boolean formula projected on secret variables, and (4) calculate the necessary probability and dynamic leakage using the counting result. Based on this method, we conducted experiments using our prototype tool with benchmarks taken from QIF related literatures, in which programs are deterministic, to examine the feasibility of automatic calculation. We also give discussion, in subsection 5.3, on difficulties and possibilities to deal with more general cases, such as, of probabilistic programs. In step (3), we can flexibly use any off-the-shelf model counter. To investigate the scalability of this method, we used four state-of-the-art counters, SharpCDCL [15] and GPMC [24, 32] for SAT-based counting, an improved version of aZ3 [22] for SMT-based counting, and DSharp-p [20, 30] for SAT-based counting in d-DNNF fashion. Finally, we discuss the feasibility of automatic calculation of the leakage in general case.

Related work The very early work on computational complexity of QIF is that of Yasuoka and Terauchi. They proved that even the problem of comparing QIF of two programs, which is obviously not more difficult than calculating QIF, is not a -safety property for any [27]. Consequently, self-composition, a successful technique to verify non-interference property, is not applicable to the comparison problem. Their subsequent work [28] proves a similar result for bounding QIF, as well as the -hardness of precisely quantifying QIF in all entropy-based definitions for loop-free Boolean programs. Chadha and Ummels [9] show that the QIF bounding problem of recursive programs is not harder than checking reachability for those programs. Despite given those evidences about the hardness of calculating QIF, for this decade, precise QIF analysis gathers much attention of the researchers. In [15], Klebanov et al. reduce QIF calculation to SAT problem projected on a specific set of variables as a very first attempt to tackle with automating QIF calculation. On the other hand, Phan et al. reduce QIF calculation to SMT problem for utilizing existing SMT (satisfiability modulo theory) solver. Recently, Val et al. [25] reported a method that can scale to programs of 10,000 lines of code but still based on SAT solver and symbolic execution. However, there is still a gap between such improvements and practical use, and researchers also work on approximating QIF. Köpf and Rybalchenko [16] propose approximated QIF computation by sandwiching the precise QIF by lower and upper bounds using randomization and abstraction, respectively with a provable confidence. LeakWatch of Chothia et al. [10], also give approximation with provable confidence by executing a program multiple times. Its descendant called HyLeak [7]

combines the randomization strategy of its ancestor with precise analysis. Also using randomization but in Markov Chain Monte Carlo (MCMC) manner, Biondi et al.

[6] utilize ApproxMC2, an existing model counter created by some of the co-authors. ApproxMC2 provides approximation on the number of models of a Boolean formula in CNF with adjustable precision and confidence. ApproxMC2 uses hashing technique to divide the solution space into smaller buckets with almost equal number of elements, then count the models for only one bucket and multiply it by the number of buckets. As for dynamic leakage, McCamant et al. [17] consider QIF as network flow through programs and propose a dynamic analysis method that can work with executable files. Though this model can scale to very large programs, its precision is relatively not high. Alvim et al. [2] give some axioms for a reasonable definition of QIF to satisfy and discuss whether some definitions of QIF satisfy the axioms. Note that these axioms are for static QIF measures, which differ from dynamic leakage. However, given a similarity between static and dynamic notions, we investigated how our new dynamic notions fit in the lens of the axioms (refer to Section 2).

Dynamic information flow analysis (or taint analysis) is a bit confusing term that does not mean an analysis of dynamic leakage, but a runtime analysis of information flow. Dynamic analysis can abort a program as soon as an unsafe information flow is detected. Also, hybrid analysis has been proposed for improving dynamic analysis that may abort a program too early or unnecessarily. In hybrid analysis, the unexecuted branches of a program is statically analyzed in parallel with the executed branch. Among them, Bielova et al. [4] define the knowledge of a program variable as the information on secret that can be inferred from (technically, is the same of the pre-image of an observed value of

, defined in Section 2). In words, hybrid analysis updates the ‘dynamic leakage’ under the assumption that the program may terminate at each moment. Our method is close to

[4] in the sense that the knowledge is computed. The difference is that we conduct the analysis after the a program is terminated and is given. We think this is not a disadvantage compared with hybrid analysis because the amount of dynamic leakage of a program is not determined until a program terminates in general.

Structure of the remaining parts: Section 2 is dedicated to introduce new notions, i.e., QIF1 and QIF2, of dynamic leakage and some properties of them. The computational complexity of CompQIF1 and CompQIF2 is discussed in Section 3. Section 4 gives details on calculating dynamic leakage based on model counting. Experimental results and discussion are provided in Section 5 and the paper is concluded in Section 6.

2 New Notions for Dynamic Leakage

The standard notion for static quantitative information flow (QIF) is defined as the mutual information between random variables

for secret input and for observable output:

(1)

where is the entropy of and is the expected value of , which is the conditional entropy of when observing an output . Shannon entropy and min-entropy are often used as the definition of entropy, and in either case, always holds by the definition.

In [5], the author discusses the appropriateness of the existing measures for dynamic QIF and points out their drawbacks, especially, each of these measures may become negative. Hereafter, let and denote the finite sets of input values and output values, respectively. Since , [5] assumes the following measure obtained by replacing with in (1) for dynamic QIF:

(2)

However, may become negative even if a program is deterministic (see Example 2). Another definition of dynamic QIF is proposed in [11] as

(3)

where is KL-divergence defined as , and if and otherwise. Intuitively, represents how closer the belief of an attacker approaches to the secret by observing . For deterministic programs, [5]. However, may still become negative if a program is probabilistic (see Example 3).

Let be a program with secret input variable and observable output variable . For notational convenience, we identify the names of program variables with the corresponding random variables. Throughout the paper, we assume that a program always terminates. The syntax and semantics of programs assumed in this paper will be given in the next section. For and , let , , , , denote the joint probability of and , the conditional probability of given (the likelihood), the conditional probability of given

(the posterior probability), the marginal probability of

(the prior probability) and the marginal probability of

, respectively. We often omit the subscripts as and if they are clear from the context. By definition,

(4)
(5)
(6)

We assume that (the source code of) and the prior probability () are known to an attacker. For , let , which is called the pre-image of (by the program ).

Considering the discussions in the literature, we aim to define new notions for dynamic QIF that satisfy the following requirements:

  1. Dynamic QIF should be always non-negative because an attacker obtains some information (although sometimes very small or even zero) when he observes an output of the program.

  2. It is desirable that dynamic QIF is independent of a secret input

    . Otherwise, the controller of the system may change the behavior for protection based on the estimated amount of the leakage that depends on

    , which may be a side channel for an attacker.

  3. The new notion should be compatible with the existing notions when we restrict ourselves to special cases such as deterministic programs, uniformly distributed inputs, and taking the expected value.

The first proposed notion is the self-information of the secret inputs consistent with an observed output . Equivalently, the attacker can narrow down the possible secret inputs after observing to the pre-image of by the program. We consider the self-information of after the observation as the probability of divided by the sum of the probabilities of the inputs in the pre-image of (see the upper part of Fig. 1).

(7)

The second notion is the self-information of the joint events and an observed output (see the lower part of Fig. 1). This is equal to the the self-information of .

(8)
(9)

Both notions are defined by considering how much possible secret input values are reduced by observing an output. We propose two notions because there is a trade-off between the easiness of calculation and the appropriateness. As illustrated in Example 3, QIF2 can represent the dynamic leakage more appropriately than QIF1 in some cases. On the other hand, the calculation of QIF1 is easier than QIF2 as discussed in Section 4. Both notions are independent of the secret input (Requirement (R2)).

(10)

If we assume Shannon entropy,

QIF (12)

If a program is deterministic, for each , there is exactly one such that and for , and therefore

QIF (13)

Comparing (9) and (13), we see that QIF is the expected value of QIF2, which suggests the compatibility of QIF2 with QIF (Requirement (R3)) when a program is deterministic. Also, if a program is deterministic, , which coincides with (Requirement (R3)). By (10), Requirement (R1) is satisfied. Also in (10), holds for every if and only if the program is deterministic.

Theorem 2.1

If a program is deterministic, for every and ,

If input values are uniformly distributed, for every . ∎

Figure 1: QIF1 (the upper) and QIF2 (the lower)

Let us get back to the Example 1 in the previous section to see how new notions convey the intuitive meaning of dynamic leakage. We assume: both and are 8-bit numbers of which values are in , is uniformly distributed over this range. Then, because the program in this example is deterministic, as mentioned above QIF1 coincides with QIF2. We have while for every between 9 and 23. This result addresses well the problem of failing to differentiate vulnerable output from safe ones of QIF.

Example 2

Consider the following program taken from Example 1 of [5]:

if then else

Assume that the probabilities of inputs are , and . Then, we have the following output and posterior probabilities:



If we use Shannon entropy, , and . Thus, , which is negative as pointed out in [5]. Also, and . reflects the fact that the difference of the posterior and the prior of each input when observing is larger (, ) than observing (, ).

Since the program is deterministic, .

Example 3

The next program is quoted from Example 2 of [5] where means that the program chooses with probability and with probability .

if then
else

Assume that the probabilities of inputs are and . and the posterior probabilities are calculated by (4) as:


Let us use Shannon entropy for . As , . As already discussed in [5], though an attacker may think that is more probable by observing . For each , takes different values (one of them is negative) depending on whether or is the secret input. and . because the set of possible input values does not shrink whichever or is observed. Similarly to Example 2, reflects the fact that the probability of each input when observing varies more largely (, ) than when observing (, ). In this example, the number of input values is just two, but in general, is larger and we can expect is much smaller than and QIF1 serves a better measure for dynamic QIF.

A program is non-interferent if for every such that and for every , . Assume a program is non-interferent. By (4), for every () and , then QIF = 0 by (12). If is deterministic in addition, for () and . That is, if a program is deterministic and non-interferent, it has exactly one possible output value.

Relationship to the hybrid monitor Let us see how our notions relate to the knowledge tracking hybrid monitor proposed by Bielova et al. [4].

Example 4

Consider the following program taken from Program 5 of [4]:

if then +
else - ;
output

where is a secret input, and are public inputs and is a public output.

In [4], the knowledge about secret input carried by public output is if where is an initial environment (an assignment of values to , and ) and is the evaluation of in . If , and , then . In [4], to verify whether this value of reveals any information about in this setting of public inputs (i.e., , ), they take ifif. Because if for every , [4] concluded that in that setting leaks no information.
On the other hand, with that settings of and , given as the observed output, can be either or . For the program is deterministic, , which is consistent with that of [4] though the approach looks different. Actually, the function encodes all information revealed from a value of about secret input. By applying for a specific value of , we get the pre-image of . In other words, is exactly what we are getting toward quantifying our notions of dynamic leakage. The monitor proposed in [4] tracks the knowledge about secret input carried by all variables along an execution of a program according to the inlined operational semantics. It seems, however, impractical to store all the knowledge during an execution, and furthermore, it would take time to compute the inverse of the knowledge when an observed output is fixed.

The three requirements (R1), (R2) and (R3) we presented summarize the intuitions about dynamic leakage following the spirit of [5]. However, those requirements lack of a firm back-up theory, whilst in [2] Alvim et al. provide a set of axioms for QIF. Despite there is difference between QIF and dynamic leakage, we investigated how well our notions fit in the lens of those axioms to confirm their feasibility to be used as a metric. For the limitations of space, we will skip detailed explanation for the quite trivial results.
(1) For prior vulnerability, both QIF1 and QIF2 satisfy CONTINUITY, CONVEXITY (also the loosen version Q-CONVEXITY).
(2) For posterior vulnerability, exactly speaking, we cannot construct the channel matrix , because dynamic leakage is about one specific output value but the matrix is for all possibilities. Hence, conceptually, those axioms are not applicable in this context. However, by capturing the intuitive interpretation of the axioms, we made a small modification (i.e., to use in stead of as the type of posterior vulnerability) to investigate the new notions under the spirit of axioms. So, by definitions above and the meaning of posterior vulnerability in terms of [2], we have and are respectively formulae for in the context of QIF1 and QIF2. Given this modification, we found that QIF1 satisfies all the three axioms: NI (Non-Interference), MONO (Monotonicity) and DPI (Data Processing Inequality) whilst QIF2 satisfies only the first two axioms but the last one, DPI. In fact, QIF2 still aligns well to DPI in cases for deterministic programs, and only misses for probabilistic ones. Please recall that in deterministic cases, by Theorem 2.1. Hence, for deterministic programs, QIF2 satisfies DPI because QIF1 does. For it is quite trivial and the space is limited, we will omit the proof of those satisfaction. In stead, we will give a counterexample to show that QIF2 does not satisfy DPI when programs are probabilistic. Let and in which is a post-process of . Also assume the following probabilities: and , in which annotate events that the corresponding variables have those values. Given these settings, we have , and . In other words, , which is against to DPI.
It turns out that our proposed notions either satisfy straightly or through some adaptive transformation, i.e., the type of posterior vulnerability, for all the axioms except DPI. For DPI, we came to the conclusion that it is not suitable as a criterion to verify if a dynamic leakage notion is reasonable. It is because dynamic leakage is about a specific execution path, in which the inequality of DPI does no longer make sense, rather than the average on all possible execution paths. Therefore, it is not counter-intuitive that QIF2 does not satisfy DPI for probabilistic programs while QIF2 for deterministic programs and QIF1 satisfy DPI.

3 Complexity Results

3.1 Program model

Let be the set of truth values, be the set of natural numbers and . Also let denote the set of rational numbers. We assume probabilistic Boolean programs where every variable stores a truth value and the syntactical constructs are assignment to a variable, conditional, probabilistic choice, while loop, procedure call and sequential composition:

where stands for a (Boolean) variable, is a constant rational number such that . In the above BNFs, objects derived from the syntactical categories and are called expressions and commands, respectively.

A procedure has the following syntax:

where are sequences of input, output and local variables, respectively (which are disjoint from one another). Let . We will use the same notation and for an expression and a command . A program is a tuple of procedures where is the main procedure. is also written as to emphasize the input and output variables and of

A command assigns the value of Boolean expression to variable . A command means that the program chooses with probability and with probability . Note that this is the only probabilistic command. A command is a recursive procedure call to with actual input parameters and return variables . The semantics of the other constructs are defined in the usual way.

The size of is the sum of the number of commands and the maximum number of variables in a procedure of .

If a program does not have a recursive procedure call and , it is called a (non-recursive) while program. If a while program does not have a while loop, it is called a loop-free program (or straight-line program). If a program does not have a probabilistic choice, it is deterministic.

3.2 Assumption and overview

We define the problems CompQIF1 and CompQIF2 as follows.

Inputs: a probabilistic Boolean program ,
 an observed output value , and
 a natural number (in unary) specifying the error bound.
Problem: Compute (resp. for and .

(General assumption)

  1. The answer to the problem CompQIF1 (resp. CompQIF2) should be given as a rational number (two integer values representing the numerator and denominator) representing the probability (resp. ).

  2. If a program is deterministic or non-recursive, the answer should be exact. Otherwise, the answer should be within bits of precision, i.e., .

If we assume (A1), we only need to perform additions and multiplications the number of times determined by an analysis of a given program, avoiding the computational difficulty of calculating the exact logarithm. The reason for assuming (A2) is that the exact reachability probability of a recursive program is not always a rational number even if all the transition probabilities are rational (Theorem 3.2 of [13]).

When we discuss lower-bounds, we consider the corresponding decision problem by adding a candidate answer of the original problem as a part of an input. The results on the complexity of CompQIF1 and CompQIF2 are summarized in Table 1. As mentioned above, if a program is deterministic, .

programs deterministic probabilistic
CompQIF1 CompQIF2
loop-free PSPACE PSPACE PSPACE
-hard (Theorem 3.1) (Theorem 3.1)
(Proposition 1) -hard -hard
while PSPACE-comp PSPACE-comp EXPTIME
(Proposition 2) (Theorem 3.2) (Theorem 3.3)
PSPACE-hard
recursive EXPTIME-comp EXPSPACE EXPSPACE
(Proposition 3) (Theorem 3.4) (Theorem 3.4)
EXPTIME-hard EXPTIME-hard
Table 1: Complexity results

Recursive Markov chain (abbreviated as RMC) is defined in [13] by assigning a probability to each transition in recursive state machine (abbreviated as RSM) [1]. Probabilistic recursive program in this paper is similar to RMC except that there is no program variable in RMC. If we translate a recursive program into an RMC, the number of states of the RMC may become exponential to the number of Boolean variables in the recursive program. In the same sense, deterministic recursive program corresponds to RSM, or equivalently, pushdown systems (PDS) as mentioned and used in [9]. Also, probabilistic while program corresponds to Markov chain. We will review the definition of RMC in Section 3.6.1.

3.3 Deterministic case

We first show lower bounds for deterministic loop-free, while and recursive programs. For deterministic recursive programs, we give EXPTIME upper bound as a corollary of Theorem 3.4.

Proposition 1

is -hard for deterministic loop-free programs even if the input values are uniformly distributed.

(Proof) We show that SAT can be reduced to CompQIF1 where the input values are uniformly distributed. It is necessary and sufficient for CompQIF1 to compute the number of inputs such that because . For a given propositional logic formula with Boolean variables , we just construct a program with input variables and an output variable such that the value of for is stored to . Then, the result of CompQIF1 with and coincides with the number of models of . ∎

Proposition 2

is PSPACE-hard for deterministic while programs.

(Proof) The proposition can be shown in the same way as the proof of PSPACE-hardness of the non-interference problem for deterministic while programs by a reduction from quantified Boolean formula (QBF) validity problem given in [9] as follows. For a given QBF , we construct a deterministic while program having one output variable such that is non-interferent if and only if is valid as in the proof of Proposition 19 of [9]. The deterministic program is non-interferent if and only if the output of the program is always , i.e., . Thus, we can decide if is valid by checking whether or not for the deterministic program, the output value , and the probability 1. ∎

Proposition 3

is EXPTIME-complete for deterministic recursive programs.

(Proof) EXPTIME upper bound can be shown by translating a given program to a pushdown system (PDS). Assume we are given a deterministic recursive program and an output value . We apply to the translation to a recursive Markov chain (RMC) in the proof of Theorem 3.4. The size of is exponential to the size of . Because is deterministic, is also deterministic; is just a recursive state machine (RSM) or equivalently, a PDS. It is well-known [8] that the pre-image of a configuration of a PDS can be computed in polynomial time by so-called P-automaton construction. Hence, by specifying configurations outputting as , we can compute in exponential time.

The lower bound can be shown in the same way as the EXPTIME-hardness proof of the non-interference problem for deterministic recursive programs by a reduction from the membership problem for polynomial space-bounded alternating Turing machines (ATM) given in the proof of Theorem 7 of

[9]. From a given polynomial space-bounded ATM and an input word to , we construct a deterministic recursive program having one output variable such that is non-interferent if and only if accepts as in [9]. As in the proof of Proposition 2, we can reduce to CompQIF1 instead of reducing to the non-interference problem. ∎

3.4 Loop-free programs

We show upper bounds for loop-free programs. For CompQIF2, the basic idea is similar to the one in [9], but we have to compute the conditional probability . For CompQIF1, upper bound can be obtained by a similar result on model counting if the input values are uniformly distributed.

Theorem 3.1

CompQIF1 and CompQIF2 are solvable in PSPACE for probabilistic loop-free programs. CompQIF1 is solvable in if the input values are uniformly distributed.

(Proof) We first show that CompQIF2 is solvable in PSPACE for probabilistic loop-free programs. If a program is loop-free, we can compute for every in the same way as in [9], multiply it by and sum up in PSPACE. Note that in [9], it is assumed that a program is deterministic and input values are uniformly distributed, and hence it suffices to count the input values such that , which can be done in . In contrast, we have to compute the sum of the probabilities of for all . We can easily see that CompQIF1 is solvable in PSPACE for probabilistic loop-free programs in almost the same way as CompQIF2. Instead of summing up for all , we just have to sum up for all such that (if and only if ).

Next, we show that CompQIF1 is solvable in if the input values are uniformly distributed. As stated in the proof of Proposition 1, in this case, CompQIF1 can be solved by computing the number of inputs such that . Deciding for a given probabilistic loop-free program can be reduced to the satisfiability problem of a propositional logic formula. Note that for any probabilistic choice like with , we just have to treat it as a non-deterministic choice like because all we need to know is whether . We construct from a formula with Boolean variable corresponding to input and output variables of and intermediate variables. Here, we abuse the symbols and , which are used for the variables of , also as the Boolean variables corresponding to them, respectively. The formula is constructed such that is satisfiable if and only if ¿ 0 for and . Thus, the number of inputs such that is the number of truth assignments for such that is satisfiable, i.e., the number of projected models on . This counting can be done in because projected model counting is in  [3]. ∎

3.5 While programs

We show upper bounds for while programs. For CompQIF1, we reduce the problem to the reachability problem of a graph representing the state reachability relation. An upper bound for CompQIF2 will be obtained as a corollary of Theorem 3.4.

Theorem 3.2

CompQIF1 is PSPACE-complete for probabilistic while programs.

(Proof) It suffices to show that QIF1 is solvable in PSPACE for probabilistic while programs. QIF1 for probabilistic while programs is reduced to the reachability problem of graphs that represents the reachability among states of . We construct a directed graph from a given program as follows. Each node on uniquely corresponds to a location on and an assignment for all variables in . An edge from to represents that if the program is running at with then, with probability greater than , it can transit to with by executing the command at . Deciding the reachability from a node to another node can be done in nondeterministic space of the size of the graph. The size of the graph is exponential to the size of due to exponentially many assignments for variables. We see that if and only if there are two nodes and such that is the initial location, is an end location, , , and is reachable from in . Thus, can be decided in PSPACE, and also can be computed in PSPACE. ∎

Theorem 3.3

CompQIF2 is solvable in EXPTIME for probabilistic while programs.

(We postpone the proof until we show the result on recursive programs.) ∎

3.6 Recursive programs

As noticed in the end of Section 3.1, we will use recursive Markov chain (RMC) to give upper bounds of the complexity of CompQIF1 and CompQIF2 for recursive programs because RMC has both probability and recursion and the complexity of the reachability probability problem for RMC was already investigated in [13].

3.6.1 Recursive Markov chains

A recursive Markov chain (RMC) [13] is a tuple where each () is a component graph (or simply, component) consisting of:

  • a finite set of nodes,

  • a set of entry nodes, and a set of exit nodes,

  • a set of boxes, and a mapping from boxes to (the indices of) components. To each box , a set of call sites and a set of return sites are associated.

  • is a finite set of transitions of the form where

    • the source is either a non-exit node or a return site.

    • the destination is either a non-entry node or a call site.

    • is a rational number between 0 and 1 representing the transition probability from to . We require for each source , . We write instead of for readability. Also we abbreviate as .

Intuitively, a box with denotes an invocation of component from component . There may be more than one entry node and exit node in a component. A call site specifies the entry node from which the execution starts when called from the box . A return site has a similar role to specify the exit node.

Let , which is called the set of locations of . We also let , , where , and .

The probability of a transition is a rational number represented by a pair of non-negative integers, the numerator and denominator. The size of is the sum of the numbers of bits of these two integers, which is called the bit complexity of .

The semantics of an RMC is given by the global (infinite state) Markov chain induced from where is the set of global states and is the smallest set of transitions satisfying the following conditions:

  1. For every , where is the empty string.

  2. If and , then and .

  3. If with , then and .

  4. If with , then and .

Intuitively, is the global state where is a current location and is a pushdown stack, which is a sequence of box names where the right-end is the stack top. (2) defines a transition within a component. (3) defines a procedure call from a call site ; the box name is pushed to the current stack and the location is changed to . (4) defines a return from a procedure; the box name at the stack top is popped and the location becomes the return site . For a location and an exit node in the same component , let denote the probability of reaching starting from 111Though we usually want to know for an entry node , the reachability probability is defined in a slightly more general way.. Also, let . The reachability probability problem for RMCs is the one to compute within bits of precision for a given RMC , a location and an exit node in the same component of and a natural number in unary.
The following property is shown in [13].

Proposition 4

The reachability probability problem for RMCs can be solved in PSPACE. Actually, can be computed for every pair of and simultaneously in PSPACE by calculating the least fixpoint of the nonlinear polynomial equations induced from a given RMC. ∎

3.6.2 Results

Theorem 3.4

CompQIF1 and CompQIF2 are solvable in EXPSPACE for probabilistic recursive programs.

(Proof) We will prove the theorem by translating a given program into a recursive Markov chain (RMC) whose size is exponential to the size of . By Proposition 4, we obtain EXPSPACE upper bound. Because an RMC has no program variable, we expand Boolean variables in to all (reachable) truth-value assignments to them. A while command is translated into two transitions; one for exit and the other for while-body. A procedure call is translated into a box and transitions connecting to/from the box. For the other commands, the translation is straightforward.

Let be a given program. For , let be the set of truth value assignments to . We will use the same notation and for an expression and a command . For an expression and an assignment , we write to denote the truth value obtained by evaluating under the assignment . For an assignment and a truth value , let denote the assignment identical to except . We use the same notation for sequences of variables and truth values as .

We construct the RMC from where each component graph () is constructed from as follows.

  • .

  • .

  • , , and are constructed as follows.

  1. , , the function undefined everywhere, where is the restriction of to . Note that .

  2. Repeat the following construction until all the elements in are marked:
    Choose an unmarked from , mark it and do one of the followings according to the syntax of .

    1. . Add to and add to .

    2. . Add to and add to if . Add to and add to if .

    3. . Add and to . Add and to .