 # Statistical Epistemic Logic

We introduce a modal logic for describing statistical knowledge, which we call statistical epistemic logic. We propose a Kripke model dealing with probability distributions and stochastic assignments, and show a stochastic semantics for the logic. To our knowledge, this is the first semantics for modal logic that can express the statistical knowledge dependent on non-deterministic inputs and the statistical significance of observed results. By using statistical epistemic logic, we express a notion of statistical secrecy with a confidence level. We also show that this logic is useful to formalize statistical hypothesis testing and differential privacy in a simple and abstract manner.

## Authors

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

Knowledge representation and reasoning have been studied in two research areas: logic and statistics

. Broadly speaking, logic describes our knowledge using formal languages and reasons about it using symbolic techniques, while statistics interprets collected data having random variation and infers properties of their underlying probability models. As research advances demonstrate, logical and statistical approaches are respectively successful in many applications, including artificial intelligence, software engineering, and information security.

The techniques of these two approaches are basically orthogonal and could be integrated to get the best of both worlds. For example, in a large system with artificial intelligence (e.g., an autonomous car), both rule-based knowledge and statistical machine learning models may be used, and the way of combining them would be crucial to the performance and security of the whole system. However, even in theoretical research on knowledge models, there still remains much to be done to integrate techniques from the two approaches. For a very basic example,

epistemic logic , a formal logic for representing and reasoning about knowledge, has not yet been able to model “statistical knowledge” with sampling and statistical significance, although a lot of epistemic models [14, 20, 21] have been proposed so far.

One of the important challenges in integrating logical and statistical knowledge is to design a logical model for statistical knowledge, which can be updated by a limited number of sampling of probabilistic events and by the non-deterministic inputs from an external environment. Here we note that non-deterministic inputs are essential to model the security of the system, because we usually do not have a prior knowledge of the probability distribution of adversarial inputs and need to reason about the worst scenarios caused by the attack. Nevertheless, to the best of our knowledge, no previous work on epistemic logic has proposed an abstract model for the statistical knowledge that involves non-deterministic inputs and the statistical significance of observed results.

In the present paper, we propose an epistemic logic for describing statistical knowledge. To define its semantics, we introduce a variant of a Kripke model  in which each possible world is defined as a probability distribution of states and each variable is probabilistically assigned a value. In this model, the stochastic behaviour of a system is modeled as a distribution of states at each world, and each non-deterministic input to the system corresponds to a distinct possible world. As for applications of this model, we define an accessibility relation as a statistical distance between distributions of observations, and show that our logic is useful to formalize statistical hypothesis testing and differential privacy  of statistical data.

#### Our contributions.

The main contributions of this work are as follows:

• We introduce a modal logic, called statistical epistemic logic (StatEL), to describe statistical knowledge.

• We propose a Kripke model incorporating probability distributions and stochastic assignments by regarding each possible world as a distribution of states and by defining an accessibility relation using a metric/divergence between distributions.

• We introduce a stochastic semantics for StatEL based on the above models. As far as we know, this is the first semantics for modal logic that can express the statistical knowledge dependent on non-deterministic inputs and the statistical significance of observed results.

• We present basic properties of the probability quantification and epistemic modality in StatEL. In particular, we show that the transitivity and Euclidean axioms rely on the agent’s capability of observation in our model.

• By using StatEL we introduce a notion of statistical secrecy with a significance level . We also show that StatEL is useful to formalize statistical hypothesis testing and differential privacy in a simple and abstract manner.

#### Paper organization.

The rest of this paper is organized as follows. Section 2 introduces background and notations used in this paper. Section 3 presents an example of coin flipping to explain the motivation for a logic of statistical knowledge. Section 4 shows the syntax and semantics of the statistical epistemic logic StatEL. Section 5 presents basic properties of the logic. As for applications, Sections 6 and 7 respectively model statistical hypothesis testing and statistical data privacy using StatEL. Section 8 presents related work and Section 9 concludes.

## 2 Preliminaries

In this section we recall the definitions of divergence and metrics, which are used in later sections to quantitatively model an agent’s capability of distinguishing possible worlds.

### 2.1 Notations

Let be the set of non-negative real numbers, and . We denote by the set of all probability distributions over a set . For a finite set and a distribution , the probability of sampling a value from is denoted by . For a subset , let . The support of a distribution over a finite set is . For a set , a randomized algorithm and a set we denote by the probability that given input , outputs one of the elements of .

### 2.2 Metric and Divergence

A metric over a non-empty set is a function such that for all , (i) ; (ii) iff ; (iii) ; (iv) . Recall that (iii) and (iv) are respectively referred to as symmetry and subadditivity.

A divergence over a non-empty set is a function such that for all , (i) and (ii) iff . Note that a divergence may not be symmetric or subadditive.

To describe a statistical hypothesis testing in Section 6, we recall the definition of divergence due to Pearson  as follows:

###### Definition 1 (Pearson’s χ2 divergence)

Given two distributions over a finite set ,  the -divergence of from is defined by:

statistics is the multiplication of -divergence with a sample size .

To introduce a notion of statistical data privacy in Section 7, we recall the definition of the max-divergence as follows.

###### Definition 2 (Max divergence)

For two distributions over a finite set ,  the max divergence of from is defined by:

 D∞(μ∥μ′)=maxR⊆supp(μ)lnμ[R]μ′[R].

Note that neither nor is symmetric.

## 3 Motivating Example

In this section we present a motivating example to explain why we need to introduce a new model for epistemic logic to describe statistical knowledge.

###### Example 1 (Coin flipping)

Let us consider a simple running example of flipping a coin in two possible worlds and respectively. We assume that in the world the coin is fair (represented by ), whereas in the probability of getting a heads is (represented by ). Here we do not have any prior belief on the probabilities of the worlds and . This does not mean , but means we have no idea on the values of and at all, i.e., either or is chosen non-deterministically.

When we flip a coin just once and observe its outcome (heads or tails), we do not know whether the coin is fair or biased, that is, we cannot tell whether we are located in the world or .

As shown in Fig. 1, however, when we increase the number of coin flips, we can more clearly see the difference between the numbers of getting heads in and in . If the fraction of observing heads goes to (resp. ), then we learn we are located in the world (resp. ) with a stronger confidence, namely, we have a stronger belief that the coin is fair (resp. biased). This implies that a larger number of observing the outcome enables us to distinguish two possible worlds more clearly, hence to obtain a stronger belief.

To model such statistical beliefs, we regard each possible world as a probability distribution of two states and as shown in Fig. 2 (e.g., and ). Then for a divergence between two distributions, we define an accessibility relation between worlds such that for any worlds and ,  iff . Then for a smaller threshold represents that a larger number of sampling is required to distinguish from .

This relation is used to formalize statistical knowledge in a model of epistemic logic in Section 4. Intuitively, given a threshold determining a confidence level, we say that we know a proposition in a world if is satisfied in all possible worlds that are indistinguishable from in terms of . In Section 6 we will revisit the coin flipping example to see how we formalize it using our logic.

To our knowledge, no previous work on epistemic logic has modeled a statistical knowledge that depends on the agent’s capability of observing events. In fact, in most of the Kripke models used in previous work, a possible world represents a single state instead of a probability distribution of states, hence the relation between possible worlds does not involve the probability of distinguishing them. Therefore, no prior work on epistemic logic has proposed an abstract model for the statistical knowledge that involves the sample size of observing random variables and the statistical significance of the observed results.

## 4 Statistical Epistemic Logic (StatEL)

In this section we introduce the syntax and semantics of the statistical epistemic logic (StatEL).

### 4.1 Syntax

We first present the syntax of the statistical epistemic logic as follows. To express both deterministic and probabilistic properties, we introduce two levels of formulas: static formulas and epistemic formulas. Intuitively, a static formula represents a proposition that can be satisfied at a state with probability , while an epistemic formula represents a proposition that can be satisfied at a probability distribution of states with some probability.

Formally, let be a set of symbols called measurement variables, and be a set of atomic formulas of the form for a predicate symbol and (). Let be a finite union of intervals, and be a finite set of indices (typically associated with the names of agents and/or statistical tests). Then the static and epistemic formulas are defined by:

• Static formulas:

• Epistemic formulas:

where . Let be the set of all epistemic formulas. Note that we have no quantifiers over measurement variables. (See Section 4.5.)

The probability quantification represents that a static formula is satisfied with a probability belonging to a set . For instance, represents that holds with a probability greater than . The non-classical implication is used to represent conditional probabilities. For example, by we represent that the conditional probability of given is included in a set . The epistemic knowledge expresses that an agent knows . The formal meaning of these operators will be shown in the definition of semantics.

As syntax sugar, we use disjunction , classical implication , and epistemic possibility operator , defined by: , , and . When is a singleton , we abbreviate as .

### 4.2 Modeling of Systems

In this work we deal with a simple stochastic system with measurement variables. Let be the finite set of all data that can be assigned to the measurement variables in . We assume that all possible worlds share the same domain . We define a stochastic system as a pair consisting of:

• a stochastic program that deals with input and output data through measurement variables in , behaves deterministically or probabilistically (by using some randomly generated data), and terminates with probability ;

• a stochastic assignment representing that each measurement variable has an observed value with probability .

Here we present only a general model and do not specify the data type of those measurement variables, which can be (sequences of) bit strings, floating point numbers, texts, or other types of data. Thanks to the assumption on the program termination and on the finite range of data, the program can reach finitely many states. For the sake of simplicity, our model does not take timing into account. Extension to time and temporal modality is left for future work.

### 4.3 Distributional Kripke Model

To define a semantics for StatEL, we recall the notion of a Kripke model :

###### Definition 3 (Kripke model)

Given a set of atomic formulas, a Kripke model is defined as a triple consisting of a non-empty set , a binary relation on , and a function that maps each atomic formula to a subset of . The set is called a universe, its elements are called possible worlds, is called an accessibility relation, and is called a valuation.

Now we introduce a Kripke model called a “distributional” Kripke model where each possible world is a probability distribution of states over and each world is associated with a stochastic assignment to measurement variables.

###### Definition 4 (Distributional Kripke model)

Let be a finite set of indices (typically associated with the names of agents and/or statistical tests), be a finite set of states111It is left for future work to investigate the case of infinite numbers of states., and be a finite set of data. A distributional Kripke model is a tuple consisting of:

• a non-empty set222Since is not a multiset, each world in is a different distribution of states. However, this is still expressive enough when we take to be sufficiently large. of probability distributions of states over ;

• for each , an accessibility relation ;

• for each , a valuation that maps each -ary predicate to a set .

We assume that each is associated with a function that maps each measurement variable to its value observed at a state . We also assume that each state in a world is associated with the assignment defined by .

Note that this model assumes a constant domain ; i.e., all measurement variables range over the same set in every world. Since each world is a probability distribution of states, we denote by the probability that a state is sampled from . Then the probability that a variable has a value in a world is given by:

 σw(x)[v]=∑s∈supp(w),σs(x)=vw[s].

This means that when a state is drawn from the distribution , an input value is sampled from the distribution .

### 4.4 Divergence-based Accessibility Relation

Next we introduce a family of accessibility relations used in typical statistical inferences. Since many notions of statistical distance are not metrics but divergences, we introduce an accessibility relation based on a divergence as follows.

Suppose that an agent observes some data through a single measurement variable . Then the distribution of the observed data at a world is represented by . Assume that the agent distinguishes distributions in terms of a divergence . Then given a threshold , we define a divergence-based accessibility relation by:

 Ra,ε\scriptsize def={(w,w′)∈W×W∣D(σw(x)∥σw′(x))≤ε}.

For a smaller value of , the capability of distinguishing worlds is stronger.

If is a metric instead, we call a metric-based accessibility relation. We often omit to write when we do not compare different agents’ knowledge.

Intuitively, represents that the distribution of the data observed in is indistinguishable from that in in terms of . By the definition of a divergence/metric , implies . Therefore, the relation expresses that the agent has an unlimited capability of observing the distributions and . In Sections 6 and 7 we will show examples of divergence-based accessibility relations.

### 4.5 Stochastic Semantics

In this section we define the stochastic semantics for the StatEL formulas over a distributional Kripke model with .

The interpretation of static formulas at a state is given by:

 s⊨γ(x1,x2,…,xk) iff  (σs(x1),σs(x2),…,σs(xk))∈Vs(γ) s⊨¬ψ iff  s⊭ψ s⊨ψ∧ψ′ iff  s⊨ψ  and  s⊨ψ′.

Note that the satisfaction of the static formulas does not involve probability.

To interpret the non-classical implication , we define the restriction of a world to a state formula as follows. If there exists a state such that and , then can be defined as the distribution over the finite set of states such that:

 w|ψ[s]=⎧⎨⎩w[s]∑s′:s′⊨ψw[s′] if s⊨ψ0 otherwise.

Then . Note that is undefined if does not have a state that satisfies and has a non-zero probability in .

Now we define the interpretation of epistemic formulas at a world in  by:

 M,w⊨PIψ iff  Pr[s\$←w: s⊨ψ]∈I M,w⊨¬φ iff  M,w⊭φ M,w⊨φ∧φ′ iff  M,w⊨φ  and  M,w⊨φ′ M,w⊨ψ⊃φ iff  w|ψ is defined and  M,w|ψ⊨φ M,w⊨Kaφ iff  for every w′ s.t. (w,w′)∈Ra,  M,w′⊨φ,

where represents that a state is sampled from the distribution .

Finally, the interpretation of an epistemic formula in is given by:

 M⊨φ iff  for every world w in M,  M,w⊨φ.

We remark that in each world , measurement variables can be interpreted using , as shown in Section 4.3. This allows one to assign different values to distinct occurrences of a variable in a formula; E.g., in ,  the measurement variable occurring in can be interpreted using in a world , while in can be interpreted using in another s.t. .

Note that our semantics for probability quantification is different from that in the previous work. Halpern  shows two approaches to defining semantics: giving probabilities (1) on the domain and (2) on possible worlds. However, our semantics is different from both. It defines probabilities on the states belonging to a possible world, while each world is not assigned a probability. Hence, unlike Halpern’s approaches, our model can deal with both probabilistic behaviours of systems and non-deterministic inputs from an external environment.

We also remark that StatEL can be used to formalize conditional probabilities. If the conditional probability of satisfying a static formula given another static formula is included in a set at a world , then we have , hence we obtain .

## 5 Basic Properties of StatEL

In this section we present basic properties of StatEL. In particular, we show the transitivity and Euclidean axioms rely on the agent’s capability of observation.

### 5.1 Properties of Probability Quantification

We can define a dual operator of as follows. Given a finite union of disjoint intervals, let and . Then . Negation with has the following properties.

###### Proposition 1 (Negation with probability quantification)

For any world in a model and any static formula , we have:

1.  iff

2.  iff  .

By Proposition 1, is logically equivalent to . For instance, is equivalent to , and is equivalent to .

### 5.2 Properties of Epistemic Modality

Next we show some properties of epistemic modality. As with the standard modal logic, StatEL satisfies the necessitation rule and distribution axiom.

###### Proposition 2 (Minimal properties)

For any distributional Kripke model , any , and any , we have:

• (N) necessitation:  implies

• (K) distribution:  .

The satisfaction of other properties depends on the definition of the accessibility relation. Since many notions of statistical distance are not metrics but divergences, we present some basic properties when has a divergence-based accessibility relation: .

###### Proposition 3 (Properties with divergence-based accessibility)

Let and . For any distributional Kripke model with a divergence-based accessibility relation and any , we have:

• (T) reflexivity:

• () comparison of observability:  .

If is symmetric (e.g., based on the Jensen-Shannon divergence ) then:

• (B) symmetry:  .

Here the axiom represents that an agent having a stronger capability of distinguishing worlds may have more beliefs.

Finally, we show some properties when is based on a metric (e.g. the -Wasserstein metric , including the Earth mover’s distance).

###### Proposition 4 (Properties with metric-based accessibility)

Let and . For any distributional Kripke model with a metric-based accessibility relation and any , we have (T)reflexivity, (B)symmetry, and:

• (4q) quantitative transitivity:

• (5q) relaxed Euclidean:  .

If the agent has an unlimited capability of observation (i.e., ), then:

• (4) transitivity:

• (5) Euclidean:  .

By this proposition, for , StatEL has the axioms of S5, hence the epistemic operator represents knowledge rather than beleif.

However, if the agent has a limited observability (i.e., ), then neither transitivity nor Euclidean may hold. This means that, even when he know whether holds or not with some confidence, he may not be perfectly confident that he knows it.

## 6 Modeling Statistical Hypothesis Testing Using StatEL

In this section we formalize statistical hypothesis testing by using StatEL formulas, and introduce a notion of statistical secrecy with a confidence level.

### 6.1 Statistical Hypothesis Testing

A statistical hypothesis testing is a method of statistical inference to check whether given datasets provide sufficient evidence to support some hypothesis. Typically, given two datasets, a null hypothesis is defined to claim that there is no statistical relationship between the two datasets (e.g., no difference between the result of a medical treatment and the placebo effect), while an alternative hypothesis represents that there is some relationship between them (e.g., the result of a medical treatment is better than the placebo effect).

Before performing a hypothesis test, we specify a significance level , i.e., the probability that the test might reject the null hypothesis , given that is true. Typically, is or . is called a confidence level.

### 6.2 Formalization of Statistical Hypothesis Testing

Now we define a distributional Kripke model with a universe that includes at least two worlds and corresponding to the two datasets we compare:

• the real world where we have a dataset sampled from actual experiments (e.g., from a medical treatment whose effectiveness we want to know);

• the ideal world where we have a dataset that is synthesized from the null hypothesis setting (e.g., the dataset obtained from the placebo effect).

Note that may include other worlds corresponding to different possible datasets.

Let be the size of the dataset, and be a measurement variable denoting a single data value chosen from the dataset we have. We assume that each world has a state corresponding to each single data value in the dataset. Then is the empirical distribution (histogram) calculated from the dataset observed in the actual experiments in , while is the distribution calculated from the synthetic dataset in . Then the number of data having a value in the dataset in a world is given by .

Assume that has an accessibility relation that is specific to the sample size , the statistical hypothesis test, and the critical value for a significance level we use. For brevity let . Intuitively, represents that the hypothesis test cannot distinguish the actual dataset from the synthetic one. For instance, when we use Pearson’s -test as the hypothesis test, then is defined by:

 Rεα,n% \scriptsize def={(w,w′)∈W×W∣Dχ2(σw(x)∥σw′(x))≤εα,n},

where is Pearson’s divergence (Definition 1).

Observe that when the confidence level increases, then decreases, hence is smaller, i.e., the capability of distinguishing possible worlds is stronger.

Let be a formula representing that the dataset is synthesized from the null hypothesis setting (e.g., representing the placebo effect). Then . Since each world in corresponds to a different dataset, it holds for any that . For instance, , since the actual dataset is used in even when it looks indistinguishable from the synthetic dataset by the hypothesis test.

When the null hypothesis is rejected with a confidence level , then . Since holds for any , this rejection of the null hypothesis implies:

 M,wreal⊨Kεα,n¬φsyn,

which is logically equivalent to . This means that with the confidence level , we know we are not located in the world , hence do not have a synthetic dataset.

On the other hand, when the null hypothesis is not rejected with a confidence level , then . Thus we obtain:

 M,wreal⊨Pεα,nφsyn. (1)

This means that we cannot recognize whether we are located in the world or , i.e., we are not sure which database we have. To see this in details, let be a formula representing that we have a third database (different from those in and ). Suppose that another null hypothesis of satisfying is not rejected with a confidence level . Then we have . Since each world in corresponds to a different database, we obtain , which implies . This represents that, when the null hypothesis is not rejected, we are not sure whether the null hypothesis is true or false.

### 6.3 Formalization of Statistical Secrecy

Now let us formalize the coin flipping in Example 1 in Section 3 by using StatEL as follows. Recall that in and in . Let be a static formula representing that the coin is a heads. Then and . Assume that either or holds, i.e., .

When we have a sufficient number of coin flips (e.g., ), we can distinguish from (i.e., from ) by a hypothesis test. Hence we learn the probability with some confidence level , i.e., and . Therefore we obtain:

 M⊨(P0.5ψ→Kεα,nP0.5ψ)∧(P0.4ψ→Kεα,nP0.4ψ).

Note that for a larger sample size ,  we have , hence it follows from the axiom in Proposition 3 that:

 M⊨(P0.5ψ→Kεα,n′P0.5ψ)∧(P0.4ψ→Kεα,n′P0.4ψ).

This means that if our knowledge derived from a smaller sample is statistically significant, then we derive the same conclusion from a larger sample.

On the other hand, when we have a very small number of coin flips, we cannot distinguish from . Then we are not sure about with a confidence level , i.e., and . Hence:

 M⊨(P0.5ψ∨P0.4ψ)→(Pεα,n′′P0.5ψ∧Pεα,n′′P0.4ψ).

This expresses a secrecy of . We generalize this to introduce the following definition of secrecy.

###### Definition 5 ((α,n)-statistical secrecy)

Let be a finite set of formulas, be a significance level, and be a sample size. We say that is -statistically secret if we have:

 M⊨⋁φ∈Φφ→⋀φ∈ΦPεα,nφ.

In the above coin flipping example, is -statistically secret for some significance level and sample size . Syntactically, -statistical secrecy resembles the notion of total anonymity , whereas in our definition, the epistemic operator deals with the statistical significance and is not limited to a formula representing an agent’s action.

## 7 Modeling Statistical Data Privacy Using StatEL

In this section we formalize a notion of statistical data privacy by using StatEL.

### 7.1 Differential Privacy

Differential privacy [11, 12] is a popular measure of data privacy guaranteeing that by observing a statistics about a database , we cannot learn whether an individual user’s record is included in or not.

As a toy example, let us assume that the body weight of individuals is sensitive information, and we publish the average weight of all users recorded in a database . Then we denote by the database obtained by adding to a single record of a new user ’s weight. If we also disclose the average weight of all users in , then you learn ’s weight from the difference between these two averages.

To mitigate such privacy leaks, many studies have proposed obfuscation mechanisms, i.e., randomized algorithms that add random noise to the statistics calculated from databases. In the above example, an obfuscation mechanism receives a database and outputs a statistics of average weight to which some random noise is added. Then you cannot learn much information on ’s weight from the perturbed statistics of average weight.

The privacy achieved by such obfuscation is often formalized as differential privacy. Intuitively, an -differential privacy mechanism makes every two “adjacent” (i.e., close) database and indistinguishable with a degree of .

###### Definition 6 (Differential privacy)

Let be the base of natural logarithm, , be the set of all databases, and be an adjacency relation between two databases. A randomized algorithm provides -differential privacy w.r.t. if for any and any ,

 Pr[A(d)∈R]≤eεPr[A(d′)∈R]

where the probability is taken over the randomness in .

For a smaller , the protection of differential privacy is stronger. It is known that differential privacy can be defined using the max-divergence (Definition 2) as follows .

###### Proposition 5

An obfuscation mechanism provides -differential privacy w.r.t. iff for any ,  and .

### 7.2 Formalization of Differential Privacy

Next we define a distributional Kripke model where there is a possible world corresponding to each database in . We assume that each world is a probability distribution of states in each of which an obfuscation mechanism uses a different value of random seed for providing a probabilistically perturbed output. Let (resp. ) be a measurement variable denoting the input (resp. output) of the obfuscation mechanism . In each world ,  is the database that receives as input, and is the distribution of statistics that outputs. Then the set of all databases is denoted by .

Now we define the accessibility relation in by using the max divergence as follows333Since the relation is symmetric, the symmetry axiom (B) also holds. :

 Rε\scriptsize def={(w,w′)∈W×W∣D∞(σw(y)∥σw′(y))≤ε, D∞(σw′(y)∥σw(y))≤ε}.

Intuitively, represents that, when we observe an output of the obfuscation mechanism , we do not know which of the two worlds and we are located at. Hence we do not see which of the two databases and was the input to .

For each , let be a formula representing that we have a database . Then the -differential privacy of w.r.t. an adjacency relation is expressed as:

 M⊨⋀d∈D(φd→⋀d′∈Ψ(d)Pεφd′).

Note that the privacy of user attributes defined as distribution privacy  can also be expressed using StatEL, since it is defined as the differential privacy w.r.t. a relation between the probability distributions that represent user attributes. We will elaborate on this in future work.

## 8 Related Work

In this section, we overview related work, including the integration of logical and statistical techniques, epistemic logic, and logical formalization of privacy.

#### Integration of logical and statistical techniques.

There have been various studies on integrating logical and statistical techniques in software engineering. Notable examples are probabilistic programming , which has sampling from distributions and conditioning by observations, and statistical model checking [36, 40, 31], which checks the satisfiability of logical formulas by simulations and statistical hypothesis tests. In research of privacy, a few papers (e.g., ) present hybrid methods combining symbolic and statistical analyses to quantify privacy leaks. In future work, our logic may be used to define specifications of these techniques and characterize their properties.

#### Non-determinism and probability in Kripke models.

Although many epistemic models have been proposed [14, 20, 21], they often assume that each possible world is a single deterministic state. To formalize the behaviours of stochastic systems in their model, they assume that every world is assigned a probability (e.g., ), which means the non-determinism needs to be resolved in advance.

However, not only probability but also non-deterministic inputs are essential to reason about security and many applications in statistics. In the context of security, we usually do not have a prior knowledge of the probability distribution of adversarial inputs. Also in the statistical hypothesis testing (Section 6.1

), we do not assume the prior probabilities of the null/alternative hypotheses. The notion of differential privacy (Definition

6) is also independent of the prior distribution on the databases. Therefore, unlike ours, the Kripke models in previous work cannot be used for the purpose of formalizing such statistical knowledge.

#### Kripke model for some aspects of statistics.

The random worlds model  is an epistemic model that tries to formalize some aspects of statistics. In that model, they assume that each possible world has an identical probability at the initial time, although this causes problems as mentioned in Chapter 10 of . Unlike our distributional model, their model employs neither distributions of states nor statistical significance. They assume only finite intervals of errors, and analyze only the ideal situation that corresponds to an infinite sample size. Therefore, the random worlds model cannot formalize statistical knowledge in our sense.

In research of philosophical logic, [32, 2] formalize the idea that when a random value has various possible probability distributions, those distributions should be represented on different possible worlds. Unlike our work, however, they do not model statistical significance or explore accessibility relations.

Independently of our work, French et al.  propose a probability model for a dynamic epistemic logic where each world is associated with a (subjective) probability distribution over the universe and may have a different probability for a propositional variable to be true. This is different from our distributional Kripke model in that their model does not associate each world with a probability distribution of observable variables, hence deals with neither non-deterministic inputs, divergence-based accessibility relations, nor statistical significance.

#### Epistemic logic for privacy properties.

Epistemic logic has been used to formalize and reason about privacy properties, including anonymity [37, 21, 35, 17, 25, 13, 4, 6], role-interchangeability , receipt-freeness of electronic voting protocols [23, 4], and its extension called coercion-resistance . Unlike our formalization in Section 7, however, these do not regard possible worlds as probability distributions and cannot formalize privacy properties with a statistical significance.

#### Logical approaches to differential privacy.

There have been studies that formalize differential privacy using logics, such as Hoare logic  and HyperPCTL . Compared to StatEL, these formalizations need to explicitly describe inequalities of probabilities without much abstraction, hence the formulas are more complicated. In addition, none of them formalizes the situation with finite sample sizes or statistical significance.

## 9 Conclusion

We introduced statistical epistemic logic (StatEL) to describe statistical knowledge, and showed its stochastic semantics based on the distributional Kripke model. By using StatEL we introduced -statistical secrecy with a significance level and a sample size , and showed that StatEL is useful to formalize hypothesis testing and differential privacy in a simple way. As shown in , StatEL can also express certain properties of statistical machine learning.

In our ongoing work, we extend StatEL to deal with the security of cryptography based on computational complexity theory. As for future work, we will extend this logic with temporal modality and give its axiomatization. Our future work includes an extension of StatEL to formalize the quantitative notions of anonymity  and asymptotic anonymity . We are also interested in clarifying the relationships between our distributional Kripke model and the main stream probabilistic epistemic logic assigning probabilities to worlds. Furthermore, we plan to develop statistical epistemic logic for process calculi in an analogous way to [6, 22, 10, 8], and to investigate the relationships between statistical epistemic logic and bisimulation metrics analogously to .

## Acknowledgments

I would like to thank the reviewers for their helpful and insightful comments. I am also grateful to Ken Mano, Gergei Bana, and Ryuta Arisaka for their useful comments on preliminary manuscripts.

## References

•  Ábrahám, E., Bonakdarpour, B.: Hyperpctl: A temporal logic for probabilistic hyperproperties. In: Proc. QEST. pp. 20–35 (2018)
•  Bana, G.: Models of objective chance: An analysis through examples. In: Making it Formally Explicit. pp. 43–60. Springer International Publishing (2017). https://doi.org/10.1007/978-3-319-55486-0_3
•  Barthe, G., Gaboardi, M., Arias, E.J.G., Hsu, J., Kunz, C., Strub, P.: Proving differential privacy in hoare logic. In: Proc. CSF. pp. 411–424 (2014)
•  Baskar, A., Ramanujam, R., Suresh, S.P.: Knowledge-based modelling of voting protocols. In: Proc. TARK. pp. 62–71 (2007)
• 

Biondi, F., Kawamoto, Y., Legay, A., Traonouez, L.: Hybrid statistical estimation of mutual information and its application to information flow. Formal Asp. Comput.

31(2), 165–206 (2019). https://doi.org/10.1007/s00165-018-0469-z
•  Chadha, R., Delaune, S., Kremer, S.: Epistemic logic for the applied pi calculus. In: Proc. FMOODS/FORTE. pp. 182–197 (2009). https://doi.org/10.1007/978-3-642-02138-1_12
•  Chatzikokolakis, K., Gebler, D., Palamidessi, C., Xu, L.: Generalized bisimulation metrics. In: Proc. CONCUR. pp. 32–46 (2014). https://doi.org/10.1007/978-3-662-44584-6_4
•  Chatzikokolakis, K., Knight, S., Palamidessi, C., Panangaden, P.: Epistemic strategies and games on concurrent processes. ACM Trans. Comput. Logic 13(4), 28:1–28:35 (2012). https://doi.org/10.1145/2362355.2362356
•  Chatzikokolakis, K., Palamidessi, C., Panangaden, P.: Anonymity protocols as noisy channels. Inf. Comput. 206(2–4), 378–401 (2008). https://doi.org/10.1016/j.ic.2007.07.003
•  Dechesne, F., Mousavi, M., Orzan, S.: Operational and epistemic approaches to protocol analysis: Bridging the gap. In: Proc. LPAR. pp. 226–241 (2007)
•  Dwork, C.: Differential privacy. In: Proc. ICALP. pp. 1–12 (2006)
•  Dwork, C., Roth, A., et al.: The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science 9(3–4), 211–407 (2014)
•  van Eijck, J., Orzan, S.: Epistemic verification of anonymity. Electr. Notes Theor. Comput. Sci. 168, 159–174 (2007). https://doi.org/10.1016/j.entcs.2006.08.026
•  Fagin, R., Halpern, J., Moses, Y., Vardi, M.: Reasoning about Knowledge. The MIT Press (1995)
•  French, T., Gozzard, A., Reynolds, M.: Dynamic aleatoric reasoning in games of bluffing and chance. In: Proc. AAMAS. pp. 1964–1966 (2019)
•  F.R.S., K.P.: On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling. The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science 50(302), 157–175 (1900)
•  Garcia, F.D., Hasuo, I., Pieters, W., van Rossum, P.: Provable anonymity. In: Proc. FMSE. pp. 63–72 (2005). https://doi.org/10.1145/1103576.1103585
•  Gordon, A.D., Henzinger, T.A., Nori, A.V., Rajamani, S.K.: Probabilistic programming. In: Proc. FOSE. pp. 167–181 (2014). https://doi.org/10.1145/2593882.2593900
•  Halpern, J.Y.: An analysis of first-order logics of probability. Artif. Intell. 46(3), 311–350 (1990). https://doi.org/10.1016/0004-3702(90)90019-V
•  Halpern, J.Y.: Reasoning about uncertainty. The MIT press (2003)
•  Halpern, J.Y., O’Neill, K.R.: Anonymity and information hiding in multiagent systems. In: Proc. CSFW. pp. 75–88 (2003)
•  Hughes, D., Shmatikov, V.: Information hiding, anonymity and privacy: a modular approach. J. of Comp. Security 12(1), 3–36 (2004)
•  Jonker, H.L., Pieters, W.: Receipt-freeness as a special case of anonymity in epistemic logic. In: Proc. Workshop On Trustworthy Elections (WOTE’06) (June 2006)
•  Kawamoto, Y.: Towards logical specification of statistical machine learning. In: Proc. SEFM (2019), to appear
•  Kawamoto, Y., Mano, K., Sakurada, H., Hagiya, M.: Partial knowledge of functions and verification of anonymity (in Japanese). Transactions of the Japan Society for Industrial and Applied Mathematics 17(4), 559–576 (2007). https://doi.org/10.11540/jsiamt.17.4_559
•  Kawamoto, Y., Murakami, T.: On the anonymization of differentially private location obfuscation. In: Proc. ISITA. pp. 159–163 (2018)
•  Kawamoto, Y., Murakami, T.: Local obfuscation mechanisms for hiding probability distributions. In: Proc. ESORICS (2019), to appear
•  Kooi, B.P.: Probabilistic dynamic epistemic logic. Journal of Logic, Language and Information 12(4), 381–408 (2003). https://doi.org/10.1023/A:1025050800836
•  Kripke, S.A.: Semantical analysis of modal logic i normal modal propositional calculi. Mathematical Logic Quarterly 9(5-6), 67–96 (1963)
•  Küsters, R., Truderung, T.: An epistemic approach to coercion-resistance for electronic voting protocols. In: Proc. S&P. pp. 251–266 (2009). https://doi.org/10.1109/SP.2009.13
•  Legay, A., Delahaye, B., Bensalem, S.: Statistical model checking: An overview. In: Proc. RV. pp. 122–135 (2010). https://doi.org/10.1007/978-3-642-16612-9_11
•  Lewis, D.: A subjectivist’s guide to objective chance. In: Studies in Inductive Logic and Probability, Volume II, pp. 263–293. Berkeley: University of California Press (1980)
•  Lin, J.: Divergence measures based on the shannon entropy. IEEE Transactions on Information Theory 37(1), 145–151 (1991). https://doi.org/10.1109/18.61115
•  Mano, K., Kawabe, Y., Sakurada, H., Tsukada, Y.: Role interchange for anonymity and privacy of voting. J. Log. Comput. 20(6), 1251–1288 (2010). https://doi.org/10.1093/logcom/exq013
•  van der Meyden, R., Su, K.: Symbolic model checking the knowledge of the dining cryptographers. In: Proc. CSFW. p. 280 (2004). https://doi.org/10.1109/CSFW.2004.19
•  Sen, K., Viswanathan, M., Agha, G.: Statistical model checking of black-box probabilistic systems. In: Proc. CAV. pp. 202–215 (2004). https://doi.org/10.1007/978-3-540-27813-9_16
•  Syverson, P.F., Stubblebine, S.G.: Group principals and the formalization of anonymity. In: World Congress on Formal Methods (1). pp. 814–833 (1999). https://doi.org/10.1007/3-540-48119-2_45
•  Vaserstein, L.: Markovian processes on countable space product describing large systems of automata. Probl. Peredachi Inf. 5(3), 64–72 (1969)
•  von Wright, G.H.: An Essay in Modal Logic. Amsterdam: North-Holland Pub. Co. (1951)
•  Younes, H.L.: Verification and planning for stochastic processes with asynchronous events. Ph.D. thesis, Carnegie Mellon University (2005)

## Appendix 0.A Properties of Probability Quantification

In this section we present the proofs for properties of probability quantification.

See 1

###### Proof

We show the first claim as follows. By the definition of semantics, is logically equivalent to , which is equivalent to , namely, .

Next we show the second claim as follows. By the definition of semantics, is logically equivalent to , i.e., . This is equivalent to . ∎

## Appendix 0.B Properties of the Epistemic Operators

In this section we present properties of our epistemic operators and their proofs.

See 2

###### Proof

We first show (N) necessitation rule as follows. Assume that . Then for any world in , we have . Hence . Therefore the necessitation rule holds.

Next we show (K) distribution axiom as follows. Let be a possible world in . Assume that , and that . Let be any world such that . Then we have and , hence . Thus we have . Therefore we obtain . ∎

See 3

###### Proof

Let be a possible world in .

We first show (T) reflexivity as follows. Assume that . By , we have . Therefore, we obtain .

Next we show () comparison of observability as follows. Assume that . Let be any world such that . Then . By and the definition of , we have , hence . Then . Therefore we obtain .

Finally, we show (B) symmetry when is symmetric. Assume that . Let be any world such that . Since is symmetric, we have . By , we obtain . Hence . Therefore we obtain . ∎

See 4

###### Proof

Since a metric satisfies the definition of a divergence (in Section 2), a metric-based accessibility relation is also a divergence-based accessibility relation. Therefore we obtain (T) reflexivity and (B) symmetry from Proposition 3.

Next we show (4q) quantitative transitivity as follows. Let be a possible world in . Assume that . Let be any world such that , and be any world such that . By definition, we have and . By the subadditivity of the divergence , we have , hence . Then it follows from that . By the definition of , we obtain . Therefore we have .

We next show (5q) relaxed Euclidean as follows. Let be a possible world in . Assume that . Then there exists a world such that and . Let be any world such that