have found vulnerabilities on the robustness of deep neural networks (DNNs) to malicious inputs, which can lead to disasters in security critical systems, such as self-driving cars. To find out these vulnerabilities in advance, there have been researches on the formal verification and testing methods for the robustness of DNNs in recent years[22, 25, 33, 37]. However, relatively little attention has been paid to the formal specification of machine learning .
To describe the formal specification of security properties, logical approaches have been shown useful to classify desired properties and to develop theories to compare those properties. For example, security policies in temporal systems have been formalized as trace properties  or hyperproperties , which characterize the relationships among various security policies. For another example, epistemic logic  has been widely used as formal policy languages (e.g., for the authentication  and the anonymity [35, 20] of security protocols, and for the privacy of social network ). As far as we know, however, no prior work has employed logical formulas to rigorously describe various statistical properties of machine learning, although there are some papers that (often informally) list various desirable properties of machine learning .
In this paper, we present a first logical formalization of statistical properties of machine learning. To describe the statistical properties in a simple and abstract way, we employ statistical epistemic logic (StatEL) 
, which is recently proposed to describe statistical knowledge and is applied to formalize statistical hypothesis testing and statistical privacy of databases.
A key idea in our modeling of statistical machine learning is that we formalize logical properties in the syntax level by using logical formulas, and statistical distances in the semantics level by using accessibility relations of a Kripke model . In this model, we formalize statistical classifiers and some of their desirable properties: classification performance, robustness, and fairness. More specifically, classification performance and robustness are described as the differences between the classifier’s recognition and the correct label (e.g., given by the human), whereas fairness is formalized as the conditional indistinguishability between two groups or individuals by using a notion of counterfactual knowledge.
The main contributions of this work are as follows:
We show a logical approach to formalizing statistical properties of machine learning in a simple and abstract way. In particular, we model logical properties in the syntax level, and statistical distances in the semantics level.
We introduce a formal model for statistical classification. More specifically, we show how probabilistic behaviours of classifiers and non-deterministic adversarial inputs are formalized in a distributional Kripke model .
We formalize the classification performance, robustness, and fairness of classifiers by using statistical epistemic logic (StatEL). As far as we know, this is the first work that uses logical formulas to formalize various statistical properties of machine learning, and that provides epistemic (resp. counterfactually epistemic) views on robustness (resp. fairness) of classifiers.
We show some relationships among properties of classifiers, e.g., different strengths of robustness. We also present some relationships between classification performance and robustness, which suggest robustness-related properties that have not been formalized in the literature as far as we know.
To formalize fairness properties, we define a notion of certain counterfactual knowledge and show techniques to formalize conditional indistinguishability by using counterfactual epistemic operators in StatEL. This enables us to express various fairness properties in a similar style of logical formulas.
Cautions and limitations.
In this paper, we focus on formalizing properties of classification problems and do not deal with the properties of learning algorithms (e.g., fairness through awareness ), quality of training data (e.g., sample bias), quality of testing (e.g., coverage criteria), explainability, temporal properties, system level specification, or process agility in system development. It should be noted that all properties formalized in this paper have been known in literatures on machine learning, and the novelty of this work lies in the logical formulation of those statistical properties.
We also remark that this work does not provide methods for checking, guaranteeing, or improving the performance/robustness/fairness of machine learning. As for the satisfiability of logical formulas, we leave the development of testing and (statistical) model checking algorithms as future work, since the research area on the testing and formal/statistical verification of machine learning is relatively new and needs further techniques to improve the scalability. Moreover, in some applications such as image recognition, some formulas (e.g., representing whether an input image is panda or not) cannot be implemented mathematically, and require additional techniques based on experiments. Nevertheless, we demonstrate that describing various properties using logical formulas is useful to explore desirable properties and to discuss their relationships in a framework.
Finally, we emphasize that our work is the first attempt to use logical formulas to express statistical properties of machine learning, and would be a starting point to develop theories of specification of machine learning in future research.
The rest of this paper is organized as follows. Section 2 presents background on statistical epistemic logic (StatEL) and notations used in this paper. Section 3 defines counterfactual epistemic operators and shows techniques to model conditional indistinguishability using StatEL. Section 4 introduces a formal model for describing the behaviours of statistical classifiers and non-deterministic adversarial inputs. Sections 5, 6, and 7 respectively formalize the classification performance, robustness, and fairness of classifiers by using StatEL. Section 8 presents related work and Section 9 concludes.
In this section we introduce some notations and recall the syntax and semantics of the statistical epistemic logic (StatEL) introduced in .
Let be the set of non-negative real numbers, and be the set of non-negative real numbers not greater than . We denote by
the set of all probability distributions over a set. Given a finite set and a probability distribution , the probability of sampling a value from is denoted by . For a subset we define by: . For a distribution over a finite set , its support is defined by .
The total variation distance of two distributions is defined by: .
2.2 Syntax of StatEL
We recall the syntax of the statistical epistemic logic (StatEL) , which has two levels of formulas: static and epistemic formulas. Intuitively, a static formula describes a proposition satisfied at a deterministic state, while an epistemic formula describes a proposition satisfied at a probability distribution of states. In this paper, the former is used only to define the latter.
Formally, let be a set of symbols called measurement variables, and be a set of atomic formulas of the form for a predicate symbol , , and . Let be a finite union of disjoint intervals, and be a finite set of indices (e.g., associated with statistical divergences). Then the formulas are defined by:
where . We denote by the set of all epistemic formulas. Note that we have no quantifiers over measurement variables. (See Section 2.4 for more details.)
The probability quantification represents that a static formula is satisfied with a probability belonging to a set . For instance, represents that holds with a probability greater than . By we represent that the conditional probability of given is included in a set . The epistemic knowledge expresses that we knows with a confidence specified by .
As syntax sugar, we use disjunction , classical implication , and epistemic possibility , defined as usual by: , , and . When is a singleton , we abbreviate as .
2.3 Distributional Kripke Model
Next we recall the notion of a distributional Kripke model , where each possible world is a probability distribution over a set of states and each world is associated with a stochastic assignment to measurement variables.
Definition 1 (Distributional Kripke model)
Let be a finite set of indices (typically associated with statistical tests and their thresholds), be a finite set of states, and be a finite set of data. A distributional Kripke model is a tuple consisting of:
a non-empty set of probability distributions over a finite set of states;
for each , an accessibility relation ;
for each , a valuation that maps each -ary predicate to a set .
We assume that each is associated with a function that maps each measurement variable to its value observed at a state . We also assume that each state in a world is associated with the assignment defined by .
The set is called a universe, and its elements are called possible worlds. All measurement variables range over the same set in every world.
Since each world is a distribution of states, we denote by the probability that a state is sampled from . Then the probability that a measurement variable has a value is given by . This implies that, when a state is drawn from , an input is sampled from the distribution .
2.4 Stochastic Semantics of StatEL
Now we recall the stochastic semantics  for the StatEL formulas over a distributional Kripke model with .
The interpretation of static formulas at a state is given by:
The restriction of a world to a static formula is defined by if , and otherwise. Note that is undefined if there is no state that satisfies and has a non-zero probability in .
Then the interpretation of epistemic formulas in a world is defined by:
where represents that a state is sampled from the distribution .
Then represents that the conditional probability of satisfying a static formula given another is included in a set at a world .
In each world , measurement variables can be interpreted using . This allows us to assign different values to different occurrences of a variable in a formula; E.g., in , occurring in is interpreted by in a world , while in is interpreted by in another s.t. .
Finally, the interpretation of an epistemic formula in is given by:
3 Techniques for Conditional Indistinguishability
In this section we introduce some modal operators to define a notion of “counterfactual knowledge” using StatEL, and show how to employ them to formalize conditional indistinguishability properties. The techniques presented here are used to formalize some fairness properties of machine learning in Section 7.
3.1 Counterfactual Epistemic Operators
Let us consider an accessibility relation based on a statistical divergence and a threshold defined by:
where is the measurement variable observable in each world in . Intuitively, represents that the probability distribution of the data observed in a world is indistinguishable from that in another world in terms of .
Now we define the complement relation of by , namely,
Then represents that the distribution observed in can be distinguished from that in . Then the corresponding epistemic operator , which we call a counterfactual epistemic operator, is interpreted as:
Intuitively, (1) represents that if we were located in a possible world that looked distinguished from the real world , then would always hold. This means a counterfactual knowledge111Our definition of counterfactual knowledge is limited to the condition of having an observation different from the actual one. More general notions of counterfactual knowledge can be found in previous work (e.g., ). in the sense that, if we had an observation different from the real world, then we would know . This is logically equivalent to (2), representing that all possible worlds that do not satisfy look indistinguishable from the real world in terms of .
We remark that the dual operator is interpreted as:
This means a counterfactual possibility in the sense that it might be the case where we had an observation different from the real world and thought possible.
3.2 Conditional Indistinguishability via Counterfactual Knowledge
Specifically, we use the following proposition, stating that given that two static formulas and are respectively satisfied in worlds and with probability , then the indistinguishability between and can be expressed as . Note that this formula means that there is no possible world where we have an observation different from the real world (satisfying ) but we think possible; i.e., the formula means that if is satisfied then we have an observation indistinguishable from that in the real world .
Proposition 1 (Conditional indistinguishability)
Let be a distributional Kripke model with the universe . Let and be static formulas, and .
iff for any , and imply .
If is symmetric, then iff .
See Appendix 0.A for the proof.
4 Formal Model for Statistical Classification
In this section we introduce a formal model for statistical classification by using distributional Kripke models (Definition 1). In particular, we formalize a probabilistic behaviour of a classifier and a non-deterministic input from an adversary in a distributional Kripke model.
4.1 Statistical Classification Problems
Multiclass classification is the problem of classifying a given input into one of multiple classes.
Let be a finite set of class labels, and be the finite set of input data (called feature vectors
feature vectors) that we want to classify. Then a classifier is a function that receives an input datum and predicts which class (among ) the input belongs to. Here we do not model how classifiers are constructed from a set of training data, but deal with a situation where some classifier has already been obtained and its properties should be evaluated.
Let be a scoring function that gives a score of predicting the class of an input datum (feature vector) as a label . Then for each input , we denote by to represent that a label maximizes . For example, when the input is an image of an animal and is the animal’s name, then may represent that an oracle (or a “human”) classifies the image as .
4.2 Modeling the Behaviours of Classifiers
Classifiers are formalized on a distributional Kripke model with and a real world . Recall that each world is a probability distribution over the set of states and has a stochastic assignment that is consistent with the deterministic assignments for all (as explained in Section 2.3).
We present an overview of our formalization in Fig. 1. We denote by an input to the classifier , and by a label output by . We assume that the input variable (resp. the output variable ) ranges over the set of input data (resp. the set of labels); i.e., the deterministic assignment at each state has the range and satisfies and .
A key idea in our modeling is that we formalize logical properties in the syntax level by using logical formulas, and statistical distances in the semantics level by using accessibility relations . In this way, we can formalize various statistical properties of classifiers in a simple and abstract way.
To formalize a classifier , we introduce a static formula to represent that classifies a given input as a class . We also introduce a static formula to represent that is the actual class of an input . As an abbreviation, we write (resp. ) to denote (resp. ). Formally, these static formulas are interpreted at each state as follows:
4.3 Modeling the Non-deterministic Inputs from Adversaries
As explained in Section 2.3, when a state is drawn from a distribution , an input value is sampled from the distribution , and assigned to the measurement variable . Since denotes the input to the classifier , the input distribution over can be regarded as the test dataset. This means that each world corresponds to a test dataset . For instance, in the real world represents the actual test dataset. The set of all possible test datasets (i.e., possible distributions of inputs to ) is represented by . Note that can be an infinite set.
For example, let us consider testing the classifier with the actual test dataset . When assigns a label to an input with probability , i.e., , then this can be expressed by:
We can also formalize a non-deterministic input from an adversary in this model as follows. Although each state in a possible world is assigned the probability , each possible world itself is not assigned a probability. Thus, each input distribution itself is also not assigned a probability, hence our model assumes no probability distribution over . In other words, we assume that a world and thus an adversary’s input distribution are non-deterministically chosen. This is useful to model an adversary’s malicious inputs in the definitions of security properties, because we usually do not have a prior knowledge of the distribution of malicious inputs from adversaries, and need to reason about the worst cases caused by the attack. In Section 6, this formalization of non-deterministic inputs is used to express the robustness of classifiers.
Finally, it should be noted that we cannot enumerate all possible adversarial inputs, hence cannot construct by collecting their corresponding worlds. Since can be an infinite set and is unspecified, we do not aim at checking whether or not a formula is satisfied in all possible worlds of . Nevertheless, as shown in later sections, describing various properties using StatEL is useful to explore desirable properties and to discuss relationships among them.
5 Formalizing the Classification Performance
In this section we show a formalization of classification performance using StatEL (See Fig. 2 for basic ideas). In classification problems, the terms positive/negative represent the result of the classifier’s prediction, and the terms true/false represent whether the classifier predicts correctly or not. Then the following terminologies are commonly used:
true positive means both the prediction and actual class are positive;
true negative means both the prediction and actual class are negative;
false positive means the prediction is positive but the actual class is negative;
false negative means the prediction is negative but the actual class is positive.
These terminologies can be formalized using StatEL as shown in Table 1. For example, when an input shows true positive at a state , this can be expressed as, , and .
Then the precision (positive predictive value) is defined as the conditional probability that the prediction is correct given that the prediction is positive; i.e., . Since the test dataset distribution in the real world is expressed as (as explained in Section 4.3), the precision being within an interval is given by:
which can be written as:
By using StatEL, this can be formalized as:
Note that the precision depends on the test data sampled from the distribution , hence on the real world in which we are located. Hence the measurement variable in is interpreted using the stochastic assignment in the world .
Symmetrically, the recall (true positive rate) is defined as the conditional probability that the prediction is correct given that the actual class is positive; i.e., . Then the recall being within is formalized as:
In Table 1 we show the formalization of other notions of classification performance using StatEL.
6 Formalizing the Robustness of Classifiers
Many studies have found attacks on the robustness of statistical machine learning . An input data that violates the robustness of classifiers is called an adversarial example . It is designed to make a classifier fail to predict the actual class , but is recognized to belong to
from human eyes. For example, in computer vision, Goodfellow et al. create an image by adding undetectable noise to a panda’s photo so that humans can still recognize the perturbed image as a panda, but a classifier misclassifies it as a gibbon.
In this section we formalize robustness notions for classifiers by using epistemic operators in StatEL (See Fig. 2 for an overview of the formalization). In addition, we present some relationships between classification performance and robustness, which suggest robustness-related properties that have not been formalized in the literature as far as we know.
6.1 Total Correctness of Classifiers
We first note that the total correctness of classifiers could be formalize as a classification performance (e.g., precision, recall, or accuracy) in the presence of all possible inputs from adversaries. For example, the total correctness could be formalized as , which represents that is satisfies in all possible worlds of .
In practice, however, it is not possible or tractable to check whether the classification performance is achieved for all possible dataset and for all possible inputs, e.g., when is an infinite set. Hence we need a weaker form of correctness notions, which may be verified in a certain way. In the following sections, we deal with robustness notions that are weaker than total correctness.
6.2 Probabilistic Robustness against Targeted Attacks
When a robustness attack aims at misclassifying an input as a specific target label, then it is called a targeted attack. For instance, in the above-mentioned attack by , a gibbon is the target into which a panda’s photo is misclassified.
To formalize the robustness, let be an accessibility relation that relates two worlds having closer inputs, i.e.,
where is some divergence or distance. Intuitively, implies that the two distributions and of inputs to the classifier are close data in terms of (e.g., two slightly different images that look pandas from the human’ eyes). Then an epistemic formula represents that the classifier is confident that is true as far as it classifies the test data that are perturbed by a level of noise222This usage of modality relies on the fact that the value of the measurement variable can be different in different possible worlds..
Now we discuss how we formalize robustness using the epistemic operator as follows. A first definition of robustness against targeted attacks might be:
which represents that a panda’s photo will not be recognized as a gibbon at all after the photo is perturbed by noise. However, this does not express probability or cover the case where the human cannot recognize the perturbed image as a panda, for example, when the image is perturbed by a transformation such as rescaling and rotation . Instead, for some , we formalize a notion of probabilistic robustness against targeted attacks by:
Since -norms are often regarded as reasonable approximations of human perceptual distances , they are used as distance constraints on the perturbation in many researches on targeted attacks (e.g. [36, 18, 6]). To represent the robustness against these attacks in our model, we should take a metric defined by where and range over the datasets and respectively.
6.3 Probabilistic Robustness against Non-Targeted Attacks
Next we formalize non-targeted attacks [31, 30] in which adversaries try to misclassify inputs as some arbitrary incorrect labels (i.e., not as a specific label like a gibbon). Compared to targeted attacks, this kind of attacks are easier to mount, but harder to defend.
A notion of probabilistic robustness against non-targeted attacks can be formalized for some by:
Then we derive that implies , namely, robustness against non-targeted attacks is not weaker than robustness against targeted attacks.
Next we note that by (6), robustness can be regarded as recall in the presence of perturbed noise. This implies that for each property in Table 1, we could consider as a property related to robustness although these have not been formalized in the literature of robustness of machine learning as far as we recognize. For example, represents that in the presence of perturbed noise, the prediction is correct with a probability given that it is positive. For another example, represents that in the presence of perturbed noise, the prediction is correct (whether it is positive or negative) with a probability .
Finally, note that by the reflexivity of , implies , i.e., robustness implies recall without perturbation noise.
7 Formalizing the Fairness of Classifiers
There have been researches on various notions of fairness in machine learning. In this section, we formalize a few notions of fairness of classifiers by using StatEL. Here we focus on the fairness that should be maintained in the impact, i.e., the results of classification, rather than the treatment333For instance, fairness through awareness  requires that protected attributes (e.g., race, religion, or gender) are not explicitly used in the prediction process. However, StatEL may not be suited to formalizing such a property in treatment..
To formalize fairness notions, we use a distributional Kripke model where includes a possible world having a dataset from which an input to the classifier is drawn. Recall that (resp. ) is a measurement variable denoting the input (resp. output) of the classifier . In each world , is the distribution of ’s input over , i.e., the test data distribution, and is the distribution of ’s output over . For each group of inputs, we introduce a static formula representing that an input belongs to . We also introduce a formula representing that is the dataset that the input to is drawn from. Formally, these formulas are interpreted as follows:
For each state , iff .
For each world , iff .
Now we formalize three popular notions of fairness of classifiers by using counterfactual epistemic operators (introduced in Section 3) as follows.
7.1 Group Fairness (Statistical Parity)
The group fairness formulated as statistical parity  is the property that the output distributions of the classifier are identical for different groups. Formally, for each and a group , let be the distribution of the output (over ) of the classifier when the input is sampled from a dataset and belongs to . Then the statistical parity is formalized using the total variation by .
To express this using StatEL, we define an accessibility relation in by:
Intuitively, represents that the two probability distributions and of the outputs by the classifier respectively in and in are close in terms of . Note that and respectively represent and .
Then the statistical parity w.r.t. groups means that in terms of , we cannot distinguish a world having a dataset and satisfying from another world satisfying . By Proposition 1, this is expressed as:
7.2 Individual Fairness (as Lipschitz Property)
The individual fairness formulated as a Lipschitz property  is the property that the classifier outputs similar labels given similar inputs. Formally, let and be the distributions of the outputs (over ) of the classifier when the inputs are and , respectively. Then the individual fairness is formalized using some divergence , some metric , and a threshold by .
To express this using StatEL, we define an accessibility relation in for the metric and the divergence as follows:
Intuitively, represents that, when inputs are closer in terms of the metric , the classifier outputs closer labels in terms of the divergence .
Then the individual fairness w.r.t. and means that in terms of , we cannot distinguish two worlds where is satisfied, i.e., the classifier outputs given an input . By Proposition 1, this is expressed as:
This represents that when we observe the distribution of the classifier’s output , we can less distinguish two worlds and when their inputs and are closer.
7.3 Equal Opportunity
Equal opportunity [21, 40] is the property that the recall (true positive rate) is the same for all the groups. Formally, given an advantage class (e.g., not defaulting on a loan) and a group of inputs with a protected attribute (e.g., race), a classifier is said to satisfy equal opportunity of w.r.t. if we have:
If we allow the logic to use the universal quantification over the probability value , then the notion of equal opportunity could be formalized as:
However, instead of allowing for this universal quantification, we can use the modal operators (defined by (7)) with , and represent equal opportunity as the fact that we cannot distinguish a world having a dataset and satisfying from another world satisfying as follows:
Then equal opportunity can be regarded as a special case of statistical parity.
8 Related Work
In this section, we provide a brief overview of related work on the specification of statistical machine learning and on epistemic logic for describing specification.
Desirable properties of statistical machine learning.
There have been a large number of papers on attacks and defences for deep neural networks [36, 8]. Compared to them, however, not much work has been done to explore the formal specification of various properties of machine learning. Seshia et al.  present a list of desirable properties of DNNs (deep neural networks) although most of the properties are presented informally without mathematical formulas. As for robustness, Dreossi et al.  propose a unifying formalization of adversarial input generation in a rigorous and organized manner, although they formalize and classify attacks (as optimization problems) rather than define the robustness notions themselves. Concerning the fairness notions, Gajane  surveys the formalization of fairness notions for machine learning and present some justification based on social science literature.
Epistemic logic for describing specification.
The BAN logic , proposed by Burrows, Abadi and Needham, is a notable example of epistemic logic used to model and verify the authentication in cryptographic protocols. To improve the formalization of protocols’ behaviours, some epistemic approaches integrate process calculi [23, 10, 7].
Concerning the formalization of fairness notions, previous work in formal methods has modeled different kinds of fairness involving timing by using temporal logic rather than epistemic logic. As far as we know, no previous work has formalized fairness notions of machine learning using counterfactual epistemic operators.
Formalization of statistical properties.
In studies of philosophical logic, Lewis  shows the idea that when a random value has various possible probability distributions, then those distributions should be represented on distinct possible worlds. Bana  puts Lewis’s idea in a mathematically rigorous setting. Recently, a modal logic called statistical epistemic logic  is proposed and is used to formalize statistical hypothesis testing and the notion of differential privacy . Independently of that work, French et al.  propose a probability model for a dynamic epistemic logic in which each world is associated with a subjective probability distribution over the universe, without dealing with non-deterministic inputs or statistical divergence.
We have shown a logical approach to formalizing statistical classifiers and their desirable properties in a simple and abstract way. Specifically, we have introduced a formal model for probabilistic behaviours of classifiers and non-deterministic adversarial inputs using a distributional Kripke model. Then we have formalized the classification performance, robustness, and fairness of classifiers by using StatEL. Moreover, we have also clarified some relationships among properties of classifiers, and relevance between classification performance and robustness. To formalize fairness notions, we have introduced a notion of counterfactual knowledge and shown some techniques to express conditional indistinguishability. As far as we know, this is the first work that uses logical formulas to express statistical properties of machine learning, and that provides epistemic (resp. counterfactually epistemic) views on robustness (resp. fairness) of classifiers.
In future work, we are planning to include temporal operators in the specification language and to formally reason about system-level properties of learning-based systems. We are also interested in developing a general framework for the formal specification of machine learning associated with testing methods and possibly extended with Bayesian networks. Our future work also includes an extension of StatEL to formalize machine learning other than classification problems. Another possible direction of future work would be to clarify the relationships between our counterfactual epistemic operators and more general notions of counterfactual knowledge in previous work such as.
Appendix 0.A Proof for Proposition 1
We first prove the claim (i) as follows. We show the direction from left to right. Assume that . Let such that and . Then . By , we obtain , which is logically equivalent to