1 Introduction
Randomisation provides the most efficient algorithmic solutions, at least concretely, in many different contexts. A typical example is the one of primality testing, where the MillerRabin test [Miller1976, Rabin1980] remains the preferred choice despite polynomial time deterministic algorithms are available from many years now [AKS2002]
. Probability theory can be exploited even more fundamentally in programming, by way of socalled probabilistic (or, more specifically, Bayesian) programming, as popularized by languages like, among others,
ANGLICAN [WMM2014] or CHURCH [GMRBT08]. This has stimulated research about probabilistic programming languages and their semantics [jones1990, DH2002, EPT2018], together with type systems [DLG2017, BDL2018], equivalence methodologies [DLSA2014, CDL2014], and verification techniques [SABGGH2019].Giving a satisfactory denotational semantics to higherorder functional languages is already problematic in presence of probabilistic choice [jones1990, JT1998], and becomes even more challenging when continuous distributions and scoring are present. Recently, quasiBorel spaces [hksy2017] have been proposed as a way to give semantics to calculi with all these features, and only very recently [VKS2019] this framework has been shown to be adaptable to a fullyfledged calculus for probabilistic programming, in which continuous distributions and softconditioning are present. Probabilistic coherent spaces [DE2011] are fully abstract [EPT2018] for calculi with discrete probabilistic choice, and can, with some effort, be adapted to calculi with sampling from continuous distributions [EPT2018POPL], although without scoring.
A research path which has been studied only marginally, so far, consists in giving semantics to Bayesian higherorder programming languages through interactive forms of semantics, e.g. game semantics [HO2000, AJM2000] or the geometry of interaction [girard1989]. One of the very first models for higherorder calculi with discrete probabilistic choice was in fact a game model, proved fully abstract for a probabilistic calculus with global ground references [DH2002]. After more than ten years, a parallel form of Geometry of Interaction (GoI) and some game models have been introduced for calculi with probabilistic choice [DLFVY2017, CCPW2018, CP2018], but in all these cases only discrete probabilistic choice can be handled, with the exception of a recent work on concurrent games and continuous distributions [PW2018].
In this paper, we will report on some results about GoI models of higherorder Bayesian languages. The distinguishing features of the introduced GoI model can be summarised as follows:

Simplicity. The category on which the model is defined is the one of measurable spaces and partial measurable functions, so it is completely standard from a measuretheoretic perspective.

Expressivity. As is wellknown, the GoI construction [jsv, ahs2002] allows to give semantics to calculi featuring higherorder functions and recursion. Indeed, our GoI model can be proved adequate for , a fullyfledged calculus for probabilistic programming.

Flexibility. The model we present is quite flexible, in the sense of being able to reflect the operational behaviour of programs as captured by both the distributionbased and the samplingbased semantics.

Intuitiveness. GoI visualises the structure of programs in terms of graphs, from which dependencies between subprograms can be analyzed. Adequacy of our model provides diagrammatic reasoning principle about observational equivalence of .
This paper’s contributions, beside the model’s definition, are two adequacy results which precisely relate our GoI model to the operational semantics, as expressed (following [bdlgs2016]), in both the distribution and sampling styles. As a corollary of our adequacy results, we show that the distribution induced by samplingbased operational semantics coincides with distributionbased operational semantics.
1.1 Turning Measurable Spaces into a GoI Model
Before entering into the details of our model, it is worthwhile to give some hints about how the proposed model is obtained, and why it differs from similar GoI models from the literature.
The thread of work the proposed model stems from is the one of socalled memoryful geometry of interaction [hmh2014, mhh2016]. The underlying idea of this paper is precisely the same: program execution is modelled as an interaction between the program and its environment, and memoisation takes place inside the program as a result of the interaction.
In the previous work on memoryful GoI by the second author with Hasuo and Muroya, the goal consisted in modelling a calculus with algebraic effects. Starting from a monad together with some algebraic effects, they gave an adequate GoI model for such a calculus, which is applicable to wide range of algebraic effects. In principle, then, their recipe could be applicable to , sinc samplingbased operational semantics enables us to see scoring and sampling as algebraic effects acting on global states. However, the that would not work for , since the category of measurable spaces^{1}^{1}1We need to work on because we want to give adequacy for distributionbased semantics. is not cartesian closed, and we thus cannot define a state monad by way of the exponential .
In this paper, we side step this issue by a series of translations, to be described in Section 4 below. Instead of looking for a state monad on , we embed into the category of objects and Mealy machines (Section 5) and use a state monad on this category. This is doable because is a compact closed category given by the construction [ahs2002]. The use of such compact closed categories (or, more generally, of traced monoidal categories) is the way GoI models capture higherorder functions.
1.2 Outline
The rest of the paper is organised as follows. After giving some necessary measuretheoretic preliminaries in Section 2 below, we introduce in Section 3 the language , together with the two kinds of operational semantics we were referring to above. In Section 4, we introduce our GoI model informally, while in Section 5 a more rigorous treatment of the involved concepts is given, together with the adequacy results. We discuss in Section 10 an alternative way of giving a GoI semantics to based on sfinite kernels, and we conclude in Section LABEL:sec:conclusion.
2 MeasureTheoretic Preliminaries
We recall some basic notions in measure theory that will be needed in the following. We also fix some useful notations. For more about measure theory, see standard text books such as [billingsley1986].
A algebra on a set is a family consisting of subsets of such that ; and if , then the complement is in ; and for any family , the intersection is in . A measurable space is a set equipped with a algebra on . We often confuse a measurable space with its underlying set . For example, we simply write instead of . For measurable spaces and , we say that a partial function (in this paper, we use for both partial functions and total functions) is measurable when for all , the inverse image
is in . A measurable function from to is a totally defined partial measurable function. A (partial) measurable function is invertible when there is a measurable function such that and are identities. In this case, we say that is an isomorphism from to and say that is isomorphic to .
We denote a singleton set by , and we regard the latter as a measurable space by endowing it with the trivial algebra. We also regard the empty set as a measurable space in the obvious way. In this paper, denotes the measurable set of all nonnegative integers equipped with the algebra consisting of all subsets of , and denotes the measurable set of all real numbers equipped with the algebra consisting of Borel sets, that is, the least algebra that contains all open subsets of . By the definition of , a function is measurable whenever for all open subsets . Therefore, all continuous functions on are measurable.
When is a subset of the underlying set of a measurable space , we can equip with a algebra . This way, we regard the unit interval and the set of all nonnegative real numbers as measurable spaces, and indicate them as follows:
For measurable spaces and , we define the product measurable space and the coproduct measurable space by
where the underlying algebras are:
We assume that has higher precedence than , i.e., we write for . In this paper, we always regard finite products as the product measurable space on . It is wellknown that the algebra is the set of all Borel sets, i.e., is the least one that contains all open subsets of . Partial measurable functions are closed under compositions, products and coproducts.
Let be a measurable space. A measure on is a function from to that is the set of all nonnegative real numbers extended with , such that

; and

for any mutually disjoint family , we have .
We say that a measure on is finite when and that it is finite if for some family satisfying .
For a measurable space , we write for a measure on given by for all . If is a measure on a measurable space , then for any nonnegative real number , the function is also a measure on . The Borel measure on is the unique measure that satisfies
We define the Borel measure on by . For a measurable function and a measurable subset , we denote the integral of with respect to the Borel measure restricted to by
For a measurable space and for an element , a Dirac measure on is given by
The square bracket notation in the right hand side is called Iverson’s bracket. In general, for a proposition , we have when is true and when is false.
Proposition 2.1.
For every finite measures on a measurable space and on a measurable space , there is a unique measure on such that for all and .
The measure is called the product measure of and . For example, the Borel measure on is the product measure of the Borel measure on .
Finally, let us recall the notion of a kernel, which is a wellknown concept in the theory of stochastic processes. For measurable spaces and , a kernel from to is a function such that for any , the function is a measure on , and for any , the function is measurable. Notions of finite and finite kernels can be naturally given, following the emponymous constraint on measures. Those kernels which can be expressed as the sum of countably many finite kernels are said to be sfinite [staton2017]. We use kernels to give semantics for our probabilistic programming language, to be defined in the next section.
3 Syntax and Operational Semantics
3.1 Syntax and Type System
Our language for higher order Bayesian programming can be seen as Plotkin’s endowed with real numbers, measurable functions, sampling from the uniform distribution on and softconditioning. We first define types , values and terms as follows:
Here, varies over a countably infinite set of variable symbols, and varies over the set of all real numbers. Each function identifier is associated with a measurable function from to . For terms and , we write for the captureavoiding substitution of in by .
Terms in are restricted to be Anormal forms, in order to make some of the arguments on our semantics simpler. This restriction is harmless for the language’s expressive power, thanks to the presence of bindings. For example, term application can be defined to be .
The term constructor and the constant enable probabilistic programming in . Evaluation of has the effect of multiplying the weight of the current probabilistic branch by , this way enabling a form of softconditioning. The constant generates a real number randomly drawn from the uniform distribution on . Only one sampling mechanism is sufficient because we can model sampling from other standard distributions by composing with measurable functions [wcgc2018].
Terms can be typed in a natural way. A context is a finite sequence consisting of pairs of a variable and a type such that every variable appears in at most once. A type judgement is a triple consisting of a context , a term and a type . We say that a type judgement is derivable when we can derive from the typing rules in Figure 1. Here, the type of is , and the type of is because returns a real number, and the purpose of scoring is its side effect.
In the sequel, we only consider derivable type judgements and typable closed terms, that is, closed terms such that is derivable for some type .
3.2 DistributionBased Operational Semantics
We define distributionbased operational semantics following [bdlgs2016] where, however, a algebra on the set of terms is necessary so as to define evaluation results of terms to be distributions (i.e. measures) over values. In this paper, we only consider evaluation of terms of type and avoid introducing algebras on sets of closed terms, thus greatly simplifying the overall development.
Distributionbased operational semantics is a function that sends a closed term to a measure on . Because of the presence of , the measure may not be a probabilistic measure, i.e., may be larger than , but the idea of distributionbased operational semantics is precisely that of associating each closed term of type with a measure over .
As common in callbyvalue programming languages, evaluation is defined by way of evaluation contexts:
The distributionbased operational semantics of is a family of binary relations between closed terms of type and measures on inductively defined by the evaluation rules in Figure 2 where the evaluation rule for is inspired from the one in [staton2017]. The binary relation in the precondition of the third rule in Figure 2 is called deterministic reduction and is defined as follows as a relation on closed terms:
The last evaluation rule in Figure 2 makes sense because in the precondition is a kernel from to :
Lemma 3.1.
For any and for any term
there is a finite kernel from to such that for any and for any measure on ,
where .
Proof.
Let be a context of the form . In this proof, for a finite sequence , and for a term , we denote
by . We prove the statement by induction on . (Base case) Let be a kernel from to given by
Then for any ,
(Induction step) We define a redex by
We note that in the above BNF can be variables. By induction on the size of type derivation, we can show that every term is either a value or of the form for some evaluation context and some redex . Given a term where , we prove the induction step by case analysis.

If is a value, then is either a variable or a constant . When is a variable , we have
When is a constant , we have
Both given by
are kernels from to .

If is of the form , then by induction hypothesis, there is a kernel from to such that for any ,
We define a kernel from to by
This is a kernel because if is a nonnegative measurable function, then
is measurable. See [billingsley1986, Theorem 18.3]. Then, for any ,

If is of the form for some , then by induction hypothesis, there is a kernel from to such that for any ,
We define a kernel to by
Then, for any ,

If is of the form for some , then by induction hypothesis, there is a kernel from to such that for any ,
We define a kernel to by
Then, for any ,

If is of the form , then by induction hypothesis, there is a kernel from to such that for all ,
Hence,

If is of the form , then by induction hypothesis, there is a kernel from to such that for all ,
Hence,

If is of the form , then is equal to either a variable or a constant . For simplicity, we suppose that and and . By induction hypothesis, there is a kernel from to such that for all ,
We define a kernel from to by
Then, for any ,

If is of the form , then by induction hypothesis, there is a kernel from to such that for all ,
Hence,

If is of the form for some , then by induction hypothesis, there are kernels and from to such that for any ,
We define a kernel from to by
Then, for any ,

If is of the form , then by induction hypothesis, there is a kernel from to such that for any ,
Hence,

If is of the form for some real number , then by induction hypothesis, there is a kernel from to such that
Hence,
∎
Lemma 3.1 implies that the relations can be seen as functions from the set of closed terms of type to the set of measures on .
The stepindexed distributionbased operational semantics approximates the evaluation of closed terms by restricting the number of reduction steps. Thus, the limit of the stepindexed distributionbased operational semantics represents the “true” result of evaluating the underlying term.
Definition 3.1.
For a closed term and a measure on , we write when there is a family of measures on such that and for all ,
The binary relation is a function from the set of closed terms of type to the set of measures on . This follows from Lemma 3.1 and that the family of measures on such that forms an ascending chain with respect to the pointwise order. Moreover, it can be proved that for any , given by is an sfinite kernel.
3.3 SamplingBased Operational Semantics
can be endowed with another form of operational semantics, closer in spirit to inference algorithms, called the samplingbased operational semantics. The way we formulate it is deeply inspired from the one in [bdlgs2016].
The idea behind samplingbased operational semantics is to give the evaluation result of each probabilistic branch somehow independently. We specify each probabilistic branch by two parameters: one is a sequence of random draws, which will be consumed by ; the other is a likelihood measure called weight, which will be modified by .
Definition 3.2.
A configuration is a triple consisting of a closed term , a real number called the configuration’s weight, and a finite sequence of real numbers in , called its trace.
Below, we write for the empty sequence. For a real number and a finite sequence consisting of real numbers, we write for the finite sequence obtained by putting on the head of . In Figure 3, we give the evaluation rules of samplingbased operational semantics where is the deterministic reduction relation introduced in the previous section. We denote the reflective transitive closure of by . Intuitively, means that by evaluating , we get the real number with weight consuming all the random draws in .
4 Towards Mealy Machine Semantics
In this section, we give some intuitions about our GoI model, which we also call Mealy machine semantics. Giving Mealy machine semantics for requires translating into the linear calculus. This is because GoI is a semantics for linear logic, and is thus tailored for calculi in which terms are treated as resources. Schematically, Mealy machine semantics for translates terms in into Mealy machines in the following way.
In Section 4.1, we explain the first three steps. The last step deserves to be explained in more detail, which we do in Section 4.2. For the sake of simplicity, we ignore the translation of conditional branching and the fixed point operator.
4.1 From to Proof Structures
4.1.1 Moggi’s Translation
In the first step, we translate into an extension of the Moggi’s metalanguage by Moggi’s translation [moggi1991]. Here, in order to translate scoring and sampling in , we equip Moggi’s metalanguage with base types and and the following terms:
where is the monad of Moggi’s metalanguage. Any type of is translated into the type defined as follows:
Terms and in are translated into and in Moggi’s metalanguage respectively. See [moggi1991] for more detail about Moggi’s translation.
4.1.2 Girard Translation
We next translate the extended Moggi’s metalanguage into an extension of the linear calculus, by way of the socalled Girard translation [girard1987]. Types are given by
where , and are base types, and terms are generated by the standard term constructors of the linear calculus, plus the following rules:
(as customary in linear logic, is an abbreviation of ). These typing rules are derived from the following translation of types of the extended Moggi’s metalanguage into types of the extended linear calculus:
The definition of is motivated by the following categorical observation: let be the syntactic category of the extended linear calculus, which is a symmetric monoidal closed category endowed with a comonad with certain coherence conditions (see e.g. [hs2003]), and let be the coKleisli category of the comonad . Then, by composing the adjunction between and with a state monad on , we obtain a monad on :
which sends an object to . This use of the state monad is motivated by samplingbased operational semantics: we can regard as a callbyvalue calculus with global states consisting of pairs of a nonnegative real number and a finite sequence of real numbers, and we can regard and as effectful operations interacting with those states.
4.1.3 The Third Step
We translate terms in the extended linear calculus into (an extension of proof structures) [lafont1995], which are graphical presentations of type derivation trees of linear terms. We can also understand proof structures as string diagrams for compact closed categories [selinger2011]. Operators of the pure, linear, calculus, can be translated as usual [lafont1995]. For example, type derivation trees
are translated into proof structures
respectively where nodes labelled with and are proof structures associated to type derivations of and . Terms of the form , and , require new kinds of nodes:
.
This is not a direct adaptation of typing rules for and in the linear calculus, but the correspondence can be recovered by way of multiplicatives:
Comments
There are no comments yet.