# Semantics of higher-order probabilistic programs with conditioning

We present a denotational semantics for higher-order probabilistic programs in terms of linear operators between Banach spaces. Our semantics is rooted in the classical theory of Banach spaces and their tensor products, but bears similarities with the well-known Scott semantics of higher-order programs through the use ordered Banach spaces which allow definitions in terms of fixed points. Being based on a monoidal rather than cartesian closed structure, our semantics effectively treats randomness as a resource.

## Authors

• 8 publications
• 8 publications
• ### Extensional Denotational Semantics of Higher-Order Probabilistic Programs, Beyond the Discrete Case

We describe a mathematical structure that can give extensional denotatio...
04/13/2021 ∙ by Guillaume Geoffroy, et al. ∙ 0

• ### Formal verification of higher-order probabilistic programs

Probabilistic programming provides a convenient lingua franca for writin...
07/16/2018 ∙ by Tetsuya Sato, et al. ∙ 0

• ### On Generalized Metric Spaces for the Simply Typed Lambda-Calculus (Extended Version)

Generalized metrics, arising from Lawvere's view of metric spaces as enr...
04/27/2021 ∙ by Paolo Pistone, et al. ∙ 0

• ### Densities of almost-surely terminating probabilistic programs are differentiable almost everywhere

We study the differential properties of higher-order statistical probabi...
04/08/2020 ∙ by Carol Mak, et al. ∙ 0

• ### Learning higher-order logic programs

A key feature of inductive logic programming (ILP) is its ability to lea...
07/25/2019 ∙ by Andrew Cropper, et al. ∙ 0

• ### Linear Models of Computation and Program Learning

We consider two classes of computations which admit taking linear combin...
12/15/2015 ∙ by Michael Bukatin, et al. ∙ 0

• ### Local Local Reasoning: A BI-Hyperdoctrine for Full Ground Store

Modelling and reasoning about dynamic memory allocation is one of the we...
03/11/2020 ∙ by Miriam Polzer, et al. ∙ 0

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## I Introduction

Probabilistic programming has enjoyed a recent resurgence of interest driven by new applications in machine learning and statistical analysis of large datasets. The emergence of probabilistic programming languages such as Church and Anglican, which allow statisticians to construct and sample distributions and perform Bayesian inference, has created a need for sound semantic foundations and tools for specification and reasoning. Several recent works have approached this task from various perspectives

[1, 2, 3, 4].

One of the earliest works on the semantics of probabilistic programs was [5], in which operational and denotational semantics were given for an idealized first-order imperative language with random number generation. Programs and data were interpreted over ordered Banach spaces. Programs were modelled as positive and continuous linear operators on an ordered Banach space of measures. In [6], an equivalent predicate-transformer semantics was introduced based on ordered Banach spaces of measurable functions and shown to be dual to the measure-transformer semantics of [5].

In this paper revisit this approach. We identify a symmetric monoidal closed category of ordered Banach spaces and regular maps that can serve as a foundation for higher-order probabilistic programming with sampling, conditioning, and Bayesian inference. Bayesian inference can be viewed as reversing the computation of a probabilistic program to infer information about a prior distribution from observations. We model Bayesian inference as computing the adjoint of a linear operator and show how it corresponds to computing a disintegration.

The extension to higher types is achieved through a tensor product construction in the category that gives symmetric monoidal closure. Although not cartesian, the construction does admit an adjunction with homsets enriched with an ordered Banach space structure acting as internalized exponentials. To accommodate conditioning and Bayesian inference, we introduce ‘Bayesian types’, in which values are decorated with a prior distribution. Based on this foundation, we give a type system and denotational semantics for an idealized higher-order probabilistic language with sampling, conditioning, and Bayesian inference.

We believe our approach should appeal to computer scientists, as it is true to traditional Scott-style denotational semantics (see § IV-D). It should also appeal to mathematicians, statisticians and machine learning theorists, as it uses very familiar mathematical objects from those fields. For example, a traditional perspective is that a Markov process is just a positive linear operator of norm 1 between certain Banach lattices [7, Ch. 19]. These are precisely the morphisms of our semantics. Similarly, classical ergodic theory, which is key to proving the correctness of important algorithms like Gibbs sampling, is an important part of the theory of these operators [8]. Our semantics therefore connects seamlessly with a wealth of results from functional analysis, ergodic theory, statistics, etc. We believe that this will greatly simplify the task of validating stochastic machine learning algorithms.

We should also mention that our semantics fits well with the view of entropy (randomness) as a computation resource, like time or space. True random number generators can only produce randomness at a limited rate; physically, randomness is a resource [9]. Our type system, being resource-sensitive, has some nice crypographical properties: it is forbidden by construction to use a sample more than once; that is, each operation consuming a random sample requires a fresh sample (component in a tensor product).

Related works: Two very powerful semantics for higher-order probabilistic programming have been recently developed in the literature. In [1, 2]

, a semantics is given in terms of so-called quasi-Borel spaces. These form a Cartesian closed category and admit a notion of probability distribution and of a Giry-like monad of probability distributions. In

[4] the authors develop a semantics in terms of measurable cones. These form a cpo-enriched Cartesian closed category which provides a semantics to a probabilistic extension of PCF that includes conditioning. The key differences with the present semantics are the following. First, these proposed mathematical universes come directly from the world of theoretical computer science, whilst as mentioned above, our semantics is rooted in the traditional mathematics of the objects being constructed by the programs. Second, quasi-Borel spaces and measurable cones form Cartesian closed categories, whereas we work in a monoidal closed category, with obvious implications in terms of resources (e.g. we cannot copy a value). Finally, our semantics of conditioning has been reduced to a mathematically very simple, but also very general construction (taking the adjoint of a linear operator, see § IV-B8), whilst in [1] un-normalized posteriors and normalization constants are computed pointwise, and [4] effectively hard-codes the rejection-sampling algorithm into the semantics.

The reader will find the proofs of most results in the Appendix, together with some background material on measure theory and tensor products of Banach spaces.

## Ii Background

We start by describing the mathematical landscape of our semantics. We assume that the reader is familiar with the basic definitions of measure theory and of what a (real) Banach space is (see [7, Ch. 4, 6, 8-11] for a gentle introduction in the spirit of this paper).

### Ii-a Banach spaces, Disintegration and Bayesian inversion

#### Ii-A1 Some important Banach spaces

Two classes of Banach spaces will appear repeatedly in this paper.

First, for any measurable space we introduce the space , or simply , as the set of signed measures of bounded variation over . is a Banach space: the linear structure is inherited pointwise from , and the norm is given by the total variation; see [7, Th. 10.56] for a proof that the space is complete.

Second, for a measured space and , the Lebesgue space is the set of equivalence classes of -almost everywhere equal -integrable real-valued functions, that is to say functions such that

 ∫|f|p dμ<∞.

The linear structure is inherited pointwise from and the norm is given by . When , the space is defined as the set of equivalence classes of -almost everywhere equal bounded real-valued functions with the norm given by the essential supremum:

 ∥f∥∞=inf{C≥0∣|f(x)|≤C μ-a.e.}

A proof that Lebesgue spaces are complete can be found in [7, Th. 13.5].

#### Ii-A2 Disintegration

Measurable spaces and maps form the category . We define the functor by setting to be the set of signed measures of bounded variation on equipped with the smallest -algebra making all evaluation maps , measurable and by setting , the pushforward measure of , for any 111This is just a generalisation of the Giry monad on [10]. Note that and share the same underlying set, but the former is a Banach space and the latter a measurable space.. We define a measure kernel to be a measurable map such that for all for some fixed . A probability kernel is a measure kernel such that and for all . A measure can also be pushed-forward through a measure kernel to give a measure in via the definition

 f∗(μ)(B)=∫Xf(x)(B) dμ (1)

which converges since .

With these definitions in place we can introduce the important notion of disintegration which underlies the semantics of Bayesian conditioning (see § IV-B8). We provide a slightly simplified version of the definition which will be enough for our purpose (see [11, Def. 1] for a very general definition). Intuitively, given a measurable map and a finite measure on , we say that has a disintegration w.r.t. if the fibres of can be equipped with measures which average out to over the pushforward measure . Formally, the disintegration of w.r.t. to is a measure kernel such that

In fact [11, Th. 3] shows that can be chosen to be a probability kernel. As can be seen from the first condition, a disintegration – if it exists at all – is only defined up to a null set for the pushforward measure. For sufficiently well-behaved spaces, for example standard Borel spaces [12, 17.35] or more generally metric spaces with Radon measures [11, Th. 1], disintegrations can be shown to always exist.

#### Ii-A3 Bayesian inversion

The notion of disintegration is key to the understanding of Bayesian conditioning. The traditional setup is as follows: we are given a probability kernel where is regarded as a parameter space and is regarded as a parametrized statistical model on , a space of observable values. We also start with a probability distribution on (the prior) which is regarded as the current state of belief of where the ‘true’ parameters of the model lie. The problem is, given an observation , to update the state of belief to a new distribution (the posterior) reflecting the observation. We must therefore find a kernel going in the opposite direction . As shown in [13, 14]

this reverse kernel can be built using a disintegration as follows. First we define a joint distribution

defined by

 γ(A×B)=∫Af(x)(B) dμ,

The Bayesian inverse , if it exists, is given by the probability kernel

 f†μ=(πX)∗∘(πY)†γ

where is the disintegration of the measure along the projection (it can be assumed to be a probability kernel). This construction clearly generalizes to all measure kernels.

### Ii-B Ordered Banach spaces

#### Ii-B1 Regular Ordered Banach spaces

An ordered vector space

is a vector space together with a partial order

which is compatible with the linear structure in the sense that for all

 u≤v⇒u+w≤v+wandu≤v⇒λu≤λv

A vector in an ordered vector space is called positive if and the collection of all positive vectors is called the positive cone of and denoted . The positive cone is said to be generating if , that is to say if every vector can be expressed as the difference of two positive vectors.

An ordered normed vector space is an ordered vector space in which the positive cone is closed for the topology generated by the norm. A subset of the positive cone of particular importance will be the positive unit ball . An ordered Banach space is an ordered normed vector space which is complete. We can now describe the central class of object of this work: an ordered normed space is said to be regular if it satisfies [15, Ch. 9]:

1. [label=R0]

2. if then

In particular, a regular ordered Banach space is an ordered Banach space which is regular. A few comments are in order. First note that if then , and thus is positive, so 2 says that the norm of any vector can be approximated arbitrarily well by the norm of positive vectors. Note also that 2 implies that the positive cone is generating: for any , fix , then by 2 there exists with whose norm is -close to that of . Since , and since it follows from that both and are positive, can indeed be expressed as the difference of two positive vectors. Regularity can be understood as the fact that the space is fully characterised by its positive unit ball [16].

#### Ii-B2 Regular operators and RoBan

As mentioned above, regular ordered Banach spaces are determined in a very strong sense by their positive cone which is generating and determines the norm (axiom R2). It is therefore natural to consider linear operators between regular ordered Banach spaces which send positive vectors to positive vectors, i.e. such that . Such operators are called positive operators and constitute a field of mathematical research in their own right [17, 18]. The collection of positive operators between two regular ordered Banach spaces clearly does not form a vector space, since it is not closed under scalar multiplication by negative reals. We therefore consider the span of this collection, that is to say the operators which can be expressed as the difference between two positive operators, i.e. with . Such operators are called regular operators, and we define the category as the category whose objects are regular ordered Banach spaces and whose morphisms are regular operators. Regular operators have the following important properties.

###### Proposition 1.

Regular operators on regular ordered Banach spaces are (norm) bounded.

###### Theorem 2 ([16]).

If are regular ordered Banach spaces and is equipped with the obvious linear structure, pointwise order and the regular norm

 ∥f∥r=inf{∥g∥:−g≤f≤g}

where is the usual operator norm, then is a regular ordered Banach space.

This result justifies the following notation: we will denote the regular ordered Banach space of operators between the regular ordered Banach spaces by .

#### Ii-B3 Banach lattices

We now describe a particularly important class of regular ordered Banach spaces: the class of Banach lattices. Although this class of objects lacks the categorical closure properties that we seek (see § II-D), most of the objects we will be dealing with are Banach lattices.

An ordered vector space is a Riesz space if its partial order is a lattice. This allows the definition of the positive and negative part of a vector as and its modulus as . Note that , with positive, and the positive cone of a Riesz space is thus generating. A Riesz space is order complete or Dedekind-complete (resp. -order complete or -Dedekind complete) if every non-empty (resp. non-empty countable) subset of which is order bounded has a supremum222Order-completeness was called conditional completeness in [5]. A normed Riesz space is a Riesz space equipped with a lattice norm, i.e. norm satisfying axiom R1 above. A normed Riesz space is called a Banach lattice if it is (norm-) complete. As stated, Banach lattices form a special class of regular ordered Banach spaces:

###### Proposition 3.

Banach lattices are regular.

###### Example 4.

Given a measurable space , the space can be shown [7, Th 10.56] to be a Banach lattice. The Banach space structure was described above and the lattice structure is given by

 (μ∨ν)(A)=sup{μ(B)+ν(A∖B)∣B measurable ,B⊆A}

and dually for meets. The Hahn-Jordan decomposition theorem defines the positive and negative part of a measure in the Banach lattice .

###### Example 5.

Given a measured space and , the Lebesgue space is a Banach lattice with the pointwise order. In particular, for any , the positive and negative parts and of a function used in the definition of the Lebesgue integral defines the positive-negative decomposition of in the Banach lattice . We will say that are Hölder conjugate if either of the following conditions hold: (i) and , or (ii) and , or (iii) and .

The examples of Banach lattices described above are instances of an even better behaved class of objects called abstract Lebesgue spaces or AL spaces. They are defined by the following property of the norm: a Banach lattices is an AL space if for all

 ∥u+v∥=∥u∥+∥v∥ (AL)

Not surprisingly, the Lebesgue spaces are examples of AL spaces, as are the Banach lattices .

###### Theorem 6 ([18], Sec. 4.1).

AL spaces are order-complete.

#### Ii-B4 Bands

The order structure of Riesz spaces gives rise to classes of subspaces which are far richer than the traditional linear subspaces. An ideal of a Riesz space is a linear subspace with the property that if and then . An ideal is called a band when for every subset if exists in , then it also belongs to . Every band in a Banach lattice is itself a Banach lattice. Of particular importance in what follows will be the principal band generated by an element , which we denote and can be described explicitly by

 Vv={w∈V∣(|w|∧n|v|)↑|w|}
###### Example 7.

Given a measure , the band generated by is the set of signed measures of bounded variation which are absolutely continuous w.r.t. [7, Th. 10.61]. The ordered version of the Radon-Nikodym theorem states that as Banach lattices [7, Th. 13.19].

#### Ii-B5 Köthe duals

There are two modes of ‘convergence’ in an ordered Banach space: order convergence and norm convergence. The latter is well-known, the former less so. Let be a directed set, and let be a net in an ordered Banach space . We say that converges in order to if there exists a decreasing net with , notation , such that

 −uα≤vα−v≤uα for all α∈D

If the directed set is we get the notion of order-convergent sequence. Order and norm convergence of sequences are disjoint concepts, i.e. neither implies the other (see [17, Ex. 15.2] for two counter-examples). However if a sequence converges both in order and in norm then the limits are the same (see [17, Th. 15.4]). Moreover, for monotone sequences norm convergence implies order convergence [17, Th. 15.3].

It is well known that bounded operators are continuous, i.e. preserve norm-converging sequences. The corresponding order-convergence concept is defined as follows: an operator between Riesz spaces is said to be -order continuous if whenever , 333Equivalently if , i.e. is an increasing sequence with supremum , implies . Note the similarity with Scott-continuity, the only difference being the condition that sequences must be order-bounded.. We can thus consider two types of dual spaces on an ordered Banach space : on the one hand we can consider the norm-dual:

 V∗={f:V→R:f is norm-continuous}

and the -order-dual:

 Vσ={f:V→R:f is σ-order continuous and regular}

The latter is also known as the Köthe dual of [19, 17].

###### Theorem 8.

The Köthe dual of a regular ordered Banach space is an order-complete Banach lattice.

###### Example 9.

It is shown in e.g. [17, 20] that

 Lp(X,μ)σ=Lq(X,μ) (2)

for any Hölder conjugate pair . In particular the spaces and are Köthe dual of each other. Note that they are not ordinary duals.

#### Ii-B6 Categorical connections.

We conclude this section by a summary of some results from [14] which provide a categorical connection between most of the topics covered so far.

The category is the category whose objects are pairs where is standard Borel spaces [12] (in fact any class of measurable spaces for which disintegrations exist will do) and . A morphism between and is a measure kernel such that (where is defined in (1)), in which case the morphism is denoted as well. As was shown in [14], any two morphisms which disagree only on a null set can be identified, and the morphisms of thus become equivalence classes of almost everywhere equal measure kernels (see [14] for the technical details of this construction).

Now, we define some functors. First, as was shown in [14], the Bayesian inversion operation described in § II-A3 defines a functor which leaves objects unchanged and sends a morphism to its Bayesian inverse (we drop the subscript of because it is made explicit from the typing). Note that [14]. We also define the functor which sends a regular ordered Banach space to its Köthe dual, and a regular operator to its adjoint defined in the usual way. Note that just as taking the Köthe dual gives an order-complete space, the adjoint of a regular operator is an order-continuous regular operator [17, Ch. 26].

Connecting the categories, we define for each the functor which sends a -object to the Lebesgue space and a -arrow to the operator , . We also define the functor which sends an object to the band and a morphism to the operator , .

The functors and of type are related via natural transformations which play a major role in measure theory [14]:

• acts at by sending a measure to its Radon-Nikodym derivative .

• acts at by sending an -map to its Measure Representation .

• acts at by sending a measure to its Functional Representation .

• acts at by sending an -functional to its Riesz Representation .

The natural transformations and are inverse of each other, as are and , proving natural isomorphisms between the three functors. We summarize these relationships in the following diagram:

 (3)

### Ii-C Tensor products of ordered Banach spaces

We start by describing the tensor product of vector spaces from the perspective of computer science. We will then discuss how the tensor product can be normed and ordered.

#### Ii-C1 Introduction to the tensor product

As was already highlighted in [5] in the case of probabilistic programming, and subsequently in the development of semantics for quantum programming languages (e.g. [21]), it may be desirable to interpret programs as linear operators in a category of vector spaces. Indeed, this is precisely what this paper advocates for probabilistic programming languages. However, a difficulty quickly emerges if one wants to include higher-order features. Consider a map in two arguments . The most basic facility provided by higher-order reasoning is the ability to curry such a map and define the two curried maps

 ^f:U→WVand~f:V→WU

by fixing one argument or the other. Since we want both curried map and to be linear, it is easy to see that must be linear in each arguments separately, in particular

 f(λu,v)=λf(u,v)andf(u,λv)=λf(u,v)=λ(u,v)

Such a map is referred to as a bilinear map, it is linear in each argument separately. However being bilinear is incompatible with being linear: by definition of the product linear structure if were also linear we would have

 λf(u,v)=f(λ(u,v))=f(λu,λv)=λf(u,λv)=λ2f(u,v)

which is clearly a contradiction if . Thus is not a valid morphism if we want our semantic universe to consist of linear maps between vector spaces. Fortunately, for any pair of vector spaces there exists a special object, the tensor product , which linearizes bilinear maps, i.e. such that any bilinear map corresponds to unique linear map (and vice-versa). This can be phrased in terms of a universal property: there exists a universal bilinear map , such that for any bilinear map there exists a unique linear map making the following diagram commute:

 (4)

The tensor product can be built explicitly as follows: it is the free vector space over quotiented by the following identities:

 (u+u′,v)=(u,v)+(u′,v),(u,v+v′)=(u,v)+(u,v′), (λu,v)=(u,λv)=λ(u,v) (5)

It is not too hard to see that the last identity is precisely what is needed to fix the contradiction described above and reconcile currying with linearity. By definition, an element will be a (finite) linear combination of equivalence classes of the generators – denoted – under the identities (5), formally .

#### Ii-C2 Tensor product of Banach spaces

Suppose now that both and are Banach spaces, in particular that they carry a norm. How do we define a norm on , and how do we ensure that the space is complete for this norm? For Hilbert spaces there is a straightforward construction, but since we shall be dealing with Banach spaces which are not Hilbert spaces, we will require the much more subtle theory of tensor products of Banach spaces originally developed by Grothendieck [22]. We refer to [23] for a good introduction.

The initial difficulty with the construction of a norm is that by definition of the tensor product, each vector has many representations. For example, the vector , i.e. the equivalence class of the pair under the equations (5), can also be expressed as , and these representations are built from vectors with very different norms. Assuming that we want the norm of the tensor be be defined from the norms of its components, which representation do we choose? There is no unique solution to this question, but Grothendieck proposed one extremal solution by defining for any

 ∥x∥π=inf{n∑i=1∥ui∥∥vi∥:x=n∑i=1ui⊗vi} (6)

This definition of defines a norm [23, Prop 2.1] which is called the projective norm. However, the space equipped with the projective norm is in general not complete. One therefore defines the projective tensor product of two Banach spaces as the completion of under the projective norm, i.e. the space of equivalence classes of Cauchy sequences in converging to the same point. This space will be denoted and one can describe the projective norm of elements in this space as follows:

In § II-C1 we saw how the tensor product can be used as a way to linearize bilinear maps. This property extends naturally to the normed case, and it can be shown that the projective tensor product linearizes bounded bilinear maps [23, Th 2.9] in the sense that there exists a universal bounded bilinear maps satisfying the universal property of (4) w.r.t bounded bilinear maps.

The projective tensor product of two Banach spaces is in general fairly inscrutable. However, one can explicitly describe projective tensor products involving important objects for our semantics. When one component is an -space we have:

###### Theorem 10 (Radon-Nikodym and [23] p. 43).

For finite measures on measurable spaces respectively, .

The operator taking the product of measures is bilinear. Therefore there exists a unique map mapping any tensor to the product measure. In this sense, is the subspace of which is generated by taking linear combinations of product measures, and then closing under the projective norm.

###### Theorem 11.

The projective tensor product is isometrically embedded in .

#### Ii-C3 Tensor product of ordered Banach spaces

We conclude this brief description of tensor products by examining the case of interest to us, namely regular ordered Banach spaces. In the most important examples the construction is isomorphic as Banach spaces to the unordered case, and we will therefore not dwell too long on the theory of tensor product of ordered Banach spaces developed in [24, 25, 26]. The main idea of the definition is to reflect the central role of positive vectors in the theory of regular ordered Banach spaces, and in particular the fact that the positive cone is generating and determines the norm (axiom R1, R2 above). The same should hold for any ordered tensor product.

Given two ordered regular Banach spaces , their tensor product is equipped with the positive projective norm defined as

 ∥x∥|π|=inf{n∑i=1∥ui∥∥vi∥:ui∈U+,vi∈V+, −n∑i=1ui⊗vi≤x≤n∑i=1ui⊗vi}

Note the similarity with (6), and the role played by positive vectors in this definition. As in the unordered case is not complete for the positive projective norm, and we must therefore take its completion which we call the positive projective tensor of and and denote by .

Since -spaces and -spaces are examples of AL-spaces, the following result shows that we can in practice often ignore the subtleties of the positive projective tensor product and rely on the descriptions of the ordinary projective tensor products.

###### Theorem 12 ([25], Th. 2B).

If is an -space and is any regular ordered space, then and are isomorphic as Banach spaces.

### Ii-D The closed monoidal structue of RoBan

#### Ii-D1 Tensor product of regular operators

In § II-C1 and § II-C2 we saw how the tensor and projective tensor products can be used to linearize bilinear and bounded bilinear maps respectively. The positive projective tensor product fulfils the same role for positive (and thus bounded by Prop. 1) bilinear maps: there exists a universal positive bilinear map satisfying the universal property of (4) w.r.t positive bilinear maps [26, 2.7]. This universal property of tensor products provides a definition of as a bifunctor on . Let be positive operators, then the map

is positive and bilinear, and thus there exists a unique positive operator which is denoted . This provides the definition of the bifunctor on morphisms.

#### Ii-D2 The closed monoidal structure

As we saw in Th. 2, the category has internal homs, and these interact correctly with positive projective tensor products.

###### Theorem 13 ([16]).

For every regular ordered Banach space , the tensoring and homming operations and define functors such that

The positive projective tensor defines a symmetric monoidal structure on with as its unit – since at the level of the underlying vector spaces – and the obvious isomorphisms inherited from the isomorphism between the tensor product of the underlying vector spaces. The category is thus symmetric monoidal closed.

## Iii A higher-order language with conditioning

### Iii-a A type system

We start by defining a type system for our language. Our aims are to (a) have enough types to write some realistic programs for example including multivariate normal or chi-squared distributions, (b) have higher-order types, (c) provide special types for Bayesian learning:

Bayesian types.

Our type grammar is given as follows:

 T::= m∣intn∣realn∣PosDef(n)∣ (T,μ)∣T⊗T∣T→T∣MT (7)

where and . We will refer to , , as ground types. As their name suggest they are to be regarded as the types of (possibly random) elements of finite sets, vectors of integers, vectors of reals and positive semi-definite matrices, that is to say covariance matrices. We will write and as and . This is by no means an exhaustive set of ground types, but sufficiently rich to consider some realistic probabilistic programs. The type will be referred to as the unit type and denoted and the type will be referred to as the boolean type and denoted .

The type constructors are the following. First, given a term of type , we can build the pointed type . We will call these types Bayesian types because will be interpreted as a prior. Bayesian types will support conditioning and thus Bayesian learning. As we shall see, our Bayesian types also fulfil a role in the semantics of variable assignment. As is the tacit practise in Anglican, we will consider that assigning a (possibly random) value to a variable is equivalent to assigning a prior to the type of this variable. For example the program which assigns the value to the variable can be understood as placing a (deterministic) prior on the reals, namely . Similarly, the program which assigns to

a value randomly sampled from the normal distribution

with mean 0 and standard deviation 1 can be understood as setting the prior

on the reals. In a slogan:

Bayesian types = Assigned types

Note however that this slogan is only valid for assignments without free variables, indeed a prior cannot be parametric in some variables, it represents definite information. This caveat will be reflected in the type system.

We then have two binary type constructors: tensor types and functions types which together will support higher-order reasoning. Finally, we have a unary type constructor used to define higher-order probabilities.

We isolate the following two sub-grammars of types whose semantic properties will be essential to the typing of certain operations. First we define order-complete types as the types generated by the grammar

 S ::=G∣(G,μ)∣(G,μ)⊗(G,μ)G in ground types (8) T ::=S∣S→S∣MS (9)

Second, we define measure types as the types generated by all the constructors of grammar (7) except the function-type constructor.

###### Remark 14.

We could easily add product types to our type system since the category in which we interpret types () is complete, but we feel that this would distract from the central role played by the tensor product. It would also introduce the possibility of copying, which from the perspective of randomness as a resource is problematic. This is why we have ‘hard-coded’ the products which we do need (tuples of integers and reals) as ground types.

#### Subtyping relation

We will need to formalise the fact that a Bayesian type is a subtype of the type . For this we introduce a subtyping relation denoted freely generated by the rules

 (T,μ)<:T S<:S′ T<:T′ S⊗T<:S′⊗T′ S′<:S T<:T′ S→T<:S′→T′

As will become clear when we define the semantics of types, we can also use the subtyping relation to add information about the absolute continuity of one built-in measure w.r.t another in the type system. For example, since a beta distribution is absolutely continuous w.r.t. to a normal distribution, if

is a program constructing a beta distribution and is a program constructing a normal distribution, we could add the rule .

#### Contexts

are maps – the free algebra of all types generated by (7) – which send cofinitely many integers to the unit type . We will write for the set and use the traditional notation to denote the context mapping to and all to . To each context we can associate the finite tensor type where . When a context appears to the right of a turnstile in the typing rules which follow we will implicitly perform this conversion from context to type.

Our contexts are a dynamic version of the static context of [5] which consists of a constant map on to a single type. They are in some respects similar to the heap models of separation logic, and for notational clarity we will require similar operations on contexts as on heaps: a notion of compatibility, of union and of difference. Given two contexts we will say that they are compatible if for all , and we will then write . For any two compatible contexts we define the union context as the union of their graphs, which is a function by the compatibility assumption. We define the difference context as the map sending if and to otherwise. In particular if the supports are disjoint.

### Iii-B Syntax

We define an ML-like language allowing imperative features like variable assignments, conditionals and while loops within a functional language.

#### Iii-B1 Expressions

 e::= n∈Nk∣r∈Rk∣m∈PosDef(n)∣ Constants op(e,…,e)∣ Built-in operations xi∣ i∈N, Variables xi:=e∣ Assignment e;e∣ Sequential composition let xi=e in e Sequencing fn xi . e∣ λ-abstraction e(e)∣ Function application ifetheneelsee∣ Conditional whileedoe∣ Iterations sample(e)∣ Sampling sampler(e)∣ Packages a program as a sampler observe(e) Conditioning

Every built-in operation must come equipped with typing instruction which we will write as an -tuple where the first components are ground types specifying the input types and the last component is a ground type or measures over a ground type specifying the output type. For example the boolean connective or would come with typing , the sine function sin with typing and the function constructing a normal distribution would come with typing , where the first input is the mean, the second is the standard deviation (a 1-dimensional covariance matrix, i.e. a positive real) and the output is a measure over the reals.

#### Iii-B2 Well-typed expressions

The typing rules for our language are gathered in Fig. 1. We will discuss these rules in detail when we define the denotational semantics of our language in § IV, but we can already make some observations.

It is important to realize that memory-manipulating rules in effect have a sequent on the right of the turnstile, formally represented by an integer-indexed tensor product type (see § III-A), whilst the other rules just have a type. Syntactically and semantically however, we make no distinction between these two cases. A useful way to think about our system is as follows: a purely functional computation will consume a context and output a value of type . A program with imperative features modifying an internal store to a new store should be thought of as consuming a context and outputting the totality of its internal state .

The only way to explicitly create a Bayesian type is through a variable assignment without free variables: a prior must contain definite information, not information which is parametric in variables. Only ‘measure types’ can form Bayesian types.

The sequential composition rule looks daunting, but it is simply a version of the cut rule with a bit of bookkeeping to make sure contexts do not conflict with one another.

Note finally that our observe statement applies to a term of type , intuitively we observe a possibly random element of type . This is slightly different from the syntax of observe in Anglican where a distribution is observed. Semantically, the difference disappears since a possibly random element is modelled by a distribution.

#### Iii-B3 A simple example

It is not hard (but notationally cumbersome) to type-check the following simple Gaussian inference program against the inference rules of Fig. 1.

let x=sample(normal(0,1)) in
observe(sample(normal(x,1)))

In the empty context the program above evaluates to a function of type

 (real,sample(normal(sample(normal(0,1)),1))) →(real,sample(normal(0,1))) (10)

which, as we will see in § IV, is what we want semantically.

## Iv Denotational semantics

As the reader will have guessed we will now provide a denotational semantics for the language described in § III in the category of regular ordered Banach spaces.

### Iv-a Semantics of types

For ground types we define

• where is equipped with the discrete -algebra. Note that , and thus , the unit of the positive projective tensor.

• , where is equipped with the discrete -algebra

• , where is equipped with its usual Borel -algebra

• , where is the space of positive semi-definite matrices equipped with the Borel -algebra inherited from

As expected, the tensor and function type constructors are interpreted by the monoidal closed structure of , i.e.

The higher-order probability type constructor is interpreted as follows. For any regular ordered Banach space we consider the underlying set together with the Borel -algebra induced by the norm. We then apply the functor to this measurable space. This construction is functorial and we overload to denote the resulting regular ordered Banach space by . Using this convenient notation we define

 ⟦MT⟧:=M⟦T⟧

For Bayesian types note that the type system in Fig. 1 can only produce a Bayesian type if is a measure type and has no free variables, i.e. if is derivable. We will therefore only need to provide a semantics to Bayesian types of this shape. Our semantics of Bayesian types is in some respect similar to that of pointed types used in homotopy type theory [27]. Indeed, at the type-theoretic level they are defined identically as a type together with a term inhabiting this type. However, the ordered vector space structure allows us to provide a semantics which is much richer than a space with a distinguished point. Given a measure type and a sequent of the type , we will see in § IV-B that is interpreted as an operator , which is uniquely determined by . For notational clarity we will often simply write for the measure . We define the denotation of the Bayesian type as the principal band in (see § II-B4) generated by the measure (i.e.). Formally:

 ⟦(T,μ)⟧=⟦T⟧μ (11)

For this semantics to be well-defined it is necessary that be a Riesz space, since bands are defined using the lattice structure. This is indeed the case:

###### Theorem 15.

The semantics of a measure type is a Banach lattice.

The function type constructor is the only operation in the type system which forces us to leave the category of Banach lattices and enter the much larger category . As shown in [18, Ex. 1.17], the space of regular operators between two Riesz spaces need not even be a lattice. The non-closure of Banach lattices under taking internal homs is one of the technical reasons for our use of ‘measure types’ 444Note that we could in principle extend (11) to all types by considering subsets generated by a single element which exist in all regular ordered Banach spaces, for example the closure of ideals..

We introduced order-complete types in § III-A because of a ‘dual’ non-closure property: order-complete spaces are not closed under the positive projective tensor operation. As shown in [25, 4C] the product is not order-complete, even though is. Order-completeness will be important in the semantics of while loops.

###### Theorem 16.

The semantics of an order-complete type is an order-complete space.

#### Subtypes and contexts

The subtyping relation will simply be interpreted as subspace inclusion. For example the relation is interpreted as the inclusion of the principal band . A context will be interpreted as the positive projective tensor

and we put . A typing rule will be interpreted as a regular (in fact positive, see Th. 23) operator .

### Iv-B Semantics of well-formed expressions

Let us now turn to the semantics of terms.

#### Iv-B1 Constants

A constant whose ground type is interpreted as the space will be interpreted as the operator

 ⟦c⟧:⟦∅⟧=R⟶⟦G⟧=MG,λ↦λδc

#### Iv-B2 Built-in operations

Recall that every built-in operation comes with typing information where each is of ground type and is either of ground type or of type . Each such operation is interpreted via a function , with or , as the unique regular operator which linearizes according to the universal property (4) of :

For example the boolean operator or of type would be interpreted, via the function implementing the boolean join, as the linearisation of (which is bilinear). Similarly, the operation of type building a normal distributions would be interpreted, via the obvious function , as the linearisation of . Note that if the inputs are deterministic, i.e. a tensor for a mean and a standard deviation (as would usually be the case), then outputs a Dirac delta over the distribution . Note how we interpret the deterministic construction of a distribution over differently from sampling an element of according to this distribution: the former is a distribution over distributions, the latter just a distribution.

#### Iv-B3 Variables and assignments

A variable on its own acts like a variable declaration and introduces a context (see Fig. 1). Its semantics is simply given by the identity operator on the type of the variable, formally if then . In order to define the semantics of variable assignment we need the following result.555As a consequence of this theorem, the semantics of all our types are Archimedean ordered vector spaces.

###### Theorem 17.

The denotation of any type admits a strictly positive functional .

The strictly positive functional can be thought of as a generalisation to all types of the functional on measures which consists in evaluating the mass of the whole space, i.e. of . With this notion in place we can provide a semantics to assignments. Given a sequent , let be the strictly positive functional on constructed in Th. 17 and let us write as . We now define the multilinear map

 ⟦Γ1⟧×⟦T⟧×⟦Γ2⟧⟶⟦T⟧

This defines the unique linearizing operator666In fact a nuclear operator [28].

 ⟦xi:=e⟧:⟦Γ[i↦T]⟧⟶⟦T⟧

In the case where the context is empty, the premise of the typing rule for assignments is interpreted as an operator and we can therefore strengthen the definition of