DeepAI

# Measurable Cones and Stable, Measurable Functions

We define a notion of stable and measurable map between cones endowed with measurability tests and show that it forms a cpo-enriched cartesian closed category. This category gives a denotational model of an extension of PCF supporting the main primitives of probabilistic functional programming, like continuous and discrete probabilistic distributions, sampling, conditioning and full recursion. We prove the soundness and adequacy of this model with respect to a call-by-name operational semantics and give some examples of its denotations.

• 11 publications
• 6 publications
• 1 publication
05/01/2018

### Probabilistic Stable Functions on Discrete Cones are Power Series (long version)

We study the category Cstabm of measurable cones and measurable stable f...
12/16/2019

### Synthetic topology in Homotopy Type Theory for probabilistic programming

The ALEA Coq library formalizes measure theory based on a variant of the...
04/16/2019

### The Geometry of Bayesian Programming

We give a geometry of interaction model for a typed lambda-calculus endo...
03/20/2013

### Non-monotonic Negation in Probabilistic Deductive Databases

In this paper we study the uses and the semantics of non-monotonic negat...
12/30/2018

### A Probabilistic and Non-Deterministic Call-by-Push-Value Language

There is no known way of giving a domain-theoretic semantics to higher-o...
06/15/2021

### An enriched category theory of language: from syntax to semantics

Given a piece of text, the ability to generate a coherent extension of i...
09/17/2019

### A Linear Exponential Comonad in s-finite Transition Kernels and Probabilistic Coherent Spaces

This paper concerns a stochastic construction of probabilistic coherent ...

## 1. Introduction

Around the 80’s, people started to apply formal methods to the analysis and design of probabilistic programming languages. In particular, Kozen81 defined a denotational semantics for a first-order while-language endowed with a random real number generator. In that setting, programs can be seen as stochastic kernels between measurable spaces: the possible configurations of the memory are described by measurable spaces, with the measurable sets expressing the observables, while kernels define the probabilistic transformation of the memory induced by program execution. For example, a while-program using variables taking values in the set of real numbers is a stochastic kernel over the Lebesgue -algebra on (see Compendium of Measures and Kernels, Section 2) — i.e.   is a function taking a sequence and a measurable set and giving a real number

, which is the probability of having the memory (

i.e.  the values of the variables) within after having executed the program with the memory initialized as .

Kozen’s approach cannot be trivially extended to higher-order types, because there is no clear notion of measurable subset for a functional space, e.g. we do not know which measurable space can describe values of type, say, (see (aumann1961) for details).

PANANGADEN1999 reframed the work by Kozen in a categorical setting, using the category of stochastic kernels. This category has been presented as the Kleisli category of the so-called Giry’s monad (Giry1982) over the category of measurable spaces and measurable functions. One can precisely state the issue for higher-order types in this framework — both and are cartesian categories but not closed.

The quest for a formal syntactic semantics of higher-order probabilistic programming had more success. We mention in particular Park:2008, proposing a probabilistic functional language based on sampling functions. This language has a type of sub-probabilistic distributions over the set of real numbers111In (Park:2008) is written . One should consider sub-probabilistic distributions because program evaluation may diverge., i.e.   measures over the Lebesgue -algebra on with total mass at most

. Using the usual functional primitives (in particular recursion) together with the uniform distribution over

and a sampling construct, the authors encode various methods for generating distributions (like the inverse transform method and rejection sampling) and computing properties about them (such as approximations for expectation values, variances, etc). The amazing feature of

is its rich expressiveness as witnessed by the number of examples and applications detailed in (Park:2008), showing the relevance of the functional paradigm for probabilistic programming.

Until now, lacked a denotation model, (Park:2008) sketching only an operational semantics. In particular, the correctness proof of the encodings follows a syntactic reasoning which is not compositional. Our paper fills this gap, giving a denotational model to a variant of . As a byproduct, we can check program correctness in a straight way by applying to program denotations the standard laws of calculus (Example 7.3,7.4), even for recursive programs (Example 7.9). This method is justified by the Adequacy Theorem 7.12 stating the correspondence between the operational and the denotational semantics.

If we restrict the language to countable data types (like booleans and natural numbers, excluding the real numbers), then the situation is much simpler. Indeed, any distribution over a countable set is discrete, i.e. it can be described as a linear combination of its possible outcomes and there is no need of a notion of measurable space. In previous papers (EhrPagTas11; EhrPagTas14; EhrhardTasson16), we have shown that the category of probabilistic coherence spaces and entire functions gives fully abstract denotational models of functional languages extended with a random natural number generator. The main goal of this work is to generalize these models in order to account for continuous data types also.

The major difficulty for such a generalization is that a probabilistic coherence space is defined with respect to a kind of canonical basis (called web) that, at the level of ground types, corresponds to the possible samples of a distribution. For continuous data types, these webs should be replaced by measurable spaces, and then one is stuck on the already mentioned impossibility of associating a measurable space with a functional type – both and being not cartesian closed.

Our solution is to replace probabilistic coherence spaces with cones (ando1962fundamental), already used by Selinger2004

, allowing for an axiomatic presentation not referring to a web. A cone is similar to a normed vector space, but with non-negative real scalars (Definition

4.1). Any probabilistic coherence space can be seen as a cone (Example 4.4) as well as the set of all bounded measures over a measurable space (Example 4.6). In particular, the cone associated with the Lebesgue -algebra on will be our interpretation of the ground type .

What about functional types, e.g. ? Selinger2004 studied the notion of Scott continuous maps between cones, i.e.  monotone non-decreasing bounded maps which commute with the lub of non-decreasing sequences222Actually, Selinger considers lubs of directed sets, but non-decreasing chains are enough for our purposes. Moreover, because we need to use the monotone convergence theorem for guaranteeing the measurability of these lubs in function spaces, completeness wrt. arbitrary directed sets would be a too strong requirement (see Section 6.2): a crucial feature of measurable sets is that they are closed under countable (and not arbitrary) unions.. The set of these functions also forms a cone with the algebraic operations defined pointwise. However, this cone construction does not yield a cartesian closed category, namely the currying of a Scott continuous map can fail to be monotone non-decreasing, hence Scott continuous (see discussion in Section 4.1.1). The first relevant contribution of our paper is then to introduce a notion of stable map, meaning Scott continuous and “absolutely monotonic” (Definition 4.14), which solves the problem about currying and gives a cartesian closed category.color=blue!20,color=blue!20,todo: color=blue!20,I moved all discussions about the order-completeness and the fact that we consider the algebraic order to the section on the cones, since it is more technical and linked to the wpor example

We borrow the term of “stable function” from Berry’s analysis of sequential computation (stability). In fact, our definition is deeply related with a notion of “probabilistic” sequentiality, as we briefly mention in Section 4.1.1 showing that it rejects the “parallel or” (but not the “Gustave function”).

The notion of stability is however not enough to interpret all primitives of probabilistic functional programming. One should be able to integrate at least first-order functions in order to sample programs denoting probabilistic distributions (e.g. see the denotation of the let construct in Figure 5). The problem is that there are stable functions which are not measurable, so not Lebesgue integrable (Section 5). We therefore equip the cones with a notion of measurability tests (Definition 5.1), inducing a notion of measurable paths (Definition 5.2) in a cone. In the case the cone is associated with a standard measurable space , i.e.  it is of the form , then the measurability tests are the measurable sets of . However, at higher-order types the definition is less immediate. The crucial point is that the measurable paths in are Lebesgue integrable, as expected (Section 6.3). We then call measurable a stable map preserving measurable paths and we prove that it gives a cartesian closed category, denoted (Figure 4 and Theorem 6.7).

To illustrate the expressiveness of we consider a variant of Scott and Plotkin’s PCF (plotPCF) with numerals for real numbers, a constant sample denoting the uniform distribution over and a let construct over the ground type. This language is as expressive as of Park:2008 (namely, the let construct corresponds to the sampling of ). The only notable difference lies in the call-by-name operational semantics (Figure 3) that we adopt, while (Park:2008) follows a call-by-value strategy.333Let us underline that our let construct does not allow to encode the call-by-value strategy at higher-order types, since it is restricted to the ground type . See Section 3 for more details. Our choice is motivated by the fact that the call-by-name model is simpler to present than the call-by-value one. We plan to detail this latter in a forthcoming paper.

We also decided not to consider the so-called soft-constraints, which are implemented in e.g. (BorgstromLGS16; StatonYWHK16; Staton17) with a construct called score. This can be added to our language by using a kind of exception monad in order to account for the possible failure of normalization, as detailed in (Staton17) (see Remark 2). Also in this case we prefer to omit this feature for focussing on the true novelties of our approach — the notions of stability and measurability.

Let us underline that although the definition of and the proof of its cartesian closeness are not trivial, the denotation of the programs (Figure 5) is completely standard, extending the usual interpretation of PCF programs as Scott continuous functions (plotPCF). We prove the soundness (Proposition 7.8) and the adequacy (Theorem 7.12) of . A major byproduct of this result is then to make it possible to reason about higher-order programs as functions between cones, which is quite convenient when working with programs acting on measures.

To conclude, let us comment Figure 1, sketching the relations between the category achieved here and the category of probabilistic coherence spaces and entire functions which has been the starting point of our approach. The two categories give models of the functional primitives (PCF-like languages), but is restricted to discrete data types, while extends the model to continuous types. We guess this extension to be conservative, hence the arrow is hooked but just dashed. We are even convinced that is the result of a Kleisli construction from a more fundamental model of (intuitionistic) linear logic, based on positive cones and measurable, Scott continuous and linear functions. We plan to study in an extended version of this paper as a category extending the category of measurable spaces and stochastic kernels. This would close the loop and further confirm the analogy with , which is the Kleisli category associated with the exponential comonad of the model based on the category of Scott continuous and linear functions between probabilistic coherence spaces, this latter containing the category

of Markov chains as a full sub-category.

##### Contents.

This paper needs a basic knowledge of measure theory: we briefly recall in Section 2 the main notions and notations used. Section 3 presents the programming language — the probabilistic variant of PCF we use for showing the expressiveness of our model. Figure 2 gives the grammar of terms and the typing rules, while Equation (5) and Figure 3 define the kernel describing the stochastic operational semantics. Our first main contribution is presented in Section 4: after having recalled Selinger’s definition of cone (Definition 4.1) we study our notion of absolutely monotonic map (Definition 4.14), or equivalently pre-stable map (Definition 4.17 and Theorem 4.18) and we prove that it composes (Theorem 4.26). Stable maps are absolutely monotonic and Scott-continuous (Definition 4.27). Section 5 introduces our second main contribution, which is the notion of measurability test (Definition 5.1) and measurable map (Definition 5.5), giving the category (Definition 5.5). Section 6 presents the cartesian closed structure of , summarized in Figure 4. Finally, Section 7 details the model of given by (Figure 5) and states soundness (Proposition 7.8) and adequacy (Theorem 7.12). Section 8 discusses the previous literature. Because of space limits, many proofs are omitted and postponed and in the technical appendix A.

##### Notations

We use for the cardinality of a set . The set of non-negative real numbers is and its elements are denoted . General real numbers are denoted . The set of non-zero natural numbers is . The greek letter will denote the Lebesgue measure over , being its restriction to the unit interval. Given a measurable space and an , we use for the Dirac measure over : is equal to if and to otherwise. We also use

to denote the characteristic function of

which is defined as is equal to if and to otherwise. We use for the set of measurable functions . We use to denote the map .

## 2. Compendium of measures and kernels

A -algebra on a set is a family of subsets of that is nonempty and closed under complements and countable unions, so that . A measurable space is a pair of a set equipped with a -algebra . A measurable set of is an element of . From now on, we will denote a measurable space simply by its underlying set , whenever the -algebra is clear or irrelevant. We consider and as measurable spaces equipped with the Lebesgue -algebra, generated by the open intervals. A bounded measure on a measurable space is a map satisfying for any countable family of disjoint sets in . We call a probability (resp. subprobability) measure, whenever (resp. ). When is a measure on , we often call it a distribution.

A measurable function is a function such that for every . The pushforward measure from a measure on along a measurable map is defined as , for every .

These notions have been introduced in order to define the Lebesgue integral of a generic measurable function with respect to a measure over . This paper uses only basic facts about the Lebesgue integral which we do not detail here.

Measures are special cases of kernels. A bounded kernel from to is a function such that: (i) for every , is a bounded measure over ; (ii) for every , is a measurable map from to . A stochastic kernel is a kernel such that is a sub-probability measure for every . Notice that a bounded measure (resp. sub-probability measure) over can be seen as a particular bounded kernel (resp. stochastic kernel) from the singleton measurable space to .

##### Categorical approach.

We use two categories having measurable spaces as objects, denoted respectively and .

The category has measurable functions as morphisms. This category is cartesian (but not cartesian closed), the cartesian product of and is , where is the set-theoretic product and is the -algebra generated by the rectangles , where and . It is easy to check that the usual projections are measurable maps, as well as that the set-theoretic pairing of two functions , is a measurable map from to , whenever , are measurable.

The category has stochastic kernels as morphisms444One can well define the category of bounded kernels also, but this is not used in this paper.. Given a stochastic kernel from to and from to , the kernel composition is a stochastic kernel from to defined as, for every and :

 (1) (K∘H)(x,U)=∫YK(y,U)H(x,dy).

Notice that the above integral is well-defined because is a stochastic measure from condition (i) on kernels and is a measurable function from condition (ii). A simple application of Fubini’s theorem gives the associativity of the kernel composition. The identity kernel is the function mapping to if and to otherwise.

Unlike

, we consider a tensor product

in which is a symmetric monoidal product but not the cartesian product555Indeed, has cartesian products, but we will not use them.. The action of over the objects is defined as the cartesian product in , so that we still denote it as . The tensor of a kernel from to and from to is the kernel given as follows, for and , :

 (2) K⊗K′((x,x′),U×U′)=K(x,U)K′(x′,U′)

Notice that is not closed with respect to . Recall that a measure can be seen as a kernel from the singleton measurable space, so that Equation (2) defines also a tensor product between measures over resp. and .

The category has also countable coproducts. Given a countable family of measurable spaces, the coproduct has as underlining set the disjoint union of the ’s, and as the -algebra the one generated by disjoint union of . The injections from to are defined as . Given a family from to , the copairing from to is defined by .

Actually, the categories and can be related in a very similar way as the relation between the categories Set (of sets and functions) and Rel (of sets and relations). In fact, corresponds to the Kleisli category of the so-called Giry’s monad over (Giry1982), exactly has the category Rel of relations is the Kleisli category of the powerset monad over Set (see (PANANGADEN1999)). Since this paper does not use this construction, we do not detail it.

## 3. The probabilistic language PPCF

### 3.1. Types and Terms

We give in Figure 2 the grammar of our probabilistic extension of PCF, briefly , together with the typing rules. The types are generated by , where the constant is the ground type for the set of real numbers. We denote by the set of terms typeable within the sequent . We write simply if the typing sequent is not important or clear from the context.

The first line of Figure 2 contains the usual constructs of the simply typed -calculus extended to the fix-point combinator for any type . The second line describes the primitives dealing with the ground type . Our goal is to show the expressiveness of the category introduced in the next section, therefore is an ideal language and does not deal with the issues about a realistic implementation of computations over real numbers. We refer the interested reader to e.g. (Vuillemin:1988; ESCARDO1996). We will suppose that the meta-variable ranges over a fixed countable set of basic measurable functions over real numbers. Examples of these functions include addition , comparison , and equality ; they are often written in infix notation. When clear from the context, we sometimes write for . To be concise, we consider only the ground type , the boolean operators (like or ) then evaluate to or , representing resp.  true and false.

The third line of Figure 2 gives the “probabilistic core” of . The constant stands for the uniform distribution over , i.e. the Lebesgue measure restricted to the unit interval. The fact that

has only this distribution as a primitive is not limiting, in fact many other probabilistic measures (like binomial, geometric, gaussian or exponential distribution) can be defined from

and the other constructs of the language, see e.g. (Park:2008) and Example 3.3. The let construction allows a call-by-value discipline over the ground type : the execution of will sample a value (i.e. a real number ) from a probabilistic distribution and will pass it to by replacing every free occurrence of in with . This primitive666Notice that this primitive corresponds to the sample construction in (Park:2008). is essential for the expressiveness of and will be discussed both operationally and semantically in the next sections.

has a limited number of constructs, but it is known that many probabilistic primitives can be introduced as syntactic sugar from the ones in , as shown in the following examples. We will prove the correctness of these encodings using the denotational semantics (Section 7), this latter corresponding to the program operational behavior by the adequacy property (Theorem 7.12).

###### Example 3.1 (Extended branching).

Let be a measurable set of real numbers whose characteristic function is in , let and for . Then the term , branching between and according to the outcome of being in , is a syntactic sugar for .777The swap between and is due to fact that is the test to zero.

###### Example 3.2 (Extended let).

Similarly, the constructor can be extended to any output type . Given and , we denote by the term which is in . However we do not know in general how to extend the type of the bound variable to higher types in this model. The issue is clear at the denotational level, where the construction is expressed with an integral (see Figure 5). With each ground type, we associate a positive cone which is generated by a measurable space . At higher types, the associated cones do not have to be generated by measurable spaces.

Notice that, because of this restriction on the type of the bound variable , our constructor does not allow to embed into our language the full call-by-value PCF.

###### Example 3.3 (Distributions).

The Bernoulli distribution takes the value

with some probability and the value with probability . It can be expressed as the term of type , taking the parameter as argument and testing whether draws a value within the interval , i.e. .

The exponential distribution at rate is specified by its density . It can be implemented as the term of type by the inversion sampling method: .

The standard normal distribution (gaussian with mean

and variance ) is defined by its density . We use the Box Muller method to encode the normal distribution .

We can encode the Gaussian distribution as a function of the expected value

by .

###### Example 3.4 (Conditioning).

Let be a measurable set of real numbers such that , we define a term of type , taking a term and returning the renormalization of the distribution of on the only samples that satisfy : . This corresponds to the usual way of implementing conditioning by rejection sampling: the evaluation of will sample a real from , if holds then the program returns , otherwise it iterates the procedure. Notice that makes a crucial use of sampling. The program has a different behavior, because the two occurrences of

correspond in this case to two independent random variables (see Example

3.10 below).

###### Example 3.5 (Monte Carlo Simulation).

An example using the possibility of performing independent copies of a random variable is the encoding of the

-th estimate of an expectation query. The expected value of a measurable function

with respect to distribution is defined as

. The Monte Carlo method relies on the laws of large number: if

are independent and identically distributed random variables of equal probability distribution

, then the -th estimate converges almost surely to . For any integer , we can then encode the -th estimate combinator by of type . Notice that it is crucial here that the variable has occurrences representing independent random variables, this being in contrast with Example 3.4 (see also Example 3.10).

color=blue!20,color=blue!20,todo: color=blue!20,Here commented metropolis hasting

### 3.2. Operational Semantics

The operational semantics of is a Markov process defined starting from the rewriting rules of Figure 3, extending the standard call-by-name reduction of PCF (plotPCF). The probabilistic primitive draws a possible value from , like in (Park:2008). The fact that we are sampling from the uniform distribution and not from other distributions with equal support appears in the definition of the stochastic kernel (Equation (5)). In order to define this kernel, we equip with a structure of measurable space (Equation (4)). This defines a -algebra of sets of terms equivalent to the one given in e.g. (BorgstromLGS16; StatonYWHK16) for slightly different languages. Similarly to (StatonYWHK16), our definition is explicitly given by a countable coproduct of copies of the Lebesgue -algebra over (for , see Equations (3)), while in (BorgstromLGS16) the definition is based on a notion of distance between -terms. The two definitions give the same measurable space, but the one adopted here allows to take advantage of the categorical structure of .

###### Remark 1 ().

The operational semantics associates with a program a probabilistic distribution of values describing the possible outcomes of the evaluation of . There are actually two different“styles” for giving : one based on samplings and another one, adopted here, based on stochastic kernels. BorgstromLGS16 proved that the two semantics are equivalent, giving the same distribution .

The “sampling semantics” associates with a function mapping a trace of random samples to a weight, expressing the likelihood of getting that trace of samples from . The final distribution is then calculated by integrating this function over the space of the possible traces, equipped with a suitable measure. This approach is usually adopted when one wants to underline an implementation of the probabilistic primitives of the language via a sampling algorithm, e.g. (Park:2008).

The “kernel-based semantics” instead describes program evaluation as a discrete-time Markov process over a measurable space of states given by the set of programs ( in our case). The transition of the process is given by a stochastic kernel (here defined in Equation (5)) and then the probabilistic distribution of values associated with a term is given by the supremum of the family of all finite iterations of the kernel (, Equation (6)). This latter approach is more suitable when comparing the operational semantics with a denotational model (in order to prove soundness and adequacy for example) and it is then the one adopted in this paper.

A redex is a term in one of the forms at left-hand side of the defined in Figure 2(a). A normal form is a term which is no more reducible under . Notice that the closed normal forms of ground type are the real numerals. The definition of the evaluation context (Figure 2(b)) is the usual one defining the deterministic lazy call-by-name strategy: we do not reduce under an abstraction and there is always at most one redex to reduce, as stated by the following standard lemma.

###### Lemma 3.6 ().

For any term , either is a normal form or there exists a unique redex and an evaluation context such that .

It is standard to check that the property of subject reduction holds (if and , then ).

From now on, let us fix an enumeration without repetitions of variables of type . Notice that any term with different occurrences of real numerals, can be decomposed univocally into a term without real numerals and a substitution , such that: (i) ; (ii) each occurs exactly once in ; (iii) occurs before reading the term from left to right. Because of this latter condition, we can omit the name of the substituted variables, writing simply with . We denote by the set of terms in with no occurrence of numerals and respecting conditions (ii) and (iii) above. We let vary over such real-numeral-free terms.

Given we then define the set . The bijection given by endows with a -algebra isomorphic to : iff . The fact that is countable and that has countable coproducts (see Section 2), allows us to define the measurable space of terms of type as the coproduct:

 (3) (ΛΓ⊢A,ΣΛΓ⊢A)=∐n∈N,S∈ΛΓ⊢An(ΛΓ⊢AS,ΣΛΓ⊢AS)

Spelling out the definition, a subset is measurable if and only if:

 (4) ∀n,∀S∈ΛΓ⊢An,{→r s.t.% S→r∈U}∈ΣRn

Given a set , we denote by the set of numerals associated with the real numbers in . Of course is measurable iff is measurable. The following lemma allows us to define and .

###### Lemma 3.7 ().

Given the function mapping to is measurable.

Given a term and a measurable set we define depending on the form of , according to Lemma 3.6:

 (5) Red(M,U)=⎧⎪⎨⎪⎩δE[N](U)if M=E[R], R→N and R≠sample,λ{r∈[0,1] s.t. E[r–]∈U}if M=E[sample],δM(U)if M normal form.

The last case sets the normal forms as accumulation points of , so that gives the probability that we observe after at most one reduction step applied to . The definition in the case of specifies that is drawing from the uniform distribution over . Notice that, if is measurable, then the set is measurable by Lemma 3.7. The definition of extends to a continuous setting the operational semantics Markov chain of (danosehrhard; EhrPagTas14).

###### Proposition 3.8 ().

For any sequent , the map is a stochastic kernel from to .

###### Proof (Sketch)..

The fact that is a measure is an immediate consequence of the definition of and the fact that any evaluation context defines a measurable map (Lemma 3.7).

Given a measurable set , we must prove that is a measurable function from to . Since can be written as the coproduct in Equation (3), it is sufficient to prove that for any and , is a measurable function. One reasons by case study on the shape of , using Lemma 3.6 and the definition of a redex. ∎

We can then iterate using the composition of stochastic kernels (Equation (1)):

 Redn+1(M,U)=(Red∘Redn)(M,U)=∫ΛRed(t,U)Redn(M,dt),

this giving the probability that we observe after at most reduction steps from . Because the normal forms are accumulation points, one can prove by induction on that:

###### Lemma 3.9 ().

Let and let be a measurable set of normal forms in . The sequence is monotone non-decreasing.

We can then define, for and a measurable set of normal forms, the limit

 (6) Red∞(M,U)=supn(Redn(M,U)).

In particular, if is a closed term of ground type , the only normal forms that can reach are numerals, in this case corresponds to the probabilistic distribution over which is computed by according to the operational semantics of (Remark 1).

###### Example 3.10 ().

In order to make clear the difference between a call-by-value and a call-by-name reduction in a probabilistic setting, let us consider the following two terms:

 M =(λx.(x=x))sample, N =let(x,sample,x=x).

Both are closed terms of type , “applying” the uniform distribution to the diagonal function . However, implements a call-by-name application, whose reduction duplicates the probabilistic primitive before sampling the distribution, while the evaluation of first samples a real number and then duplicates it:

 M →sample=sample→r–=s– for any r and s, N →let(x,r–,x=x)→–r=r– for any r.

The distribution associated with by is the Dirac , because , the last equality is because the diagonal set has Lebesgue measure zero. This expresses that evaluates to (i.e. “false”) with probability , although there are an uncountable number of reduction paths reaching . On the contrary, the distribution associated with is : , expressing that always evaluates to (i.e. true).

###### Remark 2 (Score).

Some probabilistic programming languages have a primitive score (e.g. (BorgstromLGS16; StatonYWHK16)) or factor (e.g. (GoodmanT2014)), allowing to express a probabilistic distribution from a density function. A map is the probabilistic density function of a distribution with respect to another measure, say the Lebesgue measure , whenever , for every measurable . Intuitively, gives a “score” expressing the likelihood of sampling the value from .

In our setting, the primitive would be a term like , with defining . The reduction of outputs any numeral (a possible sample from the distribution ), while the value is used to multiply , like:

 (7) Red(scorex(M),U)=∫RχU(r)f(r)λ(dr).

This primitive allows to implement a distribution in a more efficient way than rejection sampling, this latter based on a loop (Example 3.3). However, suffers a major drawback: there is no static way of characterizing whether a term is implementing a probabilistic density function or rather a generic measurable map. The integral in (7) can have a value greater than one or even to be infinite or undefined for general , in particular would fail to be a stochastic kernel for all terms.

This problem can be overcome by modifying the output type of a program, see e.g. (StatonYWHK16). We decided however to avoid these issues, convinced that is already expressive enough to test the category , which is the true object of study of this article.

## 4. Cones

We now study the central semantical concept of this paper: cones and stable functions between cones. Before entering into technicalities, let us provide some intuitions and motivations. A complete cone is an -semimodule together with a norm satisfying some natural axioms (Definition 4.1) and such that the unit ball defined by the norm is complete with respect to the cone order (Definition 4.2). A type of will be associated with a cone and a closed program of type will be denoted as an element in the unit ball . The order completeness of is crucial for defining the interpretation of the recursive programs (Section 7.1), as usual.

There are various notions of cone in the literature and we are following Selinger2004, who uses cones similar to the ones already presented in e.g.  (ando1962fundamental). Let us stress two of its crucial features. (1) The cone order is defined by the algebraic structure and not given independently from it — this is in accordance with what happens in the category of probabilistic coherence spaces (danosehrhard). (2) The completeness of is defined with respect to the cone order, in the spirit of domain theory, rather than with respect to the norm, as it is usual in the theory of Banach spaces.

A program taking inputs of type and giving outputs of type will be denoted as a map from to . The goal of Section 4.1 is to find the right properties enjoyed by such functions in order to get a cartesian closed category, namely that the set of these functions generates a complete cone compatible with the cartesian structure (which will be the denotation of the type ). It turns out that the usual notion of Scott continuity (Definition 4.10) is too weak a condition for ensuring cartesian closeness (Section 4.1.1). A precise analysis of this point led us to the conclusion that these functions have also to be absolutely monotonic (Definition 4.14). This latter condition is usually expressed by saying that all derivatives are everywhere non-negative, however we define it here as the non-negativity of iterated differences. Such non-differential definitions of absolute monotonicity have already been considered various times in classical analysis, see for instance (McMillan54).

We call stable functions the Scott continuous and absolutely monotonic functions (Definition 4.27), allowing for a cpo-enriched cartesian closed structure over the category of cones. The model of needs however a further notion, that of measurability, which will be discussed in Section 5.

###### Definition 4.1 ().

A cone is an -semimodule given together with an -valued function such that the following conditions hold for all and

.

For the set is called the ball of of radius . The unit ball is . A subset of is bounded if for some .

Observe that by the second condition (homogeneity of the norm) and that if then by the last condition (monotonicity of the norm).

###### Definition 4.2 ().

Let , one writes if there is a such that . This is then unique, and we set . The relation is easily seen to be an order relation on and will be called the cone order relation of .

A cone is complete if any non-decreasing sequence of elements of has a least upper bound .color=blue!20,color=blue!20,todo: color=blue!20,I moved def of completeness here, to save space and to make closer all definitions about the cones

The usual laws of calculus using subtraction hold (under the restriction that all usages of subtraction must be well-defined). For instance, if satisfy then we have . Indeed, it suffices to observe that .

There are many examples of cones.

###### Example 4.3 ().

The prototypical example is with the usual algebraic operations and the norm given by . The cone is defined by taking as carrier set the set of all bounded elements of , defining the algebraic laws pointwise, and equipping it with the norm . The cone instead is given by taking as carrier set the set of all elements of such that , defining the algebraic laws pointwise, and equipping it with the norm .

###### Example 4.4 ().

Let be a probablistic coherence space (see (danosehrhard)). Remember that this means that is a countable set (called web) and satisfies (where, given , the set is )888There are actually two additional conditions which are not essential here.. Then we define a cone by setting , defining algebraic operations in the usual componentwise way and setting .

The cones in Example 4.3 are instances of this one.

###### Example 4.5 ().

The set of all such that for all but a finite number of indices , is a cone when setting .

###### Example 4.6 ().

Let be a measurable space. The set of all -valued measures999So we consider only “bounded” measures, which satisfy that the measure of the total space is finite, which is not the case of the Lebesgue measure on the whole . on is a cone , algebraic operations being defined in the usual “pointwise” way (e.g.  ) and norm given by . This is the main motivating example for the present paper. Observe that such a cone is not of the shape in general.

In all these examples, the cone order can be described in a pointwise way. For instance, when is a probabilistic coherence space, one has iff . Similarly when is a measurable space, one has iff . This is due to the fact that when this condition holds, the function is easily seen to be an -valued measure.

All the examples above, but Example 4.5, are examples of complete cones.

###### Lemma 4.7 ().

is complete iff any bounded non-decreasing sequence has a least upper bound which satisfies .

###### Definition 4.8 ().

Let and be cones. A bounded map from to is a function such that for some ; the greatest lower bound of these ’s is called the norm of and is denoted as .

###### Lemma 4.9 ().

Let be a bounded map from to , then and .

###### Definition 4.10 ().

A function is linear if it commutes with sums and scalar multiplication. A Scott-continuous function from a complete cone to a complete cone is a bounded map101010Remember that then , according with Definition 4.8. from to which is non-decreasing and commutes with the lubs of non-decreasing sequences.

###### Lemma 4.11 ().

Let be a complete cone. Addition is Scott-continuous and scalar multiplication is Scott-continuous .

Proofs are easy, see (Selinger2004). The cartesian product of cones is defined in the obvious way (see Figure 3(a)).

###### Definition 4.12 ().

Let be a cone and let . We define a new cone (the local cone of at ) as follows. We set and

 ∥x∥Pu =inf{1/ε∣ε>0 and εx+u∈BP}=(sup{ε∣ε>0 and εx+u∈BP})−1.

Given a sequence of elements of a cone s.t.