Probabilistic Stable Functions on Discrete Cones are Power Series (long version)

05/01/2018
by   Raphaëlle Crubillé, et al.
0

We study the category Cstabm of measurable cones and measurable stable functions, which is a denotational model of an higher-order language with continuous probabilities and full recursion. We look at Cstabm as a model for discrete probabilities, by showing the existence of a cartesian closed, full and faithful functor which embeds probabilistic coherence spaces (a fully abstract denotational model of an higher-order language with full recursion and discrete probabilities) into Cstabm. The proof is based on a generalization of Bernstein's theorem from real analysis allowing to see stable functions between discrete cones as generalized power series.

READ FULL TEXT VIEW PDF

Authors

page 1

page 2

page 3

page 4

01/13/2020

On the linear structure of cones

For encompassing the limitations of probabilistic coherence spaces which...
11/27/2017

Measurable Cones and Stable, Measurable Functions

We define a notion of stable and measurable map between cones endowed wi...
02/17/2020

On Higher-Order Cryptography (Long Version)

Type-two constructions abound in cryptography: adversaries for encryptio...
05/05/2021

Recursion and Sequentiality in Categories of Sheaves

We present a fully abstract model of a call-by-value language with highe...
05/31/2022

Concrete categories and higher-order recursion (With applications including probability, differentiability, and full abstraction)

We study concrete sheaf models for a call-by-value higher-order language...
03/27/2013

Do We Need Higher-Order Probabilities and, If So, What Do They Mean?

The apparent failure of individual probabilistic expressions to distingu...
08/11/2020

Upper approximating probabilities of convergence in probabilistic coherence spaces

We develop a theory of probabilistic coherence spaces equipped with an a...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Probabilistic reasoning allows us to describe the behavior of systems with inherent uncertainty, or on which we have an incomplete knowledge. To handle statistical models, one can employ probabilistic programming languages: they give us tools to build, evaluate and transform them. While for some applications it is enough to consider discrete probabilities, we sometimes want to model systems where the underlying space of events has inherent continuous aspects: for instance in hybrid control systems [alur1996hybrid]

, as used e.g. in flight management. In the machine learning community 

[gordon2014probabilistic, goodman2013principles], statistical models are also used to express our beliefs about the world, that we may then update using Bayesian inference—the ability to condition values of variables via observations.

As a consequence, several probabilistic continuous languages have been introduced and studied, such as Church [church], Anglican [anglican], as well as formal operational semantics for them [borgstrom2016lambda]. Giving a fully abstract denotational semantics to a higher-order probabilistic language with full recursion, however, has proved to be harder than in the non-probabilistic case. For discrete probabilities, there have been two such fully abstract models: in [danos2002probabilistic], Danos and Harmer introduced a fully abstract denotational semantics of a probabilistic extension of idealized Algol, based on game semantics; and in [pcsaamohpc] Ehrhard, Pagani and Tasson showed that the category of probabilistic coherence spaces gives a fully abstract model for , a discrete probabilistic variant of Plotkin’s PCF.

While there is currently no known fully abstract denotational semantics for a higher-order language with full recursion and continuous probabilities, several denotational models have been introduced. The pioneering work of Kozen [kozen1979semantics] gave a denotational semantics to a first-order while-language endowed with a random real number generator. In [staton2016semantics], Staton et al give a denotational semantics to an higher-order language: they first develop a distributive category based on measurable spaces as a model of the first-order fragment of their language, and then extend it into a cartesian closed category using a standard construction based on the functor category.

Recently, Ehrhard, Pagani and Tasson introduced in [pse] the category , as a denotational model of an extension of PCF with continuous probabilities. It is presented as a refinement with measurability constraints of the category of abstract cones and so-called stable functions between cones, consisting in a generalization of absolutely monotonous functions from real analysis.

Here, we look at the category from the point of view of discrete probabilities. It was noted in [pse] that there is a natural way to see any probabilistic coherent space as an object of . In this work, we show that this connection leads to a full and faithful functor from —the Kleisli category of —into . It is done by showing that every stable function between probabilistic coherent spaces can be seen as a power series, using the extension to an abstract setting of Bernstein’s theorem for absolutely monotonous functions shown by McMillan [mcmillan]. We then show that the functor we have built is cartesian closed, i.e. respects the cartesian closed structure of . In the last part, we turn into a functor , and we show that too is cartesian closed.

To sum up, the contribution of this paper is to show that there is a cartesian closed full embedding from into . Since is known to be a fully abstract denotational model of , a corollary of this result is that too is a fully abstract model of .

2 Discrete and Continuous Probabilistic Extension of Pcf: an Overview.

A simple way to add probabilities to a (higher-order) programming language is to add a fair probabilistic choice operator to the syntax. Such an approach has been applied to various extensions of the -calculus [DLZ]. To fix ideas, we give here the syntax of a (minimal) probabilistic variant of Plotkin’s PCF [plotkin1977lcf], that we will call . It is a typed language, whose types are given by: , where is the base type of naturals numbers. The programs are generated as follows:

The operator is the fair probabilistic choice operator, is a recursion operator, and ranges over natural numbers. The ifz construct tests if its first argument (of type ) is , reduces to its second argument if it is the case, and to its third otherwise. We endow this language with a natural operational semantics [EPT15], that we choose to be call-by-name. However, for expressiveness we need to be able to simulate a call-by-value discipline on terms of ground type : it is enabled by the let-construct.

We can see that the kind of probabilistic behavior captured by is discrete, in the sense that it manipulates distributions on countable sets. In [pcsaamohpc], Ehrhard and Danos introduced a model of Linear Logic designed to lead to denotational models for discrete higher-order probabilistic computation: the category of probabilistic coherence spaces (PCSs). It was indeed shown in [EPT15] that , the Kleisli category of is a fully abstract model of , while the Eilenberg-Moore Category of is a fully abstract model of a probabilistic variant of Levy’s call-by-push-value calculus.

We are going to illustrate here on examples the ideas behind the denotational semantics of in

. The basic idea is that the denotation of a program consists of a vector on

, where is the countable sets of possible outcomes. For instance, the denotation of the program of type is the vector , with , , and for . Morphisms in , on the other hand, can be seen as analytic functions (i.e. power series) between real vector spaces. Let us look at the denotation of the simple program below.

where is the usual encoding of a never terminating term using the recursion operator. The denotation of consists of the following function :

We can see that corresponds indeed to the probability of obtaining if we pass to a term with as denotation. Observe that here is a polynomial in ; however since we have recursion in our language, there are programs that do an unbounded number of calls to their arguments: then their denotations are not polynomials anymore, but they are still analytic functions. The analytic nature of morphisms plays a key role in the proof of full abstraction for .

Observe that this way of building a model for

is utterly dependent on the fact that we consider discrete probabilities over a countable sets of values. In recent years, however, there has been much focus on continuous probabilities in higher-order languages. The aim is to be able to handle classical mathematical distributions on reals, as for instance normal or Gaussian distributions, that are widely used to build generic physical or statistical models, as well as transformations over these distributions.

We illustrate the basic idea here by presenting the language , following [pse], that can be seen as the continuous counterpart to the discrete language . It is a typed language, with types generated as , and terms generated as follows:

where is any real number, and is in a fixed countable set of measurable functions . The constant sample

stands for the uniform distribution over

. Observe that admitting every measurable functions as primitive in the language allows to encode every distribution that can be obtained in a measurable way from the uniform distribution, for instance Gaussian or normal distributions. This language is actually expressive enough to simulate other probabilistic features, as for instance Bayesian conditioning, as highlighted in 

[pse]. Moreover, we can argue it is also more general than : first it allows to encode integers (since ) and basic arithmetic operations over them. Secondly, since the orders operator is measurable, we can construct in terms like this one:

which encodes a fair choice between and .

We see, however, that cannot be a model for : indeed it doesn’t even seem possible to write a probabilistic coherence space for the real type. In [pse], Ehrhard, Pagani and Tasson introduced the cartesian closed category of measurable cones and measurable stables functions, and showed that it provides an adequate and sound denotational model for . The denotation of the base type is taken as the set of finite measures over reals, and the denotation of higher-order types is then built naturally using the cartesian closed structure. From there, it is natural to ask ourselves: how good is as a model of probabilistic higher-order languages ?

The present paper is devoted to give a partial answer to this question: in the case where we restrict ourselves to a discrete fragment of . To make more precise what we mean, let us consider a continuous language with an explicit discrete fragment which has both and as base types: we consider the language with all syntactic constructs of both and , as well as an operator real with the typing rule:

designed to enable the continuous constructs to act on the discrete fragment, by giving a way to see any distribution on as a distribution on . We see that we can indeed extend in a natural way the denotational semantics of given in [pse] to : in the same way that the denotational semantics of is taken as the set of all finite measures on , we take the denotational semantics of as the set of all finite measures over . We take as denotational semantics of the operator real the function:

We will see later that this function is indeed a morphism in . What we would like to know is: what is the structure of the sub-category of given by the discrete types of , i.e the one generated inductively by , , ?

The starting point of our work is the connection highlighted in [pse] between PCSs and complete cones: every PCSs can be seen as a complete cone, in such a way that the denotational semantics of in becomes the set of finite measures over . We formalize this connection by a functor . However, to be able to use to obtain information about the discrete types sub-category of , we need to know whether this connection is preserved at higher-order types: does the construct in make some wild functions not representable in to appear, e.g. not analytic? The main technical part of this paper consists in showing that this is not the case, meaning that the functor is full and faithful, and cartesian closed. It tells us that the discrete types sub-category of has actually the same structure as the subcategory of generated by , and . Since is a fully abstract model of , it tells us that the discrete fragment of is fully abstract in .

3 Cones and Stable Functions

The category of measurable cones and measurable, stable functions (), was introduced by Ehrhard, Pagani, Tasson in [pse] in the aim to give a model for .

They actually introduced it as a refinement of the category of complete cones and stable functions, denoted . Stable functions on cones are a generalization of well-known absolutely monotonic functions in real analysis: they are those functions which are infinitely differentiable, and such that moreover all their derivatives are non-negative. The relevance of such functions comes from a result due to Bernstein: every absolutely monotonic function coincides with a power series. Moreover, it is possible to characterize absolutely monotonic functions without explicitly asking for them to be differentiable: it is exactly those functions such that all the so-called higher-order differences, which are quantities defined only by sum and subtraction of terms of the form , are non-negative. (see [widder], chapter 4). The definition of pre-stable functions in [pse] generalizes this characterization.

In this section, we first recall basic facts about cones and stable functions, all extracted from  [pse]. Then we will prove a generalization of Bernstein’s theorem for pre-stable functions over a particular class of cones, which is the main technical contribution of this paper. We will do that following the work of McMillan on a generalization of Bernstein’s theorem for functions ranging over abstract domains endowed with partition systems, see [mcmillan].

3.1 Cones

The use of a notion of cones in denotational semantics to deal with probabilistic behavior goes back to Kozen in [kozen1979semantics]. We take here the same definition of cone as in [pse].

Definition 1

A cone is a -semimodule given together with an valued function called norm of , and verifying:

The most immediate example of cone is the non-negative real half-line, when we take as norm the identity. Another example is the positive quadrant in a 2-dimensional plan, endowed with the euclidian norm. In a way, the notion of cones is the generalization of the idea of a space where all elements are non-negative. This analogy gives us a generic way to define a pre-order, using the of the cone structure.

Definition 2

Let be a cone. Then we define a partial order on by: if there exists , with .

We define as the set of elements in of norm smaller or equal to . We will sometimes call it the unit ball of . Moreover, we will also be interested in the open unit ball , defined as the set of elements of of norm smaller than .

In [pse], the authors restrict themselves to cones verifying a completeness criterion: it allows them to define the denotation of the recursion operator in , thus enforcing the existence of fixpoints.

Definition 3

A cone is said to be:

  • sequentially complete if any non-decreasing sequence of elements of has a least upper bound .

  • directed complete if for any directed subset of , has a least upper bound .

  • a lattice cone if any two elements of have a least upper bound .

Observe that a directed-complete cone is always sequentially complete.

Lemma 1

Let be a lattice cone. Then it holds that:

  • Any two element of have a greatest lower bound .

  • Decomposition Property: if , there there exists such that , and , and .

  • Recall that, if , we denote by the element such that .

    • We consider , and we show that is indeed the greatest lower bound of and .

    • We take , and . First, we see that , and so . Moreover, .

We illustrate Definition 3 by giving the complete cone used in [pse] as the denotational semantics of the base type in .

Example 1

We take as the set of finite measures over , and the norm as . is a directed-complete cone. For every , the denotational semantics of the term in [pse] is , the Dirac measure with respect to defined by taking , and otherwise.

In a similar way, we define as the directed-complete cone of finite measures over , for any measurable space .

In [pse], the authors ask for the cones they consider only to be sequentially complete. It is due to the fact they want to add measurability requirements to their cones, and as a rule, sequential completeness interacts better with measurability than directed completeness since measurable sets are closed under countable unions, but not general unions. We illustrate this point in the example below.

Example 2

Let be a measurable space, and a finite measure on . We consider the cone of measurable functions . We take . Lebesgues Monotone Convergence Theorem shows that this cone is sequentially complete, but it is not directed complete.

In this work however, we are only interested in cones arising from probabilistic coherence spaces in a way we will develop in Section 4. Since those cones have an underlying discrete structure, we will be able to show that they are actually directed complete. We will need this information, since we will apply McMillan’s results [mcmillan] obtained in the more general framework of abstract domains with partitions, in which he asks for directed completeness. That’s because directed completness allows to also enforce the existence of infinum, as stated in the lemma below, whose proof can be found in the long version.

Lemma 2

If a cone is:

  • sequentially complete, then every non-increasing sequence has a greatest lower bound .

  • directed complete, then for every directed for the reverse order, has a greatest lower bound .

  • We do the proof when is directed complete, but it is exactly the same in the sequentially complete case. Let be a reverse directed set. If all elements of are zero, then . Otherwise, let be . We consider the subset . It is easy to see it is a directed subset of , which means that, since is directed complete, it has a supremum. So we can take , and we show that it is the least upper bound of .

It is shown in [pse] that the addition and multiplication by a scalar are Scott-continuous in complete cones, in a sequential sense. In directed complete cones, it holds also in a directed sense.

Lemma 3

The addition and the scalar multiplication are Scott-continuous:

  • for any directed subsets and of , and of :

    and
  • for any reverse directed subsets , of , and of :

    and

3.2 Pre-Stable Functions between Cones

As said before, the notion of pre-stable function is a generalization of the notion of absolutely monotonic real functions. More precisely, the idea is to define so-called higher-order differences, and to specify that they must be all non-negative.

First, we want to be able to talk about those , such that for a fixed , and . To that end, we introduce a cone whose unit ball is exactly such elements. It is an adaptation of the definition given in [pse] for the case where , and we show in the same way that it is indeed a cone.

Definition 4 (Local Cone)

Let be a cone, , and . We call -local cone at , and we denote the cone endowed with the following norm:

We can show that whenever is a directed-complete cone, is also directed-complete.

For , we use (respectively ) for the set of all subsets of such that

is even (respectively odd).

We are now ready to introduce higher-order differences. Since we have only explicit addition, not subtraction, we define separately the positive part and the negative part of those differences: For , , , and , we define:

Definition 5

We say that is pre-stable if, for every , for every , , it holds that:

If is pre-stable, we will set . Observe that the quantity is actually symmetric in , i.e. stable under permutations of the coordinates of .

Definition 6

A function is called a stable function from to if it is pre-stable, sequentially Scott-continuous, and moreover there exists such that .

Definition 7

is the category whose objects are sequentially complete cones, and morphisms from to are the stable functions from to such that .

It was shown in [pse] that it is possible to endow with a cartesian closed structure. The product cone is defined as , and . The function cone is the set of all stable functions, with . It was also shown in [pse] that these cones are indeed sequentially complete, and that the lub in is computed pointwise. We will use also the cone of pre-stable functions from to , which is also sequentially complete.

3.3 A generalization of Bernstein’s theorem for pre-stable functions

We are now going to show an analogue of Bernstein’s Theorem for pre-stable functions on directed-complete cones. The idea is to first define an analogue of derivatives for pre-stable functions, and to show that pre-stable functions can be written as the infinite sum generated by an analogue of Taylor expansion on . This result is actually an application of McMillan’s work [mcmillan] in the setting of abstract domains. Here, we give the main steps of the construction directly on cones, and highlight some properties of the Taylor series which are true for cones, but not in the general framework McMillan considered.

3.3.1 Derivatives of a pre-stable function

We are now going, following McMillan [mcmillan], to construct derivatives for pre-stable functions on directed complete cones. This construction is based on the use of a notion of partition: a partition of is a multiset such that . We write when the multiset is a partition of . We will denote by the usual union on multiset: . We call the set of partitions of .

Definition 8 (Refinement Preorder)

If , are in , we says that if , and with each of the a partition of .

Observe that when and are partition of , means that is a more finely grained decomposition of . If is an -tuple in , we extend the refinement order to .

Lemma 4

Let be a lattice cone. Then for every , is a directed set.

  • We are going to use the following notion: we say that two non-zero elements and of are orthogonal, and we note , if . Let be . We first show that it cannot exist which is orthogonal to all the element of . Indeed, suppose that it is the case: we take such that . Then by hypothesis, . We can now use the decomposition property from Lemma 1. It means that , with . But since for all , , it folds that for all , and so , and we have a contradiction.

    Now, we are going to present a procedure to construct a partition of with , and . We can suppose that all elements of and are non-zero. We start form , , , and , . Through the procedure, we guarantee:

    • , , and ;

    • all the elements of and are non-zero;

    • , and (for the refinment order).

    Then at each step of the procedure, if is non empty, we do the following: let , and . Then we know that there is a , such that and are not orthogonal. We modify the variables as follows:

    At every step of the procedure presented above, the quantity:

    decreases. Indeed:

    • or we remove either of , or of , and then the statement above holds.

    • or we replace by , and by . Then we see that . Moreover, the pairs that were orthoganal before are still orthogonal: indeed for every with it holds that , and the same for .

    As a consequence, the procedure will terminates. It means that we reach a state where is empty, and all the invariants presented above hold. Then we see that , and .

    We are going to illustrate the procedure above on a very basic example. We consider the cone consisting of the positive quadrant of , endowed by the order defined as: if , and . We take two partitions of a vector : , and , where are taken as pictured in Figure 0(a). We are going to apply our procedure in order to obtain a refinment of both and . At the beginning, we have , , , .

    • The first step is represented in Figure 0(a). Observe that the procedure is actually non-deterministic: we may choose any with , , and and not orthonal. Here, we choose to start from . We take (and we represent it by a red vector in Figure 0(a)): it is going to be the first element of our new partition . Accordingly, we take . We know update the partition and into partitions of : becomes , and becomes where and are represented also in red in Figure 0(a).

    • The second step is represented in Figure 0(b). Observe that now and are orthogonal, so we have to choose another pair. Here, we choose . As before, we add to the glb of and : we obtain . Observe that now (as can be seen on Figure 0(b), , and so . So when we update the partition and , we take: , and where is represented in purple in Figure 0(b).

    • By doing again two steps of the procedure, we see that the final partition is . We cen see by looking at Figure 0(b) that it is indeed a refinment of both and ..

    (a) First step of the Procedure

    (b) Second Step of the Procedure
    Figure 1: Illustration of the Proof of Lemma4

Observe that, as a consequence, the refinement preorder turnsalso into a directed set.

Definition 9 (from [mcmillan])

Let be a lattice cone, a cone, and let be a pre-stable function. Then for every , and , we define as:

It holds (see [mcmillan] for more details) that is a non-increasing function whenever is pre-stable (it is shown in Lemma 3.2 of [mcmillan] by looking at the definition of higher-order differences). Since is a directed set, has a greatest lower bound whenever is a directed-complete lattice cone.

Definition 10 (from [mcmillan])

Let be a lattice cone, a directed-complete lattice cone, and a pre-stable function. Let be . Then the derivative of in at rank towards the direction is the function defined as

We are now going to illustrate Definition 10 on a basic case where we take , in order to highlight the link with differentiation in real analysis.

Example 3

We take and as the positive real half-line, and . Let be such that . Then:

We know already, since is pre-stable hence absolutely monotone as function on reals, that is convex, and moreover differentiable (see [widder]). From there, by considering a particular family of partitions, we can show that .

  • First, let be any partition of . Since is differentiable and convex, it holds that:

    As a consequence, we see that for any partition of , it holds that , and it implies that . To show the reverse inequality, it is enough to consider the particular family of partition of : we see that

Lemma 5

Let be a lattice cone, a directed complete cone, a pre-stable function from to . Let be . Then is a symmetric function such that moreover:

  • .

  • Both and are pre-stable functions from to .

  • The proof is given in Lemma 3.31 in [mcmillan]. It comes almost directly from Definition 10.

We have seen in Example 3 that our so-called derivatives of pre-stable functions play the same role as the differential of a differentiable function, which are actually linear operators . While the abstract domains considered in [mcmillan] do not have to be semi-modules, so have no notion of linearity, we are able to show in the complete cone case that the are linear in the sense of Lemma 6 below.

Lemma 6

Let , be two directed complete lattice cones, .

  • Let be a pre-stable function. Then is -linear, in the sense that, for each of its arguments, it commutes with the sum and multiplication by a scalar.

  • For any , the function is linear and directed Scott-continuous.

  • We are going to use the following auxiliary lemma:

    Lemma 7 (from [mcmillan])

    Let and be two directed cones, and linear and non-decreasing, such that moreover for all subset of directed for the reverse order, . Then is directed Scott-continuous.

    • Let be a directed subset of . We define . Since is directed, is directed for the reverse order, and as a consequence:. But we see that . Therefore, since is linear, . As a consequence (and again by linearity of ): .

    We are now going to show Lemma 6.

    • We first show that is -linear. The additivity is given by Lemma 3.72 of  [mcmillan]. The commutation with scalar multiplication is not proved on this form in  [mcmillan] because they have a more general notion of a system of partition. We first show that the result holds when is a rational number. To do that, we use the fact that is always a partition of . Then, let and such that both and are in . Let be two sequences of rational number such that tends to by below, and tends to by above. We see that:

      We take such that for every , : since is non-decreasing, we see that:

      Applying now the linearity for rational numbers, we see that for every :

      As a consequence:

      and by Scott-continuity of , it tells us that . We can now conclude: recall that Therefore: