DeepAI

# A Probabilistic and Non-Deterministic Call-by-Push-Value Language

There is no known way of giving a domain-theoretic semantics to higher-order probabilistic languages, in such a way that the involved domains are continuous or quasi-continuous - the latter is required to do any serious mathematics. We argue that the problem naturally disappears for languages with two kinds of types, where one kind is interpreted in a Cartesian-closed category of continuous dcpos, and the other is interpreted in a category that is closed under the probabilistic powerdomain functor. Such a setting is provided by Paul B. Levy's call-by-push-value paradigm. Following this insight, we define a call-by-push-value language, with probabilistic choice sitting inside the value types, and where conversion from a value type to a computation type involves demonic non-determinism. We give both a domain-theoretic semantics and an operational semantics for the resulting language, and we show that they are sound and adequate. With the addition of statistical termination testers and parallel if, we show that the language is even fully abstract - and those two primitives are required for that.

11/21/2021

We give a commutative valuations monad Z on the category DCPO of dcpo's ...
01/31/2021

### Commutative Monads for Probabilistic Programming Languages

A long-standing open problem in the semantics of programming languages s...
01/26/2022

### Polarized Subtyping

Polarization of types in call-by-push-value naturally leads to the separ...
09/26/2022

### Focusing on Liquid Refinement Typing

We present a foundation systematizing, in a way that works for any evalu...
06/30/2021

### A Domain-Theoretic Approach to Statistical Programming Languages

We give a domain-theoretic semantics to a statistical programming langua...
11/11/2019

### Recurrence Extraction for Functional Programs through Call-by-Push-Value (Extended Version)

The main way of analyzing the complexity of a program is that of extract...
11/27/2017

### Measurable Cones and Stable, Measurable Functions

We define a notion of stable and measurable map between cones endowed wi...

## 1 Introduction

A central problem of domain theory is the following: is there any full Cartesian-closed subcategory of the category of continuous dcpos that is closed under the probabilistic powerdomain functor [14]? Solving the question in the positive would allow for a simple semantics of probabilistic higher-order languages, where types are interpreted as certain continuous dcpos.

However, we have a conundrum here. The category itself is closed under [13], but is not Cartesian-closed [2, Exercise 3.3.12(11)]. Among the Cartesian-closed categories of continuous domains, none is known to be closed under , and most, such as the category of bc-domains or the category of continuous complete lattices, definitely are not [14].

Instead of solving this problem, one may wonder whether there are other kinds of domain-theoretic semantics that would be free of the issue. Typically, can we imagine having two classes of types? One would be interpreted in a category of continuous dcpos that is closed under for example, although we will prefer the category of pointed coherent continuous dcpos (see below). The other would be interpreted in a Cartesian-closed category of continuous dcpos, and we will use . Such a division in two classes of types is already present in Paul B. Levy’s call-by-push-value [16] (a.k.a. CBPV), and although the division is justified there as to be between value types and computation types, the formal structure will be entirely similar.

#### Outline

We briefly review some related work in Section 2, and give a few basic working definitions in Section 3. We define our probabilistic call-by-push-value languages in Section 4, explaining the design decisions we had to make in the process—notably the extra need for demonic non-determinism. We give domain-theoretic and operational semantics there, too. We establish soundness in Section 5 and adequacy in Section 6, to the effect that for every ground term of the specific type

, the probability

that must terminate, as defined from the operational semantics, coincides with a similar notion of probability defined from the denotational semantics. In Section 7, we review a few useful consequences of adequacy, among which the coincidence between the applicative preorder and the contextual preorder (both will be defined there), a fact sometimes called Milner’s Context Lemma in the context of PCF (see [19, Theorem 8.1]). We show that, among the languages we have defined, CBPV is not (inequationally) fully abstract in Section 8, and that adding a parallel if operator does not make it fully abstract, but that adding both and a statistical termination tester operator (as in [11]) results in an (inequationally) fully abstract language. The latter is proved in Section 9. We conclude and list a few remaining open questions in Section 10.

#### Acknowledgments

I wish to thank Zhenchao Lyu and Xiaodong Jia, who participated in many discussions on the theme of this paper, and Ohad Kammar, who kindly pointed me to [21].

## 2 Related Work

Call-by-push-value (CBPV) is the creation of Paul B. Levy [16] (see also the book [17]), and is a typed higher-order pure functional language. It was originally meant as a subsuming paradigm, embodying both call-by-value and call-by-name disciplines.

The first probabilistic extension of CBPV was proposed recently by Ehrhard and Tasson [4], and its denotational semantics rests on probabilistic coherence spaces. Their typing discipline is inspired by linear logic, and they also include a treatment of general recursive types, which we will not. In contrast, our extension of CBPV will have first-class types of subprobability distributions , and will also include a type former for demonic non-determinism (a.k.a., must-non-determinism).

Statistical probabilistic programming has attracted quite some attention recently, and quasi-Borel spaces and predomains have recently been used to give adequate semantics to typed and untyped probabilistic programming languages, see [21]. The latter describes another way of circumventing the problem we stated in the introduction. One important point that Vákár, Kammar and Staton achieve is the commutativity of the probabilistic choice monad, at all, even higher-order, types. In standard domain theory, the monad is known to be commutative in full subcategories of only. That would be enough motivation to attempt to solve the problem stated in the introduction, of finding a Cartesian-closed category, closed under [14]. We also implement a commutative monad in a higher-order setting; our way of circumventing the problem is merely different.

There is a large body of literature concerned with the question of full abstraction for PCF-like languages. The first paper on the subject is due to G. Plotkin [18], who defined the language PCF, asked all the important questions (soundness, adequacy, full abstraction, definability), and answered all of them, except for the question of finding a fully abstract denotational model of PCF without parallel if, a question that was solved later, through game semantics notably [12, 1]. Th. Streicher’s book [19] is an excellent reference on the subject.

Probabilistic coherence spaces provide a fully abstract semantics for a version of PCF with probabilistic choice, as shown by Ehrhard, Tasson, and Pagani [5]. The already cited paper of Ehrhard and Tasson [4] gives an analogous result for their probabilistic version of CBPV. Our work is concerned with languages with domain-theoretic semantics instead, and our former work [11] gives soundness, adequacy and full abstraction results for PCF plus angelic non-determinism, and for PCF plus probabilistic choice and angelic non-determinism plus so-called statistical termination testers. We will see that CBPV naturally calls for a form of demonic, rather than angelic, non-determinism.

## 3 Preliminaries

We refer to [8, 2, 10] for material on domain theory and topology. A dcpo is pointed if and only if it has a least element . Dcpos are always equipped with their Scott topology. and are dcpos, with the usual ordering. The way-below relation is written : if and only if for every directed family such that , there is an such that . A dcpo is continuous if and only if every element is the supremum of a directed family of elements way-below it. In that case, the sets form a base of open sets of the Scott topology. In general, a basis of is a set of elements of such that, for every , is directed and has as supremum. A dcpo is continuous if and only if it has a basis. Then the sets , , also form a base of the Scott topology.

We write for the specialization ordering of a topological space. For a dcpo , that is the original ordering on . A subset of a topological space is saturated if and only if it is upwards-closed in , if and only if it is the intersection of its open neighborhoods. A topological space is locally compact if and only if for every , for every open neighborhood of , there is a compact saturated set such that . ( denotes the interior of .) In that case, for every compact saturated subset and every open neighborhood of , there is a compact saturated set such that . A topological space is coherent if and only if the intersection of any two compact saturated subsets is compact. It is well-filtered if and only if for every filtered family of compact saturated sets (filtered meaning directed for reverse inclusion), every open neighborhood of already contains some . In a well-filtered space, the intersection of such a filtered family is compact saturated. A stably compact space is a , well-filtered, locally compact, coherent and compact space . Then the complements of compact saturated sets form another topology on , the cocompact topology, and with the cocompact topology is the de Groot dual of . For every stably compact space, . A pointed, coherent, continuous dcpo is always stably compact.

Given two dcpos and , denotes the dcpo of all Scott-continuous maps from to , ordered pointwise. Directed suprema are also pointwise, namely for every directed family in .

## 4 The Languages CBPV(D,P) and CBPV(D,P)+pifz+◯

The first language we introduce is called CBPV: it is a call-by-push-value language with emonic non-determinism and robabilistic choice. We will explain below why we do not consider just probabilistic choice, but also demonic non-determinism.

### 4.1 Types and their Semantics

We consider the following grammar of types:

 σ,τ,… ::=Uτ––∣unit∣int∣σ×τ∣Vτ σ––,τ––,… ::=Fτ∣σ→τ––.

The types , , …, are the value types, and the types , , …, are the computation types, following Levy [16]. Our types differ from Levy’s: we do not have countable sums in value types or countable products in computation types, we write instead of , and we have a primitive type of integers; the main difference is the construction, denoting the type of subprobability valuations on the space of elements of type .

We have already said in the introduction that computation types will be interpreted in the category of continuous complete lattices. Value types will give rise to pointed, coherent, continuous dcpos :

• for every computation type , we will define as : being a continuous complete lattice, it is in particular pointed, coherent, and a continuous dcpo;

• will be Sierpiński space with ;

• will be , with the ordering that makes least and all integers be pairwise incomparable;

• will be , where denotes the dcpo of all subprobability valuations on the space .

A subprobability valuation on is a map from the lattice of open subsets of to which is strict (), Scott-continuous, and modular (). When is a continuous dcpo, so is [13, Corollary 5.4]. It is pointed, since the zero valuation is least in . If is also coherent, then is stably compact, see below. Hence is indeed a pointed, coherent continuous dcpo.

The fact that is stably compact for every coherent continuous dcpo is folklore. We argue as follows. The lift of , obtained by adding a fresh bottom element to , is stably compact. Then the space of all probability valuations , i.e., such that , is stably compact in the weak upwards topology [3, Theorem 39]. The latter has a subbase of open sets of the form , for every open subset of and . The restriction map is a homeomorphism from onto , both with their weak upwards topology, with inverse . Hence is stably compact in its weak upwards topology. Since is continuous, the latter coincides with the Scott topology, as shown by [15, Satz 8.6], see also [20, Satz 4.10].

It might seem curious that probabilistic non-determinism arises, as , among the value types. I have no philosophical backing for that, but this is somehow forced upon us by the mathematics.

Similarly, computation types will give rise to continuous complete lattices —notably will be the continuous complete lattice of all Scott-continuous maps from to —, but we have to decide on an interpretation of types of the form .

If we had decided to interpret computation types as bc-domains instead of continuous complete lattices, then a natural choice would be to define as Ershov’s bc-hull of [7]. (Bc-domains are, roughly speaking, continuous complete lattices that may lack a top element.) As Ershov notices, “the construction of a bc-hull in the general case is highly nonconstructive (using a Zorn’s lemma)” (ibid., page 13). Fortunately, the bc-hull of a space is a natural subspace of the Smyth powerdomain of , at least when is a coherent algebraic dcpo (ibid., Corollary B), and is easier to work with. Explicitly, is the poset of all non-empty compact saturated subsets of , ordered by reverse inclusion, and is used to interpret demonic non-determinism in denotational semantics. When is well-filtered and locally compact, is also a continuous dcpo, and it is a bc-domain provided is also compact and coherent. We shall see below that , the poset of all (possibly empty) compact saturated subsets of —alternatively, plus an additional top element —, is a continuous complete lattice whenever is a stably compact space, and that would make a good candidate for .

For technical reasons related to adequacy, we will need a certain map below to be strict, i.e., to map to . (Technically, this is needed so that the denotational semantics of the construction , to be introduced below, be strict in that of , in order to validate the fact that loops forever if does.) This will be obtained by defining as instead, where is the lift of , obtained by adding a fresh element below all others.

We recapitulate:

• ;

• .

Let us check that has the required property of being a continuous complete lattice, and let us prove some additional properties that we will need later. We start with the similar properties of . We let map every to .

###### Proposition 4.1

Let be a stably compact space. Then:

1. is a continuous complete lattice, and is way-below if and only if ;

2. For every continuous complete lattice , for every continuous map , there is a Scott-continuous map such that , and it is defined by .

3. , .

Proof. 1. This is well-known, but here is a brief argument. The elements of are exactly the closed subsets in the de Groot dual of , and the closed sets of any topological space always form a complete lattice. Note that the supremum of an arbitrary family in is .

Given any compact saturated subset of , the family of compact saturated neighborhoods of is filtered, and has as intersection. Indeed, since is saturated, it is the intersection of its open neighborhoods; for every open neighborhood of , local compactness implies that there is a compact saturated set such that ; applying this to shows that is non-empty, and given , applying it to , shows that contains an element included in both and .

It follows that, if , then contains an element of , hence in particular an open neighborhood of . Conversely, if where is open, then for every directed family in such that , contains some by well-filteredness, hence . Therefore .

Finally, since every in is the filtered intersection of the elements of , it is the supremum of the directed family , and we have just argued that every element of is way-below , showing that is continuous.

2. We define as . This satisfies , and is monotonic. Note that this is defined even when is empty, in which case is the top element of . In order to show that is Scott-continuous, let be a directed family in , and . We wish to show that ; the converse inclusion is by monotonicity. To this end, we let be an element of way-below . Since for every , every element of is in the open set . Then is included in , so by well-filteredness some is also included in . Then for every , so . Since that holds for every , the desired inequality follows.

3. Easy check. Note that item 2 does not state that is unique; we have just chosen the largest one. A similar construction is well-known for . Proposition 4.1 establishes the essential properties needed to show that defines a monad on the category of stably compact spaces, and that is not only well-known, but we will not require as much.

We turn to . We again write for the function that maps to , this time from to . Below, we again write for the extension of to . This should not cause any confusion with the map of Proposition 4.1, since the two maps coincide on . Note that is now strict.

###### Proposition 4.2

Let be a stably compact space. Then:

1. is a continuous complete lattice, and is way-below if and only if , or and ;

2. For every continuous complete lattice , for every continuous map , there is a strict Scott-continuous map such that . This is defined by , and for every , .

3. , .

4. For every stably compact space , for every Scott-continuous map , and for every Scott-continuous map from to a continuous complete lattice , .

Proof. 1. The lift of a continuous complete lattice is a continuous complete lattice, and is always way-below every element.

2. Easy.

3. We check the second inequality. That follows from Proposition 4.1, item 3 if . If , then and . Similarly if .

4. Fix . If , then by strictness. Henceforth, we assume that .

If for some , then , so , and , since is strict. Henceforth, we assume that for every .

We claim that is compact. Let be a directed family of open subsets of whose union contains . For every , is compact and included in , so for some . Hence . Since is compact, for some , whence .

is also saturated in , hence an element of , and therefore also of . It follows that this is the infimum of the elements , , hence is equal to . Therefore . Item 4 above is part of the properties needed to check that defines a monad on . We will not expand on that.

### 4.2 Syntax

We define the syntax of our language CBPV inductively as follows, using the notation (or ) to say “ is a term of type ” (resp., ).

• There are countably infinitely many variables , , …of each value type , and they are all terms of type . There is no variable of any computation type, as in Levy’s original CBPV. We will write , , …, when the type is unimportant or clear from context.

• For every term , .

• For all terms and , . (Levy writes as .)

• For every , .

• For every , for every , . In order to comply with the spirit of CBPV, we extend the notation to the case where has an arbitrary computation type , by induction on , by defining where as , where is fresh.

• For every , .

• For every , .

• .

• For every , .

• For all , and .

• For all and (where is any value or computation type), .

• For all and , .

• For every , and .

• For all and , .

• For every , .

• For every and every , .

• For all , .

• For all , .

• . We extend this to for every computation type , by defining as .

• For every , .

The variable is bound in , in , and in , and its scope is in all three cases. We omit the standard definition of -renaming and of capture-avoiding substitution. Although can only be of a value type, we can still define recursive objects of computation types: defining so that and have the same semantics can be done by writing , where is substitution of for the fresh variable in . This kind of trick is typical of CBPV.

Using recursion at value types would allow one to define interesting values. For example, we can define the uniform distribution on

by the term , which operates via a form of rejection sampling.

We will also consider an extension of CBPV called CBPV, obtained by admitting the following additional clauses:

• For every , for every , (statistical termination tester).

• For every , for all , (parallel if). We extend the notation to the case where and have an arbitrary computation type by letting denote when , have type , where is a fresh variable.

The language CBPV is obtained by admitting only the second one as extra clause, while CBPV only admits the first one as extra clause.

### 4.3 Denotational Semantics

The denotational semantics is given by a family of Scott-continuous map , one for each (resp., ), from the dcpo to : see Figure 1, where the bottom two clauses are specific to CBPV, resp. to CBPV, and the two of them together are specific to CBPV. The elements of are called environments and are seen as maps from variables to values. We use the notation to denote the function that maps each to . For every , and every , we write for the environment that maps to and every variable to . The operator maps every Scott-continuous map from a pointed dcpo to itself, to its least fixed point . The Dirac mass at is the probability valuation such that if , otherwise. For every continuous map , is the continuous map from to defined by for every open subset of . For future reference, we note that , and that, for every continuous map ,

 ∫y∈Yh(y)df†(ν)=∫x∈X(∫y∈Yh(y)df(x))dν. (1)

Implicit here is the fact that the map is itself continuous. Also, integration is linear in both the integrated function and the continuous valuation , and Scott-continuous in each. These facts can be found in Jones’ PhD thesis [13].

The fact that the semantics is well-defined and continuous in is standard. Note the use of binary infimum () in the semantics of and of , for which we use the following lemma.

###### Lemma 4.3

Let be a continuous complete lattice.

1. The infimum map is Scott-continuous.

2. For any two continuous maps , where is a any topological space, the infimum is computed pointwise: .

Proof. Item 1 is well-known. Explicitly, one must show that for every directed family with supremum in , for every , : for every , is below and below some , hence below for some .

As for item 2, the composition of with is continuous by item 1, is below and , and is clearly above any lower bound of and .

### 4.4 Operational Semantics

We choose an operational semantics in the style of [11]. It operates on configurations, which are pairs of an evaluation context and a term . The deterministic part of the calculus will be defined by rewrite rules between computations. For the probabilistic and non-deterministic part of the calculus, we will rely on judgments , which state, roughly, that the probability that computation terminates, starting from , is larger than .

The elementary contexts, together with their types (where , are value or computation types) are defined by:

• , for every and every computation type ;

• for every ;

• , for every computation type ;

• ;

• for all , where is any value or computation type;

• for every ;

• and , for all value types and ;

• , for every .

The initial contexts are , and . For every elementary or initial context and every , we write for the result of replacing the unique occurrence of the hole in (after removing the outer square brackets) by . E.g, .

A context (of type ) is a finite list () where is an initial context, , …, are elementary contexts, and , , and . We then write for .

Note that the contexts are defined in exactly the same way for CBPV and for CBPV, CBPV, and CBPV.

The configurations of the operational semantics are pairs where and , for arbitrary value or computation types . The rules of the operational semantics are given in Figure 2. The last row is specific to CBPV, CBPV, or to CBPV. The first rewrite rule—the redex discovery rule —applies provided is an elementary context. The notation denotes capture-avoiding substitution of for in .

The judgments are defined for all terms (where is a value or computation type), contexts , and , and mean that is way-below the probability of termination of (i.e., either or is strictly less than the probability that terminates). Since induces non-deterministic choice, we really mean the probability of must-termination, namely that, in whichever way the non-determinism involved in the use of the operator is resolved (evaluating left, or right), the final probability is larger than .

## 5 Soundness

We write for , where sups are taken in .

We let the rank of a type be for a value type that is not of the form , for types of the form , and for computation types. This will play a key role in our soundness proof, for the following reason: for every elementary or initial context , the rank of is less than or equal to the rank of . Hence if is of type , and is of type , then every has rank between those of and .

We will also need to define the semantics of contexts so that for every and for every environment . is the composition of , , , …, , where:

• maps to ,

• ,

• is the identity map,

• maps to and otherwise adds one,

• maps to and otherwise subtracts one,

• maps to , every non-zero number to and to ,

• maps to , and to ,

• is first projection,

• is second projection,

• ,

• , and

• maps to .

###### Proposition 5.1 (Soundness)

Let , , where is a value or computation type, and let . In CBPV, in CBPV, in CBPV, and in CBPV:

1. For every , if is derivable, then either and , or and for every , .

2. If then , otherwise for every , .

Proof. Item 2 is an easy consequence of item 1, which we prove by induction on the derivation.

In the case of the first rule (), , and . For every , we have , so , and certainly for every .

The case of the second rule is obvious.

The case of the leftmost rule of the next row follows from the observation that if , then . We use the standard substitution lemma in the case of -reduction (: the value of the left-hand side is , and the value of the right-hand side is ). In the case of , we also use the fact that (Proposition 4.2, item 2). In the case of , we use the equality and the substitution lemma.

By our observation on ranks, if , where and for each , then all the types are computation types (rank ). In that case, can only be of one of the two forms , . (Further inspection would reveal that the first case is impossible, but we will not need that yet.) We now observe that in each case, maps top to top: in the case of , this is by Proposition 4.2, item 3. It follows that also maps top to top, whence . As a consequence, , and the claim that for every , is vacuously true: the rule that derives for every is sound.

Similarly, and still assuming , for each , preserves binary infima. When