 # A Verified Compiler for Probability Density Functions

Bhat et al. developed an inductive compiler that computes density functions for probability spaces described by programs in a simple probabilistic functional language. In this work, we implement such a compiler for a modified version of this language within the theorem prover Isabelle and give a formal proof of its soundness w.r.t. the semantics of the source and target language. Together with Isabelle's code generation for inductive predicates, this yields a fully verified, executable density compiler. The proof is done in two steps, using a standard refinement approach: first, an abstract compiler working with abstract functions modelled directly in the theorem prover's logic is defined and proven sound. Then, this compiler is refined to a concrete version that returns a target-language expression.

## Authors

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

### 1.1 Motivation

Random distributions of practical significance can often be expressed as probabilistic functional programs. When studying a random distribution, it is often desirable to determine its probability density function (PDF). This can be used to e. g. determine the expectation or sample the distribution with a sampling method such as Markov-chain Monte Carlo (MCMC).

In 2013, Bhat et al. presented a compiler that computes the probability distribution function of a program in the probabilistic functional language

Fun [bhat13]. They evaluated the compiler on a number of practical problems and concluded that it reduces the amount of time and effort required to model them in an MCMC system significantly compared to hand-written models.

Bhat et al. also stated that their eventual goal is the formal verification of such a compiler in a theorem prover [bhat12]. This has the advantage of providing guaranteed correctness, i. e. the result of the compilation is provably a PDF for the source expression, according to the formal semantics. This greatly increases the confidence one has in the compiler.

In this Master’s thesis, we implemented such a compiler for a similar probabilistic functional language in the interactive theorem prover Isabelle/HOL and formally proved its correctness.

### 1.2 Related work

This work is based on the work of Bhat et al. [bhat12, bhat13], which presents a density compiler for probability spaces described by expressions in the language Fun. This is a small functional language with basic arithmetic, Boolean logic, product and sum types, conditionals, and a number of built-in discrete and continuous distributions. It does, however, not support lists or recursion. The correctness proof is purely pen and paper, but the authors stated that formalisation in a proof assistant such as Coq is the ultimate goal [bhat12].

Park et al. [park05] developed a probabilistic extension to Objective CAML called . Their work focuses on sample generation, i. e. using an infinite stream of random numbers to compute samples of the distribution described by a probabilistic program. While Bhat et al. generate density functions of functional programs, Park et al. generate sampling functions. These are functions that map

to the sample space. The sampling function effectively uses a finite number of uniformly-distributed random numbers from the interval

and returns a sample of the desired random variable. This approach allows them to handle much more general distributions, even recursively-defined ones and distributions that do not have a density function, but it does not allow precise reasoning about these distributions (such as determining the exact expectation). No attempt at formal verification is made.

pGCL, another probabilistic language, has been formalised with a proof assistant – first in HOL by Hurd et al. [hurd], then in Isabelle/HOL by David Cock [cock, cock_isa]. pGCL contains a rich set of language features, such as recursion and probabilistic and non-deterministic choice. David Cock formally proved a large number of results about the semantics of the language and developed mechanisms for refinement and program verification; however, the focus of this work was verification of probabilistic programs, not compiling them to density functions. Bhat et al. mention that reconciling recursion with probability density functions is difficult [bhat12], so a feature-rich language such as pGCL is probably not suited for developing a density compiler.

Audebaud and Paulin-Mohring [audebaud] also implemented a probabilistic functional language with recursion in a theorem prover (namely Coq). Their focus was also on verification of probabilistic programs. While Cock uses a shallow embedding in which any type and operation of the surrounding logic of the theorem prover (i. e. HOL) can be used, Audebaud and Paulin-Mohring use a deep embedding with a restricted type system, expression structure, predefined primitive functions, etc. Like Bhat et al. and we, they also explicitly reference the Giry monad as a basis for their work.

### 1.3 Utilised tools

As stated before, we work with the interactive theorem prover Isabelle/HOL. Isabelle is a generic proof assistant that supports a number of logical frameworks (object logics) such as Higher-Order Logic (HOL) or Zermelo-Fraenkel set theory (ZF). The most widely used instance is Isabelle/HOL, which is what we use.

We heavily rely on Johannes Hölzl’s Isabelle formalisation of measure theory [hoelzl]

, which is already part of the Isabelle/HOL library. We also use Sudeep Kanav’s formalisation of the Gaussian distribution and a number of libraries from the proof of the Central Limit Theorem by Avigad et al.

[avigad], namely the notion of interval and set integrals and the Fundamental Theorem of Calculus for these integrals. All of these have been or will be moved to the Isabelle/HOL library by their respective maintainers as well.

### 1.4 Outline

In Sect. 2.1, we will explain the notation we will use and then give a brief overview of the mathematical basics we require – in particular the Giry Monad – in Sect. 2.2.

Section 3 contains the definition and the semantics of the source and target language. Section LABEL:sec_abstract defines the abstract compiler and gives a high-level outline of the soundness proof. Section LABEL:sec_concrete then explains the refinement of the abstract compiler to the concrete compiler and the final correctness result and evaluates the compiler on a simple example.

Section LABEL:sec_conclusion gives an overview of how much work went into the different parts of the project and what difficulties we encountered. It also lists possible improvements and other future work before summarising the results we obtained. Finally, the appendix contains a table of all notation and auxiliary functions used in this work for reference.

## 2 Preliminaries

### 2.1 Notation

In the following, we will explain the conventions and the notation we will use.

#### 2.1.1 Typographical notes

We will use the following typographical conventions in mathematical formulæ:

• Constants, functions, datatype constructors, and types will be typeset in regular roman font: , , , etc.000In formulæ embedded in regular text, we will often set types and constants in italics to distinguish them from the surrounding text.

• Free and bound variables (including type variables) are in italics: , , , etc.

• Isabelle keywords are in bold print: , , , etc.

• -algebras are typeset in calligraphic font: , , , etc.

• Categories are typeset in Fraktur: , , , etc.

• File names of Isabelle theories are set in a monospaced font: PDF_Compiler.thy.

#### 2.1.2 Deviations from standard mathematical notation

In order to maintain coherence with the notation used in Isabelle, we will deviate from standard mathematical notation in the following ways:

• As is convention in functional programming (and therefore in Isabelle), function application is often written as instead of . It is the operation that binds strongest and it associates to the left, i. e. is and is .

• The integral over integrand with variable and the measure is written as

• The special notion of a non-negative integral111This is a very central concept in the measure theory library in Isabelle. We will mostly use it with non-negative functions anyway, so the distinction is purely formal. is used; this integral ‘ignores’ the negative part of the integrand. Effectively:

 ∫+x. f x ∂μ^=∫max(0,f(x))dμ(x)
• Lambda abstractions, such as , are used to specify functions. The corresponding standard mathematical notation would be .

Table 1 lists a number of additional mathematical notation.

#### 2.1.3 Semantics notation

We will always use to denote a type environment, i. e. a function from variable names to types, and to denote a state, i. e. a function from variable names to values. Note that variable names are always natural numbers as we use de Bruijn indices. The letter (Upsilon) will be used to denote density contexts, which are used in the compiler.

We also employ the notation resp. to denote the insertion of a new variable with the type (resp. value ) into a typing environment (resp. state). This is used when entering the scope of a bound variable; the newly inserted variable then has index 0 and all other variables are incremented by 1. The precise definition is as follows:222 Note the analogy to the notation for prepending an element to a list. This is because contexts with de Bruijn indices are generally represented by lists, where the -th element is the entry for the variable , and inserting a value for a newly bound variable is then simply the prepending operation.

 (y∙f)(x)={yif x=0f(x−1)otherwise

We use the same notation for inserting a new variable into a set of variables, shifting all other variables, i. e.:

 x∙V={x}∪{y+1 | y∈V}

In general, the notation and variations thereof will always mean ‘The expression has type in the type environment , whereas and variations thereof mean ‘The expression compiles to under the context .

### 2.2 Mathematical basics

The category theory part of this section is based mainly on a presentation by Ernst-Erich Doberkat [doberkat]. For a more detailed introduction, see his textbook [doberkat_book] or the original paper by Michèle Giry [giry].

#### 2.2.1 Basic measure spaces.

There are two basic measure spaces we will use:

##### Counting space.

The counting space on a countable set has

• the carrier set

• the measurable sets , i. e. all subsets of

• the measure , i. e. the measure of a subset of is simply the number of elements in that set (which can be )

##### Borel space.

The Borel space is a measure space on the real numbers which has

• the carrier set

• the Borel -algebra as measurable sets, i. e. the smallest -algebra containing all open subsets of

• the Borel measure as a measure, i. e. the uniquely defined measure that maps real intervals to their lengths

#### 2.2.2 Sub-probability spaces.

A sub-probability space is a measurable space with a measure such that every set has a measure , or, equivalently,  .

For technical reasons, we also assume  . This is required later in order to define the operation in the Giry monad in a convenient way within Isabelle. This non-emptiness condition will always be trivially satisfied by all the measure spaces used in this work.

#### 2.2.3 The category Meas.

Note that:

• For any measurable space , the identity function is --measurable.

• For any measurable spaces , , , an --measurable function , and a --measurable function , the function is --measurable.

Therefore, measurable spaces form a category where:

• the objects of the category are measurable spaces

• the morphisms of the category are measurable functions

• the identity morphism is the identity function

• morphism composition is function composition

#### 2.2.4 Kernel space.

The kernel space of a measurable space is the natural measurable space over the measures over with a certain property. For our purposes, this property will be that they are sub-probability measures (as defined in Sect. 2.2.2).

Additionally, a natural property the kernel space should satisfy is that measuring a fixed set while varying the measure within should be a Borel-measurable function; formally:

 For all X∈A, f:S(A,A)→R, \uplambdaμ. μ(X)\,\ is\ \,S(A,A)-Borel-measurable
We can now simply define as the smallest measurable space with the carrier set
that fulfils this property, i. e. we let with
where is the Borel -algebra on .

Additionally, for a measurable function , we define , where denotes the push-forward measure (or image measure)333 .. Then maps objects of to objects of and morphisms of to morphisms of . We can thus see that is an endofunctor in the category , as and  .

The Giry monad naturally captures the notion of choosing a value according to a (sub-)probability distribution, using it as a parameter for another distribution, and observing the result.

Consequently, (or ) yields a Dirac measure, i. e. a probability measure in which all the ‘probability’ lies in a single element, and (or ) integrates over all the input values to compute one single output measure. Formally, for measurable spaces and , a measure on , a value , and an --measurable function :

 return(A,A) x:=\uplambdaX. {1if x∈X0otherwiseμ≫=f:=\uplambdaX. ∫x. f(x)(X)∂μ

Unfortunately, restrictions due to Isabelle’s type system require us to determine the -algebra of the resulting measurable space for since this information cannot be provided by the type of . This can be done with an additional parameter, but it is more convenient to define bind in such a way that it chooses an arbitrary value and takes the -algebra of (or the count space on if is empty)444Note that for any -measurable function , the -algebra thus obtained is independent of which value is chosen, by definition of the kernel space..

This choice is somewhat non-standard, but the difference is of no practical significance as we will not use on empty measure spaces.

To simplify the proofs for bind, we instead define the join operation (also known as in category theory) first and use it to then define bind. The join operation ‘flattens’ objects, i. e. it maps an element of to one of  . Such an operation can be naturally defined as:

 join μ=\uplambdaX. ∫μ′. μ′(X)∂μ

Note that in Isabelle, join has an additional explicit parameter for the measurable space of the result to avoid the problem we had with bind. This makes expressions containing join more complicated; this is, however, justified by the easier proofs and will be unproblematic later since we will never use join directly, only bind.

Now, bind can be defined using join in the following way, modulo handling of empty measure spaces555, i. e. lifting a function on values to a function on measure spaces, is done by the distr function in Isabelle. This function was already defined in the library, which simplifies our proofs about bind, seeing as some of the proof work has already been done in the proofs about distr.:

 μ≫=f=join (S(f)(μ))=join (f(μ))

#### 2.2.6 The ‘do’ syntax.

For better readability, we employ a Haskell-style do notation’ for operations in the Giry monad. The syntax of this notation is defined recursively, where stands for a monadic expression and stands for arbitrary ‘raw’ text:

 do {M}^=Mdo {x←M; ⟨pattern⟩}^=M≫=(\uplambdax. do {⟨pattern⟩})

## 3 Language Syntax and Semantics

The source language used in the formalisation was modelled after the language Fun described by Bhat et al. [bhat13]; similarly, the target language is almost identical to the target language used by Bhat et al. However, we have made the following changes in our languages:

• Variables are represented by de Bruijn indices to avoid handling freshness, capture-avoiding substitution, and related problems.

• No sum types are supported. Consequently, the command is replaced with an command. Furthermore, booleans are a primitive type rather than represented as  .

• The type is called and it represents a real number with absolute precision as opposed to an IEEE floating point number.

• Beta and Gamma distributions are not included.

In the following sections, we give the precise syntax, typing rules, and semantics of both our source language and our target language.

### 3.1 Types, values, and operators

The source language and the target language share the same type system and the same operators. Figure 1 shows the types and values that exist in our languages.666Note that , , and stand for the respective Isabelle types, whereas , , and stand for the source-/target-language types. Additionally, standard arithmetical and logical operators exist.

All operators are total, meaning that for every input value of their parameter type, they return a single value of their result type. This requires some non-standard definitions for non-total operations such as division, the logarithm, and the square root – we define them to be 0 outside their domain. Non-totality could also be handled by implementing operators in the Giry monad by letting them return either a Dirac distribution with a single result or, when evaluated for a parameter on which they are not defined, the null measure. This, however, would probably complicate many proofs significantly.

To increase readability, we will use the following abbreviations: and stand for and , respectively. , , etc. will be omitted in expressions when their presence is implicitly clear from the context. stands for and for  .

### 3.2 Auxiliary definitions

A number of auxiliary definitions are used in the definition of the semantics; for a full list of auxiliary functions see table 2. The following two notions require a detailed explanation:

#### 3.2.1 Measure embeddings.

A measure embedding is the measure space obtained by ‘tagging’ values in a measure space with some injective function (in fact, will always be a datatype constructor). For instance, a set of values of type can naturally be measured by stripping away the constructor and using a measure on real numbers (e. g. the Lebesgue-Borel measure) on the resulting set of reals. Formally:

 embed_measure (A,A,μ) f = (f(A), {f(X) | X∈A}, \uplambdaX. μ(f−1(X)∩A))

#### 3.2.2 Stock measures.

The stock measure for a type is the ‘natural’ measure on values of that type. This is defined as follows:

• For the countable types , , : the count measure over the corresponding type universes

• For type : the embedding of the Lebesgue-Borel measure on with

• For : the embedding of the product measure

 stock_measure t1 ⨂ stock_measure t2

with

Note that in order to save space and increase readability, we will often write instead of in integrals.

#### 3.2.3 The state measure.

Using the stock measure, we can also construct a measure on states in the context of a typing environment . A state on the variables is a function that maps a variable in to a value. A state is well-formed w. r. t. to and if it maps every variable to a value of type and every variable to .

We now fix and a finite and consider the set of well-formed states w. r. t. and . Another representation of these states are tuples in which the -th component is the value of the -th variable in . The natural measure that can be given to such tuples is then the finite product measure of the stock measures of the types of the variables:

 state_measure Γ V:=⨂x∈Vstock_measure (Γ x)

### 3.3 Source language

Figures 2 and 3 show the syntax resp. the typing rules of the source language. Figure 4 defines the source language semantics as a primitively recursive function. Similarly to the abbreviations mentioned in Sect. 3.1, we will omit when its presence is implicitly obvious from the context; e. g. if in some context, is an expression and is a constant real number, we will write as  .

#### 3.3.1 Built-in distributions.

Our language contains the built-in distributions Bernoulli, UniformInt, UniformReal, Gaussian, and Poisson. The uniform distributions are parametrised with a pair of numbers and return a uniform distribution over the interval . The Gaussian distribution is parametrised with a pair of real numbers

, i. e. mean and standard deviation.

For invalid parameters, i. e. or , the built-in distributions return the null measure.

### 3.4 Deterministic expressions

We call an expression deterministic (written as ) if it contains no occurrence of or . Such expressions are of particular interest: if all their free variables have a fixed value, they return precisely one value, so we can define a function 777In Isabelle, the expression randomfree is used instead of deterministic, hence the ‘rf’ suffix. This is in order to emphasise the syntactical nature of the property. Additionally, it is worth noting that a syntactically deterministic expression is not truly deterministic if the variables it contains are randomised over, which is the case sometimes. that, when given a state and a deterministic expression , returns this single value.

The definition is obvious and leads to the following equality (assuming that is deterministic and well-typed and is a valid state):

 expr_sem σ e = return (expr_sem_rf σ e)

This property will later enable us to also convert deterministic source-language expressions into ‘equivalent’ target-language expressions.

### 3.5 Target language

The target language is again modelled very closely after the one used by Bhat et al. [bhat13]. The type system and the operators are the same as in the source language, but the key difference is that while expressions in the source language return a measure space on their result type, the expressions in the target language always return a single value.

Since our source language lacks sum types, so does our target language. Additionally, our target language differs from that used by Bhat et al. in the following respects:

• Our language has no function types; since functions only occur as integrands and as final results (as the compilation result is a density function), we can simply define integration to introduce the integration variable as a bound variable and let the final result contain a single free variable with de Bruijn index , i. e. there is an implicit abstraction around the compilation result.

• Evaluation of expressions in our target language can never fail. In the language by Bhat et al., failure is used to handle undefined integrals; we, on the other hand, use the convention of Isabelle’s measure theory library, which returns 0 for integrals of non-integrable functions. This has the advantage of keeping the semantics simple, which makes proofs considerably easier.

• Our target language does not have Let bindings, since, in contrast to the source language, they would be semantically superfluous here. However, they are still useful in practice since they yield shorter expressions and can avoid multiple evaluation of the same term; they could be added with little effort.

Figures 5, 3.5, and LABEL:cexpr_sem show the syntax, typing rules, and semantics of the target language.