 # Formalizing computability theory via partial recursive functions

We present a formalization of the foundations of computability theory in the Lean theorem prover. We use primitive recursive functions and partial recursive functions as the main objects of study, including the construction of a universal partial recursive function and a proof of the undecidability of the halting problem. Type class inference provides a transparent way to supply Gödel numberings where needed and encapsulate the encoding details.

## Code Repositories

### mathlib-CPP2019

mathlib library, prepared for the paper "Formalizing computability theory via partial recursive functions"

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1. Introduction

Computability theory is the study of the limitations of computers, first brought into focus in the 1930’s by Alan Turing by his discoveries on the existence of universal Turing machines and the unsolvability of the halting problem

(Turing, 1937). In the following years Alonzo Church described (Church, 1936) the -calculus as a model of computation, and Kleene proposed the -recursive functions; that these all give the same collection of “computable functions” gave credence to the thesis (Kleene, 1943) that this is the “right” notion of computation, and all others are equivalent in power. Today, this work lies at the basis of programming language semantics and the mathematical analysis of computers.

Complexity theory is in some sense a refinement of computability theory, in asking not “what can be computed” but “what can be computed in a reasonable time”. Methods of asymptotic analysis of algorithms are now commonplace in computer science, and problems such as

, with its million dollar bounty, have spurred a great deal of research on classification of the difficulty of decidable problems. But this theory is still almost completely unformalized, and in this paper we aim to make some initial steps toward a flexible and usable foundation for this research. We will not cover any complexity theory in this paper, but our experiments in computability theory are very promising, and much of the infrastructure described here will be directly applicable or easily adapted.

Like many areas of mathematics, both computability theory and complexity theory remain somewhat “formally ambiguous” about their foundations, in the sense that most theorems and proofs can be stated with respect to a number of different concretizations of the ideas in play. For example, in the equation , what is ? It is the set of polynomial time computable problems or languages, but whether a “language” is defined as a subset of or a subset of seems not to matter too much, and an individual author may choose the representation that is most convenient for the present purpose.

This formal ambiguity is somewhat frustrating for a formalizer, who would prefer some universal conventions, but it also provides some freedom to pick the representation that fits best with the formal system. This is seen even more prominently in computability theory, where we have three or four competing formulations of “computable”, which are all equivalent but each present their own view on the concept.

As a pragmatic matter, Turing machines have become the de facto standard formulation of computable functions, but they are also notorious for requiring a lot of tedious encoding in order to get the theory off the ground, to the extent that the term “Turing tarpit” is now used for languages in which “everything is possible but nothing of interest is easy” (Perlis, 1982). Asperti and Riccoti (Asperti and Ricciotti, 2012) have formalized the construction of a universal Turing machine in Matita, but the encoding details make the process long and arduous. Norrish (Norrish, 2011) uses the lambda calculus in HOL4, which is cleaner but still requires some complications with respect to the handling of partiality and type dependence.

Instead, we build our theory on Kleene’s theory of -recursive functions. In this theory, we have a collection of functions , in which we can do elementary operations on plus the ability to do recursive constructions on the natural number arguments. This produces the primitive recursive functions, and adding an unbounded recursion operator gives these functions the same expressive power as Turing computable functions. We hope to show that the “main result” here, the existence of a universal machine, is easiest to achieve over the partial recursive functions, and moreover the usage of typeclasses for Gödel numbering provides a rich and flexible language for discussing computability over arbitrary types.

This theory has been developed in the Lean theorem prover, a relatively young proof system based on dependent type theory with inductive types, written primarily by Leonardo de Moura at Microsoft Research (de Moura et al., 2015). The full development is available in the mathlib standard library (Carneiro et al., 2018), and a snapshot of the library as of this publication is available at (Carneiro, 2018). In section 2 we describe our extensible approach to Gödel numbering, in section 3 we look at primitive recursive functions, extended to partial recursive functions in section 4. Section 5 deals with the universal partial recursive function and its properties, including its application to unsolvability of the halting problem.

## 2. Encodable sets

As mentioned in the introduction, we would like to support some level of formal ambiguity when encoding problems, such as defining languages as subsets of vs. subsets of , or even where is some finite or countable alphabet. Similarly, we would like to talk about primitive recursive functions of type , or the partial recursive function that evaluates a partial function specified by a code (see section 5).

Unfortunately it is not enough just to know that these types are countable, because while the exact bijection to doesn’t matter too much, it is important that we not use one bijection in a proof and a different bijection in the next proof, because these differ by an automorphism of

which may not be computable. (For example, if we encode the halting Turing machines as even numbers and the non-halting ones as odd numbers, and then the halting problem becomes trivial.) In complexity theory it becomes even more important that these bijections are “simple” and do not smuggle in any additional computational power.

To support these uses, we make use of Lean’s typeclass resolution mechanism, which is a way of inferring structure on types in a syntax-directed way. The major advantage of this approach is that it allows us to fix a uniform encoding that we can then apply to all types constructed from a few basic building blocks, which avoids the multiple encoding problem, and still lets us use the types we would like to (or even construct new types like whose explicit structure reflects the inductive construction of partial recursive functions, rather than the encoding details).

At the core of this is the function , and its inverse forming a bijection (see figure 1). There is very little we need about these functions except their definability, and that and the two components of are primitive recursive.

We say that a type is encodable if we have a function , and a partial inverse which correctly decodes any value in the image of . Here is the type consisting of the elements for , and an extra element  representing failure or undefinedness. If the function happens to be total (that is, never returns ), then is called denumerable. Importantly, these notions are “data” in the sense that they impose additional structure on the type – there are nonequivalent ways for a type to be , and we will want these properties to be inferred in a consistent way.

Classically, an  instance on is just an injection to , and a denumerable instance is just a bijection to . But these notions have additional constructive import, and they lie in the executable fragment of Lean, meaning that one can actually run these encoding functions on concrete values of the types, i.e. we can evaluate .

## 3. Primitive recursive functions

The traditional definition of primitive recursive functions looks something like this:

###### Definition 3.1 ().

The primitive recursive functions are the least subset of functions satisfying the following conditions:

• The function is prim. rec.

• The function is prim. rec.

• The function is prim. rec. for each .

• If and for are prim. rec., then so is the -way composition .

• If and are prim. rec., then the function defined by

 h(→z,0) =f(→z) h(→z,n+1) =g(→z,n,h(→z,n))

is also prim. rec.

Lean is quite good at expressing these kinds of constructions as inductively defined predicates. See figure 3 for the definition that appears in Lean. But there is an important difference in this formulation: rather than dealing with -ary functions, we utilize the pairing function on to write everything as a function with only one argument. This drastically simplifies the composition rule to just the usual function composition, and in the primitive recursion rule we need only one auxiliary parameter rather than . Then the projection functions are replaced with the left and right cases for the components of , and in order to express composition with higher arity functions, we need the pair constructor to explicitly form the map . (See section 3.1 if you think this definition is a cheat.)

Now that we have a definition of primitive recursive on , we would like to extend it to other types using the  mechanism discussed in section 2. There is a problem though, because given an arbitrary  instance we can combine the with the function defined on induced by this  instance to form a new function , which may or may not be primitive recursive. If it is not then it “brings new power” to the primitive recursive functions and so it isn’t a pure translation of primrec to other types. To resolve this we define ”primcodable α” to mean exactly that has an  instance for which this composition is primitive recursive. All of the  constructions we have discussed (indeed, all those defined in Lean) are , so this is not a severe restriction.

Now we can say that a function between arbitrary primcodable types is primitive recursive if when we pass through the and functions we get a primitive recursive function on :

def primrec  β} [primcodable α]
[primcodable β] (f : α  β) : Prop :=
nat.primrec  n, encode ((decode α n).map f))

Notation note: The dot notation expands to ”(option.map f (decode α n))”, which lifts to a function on option types before applying it to . The result has type , which has an  function because does.

Now we are in a position to recover the textbook definition of primitive recursive, because is , so we have the language to say that is primitive recursive, and indeed this is equivalent to definition 3.1.

But we can now say much more: The function is primitive recursive because it is just encoded as . The constant function is primitive recursive because it encodes to some constant function (composed with a function that filters out values not in the domain ). The composition of prim. rec. functions on arbitrary types is prim. rec. The pair of primitive recursive functions , where and , is primitive recursive.

Indeed all the usual basic operations on inductive types like sum, prod, and option are primitive recursive. We define convenient syntax for prim. rec. binary functions (a common case), expressed by uncurrying to , and for primitive recursive predicates , which are decidable predicates which are primitive recursive when coerced to (which is ).

The big caveat comes in theorems like the following:

If and are  types and and are prim. rec., then the function defined by

 ha0 =fa ha(n+1) =gan(han)

is also prim. rec.

This is of course just the generalization of the primitive recursion clause to arbitrary types, but it requires that the target type be , which means in particular that it is countable, so we can’t define an object of function type by recursion. (The universal partial recursive function will give us a way to get around this later.) But this is in some sense “working as intended”, since this is exactly why the Ackermann function

 A(0,n) =n+1 A(m+1,0) =A(m,1) A(m+1,n+1) =A(m,A(m+1,n))

is not primitive recursive. In addition to allowing such higher types in recursion, Lean’s recursor for the natural numbers is dependent, but there is no reasonable way to incorporate dependencies in  types, so we just use types when necessary.

One other  type we have not yet discussed is , the type of finite lists of values of type . The and functions are defined recursively via the bijection . Even without using this instance, we can prove that any function is prim. rec. when is finite, by getting the elements of as a list, and writing as the composition of an index lookup of in and the th element function in to map to .

But once we allow the list itself to be an input, we get some more interesting possibilities. In particular, the function , which gets an element from a list by index (or returns  if the index is out of bounds), is primitive recursive, and this fact expresses an equivalent of Gödel’s sequence number theorem (Gödel, 1931) (for a different encoding than Gödel’s original encoding). From this we can prove the following “strong recursion” theorem:

theorem nat_strong_rec
(f : α    σ)
{g : α  list σ  option σ}
(hg : primrec g)
(H :  a n, g a ((list.range n).map (f a)) =
some (f a n)) : primrec f

Ignoring the parameter , the main hypothesis says essentially that , where the first values of have been written in a list (and the length of the list tells what value of we are constructing). The reason has optional return value is to allow for it to fail when the input is not valid.

Once we have lists, the dependent type is just a subtype of , so it has an easy

instance, and most of the vector functions follow from their list counterparts. Similarly for functions

, which are isomorphic to .

### 3.1. The textbook definition

Now that we have a proper theory, we can return to the question of how to show equivalence to definition 3.1. We do this by defining ”nat.primrec’ : ∀ n, (vector ℕ n → ℕ) → Prop” with only 5 clauses matching definition 3.1. It is easy to show at this point that implies , since all of the functions appearing in definition 3.1 are known to be primitive recursive. For the converse, most of the clauses are easy, but our earlier cheat was to axiomatize that mkpair and unpair are primitive recursive, even though the definition involves addition, multiplication and case analysis in mkpair and even square root in the inverse function:

def unpair (n : ℕ) :  ×  :=
let s := sqrt n in
if n - s*s < s then (n - s*s, s)
else (s, n - s*s - s)

(Here ”sqrt : ℕ → ℕ” is actually the function .) So we must show that all these operations are primitive recursive by the textbook definition. The square root case is not as difficult as it may sound; since it grows by at most 1 at each step we can define it by primitive recursion as

 ⌊√0⌋ =0 ⌊√n+1⌋ =if n+1

This alternate basis for primrec is useful for reductions, for example, to show that some other basis for computation like Turing machines can simulate every primitive recursive function.

## 4. Partial recursive functions

The partial recursive functions are an extension of primitive recursive functions by adding an operator , where is a predicate, which denotes the least value of such that is true. Intuitively, this value is found by starting at 0 and testing ever larger values until a satisfying instance is found. This function is not always defined, in the sense that even when all the inputs are well typed it may not return a value – it can result in an “infinite loop”.

So before we tackle the partial recursive functions we must understand partiality itself, and in particular how to represent unbounded computation, computably, in a proof assistant that can only represent terminating computations (Lean is based on dependent type theory, which is strongly normalizing, so all expression evaluation terminates).

We have already discussed the type for representing a possible failure state, but nontermination is a slightly different kind of “failure” in that you can’t tell that you have failed while executing the program, and this difference makes itself known in the type system.

To address this distinction, we introduce the type:

def part  : Type*) :=  p : Prop, p  α

A value of type is a nondecidable optional value, in the sense that there is not necessarily a decision procedure for determining if the contains a value, but if it does then you can extract the value using the function component. This type has a monad structure, as follows:

 pure:α→partα purea=⟨true,λ_.a⟩ bind:partα→(α→partβ)→partβ bind⟨p,f⟩g=⟨∃h:p,(g(fh))1,λh.(g(fh1))2h2⟩

Also, there is an element representing an undefined value. We can map by sending to and to , and assuming the law of excluded middle we can also define an inverse map and show , but this breaks the computational interpretation of .

The definition of bind, also written in Haskell style as the infix operator >>=, is a bit complicated to write but is “exactly what you would expect” in terms of its behavior. Given a partial value and a function , the resulting partial value is defined when is defined to be some , and is defined, in which case it evaluates to .

It is convenient to abstract from the definition to a relational version, where means – that is, says that is defined and equal to . With this definition the bind operator can be much more easily expressed by the theorem

 b∈p {>>=} f↔∃a∈p,b∈fa

which is shared with many other collection-based monad structures. Because they come up often, we will use the notation for the type of all partial functions from to .

One important function that is (constructively) definable on this type is fix, which has the following properties:

 b∈fixfa↔inlb∈fa∨∃a′,inra′∈fa∧b∈fixfa′

Given an input , it evaluates to get either or . In the first case it returns , and in the second case it starts over with the value . The function is defined when this process eventually terminates with a value, if we assume this then we can construct the value that returns. So even though Lean’s type theory does not permit unbounded recursion, by working in this partiality monad we get computable unbounded recursion.

The minimization operator , which finds the smallest value satisfying the (partial) boolean predicate can be defined in terms of fix as follows:

 rfindp=fix(λn.if pn% then inln else inr(n+1))0

The definition nat.partrec is given in figure 4. The first 7 cases are almost the same as those of primrec, except that we must now worry about partiality in all the operations that build functions. So for example λ n, mkpair <$> f n <*> g n is the function except that if the computation of either or fails to return a value, then this is not defined. (In other words, this operation is “strict” in both arguments). Similarly, the composition is now expressed as λ n, g n >>= f, which says that should be evaluated first, and if it is defined and equals , then is the resulting value. The interesting case is the last one, which incorporates the rfind function on . Ignoring partiality, it says that is partial recursive if is. This is of course the source of the partiality – all the other constructors produce total functions from total functions but this can be partial if the function is never zero. Although this defines a class of partial functions, some of the functions happen to be total anyway, and we call a total partial-recursive function computable. It is an easy fact that every primitive recursive function is computable. As before, we can compose with and to extend these definitions to any type. Although we could define an analogue of using computable functions instead of primitive recursive functions, since we want to stick to simple encodings (usually not just primitive recursive but polynomial time), and we already have encodings for all the important types, so is enough. One aspect of this definition which is not obviously a problem until one works out all the details is the strictness of the prec constructor. In conventional notation, it says that if and are partial recursive functions, then so is the function defined by  h(a,0) =f(a) h(a,n+1) =g(a,n,h(a,n)). Importantly, is only defined if is defined and is defined. It does not matter if does not make use of the argument at all, for example if it is the first projection. This comes up in the definition of the lazy conditional , defined when , by:  ifz[f,g](a,n)={f(a)if n=0g(a)if n≠0, where in particular regardless of whether is defined. This is the basis of “if statements” that resemble execution paths in a computer – we need a way to choose which subcomputation to perform, without needing to evaluate both. The usual way of implementing is to use primitive recursion on the argument , using in the zero case and in the successor case. But because of the strictness constraint, this will result in (where represents an undefined value or infinite loop). In fact, we won’t have the tools to solve this problem until section 5.3. ## 5. Universality ### 5.1. Codes for functions Because partrec is an inductive predicate, there is a natural data type that corresponds to proofs that a function is partial recursive: inductive code : Type | zero : code | succ : code | left : code | right : code | pair : code code code | comp : code code code | prec : code code code | rfind : code code We can define the semantics of a code via an “evaluation” function that takes a code and an input value in and produces a partial value. def eval : code →. | zero := pure 0 | succ := succ | left := λ n, n.unpair.1 | right := λ n, n.unpair.2 | (pair cf cg) := λ n, mkpair <$> eval cf n <*> eval cg n
| (comp cf cg) := λ n, eval cg n >>= eval cf
| (prec cf cg) := unpaired  a n,
nat.rec_on n (eval cf a)  y IH, IH >>= λ i,
eval cg (mkpair a (mkpair y i))))
| (rfind cf)  := unpaired  a m,
(rfind  n,  m, m = 0) <\$>
eval cf (mkpair a (n + m)))).map (+ m))

Then it is a simple consequence of the definition that is partial recursive iff there exists a code such that .

Note: The constructor is a slightly modified version of which is easier to use in evaluation:

 rfind′f(a,m)=(μn.f(a,n+m)=0)+m,

which can be expressed in terms of as:

 rfindfa =rfind′f(a,0) rfind′f(a,m) =rfind(λx.f(x1,x2+m))a+m

So we can pretend that partrec was defined with a case for instead of since it yields the same class of functions.

Now the key fact is that  is denumerable. Concretely, we can encode it using a combination of the tricks we used to encode sums, products and option types, that is,

 encode(zero) =0 encode(succ) =1 encode(left) =2 encode(right) =3 encode(pairc1c2) =4⋅(encodec1,encodec2)+4 encode(compc1c2) =4⋅(encodec1,encodec2)+5 encode(precc1c2) =4⋅(encodec1,encodec2)+6 encode(rfind′c) =4⋅(encodec)+7

where is the pairing function from figure 1. (We could have used a more permissive encoding, but this has the advantage that it is a bijection to , which makes the proof that this is a  type trivial.)

Having shown that the type is  we can now start to show that functions on codes are primitive recursive. In particular, all the constructors are primitive recursive, the recursion principle preserves primitive recursiveness and computability (not partial recursiveness, because of the as-yet unresolved problem with ), and we can prove that these simple functions on codes are primitive recursive:

 const:N→code eval(consta)n=a curry:code→N→code eval(currycm)n=evalc(m,n)

In particular, the rather understated fact that is primitive recursive is a form of the -- theorem of recursion theory.

### 5.2. Resource-bounded evaluation

We have one more component before the universality theorem. We define a “resource-bounded” version of , namely where . (In the formal text it is called evaln.) This function is total – we have a definite failure condition this time, unlike itself, which can diverge. There are multiple ways to define this function; the important part is that if then for all , and if is defined then for some . Furthermore, it is convenient to ensure that is monotonic in , and the domain of is contained in , that is, if then .

The Lean definition of evaln is given in figure 5. The details of the definition are not so important, but it is interesting to note that our “fuel” for the computation need only decrease when we evaluate a function which does not decrease the size of the program that is being computed, namely in the prec and rfind’ cases. (You may wonder why we cannot use the fact that is decreasing in the prec case to prove termination, but this is because the function is not defined by recursion on , it is by recursion on at all simultaneously.)

Because has finite domain outside which it is , we can encode the whole function as a single . Thus we can pack the function into the type , and define this by strong recursion (using the theorem nat_strong_rec mentioned in section 3), since in every case of the recursion, either decreases and remains fixed, or decreases and remains fixed.

Thus is primitive recursive (jointly in all arguments), and since where , this shows that is partial recursive. This is (more or less) Kleene’s normal form theorem – is a universal partial recursive function.

### 5.3. Applications

An easy consequence of universality are the fixed point theorems:

###### Theorem 5.1 (fixed_point).

If is computable, then there exists some code such that .

###### Proof.

Consider the function defined by (using to use natural numbers as codes in ). This function is clearly partial recursive, so let . Now let such that ; then is computable so let . Then for we have:

 eval(fc)n =eval(f(curry^g^F))n =eval(F^F)n =eval(eval^F^F)n =g^Fn =eval^g(^F,n) =eval(curry^g^F)n =evalcn.

###### Theorem 5.2 (fixed_point2).

If is partial recursive, then there exists some code such that .

###### Proof.

Let , and apply theorem 5.1 to to obtain a such that . Then

 evalcn =eval(curry^fc)n =eval^f(c,n) =fcn.

We can also finally solve the problem. If and are partial recursive functions, then letting and , the function

 c(n)={^fif n=0^gif n≠0

is primitive recursive (since both branches are just numbers now instead of computations that may not halt), and . More generally, this implies that we can evaluate conditionals where the condition is a computable function and the branches are partial functions. We conclude with Rice’s theorem on the noncomputability of all nontrivial properties about computable functions:

###### Theorem 5.3 (rice).

Let such that is computable. Then for any , implies (so classically ).

###### Proof.

Apply theorem 5.2 to the function to obtain a such that . (Note is decidable because it is computable.) Then if , we have for all so , hence . And if then similarly which contradicts , . ∎

The undecidability of the halting problem is a trivial corollary:

###### Theorem 5.4 (halting_problem).

The set
is not computable.

###### Proof.

Suppose it is; we can write it as where , so applying Rice’s theorem with and we have a contradiction from and . ∎

## 6. Future Work

### 6.1. Equivalences

The most obvious next step is to show the equivalence of other formulations of computable functions: Turing machines, -calculus, Minsky register machines, C… the space of options is very wide here and it is easy to get carried away. Furthermore, if one holds to the thesis that partial recursive functions are the quickest lifeline out of the Turing tarpit, then one must acknowledge that this is to jump right back in, where the hardest part of the translation is fiddling with the intricacies of the target language. We are still looking for ways to do this in a more abstract way that avoids the pain.

### 6.2. Complexity theory

As mentioned in the introduction, this project was explicitly for the purpose of setting up the foundations of complexity theory. One of the often stated reasons for choosing Turing machines over other models of computation like primitive recursion is because they have a better time model. We would argue that this is not true at fine grained notions of complexity (because there is often a linear multiplicative overhead for running across the tape compared to memory models). Moreover, in the other direction we find that, at least in the case of polynomial time complexity, there are methods such as bounded recursion on notation (Hofmann, 2000) that generalize primitive recursion methods to the definition of polynomial time computable functions, which can be used to define , and -hardness at least; we are hopeful that these methods can extend to other classes, possibly by hybridizing with other models of computation as well.

###### Acknowledgements.
This material is based upon work supported by AFOSR grant FA9550-18-1-0120 and a grant from the Sloan Foundation. I would like to thank my advisor Jeremy Avigad for his support and encouragement, and for his reviews of early drafts of this work.

## References

• (1)
• Asperti and Ricciotti (2012) Andrea Asperti and Wilmer Ricciotti. 2012. Formalizing turing machines. In International Workshop on Logic, Language, Information, and Computation. Springer, 1–25.
• Carneiro (2018) Mario Carneiro. 2018. Formalizing computability theory, mathlib formalization.
• Carneiro et al. (2018) Mario Carneiro, Johannes Hölzl, et al. 2017–2018. The mathlib standard library. https://github.com/leanprover/mathlib. (2017–2018).
• Church (1936) Alonzo Church. 1936. An unsolvable problem of elementary number theory. American journal of mathematics 58, 2 (1936), 345–363.
• de Moura et al. (2015) Leonardo de Moura, Soonho Kong, Jeremy Avigad, Floris Van Doorn, and Jakob von Raumer. 2015. The Lean theorem prover (system description). In International Conference on Automated Deduction. Springer, 378–388.
• Gödel (1931) Kurt Gödel. 1931. Über formal unentscheidbare Sätze der Principia Mathematica und verwandter Systeme I. Monatshefte für mathematik und physik 38, 1 (1931), 173–198.
• Hofmann (2000) Martin Hofmann. 2000. Programming languages capturing complexity classes. ACM SIGACT News 31, 1 (Jan 2000), 31–42.
• Kleene (1943) Stephen Cole Kleene. 1943. Recursive predicates and quantifiers. Trans. Amer. Math. Soc. 53, 1 (1943), 41–73.
• Norrish (2011) Michael Norrish. 2011. Mechanised computability theory. In International Conference on Interactive Theorem Proving. Springer, 297–311.
• Perlis (1982) Alan J Perlis. 1982. Special feature: Epigrams on programming. ACM Sigplan Notices 17, 9 (1982), 7–13.
• Turing (1937) Alan M Turing. 1937. On computable numbers, with an application to the Entscheidungsproblem. Proceedings of the London mathematical society 2, 1 (1937), 230–265.