Algorithmic Theories of Everything

by   Juergen Schmidhuber, et al.

The probability distribution P from which the history of our universe is sampled represents a theory of everything or TOE. We assume P is formally describable. Since most (uncountably many) distributions are not, this imposes a strong inductive bias. We show that P(x) is small for any universe x lacking a short description, and study the spectrum of TOEs spanned by two Ps, one reflecting the most compact constructive descriptions, the other the fastest way of computing everything. The former derives from generalizations of traditional computability, Solomonoff's algorithmic probability, Kolmogorov complexity, and objects more random than Chaitin's Omega, the latter from Levin's universal search and a natural resource-oriented postulate: the cumulative prior probability of all x incomputable within time t by this optimal algorithm should be 1/t. Between both Ps we find a universal cumulatively enumerable measure that dominates traditional enumerable measures; any such CEM must assign low probability to any universe lacking a short enumerating program. We derive P-specific consequences for evolving observers, inductive reasoning, quantum physics, philosophy, and the expected duration of our universe.



There are no comments yet.


page 1

page 2

page 3

page 4


Equivalences between learning of data and probability distributions, and their applications

Algorithmic learning theory traditionally studies the learnability of ef...

Coding-theorem Like Behaviour and Emergence of the Universal Distribution from Resource-bounded Algorithmic Probability

Previously referred to as 'miraculous' because of its surprisingly power...

A Theory of Universal Artificial Intelligence based on Algorithmic Complexity

Decision theory formally solves the problem of rational agents in uncert...

Quines are the fittest programs: Nesting algorithmic probability converges to constructors

In this article we explore the limiting behavior of the universal prior ...

Objective and Subjective Solomonoff Probabilities in Quantum Mechanics

Algorithmic probability has shown some promise in dealing with the proba...

Generalisable Relational Reasoning With Comparators in Low-Dimensional Manifolds

While modern deep neural architectures generalise well when test data is...

A comparative study of universal quantum computing models: towards a physical unification

Quantum computing has been a fascinating research field in quantum physi...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction to Describable Universes

An object is formally describable if a finite amount of information completely describes and only . More to the point, should be representable by a possibly infinite bitstring such that there is a finite, possibly never halting program that computes and nothing but in a way that modifies each output bit at most finitely many times; that is, each finite beginning of eventually converges and ceases to change. Definitions 2.1-2.5 will make this precise, and Sections 2-3 will clarify that this constructive notion of formal describability is less restrictive than the traditional notion of computability [92], mainly because we do not insist on the existence of a halting program that computes an upper bound of the convergence time of ’s -th output bit. Formal describability thus pushes constructivism [17, 6] to the extreme, barely avoiding the nonconstructivism embodied by even less restrictive concepts of describability (compare computability in the limit [39, 65, 34] and -describability [67][56, p. 46-47]). The results in Sections 2-5

will exploit the additional degrees of freedom gained over traditional computability, while Section

6 will focus on another extreme, namely, the fastest way of computing all computable objects.

Among the formally describable things are the contents of all books ever written, all proofs of all theorems, the infinite decimal expansion of , and the enumerable “number of wisdom” [28, 80, 21, 85]. Most real numbers, however, are not individually describable, because there are only countably many finite descriptions, yet uncountably many reals, as observed by Cantor in 1873 [23]. It is easy though to write a never halting program that computes all finite prefixes of all real numbers. In this sense certain sets seem describable while most of their elements are not.

What about our universe, or more precisely, its entire past and future history? Is it individually describable by a finite sequence of bits, just like a movie stored on a compact disc, or a never ending evolution of a virtual reality determined by a finite algorithm? If so, then it is very special in a certain sense, just like the comparatively few describable reals are special.

Example 1.1 (Pseudorandom universe)

Let be an infinite sequence of finite bitstrings representing the history of some discrete universe, where represents the state of the universe at discrete time step , and the “Big Bang” (compare [72]). Suppose there is a finite algorithm that computes () from and additional information (this may require numerous computational steps of , that is, “local” time of the universe may run comparatively slowly). Assume that is not truly random but calculated by invoking a finite pseudorandom generator subroutine [3]. Then is describable because it has a finite constructive description.

Contrary to a widely spread misunderstanding, quantum physics, quantum computation (e.g., [9, 31, 64]) and Heisenberg’s uncertainty principle do not rule out that our own universe’s history is of the type exemplified above. It might be computable by a discrete process approximated by Schrödinger’s continuous wave function, where determines the “collapses” of the wave function. Since we prefer simple, formally describable explanations over complex, nondescribable ones, we assume the history of our universe has a finite description indeed.

This assumption has dramatic consequences. For instance, because we know that our future lies among the few (countably many) describable futures, we can ignore uncountably many nondescribable ones. Can we also make more specific predictions? Does it make sense to say some describable futures are necessarily more likely than others? To answer such questions we will examine possible probability distributions on possible futures, assuming that not only the histories themselves but also their probabilities are formally describable. Since most (uncountably many) real-valued probabilities are not, this assumption — against which there is no physical evidence — actually represents a major inductive bias, which turns out to be strong enough to explain certain hitherto unexplained aspects of our world.

Example 1.2 (In which universe am I?)

Let represent a property of any possibly infinite bitstring , say, if represents the history of a universe inhabited by a particular observer (say, yourself) and otherwise. According to the weak anthropic principle [24, 4], the conditional probability of finding yourself in a universe compatible with your existence equals 1. But there may be many ’s satisfying . What is the probability that , where is a particular universe satisfying ? According to Bayes,


where denotes the probability of , given knowledge of , and the denominator is just a normalizing constant. So the probability of finding yourself in universe is essentially determined by , the prior probability of .

Each prior stands for a particular “theory of everything” or TOE. Once we know something about we can start making informed predictions. Parts of this paper deal with the question: what are plausible properties of ? One very plausible assumption is that is approximable for all finite prefixes of in the following sense. There exists a possibly never halting computer which outputs a sequence of numbers at discrete times in response to input such that for each real there exists a finite time such that for all :


Approximability in this sense is essentially equivalent to formal describability (Lemma 2.1 will make this more precise). We will show (Section 5) that the mild assumption above adds enormous predictive power to the weak anthropic principle: it makes universes describable by short algorithms immensely more likely than others. Any particular universe evolution is highly unlikely if it is determined not only by simple physical laws but also by additional truly random or noisy events. To a certain extent, this will justify “Occam’s razor” (e.g., [11]) which expresses the ancient preference of simple solutions over complex ones, and which is widely accepted not only in physics and other inductive sciences, but even in the fine arts [74].

All of this will require an extension of earlier work on Solomonoff’s algorithmic probability, universal priors, Kolmogorov complexity (or algorithmic information), and their refinements [50, 82, 26, 100, 52, 54, 35, 27, 36, 77, 83, 28, 5, 29, 93, 56]. We will prove several theorems concerning approximable and enumerable objects and probabilities (Sections 2-5; see outline below). These theorems shed light on the structure of all formally describable objects and extend traditional computability theory; hence they should also be of interest without motivation through describable universes.

The calculation of the subjects of these theorems, however, may occasionally require excessive time, itself often not even computable in the classic sense. This will eventually motivate a shift of focus on the temporal complexity of “computing everything” (Section 6). If you were to sit down and write a program that computes all possible universes, which would be the best way of doing so? Somewhat surprisingly, a modification of Levin Search [53] can simultaneously compute all computable universes in an interleaving fashion that outputs each individual universe as quickly as its fastest algorithm running just by itself, save for a constant factor independent of the universe’s size. This suggests a more restricted TOE that singles out those infinite universes computable with countable time and space resources, and a natural resource-based prior measure on them. Given this “speed prior” , we will show that the most likely continuation of a given observed history is computable by a fast and short algorithm (Section 6.6).

The -based TOE will provoke quite specific prophecies concerning our own universe (Section 7.5). For instance, the probability that it will last times longer than it has lasted so far is at most . Furthermore, all apparently random events, such as beta decay or collapses of Schrödinger’s wave function of the universe, actually must exhibit yet unknown, possibly nonlocal, regular patterns reflecting subroutines (e.g., pseudorandom generators) of our universe’s algorithm that are not only short but also fast.

1.1 Outline of Main Results

Some of the novel results herein may be of interest to theoretical computer scientists and mathematicians (Sections 2-6

), some to researchers in the fields of machine learning and inductive inference (the science of making predictions based on observations, e.g.,

6-7), some to physicists (e.g., 6-8), some to philosophers (e.g., 7-8). Sections 7-8 might help those usually uninterested in technical details to decide whether they would also like to delve into the more formal Sections 2-6. In what follows, we summarize the main contributions and provide pointers to the most important theorems.

Section 2

introduces universal Turing Machines (TMs) more general than those considered in previous related work: unlike traditional TMs,

General TMs or GTMs may edit their previous outputs (compare inductive TMs [18]), and Enumerable Output Machines (EOMs) may do this provided the output does not decrease lexicographically. We will define: a formally describable object has a finite, never halting GTM program that computes such that each output bit is revised at most finitely many times; that is, each finite prefix of eventually stabilizes (Defs. 2.1-2.5); describable functions can be implemented by such programs (Def. 2.10); weakly decidable problems have solutions computable by never halting programs whose output is wrong for at most finitely many steps (Def. 2.11). Theorem 2.1 generalizes the halting problem by demonstrating that it is not weakly decidable whether a finite string is a description of a describable object (compare a related result for analytic TMs by Hotz, Vierke and Schieffer [45]).

Section 3 generalizes the traditional concept of Kolmogorov complexity or algorithmic information [50, 82, 26] of finite (the length of the shortest halting program computing ) to the case of objects describable by nonhalting programs on EOMs and GTMs (Defs. 3.2-3.4). It is shown that the generalization for EOMs is describable, but the one for GTMs is not (Theorem 3.1). Certain objects are much more compactly encodable on EOMs than on traditional monotone TMs, and Theorem 3.3 shows that there are also objects with short GTM descriptions yet incompressible on EOMs and therefore “more random” than Chaitin’s [28], the halting probability of a TM with random input, which is incompressible only on monotone TMs. This yields a natural TM type-specific complexity hierarchy expressed by Inequality (14).

Section 4 discusses probability distributions on describable objects as well as the nondescribable convergence probability of a GTM (Def. 4.14). It also introduces describable (semi)measures as well as cumulatively enumerable measures (CEMs, Def. 4.5), where the cumulative probability of all strings lexicographically greater than a given string is EOM-computable or enumerable. Theorem 4.1 shows that there is a universal CEM that dominates all other CEMs, in the sense that it assigns higher probability to any finite , save for a constant factor independent of . This probability is shown to be equivalent to the probability that an EOM whose input bits are chosen randomly produces an output starting with (Corollary 4.3 and Lemma 4.2). The nonenumerable universal CEM also dominates enumerable priors studied in previous work by Solomonoff, Levin and others [82, 100, 54, 35, 27, 36, 77, 83, 28, 56]. Theorem 4.2 shows that there is no universal approximable measure (proof by M. Hutter).

Section 5 establishes relationships between generalized Kolmogorov complexity and generalized algorithmic probability, extending previous work on enumerable semimeasures by Levin, Gács, and others [100, 54, 35, 27, 36, 56]. For instance, Theorem 5.3 shows that the universal CEM assigns a probability to each enumerable object proportional to raised to the power of the length of its minimal EOM-based description, times a small corrective factor. Similarly, objects with approximable probabilities yet without very short descriptions on GTMs are necessarily very unlikely a priori (Theorems 5.4 and 5.5). Additional suspected links between generalized Kolmogorov complexity and probability are expressed in form of Conjectures 5.1-5.3.

Section 6 addresses issues of temporal complexity ignored in the previous sections on describable universe histories (whose computation may require excessive time without recursive bounds). In Subsection 6.2, Levin’s universal search algorithm [53, 55] (which takes into account program runtime in an optimal fashion) is modified to obtain the fastest way of computing all “S-describable” universes computable within countable time (Def. 6.1, Section 6.3); uncountably many other universes are ignored because they do not even exist from a constructive point of view. Postulate 6.1 then introduces a natural resource-oriented bias reflecting constraints of whoever calculated our universe (possibly as a by-product of a search for something else): we assign to universes prior probabilities inversely proportional to the time and space resources consumed by the most efficient way of computing them. Given the resulting “speed prior ” (Def. 6.5) and past observations , Theorem 6.1 and Corollary 6.1 demonstrate that the best way of predicting a future is to minimize the Levin complexity of .

Section 7 puts into perspective the algorithmic priors (recursive and enumerable) introduced in previous work on inductive inference by Solomonoff and others [82, 83, 56, 47], as well as the novel priors discussed in the present paper (cumulatively enumerable, approximable, resource-optimal). Collectively they yield an entire spectrum of algorithmic TOEs. We evaluate the plausibility of each prior being the one from which our own universe is sampled, discuss its connection to “Occam’s razor” as well as certain physical and philosophical consequences, argue that the resource-optimal speed prior may be the most plausible one (Section 7.4), analyze the inference problem from the point of view of an observer [13, 14, 91, 99, 87, 68] evolving in a universe sampled from , make appropriate predictions for our own universe (Section 7.5), and discuss their falsifiability.

2 Preliminaries

2.1 Notation

Much but not all of the notation used here is similar or identical to the one used in the standard textbook on Kolmogorov complexity by Li and Vitányi [56].

Since sentences over any finite alphabet are encodable as bitstrings, without loss of generality we focus on the binary alphabet . denotes the empty string, the set of finite sequences over , the set of infinite sequences over , . stand for strings in . If then is the concatenation of and (e.g., if and then ). Let us order lexicographically: if precedes alphabetically (like in the example above) then we write or ; if may also equal then we write or (e.g., ). The context will make clear where we also identify with a unique nonnegative integer (e.g., string 0100 is represented by integer 10100 in the dyadic system or in the decimal system). Indices range over the positive integers, constants over the positive reals, denote functions mapping integers to integers, the logarithm with basis 2, for real . For , stands for the real number with dyadic expansion (note that for , although ). For , denotes the number of bits in , where for ; . is the prefix of consisting of the first bits, if , and otherwise (). For those that contain at least one 0-bit, denotes the lexicographically smallest satisfying ( is undefined for of the form ). We write if there exists such that for all .

2.2 Turing Machines: Monotone TMs (MTMs), General TMs (GTMs), Enumerable Output Machines (EOMs)

The standard model of theoretical computer science is the Turing Machine (TM). It allows for emulating any known computer. For technical reasons we will consider several types of TMs.

Monotone TMs (MTMs). Most current theory of description size and inductive inference is based on MTMs (compare [56, p. 276 ff]) with several tapes, each tape being a finite chain of adjacent squares with a scanning head initially pointing to the leftmost square. There is one output tape and at least two work tapes (sufficient to compute everything traditionally regarded as computable). The MTM has a finite number of internal states, one of them being the initial state. MTM behavior is specified by a lookup table mapping current state and contents of the squares above work tape scanning heads to a new state and an instruction to be executed next. There are instructions for shifting work tape scanning heads one square left or right (appending new squares when necessary), and for writing 0 or 1 on squares above work tape scanning heads. The only input-related instruction requests an input bit determined by an external process and copies it onto the square above the first work tape scanning head. There may or may not be a halt instruction to terminate a computation. Sequences of requested input bits are called self-delimiting programs because they convey all information about their own length, possibly causing the MTM to halt [54, 35, 27], or at least to cease requesting new input bits (the typical case in this paper). MTMs are called monotone because they have a one-way write-only output tape — they cannot edit their previous output, because the only ouput instructions are: append a new square at the right end of the output tape and fill it with 0/1.

General TMs (GTMs). GTMs are like MTMs but have additional output instructions to edit their previous output. Our motivation for introducing GTMs is that certain bitstrings are compactly describable on nonhalting GTMs but not on MTMs, as will be seen later. This has consequences for definitions of individual describability and probability distributions on describable things. The additional instructions are: (a) shift output scanning head right/left (but not out of bounds); (b) delete square at the right end of the output tape (if it is not the initial square or above the scanning head); (c) write 1 or 0 on square above output scanning head. Compare Burgin’s inductive TMs and super-recursive algorithms [18, 19].

Enumerable Output Machines (EOMs). Like GTMs, EOMs can edit their previous output, but not such that it decreases lexicographically. The expressive power of EOMs lies in between those of MTMs and GTMs, with interesting computability-related properties whose analogues do not hold for GTMs. EOMs are like MTMs, except that the only permitted output instruction sequences are: (a) shift output tape scanning head left/right unless this leads out of bounds; (b) replace bitstring starting above the output scanning head by the string to the right of the scanning head of the second work tape, readjusting output tape size accordingly, but only if this lexicographically increases the contents of the output tape. The necessary test can be hardwired into the finite TM transition table.

2.3 Infinite Computations, Convergence, Formal Describability

Most traditional computability theory focuses on properties of halting programs. Given an MTM or EOM or GTM with halt instruction and , we write


for “ computes on and halts”. Much of this paper, however, deals with programs that never halt, and with TMs that do not need halt instructions.

Definition 2.1 (Convergence)

Let denote the input string or program read by TM T. Let denote T’s finite output string after instructions. We say that and ’s output stabilize and converge towards iff for each satisfying there exists a postive integer such that for all : and . Then we write


Although each beginning or prefix of eventually becomes stable during the possibly infinite computation, there need not be a halting program that computes an upper bound of stabilization time, given any and prefix size. Compare the concept of computability in the limit [39, 65, 34] and [41, 63].

Definition 2.2 (TM-Specific Individual Describability)

Given a TM T, an is T-describable or T-computable iff there is a finite such that .

Objects with infinite shortest descriptions on are not -describable.

Definition 2.3 (Universal TMs)

Let denote a set of TMs. has a universal element if there is a TM such that for each there exists a constant string (the compiler) such that for all possible programs , if then .

Definition 2.4 (M, E, G)

Let denote the set of MTMs, denote the set of EOMs, denote the set of GTMs.

all have universal elements, according to the fundamental compiler theorem (for instance, a fixed compiler can translate arbitrary LISP programs into equivalent FORTRAN programs).

Definition 2.5 (Individual Describability)

Let denote a set of TMs with universal element . Some is C-describable or C-computable if it is -describable. E-describable strings are called enumerable. G-describable strings are called formally describable or simply describable.

Example 2.1 (Pseudorandom universe based on halting problem)

Let be a universe history in the style of Example 1.1. Suppose its pseudorandom generator’s -th output bit is 1 if the -th program of an ordered list of all possible programs halts, and 0 otherwise. Since is describable, is too. But there is no halting algorithm computing for all , otherwise the halting problem would be solvable, which it is not [92]. Hence in general there is no computer that outputs and only without ever editing some previously computed history.

Definition 2.6 (Always converging TMs)

TM always converges if for all of its possible programs there is an such that .

For example, MTMs and EOMs converge always. GTMs do not.

Definition 2.7 (Approximability)

Let denote a real number, . is approximable by TM if there is a such that for each real there exists a such that

for all times . is approximable if there is at least one GTM as above — compare (2).

Lemma 2.1

If is approximable, then is describable, and vice versa.

2.4 Formally Describable Functions

Much of the traditional theory of computable functions focuses on halting programs that map subsets of

to subsets of . The output of a program that does not halt is usually regarded as undefined, which is occasionally expressed by notation such as . In this paper, however, we will not lump together all the possible outputs of nonhalting programs onto a single symbol “undefined.” Instead we will consider mappings from subsets of to subsets of , sometimes from to .

Definition 2.8 (Encoding )

Encode as a self-delimiting input for an appropriate TM, using


bits as follows: write in binary notation, insert a “0” after every “0” and a “1” after every “1,” append “01” to indicate the end of the description of the size of the following string, then append .

For instance, gets encoded as .

Definition 2.9 (Recursive Functions)

A function is recursive if there is a TM using the encoding 2.8 such that for all .

Definition 2.10 (Describable Functions)

Let denote a TM using the encoding of Def. 2.8. A function is -describable if for all . Let denote a set of TMs using encoding 2.8, with universal element . is C-describable or C-computable if it is -computable. If the above is universal among the GTMs with such input encoding (see Def. 2.3) then is describable.

Compare functions in the arithmetic hierarchy [67] and the concept of -describability, e.g., [56, p. 46-47].

2.5 Weak Decidability and Convergence Problem

Traditionally, decidability of some problem class implies there is a halting algorithm that prints out the answer, given a problem from the class. We now relax the notion of decidability by allowing for infinite computations on EOMs or GTMs whose answers converge after finite yet possibly unpredictable time. Essentially, an answer needs to be correct for almost all the time, and may be incorrect for at most finitely many initial time steps (compare computability in the limit [41, 39, 65, 34] and super-recursive algorithms [18, 19]).

Definition 2.11 (Weak decidability)

Consider a characteristic function : if satisfies a certain property, and otherwise. The problem of deciding whether or not some satisfies that property is weakly decidable if is describable (compare Def. 2.10).

Example 2.2

Is a given string a halting program for a given MTM? The problem is not decidable in the traditional sense (no halting algorithm solves the general halting problem [92]), but weakly decidable and even E-decidable, by a trivial algorithm: print “0” on first output square; simulate the MTM on work tapes and apply it to , once it halts after having read no more than bits print “1” on first output square.

Example 2.3

It is weakly decidable whether a finite bitstring is a program for a given TM. Algorithm: print “0”; feed bitwise into the internally simulated TM whenever it requests a new input bit; once the TM has requested bits, print “1”; if it requests an additional bit, print “0”. After finite time the output will stabilize forever.

Theorem 2.1 (Convergence Problem)

Given a GTM, it is not weakly decidable whether a finite bitstring is a converging program, or whether some of the output bits will fluctuate forever.

Proof. A proof conceptually quite similar to the one below was given by Hotz, Vierke and Schieffer [45] in the context of analytic TMs [25] derived from R-Machines [10] (the alphabet of analytic TMs is real-valued instead of binary). Version 1.0 of this paper [75] was written without awareness of this work. Nevertheless, the proof in Version 1.0 is repeated here because it does serve illustrative purposes.

In a straightforward manner we adapt Turing’s proof of the undecidability of the MTM halting problem [92], a reformulation of Gödel’s celebrated result [38], using the diagonalization trick whose roots date back to Cantor’s proof that one cannot count the real numbers [23]. Let us write if there is a such that . Let us write if ’s output fluctuates forever in response to (e.g., by flipping from 1 to zero and back forever). Let be an effective enumeration of all GTMs. Uniquely encode all pairs of finite strings in as finite strings . Suppose there were a GTM U such that (*): for all : if , and otherwise. Then one could construct a GTM with if , and otherwise. Let be the index of , then if , otherwise . By (*), however, if , and if . Contradiction.

3 Complexity of Constructive Descriptions

Throughout this paper we focus on TMs with self-delimiting programs [52, 54, 35, 27]. Traditionally, the Kolmogorov complexity [50, 82, 26] or algorithmic complexity or algorithmic information of is the length of the shortest halting program computing :

Definition 3.1 (Kolmogorov Complexity )

Fix a universal MTM or EOM or GTM U with halt instruction, and define


Let us now extend this to nonhalting GTMs.

3.1 Generalized Kolmogorov Complexity for EOMs and GTMs

Definition 3.2 (Generalized )

Given any TM T, define

Compare Schnorr’s “process complexity” for MTMs [77, 94].

Definition 3.3 ( based on Invariance Theorem)

Consider Def. 2.4. Let denote a set of TMs with universal TM (). We drop the index , writing

This is justified by an appropriate Invariance Theorem [50, 82, 26]: there is a positive constant such that for all , since the size of the compiler that translates arbitrary programs for into equivalent programs for does not depend on .

Definition 3.4 ()

Given TM and , define


Consider Def. 2.4. If denotes a set of TMs with universal TM , then define

is a generalization of Schnorr’s [77] and Levin’s [52] complexity measure for MTMs.

Describability issues. is not computable by a halting program [50, 82, 26], but obviously -computable or describable; the with is even enumerable. Even is describable, using the following algorithm:

Run all EOM programs in “dovetail style” such that the -th step of the -th program is executed in the -th phase (); whenever a program outputs , place it (or its prefix read so far) in a tentative list of -computing programs or program prefixes; whenever an element of produces output , delete it from ; whenever an element of requests an additional input bit, update accordingly. After every change of

replace the current estimate of

by the length of the shortest element of . This estimate will eventually stabilize forever.

Theorem 3.1

is not describable.

Proof. Identify finite bitstrings with the integers they represent. If were describable then also


where is any fixed recursive function, and also


Since the number of descriptions with cannot exceed , but the number of strings with equals , most cannot be compressed by more than bits; that is, for most . From (9) we therefore obtain for large enough , because picks out one of the incompressible . However, obviously we also would have , using the encoding of Def. 2.8. Contradiction for quickly growing with low complexity, such as .

3.2 Expressiveness of EOMs and GTMs

On their internal work tapes MTMs can compute whatever GTMs can compute. But they commit themselves forever once they print out some bit. They are ill-suited to the case where the output may require subsequent revision after time intervals unpredictable in advance — compare Example 2.1. Alternative MTMs that print out sequences of result updates (separated by, say, commas) would compute other things besides the result, and hence not satisfy the “don’t compute anything else” aspect of individual describability. Recall from the introduction that in a certain sense there are uncountably many collectively describable strings, but only countably many individually describable ones.

Since GTMs may occasionally rewrite parts of their output, they are computationally more expressive than MTMs in the sense that they permit much more compact descriptions of certain objects. For instance, is unbounded, as will be seen next. This will later have consequences for predictions, given certain observations.

Theorem 3.2

is unbounded.

Proof. Define


where is recursive. Then (where is the size of the minimal halting description of function ), but for sufficiently large — compare the proof of Theorem 3.1. Therefore for infinitely many and quickly growing with low complexity.

3.2.1 EOMs More Expressive Than MTMs

Similarly, some are compactly describable on EOMs but not on MTMs. To see this, consider Chaitin’s , the halting probability of an MTM whose input bits are obtained by tossing an unbiased coin whenever it requests a new bit [28]. is enumerable (dovetail over all programs and sum up the contributions of the halting ), but there is no recursive upper bound on the number of instructions required to compute , given . This implies [28] and also . It is easy to see, however, that on nonhalting EOMs there are much more compact descriptions:


that is, there is no upper bound of


3.2.2 GTMs More Expressive Than EOMs — Objects Less Regular Than

We will now show that there are describable strings that have a short GTM description yet are “even more random” than Chaitin’s Omegas, in the sense that even on EOMs they do not have any compact descriptions.

Theorem 3.3

For all there are with

That is, is unbounded.

Proof. For and universal EOM define


First note that the dyadic expansion of is EOM-computable or enumerable. The algorithm works as follows:

Algorithm A: Initialize the real-valued variable by 0, run all possible programs of EOM dovetail style such that the -th step of the -th program is executed in the -th phase; whenever the output of a program prefix starts with some satisfying for the first time, set ; henceforth ignore continuations of .

approximates from below in enumerable fashion — infinite are not worrisome as must only read a finite prefix of to observe if the latter holds indeed. We will now show that knowledge of , the first bits of , allows for constructing a bitstring with when has low complexity.

Suppose we know . Once algorithm A above yields we know that no programs with will contribute any more to V. Choose the shortest satisfying , where is the lexicographically smallest previously computed by algorithm A such that . Then cannot be among the strings T-describable with fewer than bits. Using the Invariance Theorem (compare Def. 3.3) we obtain .

While prefixes of are greatly compressible on EOMs, is not. On the other hand, is compactly G-describable: . For instance, choosing a low-complexity , we have .

The discussion above reveils a natural complexity hierarchy. Ignoring additive constants, we have


where for each “” relation above there are which allow for replacing “” by “.”

4 Measures and Probability Distributions

Suppose represents the history of our universe up until now. What is its most likely continuation

? Bayes’ theorem yields


where is the probability of , given knowledge of , and


is a normalizing factor. The most likely continuation is determined by , the prior probability of — compare the similar Equation (1). Now what are the formally describable ways of assigning prior probabilities to universes? In what follows we will first consider describable semimeasures on , then probability distributions on .

4.1 Dominant and Universal (Semi)Measures

The next three definitions concerning semimeasures on are almost but not quite identical to those of discrete semimeasures [56, p. 245 ff] and continuous semimeasures [56, p. 272 ff] based on the work of Levin and Zvonkin [100].

Definition 4.1 (Semimeasures)

A (binary) semimeasure is a function that satisfies:


where is a function satisfying .

A notational difference to the approach of Levin [100] (who writes ) is the explicit introduction of . Compare the introduction of an undefined element by Li and Vitanyi [56, p. 281]. Note that . Later we will discuss the interesting case , the a priori probability of .

Definition 4.2 (Dominant Semimeasures)

A semimeasure dominates another semimeasure if there is a constant such that for all

Definition 4.3 (Universal Semimeasures)

Let be a set of semimeasures on . A semimeasure is universal if it dominates all .

In what follows, we will introduce describable semimeasures dominating those considered in previous work ([100], [56, p. 245 ff, p.272 ff]).

4.2 Universal Cumulatively Enumerable Measure (CEM)

Definition 4.4 (Cumulative measure )

For semimeasure on define the cumulative measure :


Note that we could replace “” by “” in the definition above. Recall that denotes the smallest with ( may be undefined). We have

Definition 4.5 (CEMs)

Semimeasure is a CEM if is enumerable for all .

Then is the difference of two finite enumerable values, according to (20).

Theorem 4.1

There is a universal CEM.

Proof. We first show that one can enumerate the CEMs, then construct a universal CEM from the enumeration. Check out differences to Levin’s related proofs that there is a universal discrete semimeasure and a universal enumerable semimeasure [100, 52], and Li and Vitányi’s presentation of the latter [56, p. 273 ff], attributed to J. Tyszkiewicz.

Without loss of generality, consider only EOMs without halt instruction and with fixed input encoding of according to Def. 2.8. Such EOMs are enumerable, and correspond to an effective enumeration of all enumerable functions from to . Let denote the -th EOM in the list, and let denote its output after instructions when applied to . The following procedure filters out those that already represent CEMs, and transforms the others into representations of CEMs, such that we obtain a way of generating all and only CEMs.

FOR all DO in dovetail fashion:

START: let and and denote variable functions on . Set , and for all other . Define for undefined . Let denote a string variable.


(1) Lexicographically order and rename all with :
(2) FOR down to 1 DO:

(2.1) Systematically search for the smallest such that AND if ; set .

(3) For all satisfying , set . For all with , set . For all with , set .

If indeed represents a CEM then each search process in (2.1) will terminate, and the will enumerate the from below, and the and will approximate the true and , respectively, not necessarily from below though. Otherwise there will be a nonterminating search at some point, leaving from the previous loop as a trivial CEM. Hence we can enumerate all CEMs, and only those. Now define (compare [52]):

and is an enumerable constant, e.g., or (note a slight difference to Levin’s classic approach which just requests ). Then dominates every by Def. 18, and is a semimeasure according to Def. 4.1:


also is a CEM by Def. 4.5, because


is enumerable, since and are (dovetail over all ). That is, is approximable as the difference of two enumerable finite values, according to Equation (20).

4.3 Approximable and Cumulatively Enumerable Distributions

To deal with infinite , we will now extend the treatment of semimeasures on in the previous subsection by discussing probability distributions on .

Definition 4.6 (Probabilities)

A probability distribution on satisfies

Definition 4.7 (Semidistributions)

A semidistribution on satisfies

Definition 4.8 (Dominant Distributions)

A distribution dominates another distribution if there is a constant such that for all :

Definition 4.9 (Universal Distributions)

Let be a set of probability distributions on . A distribution is universal if for all : dominates .

Theorem 4.2

There is no universal approximable semidistribution.

Proof. The following proof is due to M. Hutter (personal communications by email following a discussion of enumerable and approximable universes on 2 August 2000 in Munich). It is an extension of a modified111 As pointed out by M. Hutter (14 Nov. 2000, personal communication) and even earlier by A. Fujiwara (1998, according to P. M. B. Vitányi, personal communication, 21 Nov. 2000), the proof on the bottom of p. 249 of [56] should be slightly modified. For instance, the sum could be taken over . The sequence of inequalities is then satisfiable by a suitable sequence, since . The basic idea of the proof is correct, of course, and very useful. proof [56, p. 249 ff] that there is no universal recursive semimeasure.

It suffices to focus on . Identify strings with integers, and assume is a universal approximable semidistribution. We construct an approximable semidistribution that is not dominated by , thus contradicting the assumption. Let be a sequence of recursive functions converging to . We recursively define a sequence converging to . The basic idea is: each contribution to is the sum of consecutive probabilities ( increasing). Define ; . Let be such that . Define as the element with smallest (largest ) probability in this interval, i.e., . If is less than twice and is more than half of , set . Otherwise set for and for . is obviously total recursive and non-negative. Since , we have

Summing over we observe that if is a semidistribution, so is . From some on, changes by less than a factor of 2 since converges to . Hence remains unchanged for and converges to . But , violating our universality assumption .

Definition 4.10 (Cumulatively Enumerable Distributions – CEDs)

A distribution on is a CED if is enumerable for all , where


4.4 TM-Induced Distributions and Convergence Probability

Suppose TM ’s input bits are obtained by tossing an unbiased coin whenever a new one is requested. Levin’s universal discrete enumerable semimeasure [52, 27, 35] or semidistribution is limited to and halting programs:

Definition 4.11 (m)

Note that if universal. Let us now generalize this to and nonhalting programs:

Definition 4.12 ()

Suppose ’s input bits are obtained by tossing an unbiased coin whenever a new one is requested.


where .

Program Continua. According to Def. 4.12, most infinite have zero probability, but not those with finite programs, such as the dyadic expansion of . However, a nonvanishing part of the entire unit of probability mass is contributed by continua of mostly incompressible strings, such as those with cumulative probability computed by the following class of uncountably many infinite programs with a common finite prefix : “repeat forever: read and print next input bit.” The corresponding traditional measure-oriented notation for

would be

For notational simplicity, however, we will continue using the sign to indicate summation over uncountable objects, rather than using a measure-oriented notation for probability densities. The reader should not feel uncomfortable with this — the theorems in the remainder of the paper will focus on those with ; density-like nonzero sums over uncountably many bitsrings, each with individual measure zero, will not play any critical role in the proofs.

Definition 4.13 (Universal TM-Induced Distributions )

If denotes a set of TMs with universal element , then we write


We have for , the subset of C-describable . The attribute universal is justified, because of the dominance , due to the Invariance Theorem (compare Def. 3.3).

Since all programs of EOMs and MTMs converge, and are proper probability distributions on . For instance, . , however, is just a semidistribution. To obtain a proper probability distribution , one might think of normalizing by the convergence probability :

Definition 4.14 (Convergence Probability)

Given GTM T, define


Describability issues. Uniquely encode each TM as a finite bitstring, and identify with the corresponding sets of bitstrings. While the function