 # Approximations of Kolmogorov Complexity

In this paper we show that the approximating the Kolmogorov complexity of a set of numbers is equivalent to having common information with the halting sequence. The more precise the approximations are, and the greater the number of approximations, the more information is shared with the halting sequence. An encoding of the 2^N unique numbers and their Kolmogorov complexities contains at least >N mutual information with the halting sequence. We also provide a generalization of the "Sets have Simple Members" theorem to conditional complexity.

## Authors

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

The Kolmogorov complexity of a string , , is the size of the smallest program that outputs with respect to a universal prefix-free program. It is a well known fact that Kolmogorov complexity is uncomputable (see [Kol65] and [Sol64]). In fact any computable function that is not greater than is bounded by a constant. This is because for each , you can find an such that . Thus can be identified with and , so . However since , we have , thus causing a contradiction for large enough .

The authors in [BFNV05], using expanding graphs, introduced an algorithm that when given a non-random string, outputs a small list of strings of the same length containing a string with higher complexity. In [Zim16], an algorithm was presented that when given a non-random string, outputs a large list of strings of the same length where 99% of the outputted strings have higher complexity. Given a universal machine , a -short program for is a string such that and the length of is bounded by + . The authors in [BMVZ13] showed that there exists a computable function that maps every to a list of size containing a -short program for .

In this paper, we show that the approximate knowledge about the Kolmogorov complexity of a finite number of strings is equivalent to sharing a certain amount of information with the halting sequence. The more strings in the collection and the better their approximation to , the more information this collection has with the halting sequence. The mutual information between an encoding of unique numbers alongside their Kolmogorov complexity and the halting sequence is at least . Due to information non-growth laws, there is no (randomized) algorithmic means to produce information with the halting sequence.

We also provide a generalization of “Sets Have Simple Members” theorem, first seen in [EL11]

, to conditional Kolmogorov complexity and conditional algorithmic probability. The theorem states that the minimum conditional complexity, over pairs specified by an binary relation is less than the negative log of the combined conditional algorithmic probability of all pairs in the enumeration.

## 2 Conventions

We use , , , , , to denote natural numbers, rational numbers, real numbers, bits , finite strings, and infinite sequences. We use and to denote the positive and non-negative elements of set . The th bit a string is . For string , . The length of a string is . The size of a set is . For and , we use iff or there is some string where . We say iff and . The self delimiting code of a string is . The encoding of (a possibly ordered) set , is .

A (discrete) measure is a function . Measure is a semi-measure iff . Measure is a probability measure iff . The support of a measure is . Probability measure is elementary if and . Elementary probability measures with are encoded by finite strings of the . For semi-measure , we say function is a test, iff .

For nonnegative real function , we use , , to denote , , and . We also use and to denote and .

We use algorithms on input programs and auxilliary inputs . is a prefix free algorithm if for all and , , does not halt or does not halt. There exists a universal prefix algorithm where for all prefix algorithm there exists a , where for all and , . As is standard, we define Kolmogorov complexity with respect to , with for , . The universal probability is defined as . By the coding theorem

. By the chain rule,

.

Let be a function with a finite domain , . Then , where , is . . The complexity of general partial computable functions is defined as the length of the shortest -program that computes it. The halting sequence is the unique infinite sequence where is equal to 1 iff halts. The information that has about is .

## 3 Left-Total Machines

This paper uses notions of total strings and left-total machines. A string is total if all sufficiently long extensions of

will cause the universal Turing machine

to halt. More formally is total if and only if there exists a finite prefix free set of strings such that , and for all , halts. Along with totality, we introduce the notion of leftness. We say is to the left of , , iff there exists a string such that and . We say the universal Turing machine is left total if for all strings , with , if halts then is total. An example of the domain of a left total machine can be seen in Figure 1. This example also illustrates the reason for using “left” in the definition. Figure 1: The above diagram represents the domain of a left total machine with the 0 bits branching to the left and the 1 bits branching to the right, with y=110. For i∈{1,…,5}, xi⊲xi+1 and xi⊲y. Assuming T(y) halts, each xi is total. This also implies each x−i is total as well.

Without loss of generality, we can assume that the universal Turing machine is left-total. We refer the readers to [Eps19b] on the explicit construction of a left-total universal Turing machine. The border sequence is the unique sequence where if then has both total and non-total extensions. The sequence is called “border” because if then is total and if , then will never halt when given as the starting input.

For total string , we define the following function as the longest running time of a program that is to the left of or extends . If and are total, then . For total string , and , let be the algorithmic weight of from programs conditioned on in time . More formally,

 mb(x|y)=∑{2−∥p∥:Uy(p)=x in time bbtime(b)}.

The term is 0 if is not total. If and are total, then .

## 4 Stochasticity

This paper uses the notion of stochasticity, which is a part of algorithmic statistics. For a comprehensive survey of algorithmic statistics, see [VS17]. A string is stochastic if it is typical of a simple probability measure. Typicality is measured by the deficiency of randomness . The deficiency of randomness of a string , with respect to a probability measure , conditioned on auxillary string , is

 d(x|Q,v)=⌊−logQ(x)⌋−K(x|v).

The deficiency of randomness measures the difference between the length of the -Shannon-Fano code for and the shortest description of (given ). If is typical, then its measure will be small. We say a string is stochastic conditional to , for and , if there exists a program of length where , and is an elementary probability measure, and . The stochasticity measure of a string , conditional on auxilliary information is

 Λ(x|y)=min{j+3logk:x is (j,k) stochastic conditional to y}.

The following lemma is from [EL11]. It states that strings that have high stochasticity measures are exotic, in that they have high mutual information with the halting sequence. Another version of the lemma can be found in [Eps19b].

###### Lemma 1

For , .

The following lemma is from [Eps19a]. A variant of the same idea can found in Proposition 5 of [VS17]. It states that there is no total computable function can increase the stochasticity of a string by more than a constant factor (dependent on the complexity of the function).

###### Lemma 2

Given total recursive function , .

The following lemma is from [Eps19b]. If a string is total and is not total, then . This is because the border sequence is defined by the unique sequence whose prefixes have total and non total extensions. Since is total and is not total, has total and non-total extensions. The following lemma states that if a prefix of border is simple relative to a string and its own length, then it will be a part of the common information of and .

###### Lemma 3

If is total and is not, and ,
then .

The following theorem is from [Eps19b]. It states that given two (not necessarily probabilistic) measures and with certain summation requirements, if the combined -score of elements of set is large, then there exist an element in can be identified by low -code.

###### Theorem 1

Relativized to computable an with , if for some finite set , , then there exists with .

## 5 Uncomputability of K

Corollary 1 shows that an encoding of any unique pairs has more than bits of mutual information with the halting sequence . So all such large sets are exotic.

###### Theorem 2

For any finite set of natural numbers and , where , we have .

### Proof.

Let . Note that by the coding theorem . Let be the shortest total string with .

 K(b|⟨∥b∥,L⟩) ≤+K(j), (1)

as there is a program that when given , and , enumerates all total strings of length and returns the first where , which we call satisfying property . This is equal to , otherwise there is a , that satisfies property . This implies that is total and satisfies property . This implies a contradiction for being the smallest total string satisfying property . The same arguments can be used if . This also implies is not total. A graphical depiction of this argument can be seen in Figure 2. Figure 2: A graphical argument for why the total string b in the proof of Theorem 2 is unique. Each path repsents a string, with 0s branching to the left and 1s branching to the right. If another string b′ exists with the desired mb′ property, and it is to the left of b, then its prefix b′− will also be total and have the desired mb′− property, causing a contradiction.

So for all , . Let , and . . Theorem 1, relativized to , gives where . So

 s <−logmb(a)−K(a|b)+Λ(D|b)+O(K(s)) <−logmb(a)−K(a)+K(b)+Λ(D|⟨b,s⟩)+O(K(s))

Let be a total computable function that when given an encoding of a function for finite , outputs . Thus . Due to Lemma 2, conditioned on , . Due to Lemma 1,

 s<2j+K(b)+I(L:H|b)+O(logI(L:H|b)+K(s)).

Let . Due to Lemma 3 and Equation 1, . This implies

 s≤2j+h∅+O(K(s)+K(j)+logh∅).

If , then the theorem is trivially solved. So, assuming , we have . So . Therefore . So .

###### Corollary 1

Any set of unique pairs has .

## 6 Exotic Binary Relations

For -programs that enumerate a (potentially infinite) binary relation and total string , we use to denote the finite binary relation enumerated by in steps. We use to denote the entire binary relation enumerated by .

###### Theorem 3

For -program that enumerates a binary relation, with
and ,

### Proof.

Let be the shortest total string where . We have the inequality because there is a program that when given , , and , can enumerate all total strings of length and all pairs , and return the first total string where , which we call satisfying property . This string is unique, otherwise there exists a string , which satisfies property . If , then is total and satisfies property , contradicting the definition of being the shortest total string satisfying property . Similar reasoning can be used for when . Therefore , and is unique. Figure 2 illustrates this point.

Let , be the program and elementary probability measure that minimize the stochasticity of conditional on , , where , and

 ∥v′∥+3logmax{d(p|Q′,⟨v′,b,i⟩),1}=Λ(p|⟨b,i⟩).

Let be an elementary probability measure equal to conditioned on the largest set of programs that enumerate binary relations where , which we call satisfying property . Thus , where . Let , , with , where is helper code of size . Thus which implies . Let . So

 ∥v∥ ≤+∥v′∥ ∥v∥+3logd ≤+∥v′∥+3logd =+∥v′∥+3log(max{−logQ(q)−K(q|v,b,i),1}) ≤+∥v′∥+3log(max{−logQ′(q)−K(q|v,b,i),1}) ≤+∥v′∥+3log(max{−logQ′(q)−K(q|v′,b,i),1}) ≤+Λ(q|⟨b,i⟩). (2)

Let , which is finite. Let

be a set of random vectors, indexed by

, each of size . The number is a constant solely dependent on to be determined later. Each element of the vector is chosen with probability , and is chosen with probability . Let , be a nonnegative function over strings, parameterized by sets of strings , each of size , each indexed by a string . For an enumerative program , , if there exists where . Otherwise . So, using the fact for ,

 Eδy[Q(tδy)] =∑qQ(q)∏y∈S(1−∑x:(x,y)∈q[b]mb(x|y))(c+d)2i+1ec+d−1 ≤∑qQ(q)∏y∈Se−∑x:(x,y)∈q[b]mb(x|y)(c+d)2i+1ec+d−1 ≤∑qQ(q)e−(∑(x,y)∈q[b]mb(x|y))(c+d)2i+1ec+d−1 ≤∑qQ(q)e−2−i−1(c+d)2i+1ec+d−1 =e−1<1.

Thus there exists a collection of sets , indexed by , where . This collection can be found using brute search given , , and , with .

There exists , and where . Otherwise , and for proper choice of , solely dependent on , we have

 d >−logQ(p)−K(p|v,b,i)−O(1) >−logQ(p)−(−logtGx(p)Q(p)+K(tGx(⋅)Q(⋅)|v,b,i))−O(1) >−logQ(p)−(−logtGx(p)Q(p)+K(Gx,Q|v,b,i))−O(1) >(loge)(c+d)−K(d,c)−O(1) >d,

causing a contradiction. We roll into the additive constants for the rest of the theorem. So , and there exists an , where . So

 K(x|y,b,i )≤+log|Gy|+K(Gy|v,d,b,i)+K(v,d|b,i) ≤+i+3logd+∥v∥ (3) ≤+i+Λ(p|i,b) (4)

Equation 3 is due to the fact that is a program (conditioned on ). So its conditional complexity is not more than its length. Equation 4 is due to Equation 2. Equation 5 is due to Lemma 1. Equation 6 is due to Lemma 3. Equation 7 is to the inequality .

###### Corollary 2

For finite binary relation , with , .

###### Corollary 3

For partial computable function with ,
.