 # Information Kernels

Given a set X of finite strings, one interesting question to ask is whether there exists a member of X which is simple conditional to all other members of X. Conditional simplicity is measured by low conditional Kolmogorov complexity. We prove the affirmative to this question for sets that have low mutual information with the halting sequence. There are two results with respect to this question. One is dependent on the maximum conditional complexity between two elements of X, the other is dependent on the maximum expected value of the conditional complexity of a member of X relative to each member of X.

## Authors

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

In [Rom03], criteria for the amount of algorithmic information that can be extracted from a triplet of strings was established. In this paper, the notion of bunches was introduced. A bunch is a finite set of strings such that

1. ,

2. for all ,

3. for all .

The term used above represents the conditional Kolmogorov complexity. In [Rom03], Theorem 5, it was shown that common information could be extracted from bunches.

Theorem 5. [Rom03] For bunch , there exists a string such that and for any .

In our paper, we revisit bunches and show that every bunch that is not exotic has an element that is simple conditional to all other members. We show this over the class of non-exotic bunches, that is bunches whose encoding has low mutual information with the halting sequence. We also prove a similar result for a structure we call batches, which are defined in terms of expectation instead of max. In this paper, we use a slightly different definition of bunches (and batches), where there are no assumptions about the Kolmogorov complexity of its elements. We define a bunch to be a finite set of strings, where , , and for all , . We define a batch to be a finite set of strings, where , , and for all , . In this paper we prove the following two theorems.

Theorem. For batch , .

Theorem. For bunch , .

The mutual information that a string (or any elementary object) has with the halting sequence, , is . Both theorems show that for non-exotic sets, i.e. sets with low information with , there exist a string that is simple conditional to all the other strings. Due to information non-growth laws, there is no (randomized) algorithm which can create exotic sets. Therefore, there are no means to produce sets which don’t have elements that are simple relative to all other elements of the set.

An example exotic bunch is , the set of all random strings of size , where iff and . It is not hard to see that for all , . So is a bunch. In addition, because contains all random strings of size , . Thus does not have such a conditionally simple element, and this implies it is exotic, because, due to the bunch theorem introduced above, . This bound is easily verifiable using the definition of , since and , because given the halting sequence and , there exists a simple program that can produce all random strings of size .

Another example of a bunch is the set , where is a string of arbitrary length, and . This bunch is usually not exotic. It must be that for as all strings in differ by a substring of size . Furthermore . Therefore is a bunch. Since and can be recovered from an encoding of the the set , and of course can be created from and , we have that . So by the above bunch theorem, . Most has negligible information with the halting sequence, relative to its length. Furthermore it can be seen independently that , because for , there is a program that given any member of and a program for , can output .

## 2 Related Work

The study of Kolmogorov complexity originated from the work of [Kol65]. The canonical self-delimiting form of Kolmogorov complexity was introduced in [ZL70] and treated later in [Cha75]

. The universal probability

was introduced in [Sol64]. More information about the history of the concepts used in this paper can be found the textbook [LV08].

Information conservation laws were introduced and studied in [Lev74, Lev84]. Information asymmetry and the complexity of complexity were studied in [G7́5]. A history of the origin of the mutual information of a string with the halting sequence can be found in [VV04b].

The notion of the deficiency of randomness with respect to a measure follows from the work of [She83], and also studied in [KU87, V’Y87, She99]. At a Tallinn conference in 1973, Kolmogorov formulated notion of a two part code and introduced the structure function (see [VV04b] for more details). Related aspects involving stochastic objects were studied in [She83, She99, V’Y87, V’Y99].

The combination of complexity with distortion balls can be seen in [FLV06]. The work of Kolmogorov and the modelling of individual strings using a two-part code was expanded upon in [VV04b, GTV01]. These works introduced the notion of using the prefix of a “border” sequence to define a universal algorithmic sufficient statistic of strings. The generalization and synthesis of this work and the development of algorithmic rate distortion theory can be seen in the works of [VV04a, VV10]. More information on algorithmic statistics can be found in [VS17, VS15].

This paper uses theorems and lemmas found [Eps13] and [EL11]. An accessible game-theoretic proof to [EL11] can be found in [She12]. Bunches were first introduced by  [Rom03], who used them to prove properties of common information of strings.

## 3 Conventions

We use , , , , , and to represent natural numbers, rational numbers, reals, bits, finite strings, and infinite strings. Let and be the sets of non-negative and of positive elements of . The length of a string is denoted by . The removal of the last bit of a string is denoted by , for . For the empty string , is undefined. We use to denote , the set of finite and infinite strings. For , , we say iff or and for some . The th bit of a string is denoted by . The first bits of a string is denoted by . The indicator function of a mathematical statement is denoted by , where if is true then , otherwise . The size of a finite set is denoted to be and also = . For a finite set , and function , . As is typical of the field of algorithmic information theory, the theorems in this paper are relative to a fixed universal machine, and therefore their statements are only relative up to additive and logarithmic precision.

For positive real functions the terms , , represent , , and , respectively. In addition , , and denote , and , respectively. For nonnegative real function , the terms , , represent the terms , , and , respectively. A discrete measure is a nonnegative function over natural numbers. The support of a measure is the set of all elements that have positive measure, with . The measure is elementary if its support is finite and its range is a subset of . Elementary measures have an explicit finite encoding, in the natural way. The mean of a function by a measure is denoted by . We say is a semimeasure iff . Furthermore, we say that is probability measure iff . For a set , . For semimeasure , we say that is a test, if .

is the output of algorithm (or if it does not halt) on input and auxiliary input . is prefix-free if for all with , and , either or . The complexity of with respect to is .

There exist optimal for prefix-free algorithm , meaning that for all prefix-free algorithms , there exists , where for all and . For example, one can take a universal prefix-free algorithm , where for each prefix-free algorithm , there exists , with for all and . is defined to be is the Kolmogorov complexity of relative to

. When we say that universal Turing machine is relativized to an object, this means that an encoding of the object is provided to the universal Turing machine on an auxilliary tape.

The complexity of a (partial) computable function , is where is the set of indices of functions equal to in an enumeration of partial computable functions of the form . A function is lower semicomputable if the set is enumerable. The complexity of a lower semicomputable function is , where is the set of indices of functions that enumerate in an enumeration of all enumerations that outputs a subset of .

The chain rule for Kolmogorov complexity is

. The universal probability of a set is . For strings , we have . The coding theorem states .

The halting sequence is the infinite string where for all . We recall that the amount of mutual information that has with is denoted by .

## 4 Left-Total Machines

The notion of total strings and the “left-total” universal algorithm is needed in the remaining sections of the paper. We say is total with respect to a machine if the machine halts on all sufficiently long extensions of . More formally, is total with respect to for some iff there exists a finite prefix free set of strings where and for all . We say (finite or infinite) string is to the “left” of , and use the notation , if there exists a such that and . A machine is left-total if for all auxiliary strings and for all with , one has that implies that is total with respect to . An example can be seen in Figure 1. Figure 1: The above diagram represents the domain of a left total machine T with the 0 bits branching to the left and the 1 bits branching to the right. For i∈{1,…,5}, xi⊲xi+1 and xi⊲y. Assuming T(y) halts, each xi is total. This also implies each x−i is total as well.

For the remaining part of this paper, we can and will change the universal self delimiting machine into a universal left-total machine by the following definition. The algorithm enumerates all strings in order of their convergence time of and successively assigns them consecutive intervals of width . Then outputs on input if the open interval corresponding to and not that of is strictly contained in . The open interval in [0,1] corresponding with is where is the value of in binary. For example, the value of both strings 011 and 0011 is 3. The value of 0100 is 4. The same definition applies for the machines and , over all . We now set to equal . Figure 2: The above diagram represents the domain of the universal left-total algorithm U, with the 0 bits branching to the left and the 1 bits branching to the right. The strings in the above diagram, 0v0 and 0v1, are halting inputs to U with U(0v0)≠⊥ and U(0v1)≠⊥. So 0v is a total string. The infinite border sequence B∈Σ∞ represents the unique infinite sequence such that all its finite prefixes have total and non total extensions. All finite strings branching to the right of B will cause U to diverge.

Without loss of generality, the complexity terms of this paper are defined with respect to the universal left total machine . The infinite border sequence represents the unique infinite sequence such that all its finite prefixes have total and non total extensions. The term “border” is used because for any string , implies that total with respect to and implies that will never halt when given as an initial input. Figure 2 shows the domain of with respect to .

For total string , let , be the slowest running time of a program that extends or is to the left of . With respect to the universal Turing machine defined above, would be the running time of the rightmost extension of that halts. For total string , and , let be the algorithmic weight of from programs conditioned on in time . More formally, The term is 0 if is not total. Let , and if is 0.

## 5 Stochasticity

In algorithmic statistics, a string is stochastic if it is typical of a simple probability measure. Properties of stochastic (and non-stochastic) strings can be found in the survey [VS17]. The deficiency of randomness of with respect to elementary probability measure and is . The function is a -test (up to an additive constant). It is also universal, in that for any lower semicomputable -test , and , for all , , as shown in [G1́3].

For some , we say that is -stochastic if there exists , with , being an elementary probability measure, and . The stochasticity of , is measured by . The conditional stochasticity form111This is formally represented as . is represented by , for .

Stochasticity follows non-growth laws; a total computable function cannot increase the stochasticity of a string by more than a constant factor dependent on its complexity. Lemma 1 illustrates this point. Another variant of the same idea can be found in Proposition 5 in [VS17].

###### Lemma 1

Given total computable function , .

### Proof.

Let realize , with , . Let be the image distribution of with respect to . Thus . The function is a -test (relative to and up to an additive constant), because

 ∑a2d(f(a)|f(Q),v)Q(a)=∑b2d(b|f(Q),v)f(Q(b))

Also is lower semi-computable given , with . So due to the universality of , Let compute , where is helper code of size and is a shortest program that computes , with . So . Since , we have that . So

 Λ(f(x)) ≤∥v′∥+3max{d(f(x)|f(Q),v′),1} <∥v∥+3max{d(x|Q,v),1}+O(K(f)) ≤Λ(x)+O(K(f)).

The following lemma is taken from [EL11]. It states that the stochasticity measure of a string lower bounds its information with the halting sequence. Another version of the lemma can be found in [Eps13]. Even though the stochasticity measure is larger in this paper than in [Eps13], changing from to , the arguments in the proof still hold.

###### Lemma 2

For , .

The following lemma is also from [Eps13]. It shows is that if a prefix of the border sequence is simple relative to a string , then it will be the common information between and the halting sequence . Note that if a string is total and is not, then , due to the fact that has total and non-total extensions.

###### Lemma 3

If is total and is not, and ,
then .

The following theorem is from [EL11]. It states that sets that are not exotic, i.e. sets with low mutual information with the halting sequence, have simple members that contain a large portion of the algorithmic weight of the sets. It is compatible with this paper’s stochasticity definition because the term used here is larger than the stochasticity measure used in [EL11].

###### Theorem 1

For finite set , .

## 6 Batches

We recall that a batch is a finite set of strings, where , , and for all , . The following theorem says that for non-exotic batches, there is an element of that is simple, on average, conditional to all other members of .

For batch , .

### Proof.

We can assume that , otherwise the theorem is trivially proven. Let be the shortest total string where , dubbed property . Thus , as can be constructed from and . In addition there exists a program that can enumerate all total programs of length and select the first one with property . The first one selected will be , otherwise there exists a , , with property . This implies there exists a total , , , and so and and thus property holds for , contradicting the minimal length of . This also implies is not total.

Let be the support of , which is finite. Let be the infinite set of all functions . Since is finite, each can be encoded in an explicit finite string. Let be a probabilility measure where . So for all , it must be that and .

For any finite set , , let be the set of functions , where there exists with . Using the fact that for , we have that

 κ(G∖GH1)≤∏a∈H(1−2−#H+2)≤(1−2−#H+2)2#H−1≤e−2−#H+22#H−1=e−2<0.25.

So . We use measures , indexed by and . The measure is defined as , where . Noting the definition of measures, for a set , we have that . We define a second set of functions . So

 Eg∼κEx′∈H[P′g(S|x′)] =|H|−1∑x′∈H,y∈SEg∼κ[P′g(y|x′)] =|H|−1∑x′∈H∑y∈S⎛⎝Kb(y|x′)−2∑c=12c−Kb(y|x′)(Kb(y|x′)−c)−2κ({g:g∈G,g(y)=c})⎞⎠ +κ({g:g∈G,g(y)≥Kb(y|x′)−1}) ≤|H|−1∑x′∈H∑y∈S⎛⎝mb(y|x′)Kb(y|x′)−2∑c=1(Kb(y|x′)−c)−2⎞⎠+2−Kb(y|x′)+2 ≤|H|−1∑x′∈H5mb(S|x′)<5.

So by the Markov inequality, . So for all finite , , . We use the following probability measure , indexed by and , defined as . Thus for all , . So for any , there exists where and also

 Ex′∈H[−logPg(xg|x′)] =Ex′∈H[−logP′g(xg|x′)+logP′g(S|x′)] =Ex′∈H[−logP′g(xg|x′)]+Ex′∈H[logP′g(S|x′)] ≤Ex′∈H[−logP′g(xg|x′)]+logEx′∈H[P′g(S|x′)] <+Ex′∈H[−logP′g(xg|x′)] =+Ex′∈H[[δg(xg,x′)≥2](−log2−δg(xg,x′)δg(xg,x′)−2)+[δg(xg,x′)<2]] <+Ex′∈H[max{δg(xg,x′)+2logδg(xg,x′),O(1)}] <+max{Ex′∈H[δg(xg,x′)]+2logEx′∈H[δg(xg,x′)],O(1)}

Let be a computable enumeration of all finite subsets of . Let be a function that when given a set , , outputs an encoding of the first finite subset in the list such that and . On all other inputs which are not an encoded finite set with , outputs the empty string. The function is total computable relative to , with , because given and , it is computable to determine whether a given function is in .

Let . Invoking Theorem 1, conditional to , gives , where . Since , we have that . Lemma 1, relativized to , using total computable function , gives . Lemma 2, gives

 K(g|b)

Since , there exists where, due to Equation 1,

 Ex′∈X[−logPg(xg|x′)]

So we have that

 Ex′∈X[K(xg|b,x′)] <+Ex′∈X[K(xg|b,g,x′)+K(g|b)] =+Ex′∈X[K(xg|b,g,x′)]+K(g|b)

Equation 4 is due to Equation 2. Equation 5 is due to Equation 3. Equation 6 is due to the invocation of Lemma 3. Equation 7 is due to the fact that .

## 7 Bunches

We recall that a bunch is a finite set of strings, where , , and for all , . The following theorem says that for non-exotic bunches, there is an element of that is simple conditional to all other members of .

For bunch ,