The normalized algorithmic information distance can not be approximated

02/16/2020 ∙ by Bruno Bauwens, et al. ∙ 0

It is known that the normalized algorithmic information distance N is not computable and not semicomputable. We show that for all ϵ < 1/2, there exist no semicomputable functions that differ from N by at most ϵ. Moreover, for any computable function f such that |lim_t f(x,y,t) - N(x,y)| <ϵ and for all n, there exist strings x,y of length n such that ∑_t |f(x,y,t+1) - f(x,y,t)| >Ω(log n). This is optimal up to constant factors. We also show that the maximal number of oscillations of a limit approximation of N is Ω(n/log n). This strengthens the ω(1) lower bound from [K. Ambos-Spies, W. Merkle, and S.A. Terwijn, 2019, Normalized information distance and the oscillation hierarchy], see arXiv:1708.03583 .

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

The information distance defines a metric on bit strings that in some sense takes all “algorithmic regularities” into account. This distance was defined in [4] as , where

denotes conditional prefix Kolmogorov complexity relative to some fixed optimal prefix-free Turing machine; we refer to appendix 

A for the definition and basic properties, and to the books [6, 8] for more background. After minor modifications, this distance satisfies the axioms of a metric, as explained in subsection A.3. We refer to [2] for an overview of many equivalent characterizations.

The distance is not computable. However, conditional Kolmogorov complexity is upper semicomputable, which means that there exists a computable function for which , and that is non-increasing in its last argument . Hence, also is upper semicomputable.

The distance is useful to compare strings of similar complexity. However, for strings of different complexity, a normalized variant is often preferable.

Definition 1.1.

The normalized algorithmic information distance of strings and is222 The numerator is nonzero, even if . holds for every choice of the optimal prefix-free Turing machine , because such machines never halt on input the empty string. Indeed, if it halted, then it would be the only halting program by the prefix property, and hence, the machine can not be optimal.

This normalized distance has inspired many applications in machine learning, where complexities are heuristically estimated using popular practical compression algorithms such as gzip, bzip2 and PPMZ, see

[6, section 8.4]. Within small additive terms, the function has values in the real interval and satisfies the axioms of a metric:

  • [leftmargin=*]

  • ,

  • ,

  • ,

  • .

See [6, Theorem 8.4.1].333 In [6, Exercise 8.4.3] it is claimed that for the prefix variant of the normalized information distance, one can improve the precision of the last item to . However, we do not know a proof of this. If this were true, then with minor modifications of similar to those in appendix A.3, all axioms of a metric can be satisfied precisely.

In this paper, we study the computability of . Note that if Kolmogorov complexity were computable, then also  would be computable. But this is not the case, and in [9] it is proven that is not upper semicomputable and not lower semicomputable, (i.e. is not upper semicomputable). Below in Lemmas 1.2 and 1.4 we present simple proofs. In fact, in [9] it is proven that (i) there exists no lower semicomputable function that differs from by at most some constant , and (ii) there exists no upper semicomputable function that differs at most  from on -bit strings. Theorem 1.1 below implies that (ii) is also true for all .

By definition, is the ratio of two upper semicomputable functions, and hence it is limit computable, which means that there exists a computable function such that . A function that satisfies this property is called a limit approximation of .

We define a trivial limit approximation of where is obtained by replacing all appearances of and in Definition 1.1 by upper approximations and , where is a computable function satisfying and ; and similar for . We assume that and are bounded by for all of length .

Lemma 1.1.

For all and strings of length at most :

Definition.

An -approximation of a function is a limit approximation of a function with .

For a suitable choice of , we have , and the function defined by is a -approximation.444 For general optimal , and for , we can obtain an -approximation that is constant in by choosing for some finite set of pairs , and by choosing otherwise. We show that for and every -approximation, the sum in the above lemma is at least logarithmic.

Theorem 1.1.

Let be an -approximation of with . For large :

This result implies that for each , there exists no upper semicomputable function that differs from by at most .


We now state the main result of [1].

Definition.

Let . A sequence of real numbers has at most oscillations if the sequence can be written as a concatenation of sequences ( finite and 1 infinite) such that each sequence is either monotonically non-increasing or non-decreasing. The sequence has oscillations if

The main result of [1] states that no -approximation of has at most a constant number of oscillations. More precisely, for each , there exists a pair such that does not have at most oscillations.

Let . We say that has at least oscillations, if for all there exists a pair of strings of length at most , such that does not have at most oscillations. (The proof of) Theorem 1.1 implies that if , then any -approximation has at least oscillations.

The trivial -approximation has at most  oscillations, because each upper-approximation of Kolmogorov complexity in its definition is bounded by on -bit strings, and hence, there can be at most this many updates. Can it be significantly less than , for example at most for large ?

The answer is positive. For all constants , there exist optimal machines in the definition of complexity for which the number of updates of is at most . For example, one may select an optimal whose halting programs all have length modulo . If is defined relative to such a machine, than the total number of updates is . Hence, for every constant  there exists a version of  and a -approximation that has at most  oscillations for large input sizes . Our second main result provides an almost linear lower bound on the number of oscillations.

Theorem 1.2.

Every -approximation of has at least oscillations.

In an extended version of this article, we plan to improve the lower bound to an bound. This requires a more involved variant of our proof.

Theorems 1.1 and 1.2 both imply that and hence Kolmogorov complexity is not computable. In fact, they imply something stronger: can not be bounded by a constant.555 Indeed, if this were bounded by , there would exist an upper approximation of such that for each pair , the function has only finitely many values. (We modify any upper approximation of complexity by only outputting values on input , for which . There are at most such .) Hence, there would exist an approximation of such that for all , the function has only finitely many values. Such functions would have only finitely many oscillations, contradicting Theorem 1.2, and a finite total update, contradicting Theorem 1.1. It has been shown that , see [5, 3], and our proofs are related. Like the proof in [3], we also use game technique. This means that we present a game, present a winning strategy, and show that this implies the result. The game technique often leads to tight results with more intuitive proofs. (Moreover, the technique allows to easily involve students in research, because after the game is formulated, typically no specific background is needed to find a winning strategy.) For more examples of game technique in computability theory and algorithmic information theory, we refer to [7].

is not upper nor lower semicomputable

For the sake of completeness, we present short proofs of the results in [9], obtained from Theorem 3.4 and Proposition 3.6 from [1] (presented in a form that is easily accessible to people with little background in the field).666 NOTE TO THE REVIEWER: papers about the information distance are sometimes cited by people from more applied research areas. In an optimistic scenario, there might exist such readers that want to read some initial segment of the paper, and might not remember the proof of the uncomputability of . Hence, I think it is good to keep the proof of Lemma 1.3. A function is lower semicomputable if is upper semicomputable.

Lemma 1.2.

is not lower semicomputable.

Proof.

Note that for large , there exist -bit and such that

Indeed, for any , there exists an -bit such that . The denominator of is at most , and the inequality follows for large .

Assume was lower semicomputable. On input , one could search for such a pair , and we denote the first such pair that appears by . We have and . Hence . For large this approaches , contradicting the equation above. ∎

Remark. With the same argument, it follows that for any , there exists no lower semicomputable function that differs from by at most . Indeed, instead of we could as well use , and search for for which the estimate is at least .

To prove that is not upper semicomputable, we use the following well-known lemma.

Lemma 1.3.

The complexity function has no unbounded lower semicomputable lower bound.

Proof.

This is proven by the same argument as for the uncomputability of , see appendix A.1: suppose such bound exists. Then on input , one can search for a string  with and hence . But since there exists an algorithm to compute  given , we have . This is a contradiction for large . Hence, no such  exists. ∎

Lemma 1.4.

is not upper semicomputable.

Proof.

By optimality of the prefix-free machine in the definition of , we have that for all  and . Thus , and hence,

If were upper semicomputable, we would obtain an unbounded lower semicomputable lower bound of , which contradicts Lemma 1.3. ∎

2 Trivial approximations have at most logarithmic total update

Lemma 1.1 follows from the following lemma for and the upper bound on the upper approximations of Kolmogorov complexity.

Lemma 2.1.

Assume , and . Then,

Proof.

We first assume . We prove a continuous variant. Let be non-decreasing real functions with and . The sum in the lemma can be seen as a special case of

The left integral in the sum is maximized by setting equal to its minimal possible value, which is . The right one is maximized for the maximal value of , which is . Thus,

For , the minimal value of is and the maximal value of is . The result follows after a calculation. ∎

3 Oscillations of -approximations, the game

For technical reasons, we first consider the plain length conditional variant of the normalized information distance . For notational convenience, we restrict the definition to pairs of strings of equal length.

Definition.

For all and strings and of length , let

If , let .

Remarks.
- For , the denominator is at least , since at most string can have complexity zero relative to .
- The choice of the value of if is arbitrary, and does not affect Proposition 3.1 below.
- In the numerator, the length is already included in the condition, since it equals the length of the strings.
- There exists a trivial approximation of with at most oscillations. Indeed, consider an approximation obtained by defining with brute force searches among programs of length at most .
- Again, for every constant , we can construct an optimal machine and a -approximation of  for which the number of oscillations is at most . We now present a matching lower bound.

Proposition 3.1.

Every -approximation of has at least oscillations.

In this section, we show that the proposition is equivalent to the existence of a winning strategy for a player in a combinatorial (full information) game. In the last section of the paper, we present such a winning strategy.

Description of game . The game has integer parameters: , and . It is played on two 2-dimensional grids and . Grid has size . Its rows are indexed by integers , and its columns are indexed by -bit strings. Let be the column indexed by the string . See figure 1 for an example with . Grid has size . The rows are indexed by integers , and its columns are indexed by unordered pairs , where and are -bit strings, (that may be equal).777 Formally, we associate sets with 2 elements to an unordered pair , and singleton sets to the pair . We sometimes denote unordered pairs of -bit strings as , and write . Note that . Let . The slice of  is the 2-dimensional grid of size containing all columns with . Additionally, Bob must generate a function mapping unordered pairs of -bit strings and natural numbers to real numbers.

000

001

010

011

Figure 1: Example of board with . Alice has placed 2 tokens in row 2 (white), and Bob has placed 1 token in row 0 and 1 in row 2 (black). The row restrictions for both players are satisfied, since and . , and .

Two players, Alice and Bob, alternate turns. The rounds are numbered as At each round, Alice plays first. At her turn, she places tokens on cells of the grids. She must place at least 1 token. Afterwards, Bob places zero or more tokens on the grids, and he declares all values for all unordered pairs , where is the number of the current round. This terminates round , and the players start with round .

For each player, for each , and for all grids , the following row restriction should be satisfied: The total number of tokens that the player has placed during the whole game in the -th row of , is at most . If a player does not satisfy this restriction, the game terminates and the other player wins. See figure 1. Bob’s moves should satisfy 2 additional requirements. If after his turn these requirements are not satisfied, the game terminates and Alice wins.

  • [leftmargin=*]

  • Let be the value of column given by the minimal row-index of a cell in containing a token. If contains no tokens, then . Similar for the value  of column . For all and :

    (c)
  • For all and : has at most oscillations. k

Note that for decreasing and , it becomes easier for Alice to win.

Discussion. If Alice places a token in a row with small index, Bob has a dilemma: either he can change the function , or he can place tokens on the other board to restore the ratios in (c). In the first case, he might increase the number of oscillations in k , while in the second case, he exhausts his limited capacity to place tokens on rows of small indices, (by the row restriction, at most tokens can be placed below row  in each grid ).

Remark. The game has at most rounds, because in each round, Alice must place at least 1 token, and by the row restriction, Alice can place at most tokens on all grids. Hence, the game above is finite and has full information. This implies that either Alice or Bob has a winning strategy.

Lemma 3.1.

Let be such that Alice has a winning strategy in the game . Then for every 0-approximation of there exists a constant such that for large , the 0-approximation has more than oscillations on -bit inputs.

Proof.

The idea of the proof is to use any limit approximation to construct a strategy for Bob. By assumption there exists some winning strategy for Alice, and we let it play against this strategy for Bob. Then we show that Bob satisfies the row restrictions and requirement (c). Since Alice strategy is winning, we conclude that requirement k must be violated. Our construction implies that has fewer oscillations then , thus also has more than  oscillations.

It suffices to prove the lemma for the largest function for which Alice wins the game . This function is computable, since the game is finite, and for each value we can determine whether Alice has a winning strategy by brute force searching all strategies.
- Let represent an upper approximation of .
- Let and similar for .
- Let be a -approximation of . Without loss of generality, we assume .


For all and , we present a run of the game . The mapping from and  to a (transcript of) this run is computable. First, we fix a winning strategy of Alice in the game in a computable way. For example, we may brute force search all strategies and select the first winning strategy that appears. Let . Consider the game in which Alice plays this strategy, and Bob replies as follows.

Bob’s strategy. At round , Bob searches for a value with such that for all and :    and   , . If such an is found, he sets and for all and . For all he places a token in column at row . For all unordered pairs , he places a token in column  at row . End of Bob’s strategy.

We first show that if Bob does reply, he satisfies the row restriction. For this holds because there are at most programs of length , and hence, at most strings with for some . For , this holds because implies , and there are less than such .

Assuming that Bob plays in round , requirement (c) holds. Indeed, after Bob’s move and for , condition (i) implies:

Together with (ii) and , this implies requirement (c).

We show that for large , there always exists an such that (i) and (ii) are satisfied, and hence, Bob plays in each round. Since is a -approximation, requirement (ii) is true for large , and this does not depend on . We show that (i) is also satisfied. To prove the left inequality, we first construct a Turing machine . The idea is that the machine plays the game above, and each time Alice places a token in a cell of column  with row index , it selects an unassigned -bit string, and assigns to it the output . Thus on input a string and integers , it plays the game, waits until the -th token is placed in the row with index equal to the length of , and it outputs the column’s index, (which is an -bit string). The row restriction implies that enough programs are available for all tokens. Hence, , whenever Alice places a token in at height . By optimality of the Turing machine in , we have for all , and hence,

For large , this is less than . By a similar reasoning, we have , because each time Alice places a token in row of column , we assign 2 programs of length : one that outputs on input , and one that outputs on input . Thus, for large , also requirement (i) is satisfied, and Bob indeed plays at any given round, assuming he played in all previous rounds.

Recall that Alice plays a winning strategy, and that Bob satisfies the row restriction and requirement (c). Hence, requirement k must be violated, i.e., for some pair , the sequence has more than oscillations. Since is increasing in , this sequence is a subsequence of , and the latter must also have more than  oscillations. This implies the lemma. ∎

To prove Theorem 1.2 we need a version of the previous lemma for the prefix distance.

Lemma 3.2.

Under the assumption of Lemma 3.1, every 0-approximation of has more than oscillations on -bit inputs for large .

Proof.

As a warm up, we observe that

Indeed, we can convert a program on a plain machine that has access to , to a program on some prefix-free machine without access to , by prepending prefix-free codes of the integers and . Each such code requires bits, and hence the inequality follows.

We modify the proof above by replacing all appearances of by , of by , and similarly for the approximations . We also set and assume that is a -approximation of . In Bob’s strategy, no further changes are needed.

The row restriction for Bob is still satisfied, because the maximal number of halting programs of length  on a prefix-free machine is still at most . Requirement (c) follows in the same way from items (i) and (ii) in Bob’s strategy. It remains to prove that for large and , these conditions (i) and (ii) are satisfied. Item (ii) follows directly, since is a -approximation of .

For item (i), we need to construct a prefix-free machine . This is done in a similar way as above, by associating tokens in row to programs of length , but we also need to prepend 3 prefix-free codes: for the row index, for , and for . This implies

Recall that . Hence, this is at most for large . The lemma follows from the violation of requirement k in the same way as before. ∎

4 Total update of -approximations, the game

We adapt the game for the proof of Theorem 1.1.

Description of game , where and are real numbers. The game is the same as the game of the previous section, except that requirements (c) and k are replaced by:

  • [leftmargin=*]

  • For all and with :

    ()
  • For all and with :

    (a)

Remarks.
- We call the sum in (a), the total update of . Similar for the total update of an -approximation.
- The threshold is chosen for convenience. Our proof also works with any computable threshold function that is at least super-logarithmic and at most for some .

Lemma 4.1.

Let . Suppose that for large , Alice has a winning strategy in the game . Fix , and an -approximation of either  or . Then, for large , there exist -bit inputs for which the total update of exceeds .

Proof.

We first consider an -approximation of , and at the end of the proof we explain the modifications for . The proof has the same high-level structure as the proof of Lemma 3.1: from we obtain a strategy for Bob that is played against Alice’s winning strategy. Then, from the violation of (a) we conclude that the total update of exceeds .

Let be large such that Alice has a winning strategy in the game . We consider a run of the game where Alice plays a computably generated winning strategy and Bob’s replies are as follows.

Bob’s strategy. He searches for an such that for all and with :    and   , , If such an is found, let . Bob chooses for all and . For all he places a token in column at row . For all unordered pairs , he places a token in column  at row . End of Bob’s strategy.

For similar reasons as above, we have that for some and for large , requirements (i) and (ii) are satisfied. This implies that for some , Bob always reacts.

We now verify that for large , requirement () holds. Recall that we need to check the inequality when the denominator is at least . After Bob’s move we have again that

(*)

Since for some constant , we may also assume that , because truncating can only decrease the number of oscillations. This and item (ii) imply that if is large enough such that

(**)

inequality () is indeed satisfied.

Because Bob loses, requirement (a) must be violated. Since the total update of is at least the total update of as long as the -threshold is not reached, this implies that every -approximation has total update more than . The statement for  is proven.

The modifications for are similar as in the previous section. Instead of choosing to be a constant, we again choose it to be , and for the same reasons as above, this makes (*4) true if we replace conditional plain complexity by (conditional) prefix complexity. This increase from constant to logarithmic increases the minimal value of in (**4) only by a factor . Otherwise, nothing changes in the above argument. The lemma is proven. ∎

5 Total update of -approximations, winning strategy

Lemma 5.1.

Let . For large , Alice has a winning strategy in the game  for

By Lemma 4.1, this implies Theorem 1.1.

Proof idea..

Alice’s winning strategy maintains a product set containing pairs of strings. Initially, and are disjoint subsets of of size . We force Bob to decrease by at least a constant for a significant fraction of pairs . Afterwards, we discard parts of and of such that for all pairs of the remaining set , this increase and decrease indeed happened. After a ’reset’-operation, we repeat the procedure. We show that we can repeat this logarithmically many times before the sets and have size less than . And this implies the result.

The idea to enforce a decrease is as follows. First we consider a set , where is any collection of pairwise disjoint subsets of of some small size. The first time, we choose the size to be roughly for some small constant , and the number of sets equals roughly . The sets also have size and are chosen such that is larger than for all . This is possible, since the size of are very small, see figure 2. This implies that is very close to in .

Figure 2: The set from Alice’s strategy. For 4 indices , Bob reacted by decreasing for some . For 2 indices this did not happen, and they might be selected by Alice to start the next iteration.

Alice now decreases for all to . is chosen small enough such that if Bob wants to satisfy requirement (), he is forced to either decrease to less than or to decrease by at least . Since he can do the first only for a small fraction of strings , (for less than strings ), there will be a part that only contains for which the second option was chosen. Afterwards, Alice decreases for all , and the procedure can be repeated as if the game was played for .

In each iteration, the parameter decreases by a constant factor , and hence the strategy can be repeated logaritmically many times. Hence, Alice can enforce a total update proportional to , and the proof overview. ∎

We present the details. The following technical lemma presents the set , from which a part will be chosen for the recursion.

Lemma 5.2.

Let
- and be powers of with .
- and be subsets in of size .
- be pairwise disjoint subsets of of size .
There exist sets such that for all and we have .

Proof.

For each , less than strings satisfy . Fix some set . We need to select . How many in satisfy

  for some    ?

There are less than such . Let be a subset of containing of these other strings. ∎

Proof of Lemma 5.1..

Let . Let . Alice will create pairs in which oscillates between  and . The distance between these values is , thus to satisfy requirement (), the sum in (a) in such an oscillation increases by at least . (In fact, each cycle contributes at least , but we do not optimize the constant factors.)

Initially, let , and let and be disjoint parts of of size . The strategy is recursive in . At the start of each recursive call, we have for all , and Alice will not have played on the grid in a row with index smaller than . In the beginning of the game, these conditions are trivially satisfied.


Alice’s recursive strategy inside disjoint sets and of size . If , the strategy terminates. If , then for all , Alice places a token in at height . (This guarantees , but is not needed in the first recursive call, when , since by definition.) Then it is Bob’s turn. If he does not satisfy requirements () and (a), Alice wins and the game terminates. Assume the game continuous. Let . Let be a sequence that satisfies the conditions of Lemma 5.2 with and . (These sets have size .) For all and all pairs , Alice places a token in at height . Then it is Bob’s turn. If he does not satisfy requirements () and (a), the game and the strategy terminates. Otherwise, Alice selects some index for which for all . (Such exists, because there are less than strings with , and the sets are pairwise disjoint.) She runs the strategy recursively for , and . End of the strategy.

We need to prove that for some , Alice wins the game with parameter for large . Assume . We show that the total update in the selected set increases by at least . After Alice’s first move, for all , we have

If Bob’s reply does not satisfy the requirements, we are finished. Otherwise, Alice performs her second move. For all and , we have

where the right inequality follows from our choice of  and for large . As explained above, if Bob satisfies the requirements, then the total update of increases by at least .

We now determine a value of such that Alice wins the game  for large . Except for the last, each recursive call increases the total update by at least , and the number of such calls is

Note that the base of the logarithm is , because for , the assumption implies . Hence Alice wins the game for and such that . The lemma follows for any arbitrarily close to

6 Oscillations of -approximations, winning strategy

Lemma 6.1.

There exists a constant such that for all  and , Alice has a winning strategy in the game  .

Together with Lemma 3.1 this implies Theorem 1.2.

In the winning strategy from the previous section, we obtain a logarithmic number of oscillations. To increase this number, we must decrease by a smaller amount: by a constant for the lower bound of and logarithmic for the of . Again we will consider a set , but now will be very small, and it will no longer be possible to achieve the requirements for and for all pairs in for some . However, we can achieve that the average number of oscillations grows proportional with the number of recursive calls, and this is enough for our purposes.

For a finite set , and a function on , let denote the expected value of when

is uniformly distributed over 

. We present 2 technical and trivial lemmas that are useful for later reference.

Lemma 6.2.

Let
- and be non-negative powers of  with ,
- be any partition of into sets of size ,
- .
There exist subsets of of size such that for we have

Proof.

This follows by the probabilistic method using a uniformly random selection of the sets . ∎

For an integer , let denote the function that maps a string to if and to otherwise. Similar for .

Lemma 6.3.

Let and . If the players satisfy the row restion, then

(EX)
(EZ)
Proof of Lemma 6.1..

The idea is to create pairs in which oscillates between and . Again we initialize , and let and be disjoint sets of of size . The strategy is recursive in .


Alice’s recursive strategy inside disjoint sets and of size , started at round . Let . If , then the strategy immediately terminates. Assume . If , Alice places a token in at height for all . Then she waits for Bob’s reply. If he does not satisfy his requirements, the strategy terminates. Otherwise, the game proceeds to round