 # Semi-Countable Sets and their Application to Search Problems

We present the concept of the information efficiency of functions as a technique to understand the interaction between information and computation. Based on these results we identify a new class of objects that we call Semi-Countable Sets. As the name suggests these sets form a separate class of objects between countable and uncountable sets. In principle these objects are countable, but the information in the descriptions of the elements of the class grows faster than the information in the natural numbers that index them. Any characterization of the class in terms of natural numbers is fundamentally incomplete. Semi-countable sets define one-to-one injections into the set of natural numbers that can be computed in exponential time, but not in polynomial time. A characteristic semi-countable object is ϕ_Σ the set of all additions for all finite sets of natural numbers. The class ϕ_Σ codes the Subset Sum problem. This gives a natural and transparant analysis of the separation between the classes P and NP.

## Authors

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

This paper develops some ideas that were presented in an elementary form in , where we argued that the most urgent problem of modern philosophy of information was our lack of understanding of the interaction between information and computation. For a deeper understanding of the philsophical backgrounds we refer to this publication.

### 1.1 Informal presentation of the main argument

Interaction between information and computation is a phenomenon that we are all familiar with from a cognitive point of view, but that until now has eluded mathematical conceptualization. Suppose we want to add a set of numbers . We could compute , but most of us would immediately see that is easier to compute . Such a trick is not available for the set . This example shows that, for some sets, the sequence of our computations influences the hardness of the problem. The observation that is, from this perspective, less surprising than the fact that . From an information theoretical point of view this implies that the statement contains less information than the statement . Since addition is commutative and associative we have no mathematical tools to explain this phenomenon in classical arithmetic. 111For an elaborate analysis of this example, consult the last part of the Appendix in paragraph 8.

What we need is a theory that helps us to distinguish the information aspects of different computational histories. Below we develop such a theory. The central concept is the notion of the information efficiency of a function as the balance between the information in the input and the information in the output. For addition this gives . It is clear that this operation is not associative . The amount of possible computational histories for addition of a set of numbers is super exponential in its cardinality. This implies that the computational history of the way the output is computed is relevant for the amount of information it contains conditional to the input. In other words: even if we have the answer, we don’t know what we know, until we know how it is computed.

For some types of problems this means that, knowing the answer does not help us much to reconstruct the problem. A typical example is the so-called Subset Sum Problem: given a set of natural numbers , is there a subset that adds up to ? Now consider the following statement:

###### Statement 1

is the -th subset of that adds up to .

Here is the name of a set and “the -th subset of that adds up to ” a unique description. Note that we can compute the unique description effectively when we have the name and vice versa: i.e. there exists a computable bijection between the set of names and the set of unique descriptions. We have an algorithm to solve the search problem corresponding with the statement and the associated decision problem effectively:

1. Search problem: What is the -th subset of that adds up to ?

2. Decision problem: Does the -th subset of that adds up to exist?

A central question is:

###### Question 1

Are there uniquely identifying descriptions of objects that contain more or less information than the names of the objects they denote?

The prima facie answer to this question is no. If we can compute the name from the description and vice versa, independent of the amount of time this takes, the descriptive complexities should stay close to each other in the limit.

On the other hand, observe that computable bijections are by definition information efficient. When the information efficiency of a function is not well-defined, the bijection is also not well defined. We call this the Principle of Characteristic Information Efficiency: if two computations have a different information efficiency, then different functions are involved in their computation. As we have seen this is the case for addition. In this particular case there is no single finite mathematical function that describes the information efficiency of the bijection between sets and the sums of their subsets. There are infinitely many in the limit. The unique description “the first subset of that adds up to ” is ad hoc. It has no clear relation with the name of the denoted subset . This kind of ad hoc unique descriptions are abundant in every day life. Especially in relation to expressions like “the first that … ” or “the -th that …”. Consider the descriptions:

1. The first four-leaf clover I’ll find this afternoon.

2. The -th man with a moustache I’ll see in the city.

It is clear the description “The first four-leaf clover I’ll find this afternoon” does not describe an intrinsic property of a certain plant. Actually, which plant I’ll find (if I find one) completely depends on my search method. The descriptive complexity, of the plant from my perspective at the moment I utter the phrase is a combination of the amount of information in the description itself and the description of the search method I’m going to use. Suppose I throw a dice to select my path when roaming around the city to find guys with moustaches, then the description the search process to identify “The

-th man with a moustache I’ll see in the city” can easily contain more information than the descriptive complexity of his name.

According to this analysis the answer to research question 1 is positive: if a description of the search process, plus a partial description of the object, determines the object we will find, then this description possibly is more complex than the name of the object itself. At the same time the unique description itself can contain much less information. Mutatis mutandis, there is no way that we can search systematically for the “the first subset of that adds up to ” using the information given in . A fortiori there is no search process that works in time polynomial to the complexity of the search problem. The search process creates the object I will find.

Ofcourse, given the number we can always find a subset that adds up to by simply enumerating all possible subsets ordered by cardinality and compute the sum. In this case we are not using the specific information given in in the organization of the search process. Such a search by enumeration simply generates the missing information about given by a process of counting.

A question that emerges is whether there are sets for which there is no faster way to find a solution than by pure enumeration of the possible solutions. We prove this for a (prima facie) relatively simple countable object: the set of natural numbers and its finite subsets. We write this set as . We investigate two mappings to a two dimensional infinite discrete space (think of a chess board that extends to infinity on two sides):

• We show (via an elaborate counting argument) that can be mapped efficiently onto . In every cell there is exactly one finite set of natural numbers and vice versa.

• We observe that all possible descriptions of the form “the -th subset of that adds up to ” can be mapped trivially onto the plane : here is a column and is a row. In every cell there is exactly one unique description of a set. Not all unique descriptions will denote, e.g. “the -th set that adds up to ” does not exist. The infinite space is compressed infinitely over the -axis.

Comparable mappings form the core of Cantor’s argument that proves the existence of superfinite sets. Our argument follows a related strategy. The Cantor packing function maps the set of natural numbers onto the discrete plane . Using this construction we can investigate all possible mappings between sets and their descriptions in terms of elastic translations over the two dimensional space. We show that all possible mappings between names of sets and their descriptions are unboundedly information expanding. The description of most typical sets as “the -th subset of that adds up to ” using the numbers and contains more information than the index of itself. Another way of formulating this insight is that the information in the natural numbers does not grow fast enough to characterize the set of descriptions. The set of descriptions is semi-countable: their complexity “outruns” the information in the set of natural numbers unboundedly in the limit. The set is not rich enough to describe semi-countable sets. Consequently search by enumeration is the fastest way to construct them algorithmically.

### 1.2 Overview of the paper

We start with a conceptual overview of various types of computational processes: primitive recursive, -recursive and non-deterministic. We show that -recursive processes have a special status in so far that they allow for unbounded counting. We show that these processes generate information in logarithmic time in special circumstances.

We analyze this insight in the context of Kolmogorov complexity and Levin complexity and observe that these measures are not accurate enough for our purpose. We propose the concept of Information Efficiency of functions als an alternative complexity measurement theory. We give a detailed analysis of the recursive functions. We observe that information efficiency is not associative for addition.

We study the Cantor pairing function as an information preserving bijection between and . We study the information efficiency of elastic translations over Cantor bijections over the space . We show that there is a spectrum of these translations.

We show that the set , of all finite sets of natural numbers, can be mapped onto efficiently. This allows us to investigate the general conditions for elastic translations based on addition and multiplication of sets of numbers. The object that describes all possible additions of finite sets of natural numbers is . The corresponding object for multiplication is . We show that the resulting set of unique descriptions of sets “the -th subset of that adds up to ” is

1. associated with an infinite number of different computations.

2. not fully characterized by the set of natural numbers: the information in the descriptions grows faster than any counting process.

The set is semi-countable. We can search sets in exponential time but not in polynomial time. The object is fundamentally less complex as a result of the fact that the information efficiency for multiplication is associative.

This argument can easily be generalized to the Subset Sum problem, which proves the separation between and .

## 2 Conceptual Analysis

In this paragraph we give a conceptual analysis of the issues concerning the interaction between information and computation. For a more global discussion of the underlying philosophical problems the reader is referred to . We will use the prefix free Kolmogorov complexity as our measure of descriptive complexity of a string and  as basic reference: is the length of the shortest program that computes

on a reference Universal Turing machine.

### 2.1 Types of Computational Processes

There are at least three fundamentally different types of computing (See Figure 1) :

• Elementary deterministic computing as embodied in the primitive recursive functions. This kind of computing does not generate information: the amount of information in the Output is limited by the sum of the descriptive complexity of the Input and the Program.

• Deterministic computing enriched with search (bounded or unbounded) as embodied by the class of Turing equivalent systems, specifically the -recursive functions. This type of computing generates information at logarithmic speed: the amount of information in the Output is not limited by the sum of the descriptive complexities of the Input and the Program.

• Non-deterministic computing generates information at linear speed.

Suppose there is a class of search problems with a polynomial time checking function that cannot be solved by a deterministic program but can be solved using bounded search. Such a search routine would take exponential time, since information generation has logarithmic speed. A non-deterministic computer could generate (guess) the required information at linear speed and then perform the test in polynomial time. The existence of such a class of search problems would indicate a separation between and for the associated decision problems: these problems cannot be solved deterministically, they can be solved using bounded search in exponential time and non-deterministically in polynomial time.

This analysis shows that the formulation of search problems in terms of polynomial time bounds and Turing Machines might be quite misleading. More important than the polynomial time bound is the fact that the search functions can not be computed at all (in general) by deterministic functions, while the distinction between deterministic search and primitive recursion is hard to make in the context of Turing machines. We analyse this issue in the following paragraph.

### 2.2 The μ-operator for unbounded search

There is a subtle difference between systematic search and deterministic construction that is blurred in our current definitions of what computing is. If one considers the three fundamental equivalent theories of computation, Turing machines,

-calculus and recursion theory, only the latter defines a clear distinction between construction and search, in terms of the difference between primitive recursive functions and -recursive functions. The set of primitive recursive functions consists of: the constant function, the successor function, the projection function, composition and primitive recursion. With these we can define everyday mathematical functions like addition, subtraction, multiplication, division, exponentiation etc. In order to get full Turing equivalence one must add the -operator. In the world of Turing machines this device coincides with infinite loops associated with undefined variables. It is defined as follows in :

For every 2-place function one can define a new function, , where returns the smallest number y such that Defined in this way is a partial function. One way to think about as in terms of an operator that tries to compute in succession all the values , , , … until for some returns , in which case such an is returned. In this interpretation, if is the first value for which and thus , the expression is associated with a routine that performs exactly successive test computations of the form before finding . Since the -operator is unbounded can have any value.

Note that the name does not refer to a function but to a function-scheme. The in the expression is not an argument of a function but the index of a function name . We can interpret the -operator as a meta-operator that has access to an infinite number of primitive recursive functions. In this interpretation there is no such thing as a general search routine. Each search function is specific: searching for your glasses is different from searching for your wallet, even when you look for them in the same places.

The difference between primitive recursion and -recursion formally defines the difference between construction and search. Systematic search involves an enumeration of all the elements in the search space together with checking function that helps us to decide that we have found what we are looking for. We will have to look into this conception of enumeration in more depth.

### 2.3 Determinism versus non-determinism

In this paragraph we discuss the hybrid nature of unary counting processes, which, in a manner of speaking, are positioned between fully deterministic and non-deterministic processes. By definition deterministic processes do not generate new information, because the outcome of the process is determined. For a full discussion of this issue see . We start with a detailed analysis of the seven elementary counting processes (A-G) shown in figure 2

. The tensor operation

signifies concatenation.

• Automaton is deterministic and it does not halt. It starts with an empty string and writes an infinite sequence of ones.

• Automaton is non-deterministic. It generates the set of all finite strings of ones, i.e. the set of all finite unary numbers. We will call this a Counting Automaton or CA.

• Automaton , also known as the Coin Flipping Automaton or CFA, is non-deterministic. It generates the set of all finite binary strings consisting of zeros and ones, i.e. the set of all finite binary numbers.

• Automaton is deterministic. It is equivalent to automaton with the addition of an extra test that checks the Kolmogorov complexity of the string generated so far. As soon as has a complexity greater than a constant the process stops and produces output . We will ignore for the sake of argument that the Kolmogorov complexity is not computable and assume that there is some oracle that gives us a decision on the matter.

• Automaton is non-deterministic. It is equivalent to automaton with the addition of an extra test that checks the Kolmogorov complexity of the string generated so far. As soon as has a complexity greater than a constant the process stops and produces output .

• Automaton is deterministic. It is equivalent to automaton , but now the test routine involves a computable function running on an input index . In fact it is an implementation of the central routine of the -recursive search process that we discussed in the previous paragraph. It defines -recursion based on a Counting Automaton.

• Automaton is non-deterministic. It is equivalent to automaton . Here also the test routine involves a computable function running on an input index . It defines -recursion based on a Coin Flipping Automaton.

The difference between automaton and illustrates the fact that counting is essentially a non-deterministic operation. Automaton does not effectively generate an object, whereas generates all finite unary strings. Consequently the amount of information that generates is unbounded. The information is generated by a sequence of free binary decisions to continue counting followed by one decision to stop the process.

The decisions to stop and start the process can be seen as meta-decisions that as such are not an intrinsic part of the process. This is illustrated by the fact that as soon as we add a stop criterion in automaton and the unary counting process becomes deterministic and the binary string generation keeps its non-deterministic nature.

The fact that the systems and both generate information is illustrated by systems and . Both stop at the moment when a certain amount of information of size is generated. Since unary strings code information very inefficiently, and thus have a low Kolmogorov complexity, process needs to perform at least write operations before it stops, where process can reach this goal in principle in steps. In this case the computation has exponential time whereas can work in linear time. The observations we can make on the basis of this analysis are:

###### Observation 1
• Only non-deterministic processes generate information. Deterministic processes by definition do not generate information .

• Both unary counting and coin flipping are non-deterministic processes.

• Counting generates information at logarithmic speed. The time needed to generate a certain amount of information by means of unary counting, is exponential in the amount of time a coin flipping automaton needs to generate the same amount of information.

Unary counting with a stop criterion is hybrid in the sense that it has characteristics of both deterministic and non-deterministic processes. This explains the special status of these kind of processes in recursion theory. Unary counting with a stop criterion is a form of computing that is essentially stronger than standard deterministic computing. There are many conceptual problems around this notion of computing, one of which is the fact that the descriptive complexity of the computational process at any time during the computation may be much higher than the complexity of the actual output. See  for a discussion. The crucial limiting factor is the descriptive complexity of the halting test: for processes and and for processes and . A central observation in this context is the so-called non-monotonicity of set theoretical operations (see , par. 6.2).

Since -recursion is stronger than primitive recursion there will be classes of search problems that can be solved by -recursive functions and not in general by primitive recursive functions. The search process in -recursion is driven by counting. Consequently a non-deterministic version of -recursion using a Coin Flipping Automaton could solve a search problem, i.e. compute the value for the test function , in time linear in the length of representation of the number , while the search process in the classical deterministic -recursion, using a Counting Automaton, would take time exponential in the length of the representation of , i.e. the value of . Moreover if the time complexity of the computation is polynomial in the length of the input then the fact that a solution can be generated non-deterministically in linear time would be overshadowed by the time complexity of the checking function.

## 3 Information Efficiency of Recursive Functions

Let , where denotes the natural numbers and we identify and according to the correspondence

 (0,ε),(1,0),(2,1),(3,00),(4,01),…

Here denotes the empty word. The length of is the number of bits in the binary string . in the following we will use the logarithm with base as our standard reference . The standard reference  for the definitions concerning Kolmogorov complexity is followed. is the prefix-free Kolmogorov complexity of a binary string. It is defined as:

###### Definition 1
 K(x|y)=mini{l(¯i):U(¯iy)=x}

i.e. the shortest self-delimiting index of a Turing machine that produces on input y, where and . Here is the length of a self-delimiting code of an index and is a universal Turing machine that runs program after interpreting . The length of is limited for practical purposes by , where . The actual Kolmogorov complexity of a string is defined as the one-part code that is the result of, what one could call, the:

###### Definition 2 (Forcing operation)

According to the classical view the descriptive complexity of the Output of a deterministic computational proces is bounded by the sum of the complexity of the Input and the Program (See Figure 1):

 K(Output)≤K(Input)+K(Program)+O(1) (1)

Based on the discussion above we formulate the conjecture:

###### Conjecture 1

Deterministic computational processes generate information at logarithmic speed.

Acceptance of this conjecture would imply a shift from Kolmogorov complexity ro Levin complexity which takes the influence of the computing time into account:

###### Definition 3 (Levin complexity: Time = Information)

The Levin complexity of a string is the sum of the length and the logarithm of the computation time of the smallest program that produces when it runs on a universal Turing machine , noted as :

 Kt(x)=minp{l(p)+log(time(p)),U(p)=x}

The inequality 1 then becomes:

 Kt(Output)≤K(Input)+K(Program)+log(time(Program))+O(1) (2)

The problem with such a proposal is that our classical proof techniques and information measures are not sensitive enough to observe the difference between the two measures in practical situations. Information production at logarithmic speed is extremely slow and we will in every day life never sense the way it influences our measurements. The situation is not unlike the one in the theory of relativity. Our measurement of time is affected by our relative speed, but in every day life the speeds at which we travel in relation to the accuracy of our measurement techniques are such that we do not observe these fluctuations. The same holds for the difference between Kolmogorov and Levin complexity. First of all both measures are uncomputable, so we can never present a convincing example illustrating the difference. Secondly, if we estimate the number of computational steps the universe has made since the Big Bang as

, then a deterministic system would have produced about bits of information in this time span. Even if we could compute the value of and then the asymptotic nature of the measure, reflected in the parameter, which is related to the descriptive complexity of the reference Universal machines we are considering, does not allow us to reach the accuracy necessary to observe the difference on the time scale of our universe as a whole. So the classical theory of Kolmogorov complexity is of little use to us. We need to develop more advanced information measurement techniques.

One possibility is to shift our attention from Turing machines to recursive functions. The big advantage of recursive functions is that we have a reliable definition of primitive recursive functions, which relieves us of the burden to select something like a reference Universal machine, which eliminates the asymptotic nature of the measurement theory we get. Secondly we can model the flow of information through computational processes more effectively. We define:

###### Definition 4 (Information in Natural numbers)
 ∀(x∈N)I(x)=logx

The rationale behind the choice of the log function as an information measure is discussed extensively in . The big advantage of the definition of an information measure using recursive functions is the fact that we can get rid of the asymptotic factor, since we do not have to relativize over the class of universal machines. We get a theory about compressible numbers, much in line with Kolmogorov complexity, if we introduce the notion of the information efficiency of a function. The Information Efficiency of a function is the difference between the amount of information in the input of a function and the amount of information in the output. We use the shorthand for . We consider functions on the natural numbers. If we measure the amount of information in a number as:

 I(n)=logn

then we can measure the information effect of applying function to as:

 I(f(n))=logf(n)

This allows us to estimate the information efficiency as:

 δ(f(n))=I(f(n))−I(n))

More formally:

###### Definition 5 (Information Efficiency of a Function)

Let be a function of variables. We have:

• the input information and

• the output information .

• The information efficiency of the expression is

 δ(f(¯¯¯x))=I(f(¯¯¯x))−I(¯¯¯x)
• A function is information conserving if i.e. it contains exactly the amount of information in its input parameters,

• it is information discarding if and

• it has constant information if .

• it is information expanding if .

The big advantage of this definition over Kolmogorov complexity is that we can compute the flow of information through functions exactly. In the Appendix I in paragraph 8 we give extensive examples of the information efficiency of elementary recursive functions.

###### Definition 6 (Principle of Characteristic Information Efficiency)

The concept of information efficiency is characteristic for a function. Consequently if the information efficiency varies over sets of computable numbers, then different functions must be involved in their computation.

The concept of Information Efficiency gives us a tool to decide between Kolmogorov complexity and Levin complexity as the right measures for information. We make the folllowing observation:

###### Lemma 1

If we can definie a computable bijection for which in the limit the information efficiency is unbounded then computional processes generate information beyond the information stored in the program it self, i.e. equation 1 is invalid. Consequently equation 2 is the right bound.

Proof: First observe that most natural numbers are typical, i.e. random. Their Kolmogorov complexity is “close” to the logarithm of the value. Since computes a bijection on , both input and output will contain a “sufficient” amount of random, i.e. incompressible, elements to the effect that the equations 1 and 2 describe equalities for dense sets at any scale. Now consider equation 1, which can be rewritten as:

 K(Output)−K(Input)=
 δ(Program(Input))≤K(Program)+O(1)=c (3)

i.e. the information efficieny is bounded which contradicts the assumption. Now rewrite 2 as:

 Kt(Output)−K(Input)=
 δ(Program(Input))≤K(Program)+log(time(Program))+O(1) (4)

with the extra factor . This is the right bound since, following observation 1, is the maximum speed at which deterministic computational processes generate information.

## 4 Computable functions that generate and discard information

In this paragraph we show that there are indeed finite halting programs that generate and discard an unbounded amount of information in the limit. Central is the notion of an elastic transformation of the set .

### 4.1 The Cantor packing function

Observe that there is a two-way polynomial time computable bijection in the form of the so-called Cantor packing function:

 π(x,y):=12(x+y)(x+y+1)+y (5)

An example of the computations involved in the bijection is given in the Appendix in paragraph 9. The Fueter - Pólya theorem  states that the Cantor pairing function and its symmetric counterpart are the only possible quadratic pairing functions. A segment of this function is shown in figure 3. The information efficiency of this function is:

 δ(π(x,y))=log(12(x+y+1)(x+y)+y)−logx−logy (6) Figure 4: The information efficiency of the Cantor packing function, 0
###### Observation 2

The Cantor function defines what one could call: a discontinuous folding operation over the counter diagonals. On the line we find the images . For points and on different counter diagonals and we have that . Equation 6 can be seen as the description of an information topology. The Cantor function runs over the counter diagonals and the image shows that the information efficiencies of points that are in the same neighborhood are also close.

###### Observation 3

The information efficiency of the Cantor packing function has infinite precision (see Figure 4).

This is what one would expect from a function that defines a polynomial time computable bijection. We analyse some limits that define the information efficiency of the function. On the line we get:

 limx→∞δπ(x,x)=limx→∞log(12(2x+1)(2x))−2logx= (7)
 limx→∞log2x2+xx2=1

For the majority of the points in the space the function has an information efficiency close to one bit. On every line through the origin the information efficiency in the limit is constant:

 limx→∞δ(π(x,hx))= (8)
 limx→∞log(12(x+hx+1)(x+hx)+hx)−logx−loghx=
 log(1/2(h+1)2)−logh

Yet on every line (and by symmetry ) the information efficiency is unbounded:

 limx→∞δ(π(x,c))= (9)
 limx→∞log(12(x+c+1)(x+c)+c)−logx−logc=∞

Together the equations 5, 6, 7, 8 and 9 characterize the basic behavior of the information efficiency of the Cantor function. Figure 5: Row expansion by factor 2 and elastic translation by a factor 2 for the segment in Figure 3. For a computation of the exact information efficiency see figure 6. Figure 6: An exact computation of the point by point information efficiency after an elastic transformation by a factor 2 for a segment (1-4 by 1-19) of the table in figure 5. The existence of two seperate interleaving information efficiency functions is clearly visible in line with definition 6. Figure 7: A fragment of the bijection generated by an elastic transformation by a factor 2 for a segment (0-4 by 0-9) of the table in figure 5. The bijection π(ϵ2(π−1)):N→N generates a cloud of compressible and expandable points. The compressible points are above the line x=y.

### 4.2 Linear elastic transformations of the Cantor space

Actually equation 9 is responsible for remarkable behavior of the Cantor function under what one could call elastic transformations. For elastic transformations we can compress the Cantor space along the -axis by any constant without actually losing information. Visually one can inspect this counter intuitive phenomenon in figure 4 by observing the concave shape of the information efficiency function: at the edges (, ) it has in the limit an unbounded amount of compressible information. The source of this compressibility in the set is the set of numbers that is logarithmically close to sets and . In terms of Kolmogorov complexity these sets of points define regular dips of depth in the integer complexity function that in the limit provide an infinite source of highly compressible numbers. In fact, when we would draw figure 4 at any scale over all functions we would see a surface with all kinds of regular and irregular elevations related to the integer complexity function.

Observe figure 5. The upper part shows a discrete translation over the -axis by a factor . This is an information expanding operation: we add the factor , i.e. one bit of information, to each coordinate. Since we expand information, the density of the resulting set in also changes by a factor 2. In the lower part we have distributed the values in the columns over the columns . We call this an elastic translation by a factor . The space is transformed in to a space. The exact form of the translation is: . Figure 8: The information efficiency of the reference function of the Cantor packing function on the same area as in figure 4 after an elastic shift by a factor 100, 0

The effect of this elastic translation on the information efficiency on a local scale can be seen in figure 6

. After some erratic behavior close to the origin the effect of the translation evens out. There are traces of a phase transition: close to the origin the size of the

, coordinates is comparable to the size of the shift , which influences the information efficiency considerably. From the wave pattern in the image it is clear that a linear elastic transformation by a factor essentially behaves like a set of functions (in this case ), each with a markedly different information efficiency.

Even more interesting is the behavior, shown in figure 7, of the bijection:

 (10)

Although the functions , and are bijections and can be computed point wise in polynomial time, all correlations between the sets of numbers seems to have been lost. The reverse part of the bijection shown in formula 11 seems hard to compute, without computing large parts of 10 first.

 (11)
###### Observation 4

Linear elastic transformations introduce a second type of horizontal discontinuous folding operations over the columns.. These operations locally distort the smooth topology of the Cantor function into clouds of isolated points.

On a larger scale visible in figure 8 we get a smooth surface. The distortion of the symmetry compared to figure 4 is clearly visible. In accordance with equation 9, nowhere in the set the information efficiency is negative. In fact the information efficiency is lifted over almost the whole surface. On the line the value in the limit is:

 limx→∞δπ(ϵ2((x,x)))= (12)
 limx→∞log(12(2x+x/2+1)(2x+x/2)+x/2)−log2x−logx/2=2log5−3 Figure 9: The first information efficiency function of the Cantor packing function on the same area as in Figure 4 after an elastic shift by a factor 100, 0

A more extreme form of such a distortion can be seen in figure 9 that shows the effect on the information efficiency after an elastic translation by factor . Clearly the lift in information efficiency over the whole surface can be seen. We only show the first of different information efficiency functions here. Computed on a point by point basis we would see periodic saw-tooth fluctuations over the -axis with a period of . This discussion shows that elastic transformations of the Cantor space act as a kind of perpetuum mobile of information creation. For every elastic transformation by a constant the information efficiency in the limit is still positive:

###### Lemma 2

No compression by a constant factor along the -axis (or -axis, by symmetry) will generate a negative information efficiency in the limit.

Proof: immediate consequence of equation 9. The information efficiency is unbounded in the limit on every line or .

### 4.3 A general model of elastic transformations

In the following we will study, what we call general elastic transformations of the space :

###### Definition 7

The function defines an elastic translation by a function of the form:

 ϵr(x,y)=(xf(x)+(ymodr(x)),⌊yr(x)⌋) (13)

Such a transformation is super-elastic when:

 limx→∞r(x)=∞

It is polynomial when it preserves information about :

 limx→∞r(x)=cxk

It is linear when:

 r(x)=c

The reference function of the translation is:

 ϵ′r(x,y)=(r(x)x,yr(x))

We will assume that the function can be computed in time polynomial to the length of the input.

Observe that the reference function: is information neutral on the arguments:

 logx+logy−(logr(x)x+logyr(x))=0

An elastic translation consists from an algorithmic point of view of two additional operations:

1. An information discarding operation on .

2. An information generating operation on . Figure 10: An elastic translation from neighborhood α to neighborhood β.

A schematic overview of a linear elastic shift is given in figure 10. Here the letters are natural numbers. An arbitrary point in neighborhood with coordinates close to the diagonal is translated to point in neighborhood . The formula for the translation is given by definition 7: . We have and .

The information efficiency of an elastic transformation is:

 δ(π(ϵr)(x,y))=δ(π(r(x)x+(ymodr(x)),⌊yr(x)⌋))

The information efficiency of a linear elastic transformation by a factor is:

 δ(π(cx+(ymodc),⌊yc⌋))
###### Observation 5

Elastic transformations by a constant of the Cantor space replace the highly efficient Cantor packing function with different interleaving functions, each with a different information efficiency. Equation 13 must be seen as a meta-function or meta-program that spawns off different new programs.

This is illustrated by the following lemma:

###### Lemma 3

Linear elastic translations generate information.

Proof: This is an immediate effect of the use of the mod function. A linear elastic translation by a constant of the form:

 ϵc(x,y)=(cx+(ymodc),⌊yc⌋)

has different information efficiency functions. Note that the function produces all numbers , including the incompressible ones that have no mutual information with : .For each value we get a function with different information efficieny:

 δ(π(ϵc,d(x,y))))=logπ(cx+d,⌊yc⌋)−logx−logy

On a point by point basis the number is part of the information computed by . In other words the computation adds information to the input for specific pairs that is not available in the formula for

. The effect is for linear transformations constant in the limit, so it is below the accuracy of Kolmogorov complexity.

For typical cells on a line the function gives in the limit a constant shift which can be computed as:

 limx→∞π(ϵ′c(x,y))π(x,y)=limx→∞12(cx+hxc+1)(cx+hxc)+hxc12(x+hx+1)(x+hx)+hx=(c2+h)2c2(1+h)2≥1 (14)

Note that this value is only dependent on and for all and . The general lift of the line for an elastic shift by a constant is:

 limx→∞log(12(cx+xc+1)(cx+xc)+xc))−logcx−logxc= (15)
 −log2c−logc+log(2+1c2+c2)

We get a better understanding of the extreme behavior of the reference function when we rewrite equation 14 as:

 (c2+h)2c2(1+h)2=c21+2h+h2+2h1+2h+h2+h2c2+2c2h+c2h2≥1

and take the following limit:

 limh→∞(c2+h)2c2(1+h)2=c−2

If is constant then it has small effects on large in the limit. The reference function allows us to study the dynamics of well-behaved “guide points” independent of the local distortions generated by the information compression and expansion operations. Note that elastic transformations start to generate unbounded amounts of information in each direction in the limit on the basis of equation 14:

 limc→∞(c2+h)2c2(1+h)2=∞ (16)

If grows unboundedly then the information efficiency of the corresponding reference functions goes to infinity for every value of . Consequently, when goes to infinity, the reference functions predict infinite information efficiency in in all directions, i.e. we get infinite expansion of information in all regions without the existence of regions with information compression. This clearly contradicts central results of Komogorov complexity if we asume that elastic translations are defined in terms of a single program. The situation is clarified by the proof of lemma 3: if goes to infinity we create an unbounded amount of new functions that generate an unbounded amount of information.

### 4.4 Polynomial transformations

The picture that emerges from the previous paragraphs is the following: we can define bijections on the set of natural numbers that generate information for almost all numbers. The mechanism involves the manipulation of clouds of points of the set : sets with density close to the origin are projected into sparse sets of points further removed from the origin. This proces can continue indefinitely.

In this context we analyse polynomial translations. The simplest example is the elastic translation by the factor : Figure 11: Row expansion by a superlinear, r(x)=x factor and a fragment of the corresponding elastic translation for the segment in Figure 3
 ϵx(x,y)=(x2+(ymodx),⌊yx⌋) (17) Figure 12: Above: Super linear expansion by a function r(x)=x up to 32. Below: Linear elastic expansion by a factor 2 up to the number 32.

A tiny fragment of the effects is shown in figure 11. There does not seem to be a fundamental difference compared to the previous examples and there seems to be no difficulty in constructing such a translation along the lines suggested in the figure. However upon closer inspection things are different as can be observed in figure 12. The first table shows the computation of the function up to the number . Observe that the columns and are empty, while column have a value. This effect does not appear in the second table. Linear elastic translations induce a change in the direction of the iso-information line , but the image of the translation keeps a coherent topology at any stage of the computation.

Polynomial translations on the other side are discontinuous. They tear the space apart in to separate regions. An appropriate metaphor would be the following: expansion away from the origin over the -axis, sucks a vacuum that must be filled by a contraction over the -axis. Actually the creation of such a vacuum is an information discarding operation. The analysis above shows that the vacuum created by the shift described by equation 17 for cells on the line is bigger than the whole surface of the triangle . The effect is that the image of the translation becomes discontinuous.

###### Theorem 4.1

Polynomial elastic transformations of :

1. Discard and expand information on the line unboundedly.

2. Project a dense part of and on .

3. Generate an unbounded amount of information for typical points in in the limit.

Proof: The formula for a polynomial translation is:

 ϵr(x,y)=(cxk+1+(ymodcx),⌊ycx⌋)

Take . Consider a typical point in figure 10. We may assume that the numbers and are typical (i.e. incompressible and thus is inompressible too. Remember that the Cantor function runs over the counter diagonal which makes the line an iso-information line.

1. Discard and expand information on the line unboundedly:

• Discard information: Horizontally point will be shifted to location and to . The cantor index for point is .

• Expand information: The cells from to

will be “padded” in the strip between

and . But this operation “steals” a number of cells from the domain above the line . Now take a typical point on location . such that . This point will land at somewhere between and , which gives:

 π(f,(f+m))=12(f+(f+m)+1)(f+(f+m))+(f+m)=
 2f2+12fm+m2+3f+112(m+fm)
 ≫f2+2f+1

Note that the effects are dependent on and , so they are unbounded in the limit. Alternatively observe the fact that polynomial transformations are superelastic and apply equation 16.

2. Project a dense part of and on : For every point all the cells to up to the line will end up on line . since all points in this dense region are projected on the line most points are incompressible.

3. Generate an unbounded amount of information for typical points in in the limit. Take a typical point such that :

 π(ϵr(x,y))>π(ϵr(x,1))=12(x2+2)(x2+1)+1>12x4 (18)
 δ(π(ϵr(x,y)))>log12x4−logx−logy≈4logx−2logx=2logx

which gives .

This argument can easily be generalized to other values of and .

Polynomial shifts generate information above the asymptotic sensitivity level of Kolmogorov complexity. Note that is still a computable bijection:

 ∀(x,y)∈N2∃(u,v)∈N2(ϵr(u,v)=(x,y))
 u2≤x<(u+1)2
 a=(u+1)2−u2
 y=⌊va⌋

This analysis holds for all values including the values that are typical, i.e. incompressible.

###### Observation 6

An immediate consequence is the translation must be interpreted as a function scheme, that produces a countable set of new functions. One for each column . Actually can be seen as an index of the function that is used to compute .

The difference between linear and polynomial elastic transformations marks the phase transition between continuous and non-continuous shifts. When we use the enumeration of the Cantor packing function to compute such shifts the time needed to compute the location of certain points on the line close to each other may vary exponentially in the representation of the numbers involved. On the other hand such translations can be easily computed as bijections on a point by point basis in polynomial time, since the coordinate of the image codes the index of the function that was used to compute it.

### 4.5 The information efficiency of arithmetical functions on sets of numbers

In this paragraph we study elastic translations that grow faster than any polynomial function based on elementary arithmetical functions. These translations are so “aggressive” that they destroy the topology of the space completely and project chaotic clouds of points on the -axis A crucial tool for the construction of super polynomial transformations is the bijective mapping of finite sets of numbers in to the space :

###### Definition 8

is the powerset or set of subsets of . the set of finite subsets of . A characteristic function of an infinite subset of is , a monotone ascending function such that . Here is the index of in .

is uncountable, whereas can be counted. Consequently is also uncountable. Proofs of the countability of rely on the axiom of choice to distribute set in to partitions with the same cardinality. A useful concept in this context is the notion of combinatorial number systems:

###### Definition 9

The function defines for each element

 s=(sk,…,s2,s1)∈Nk

with the strict ordering its index in a -dimensional combinatorial number system as:

 σk(s)=(skk)+⋯+(s22)+(s11) (19)

The function defines for each set its index in the lexicographic ordering of all sets of numbers with the same cardinality . The correspondence does not depend on the size of the set that the -combinations are taken from, so it can be interpreted as a map from to the -combinations taken from . For singleton sets we have: , . For sets with cardinality we have:

 σk(0)=(12)+(01)→{1,0}
 σk(1)=(22)+(01)→{2,0}
 σk(2)=(22)+(11)→{2,1}
 σk(3)=(32)+(01)→{3,0}…

We can use the notion of combinatorial number systems to prove the following result:

###### Theorem 4.2

There is a bijection that can be computed efficiently.

Proof: We prove the lemma for the set . Let be the subset of all elements with cardinality . For each by definition 9 the set is described by a combinatorial number system of degree . The function defines for each element with the strict ordering its index in a -dimensional combinatorial number system. By definition 9 the correspondence is a polynomial time computable bijection. Now define as:

 ϕ+||(s)=π((|s|−1),σ|s|(s)−1) (20)

Note that both and are computable bijections. When we have the set we can compute its cardinality in linear time and compute from in polynomial time. When we have we can compute and compute from in polynomial time.

An elaborate example of the computation both ways is given in the appendix in paragraph 9. An example of the mapping can be seen in figure 13 under the header Cardinality Grid.

Note that in this proof the combinatorial number systems are defined on , while is defined on . The results can easily be normalized by linear time computable translations: . For reasons of clarity, in the rest of the paper, we will ignore such corrections for and use the function:

 ϕ||(s)=π((|s|),σ|s|(s)) (21)

The construction of the proof of theorem 4.2 separates the set in an infinite number of infinite countable partitions ordered in two dimensions: in the columns we find elements with the same cardinality, in the rows we have the elements with the same index. Figure 13: This figure illustrates the effects of two specific injections (based on sum and product) of the form ϕζ(s)=π(ζ(s),θζ(s)(s)) (equation 22). First table: basic Cantor grid. Second table: Mapping to finite sets of numbers. Third table: effect of the sum translation. Fourth table: effect of the product translation.

We give some examples. Consider the four tables in figure 13 with the following explanation:

1. The first table is a fragment of the simple Cantor function.

2. The second table illustrates the lexicographic ordering of finite sets of numbers in according to (equation 21). Note the fact that we find the set on the line in column .

3. The third table computes the shift for the sum function .

4. The fourth table computes the shift for multiplication .

### 4.6 A general theory of planar elastic translations for arithmetical functions

This suggests a general construction for the study of the information efficiency of arithmetical functions:

###### Definition 10

, a injection sorted on , is a mapping of the form:

 ϕζ(s)=π(ζ(s),θζ(s)(s)) (22)

where is the Cantor function and:

• is a general arithmetical function operating on finite sets of numbers. It can be interpreted as a type assignment function that assigns the elements of to a type (column, sort) represented as a natural number.

• is an index function for each type , that assigns an index to the set in column .The equation should be read as: is the -th set for which .

By theorem 4.2 we have that is efficiently countable. We can use as a calibration device to evaluate . If is a sorted injection the following mappings exists and such that , are identities in . Given this interconnectedness we can always use to construct :

###### Theorem 4.3

If the function exists and can be computed in polynomial time then sorted injections of the form exist and can be computed in time exponential to the representation of .

Proof: We have to compute . The functions and can be computed in polynomial time. The function can be computed using with algorithm 1. This algorithm runs in time exponential in the representation of which is the index of the set : .

Note that algorithm is a counting algorithm as discussed in paragraph 2.3 and that is a unique description of as referred to in our central research question 1: Are there uniquely identifying descriptions of objects that contain more information than names of the objects they denote? Observe that this procedure is a meta-algorithm. It abstracts completely from the semantics of the function .

There is a spectrum of planar translations as is illustrated by table 1. The interpretation of this table is as follows:

1. The first column gives the definition of the elastic function classes with a planar representation with increasing power:

1. The Cantor function with density in .

2. Linear translations by a factor .

3. Polynomial translations by .

4. Sum translations, based on the sum of the set of natural numbers associated with the cell.

5. Product translations, based on the product of the set of natural numbers associated with the cell.

6. The Trivial translation defined by the function that projects all the cells of the Cantor space on the line .

,

2. The second column gives the functions to compute the information efficiency of the translation over the -axis.

3. The third column gives the number of resulting different efficiency functions in the limit. Constant for linear transformations. Growing unboundedly for polynomial translations, to a different efficiency function for each cell for the Sum and the Product translations. For the trivial translation defines itself as its own efficiency function.

4. The fourth column gives the number of different efficiency functions per cell. Only for the Sum and the Product translations this becomes larger than .

5. The fifth column gives the resulting density function for over the counter diagonal. Up to polynomial translations the denstiy is . For Sum and Product translations the bijection becomes an injection, because there are only a finite number of sets that add up to, or multiply up to, a given number . For the sum translation the density over the counter diagonal is roughly in the limit, for the product translation the same density becomes logarithmic. For the trivial translation it becomes .