Addition Machines, Automatic Functions and Open Problems of Floyd and Knuth

11/17/2021
by   Sanjay Jain, et al.
National University of Singapore
0

Floyd and Knuth investigated in 1990 register machines which can add, subtract and compare integers as primitive operations. They asked whether their current bound on the number of registers for multiplying and dividing fast (running in time linear in the size of the input) can be improved and whether one can output fast the powers of two summing up to a positive integer in subquadratic time. Both questions are answered positively. Furthermore, it is shown that every function computed by only one register is automatic and that automatic functions with one input can be computed with four registers in linear time; automatic functions with a larger number of inputs can be computed with 5 registers in linear time. There is a nonautomatic function with one input which can be computed with two registers in linear time.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

01/04/2021

Binary Dynamic Time Warping in Linear Time

Dynamic time warping distance (DTW) is a widely used distance measure be...
05/07/2020

Linear Time LexDFS on Chordal Graphs

Lexicographic Depth First Search (LexDFS) is a special variant of a Dept...
07/04/2021

The PCP-like Theorem for Sub-linear Time Inapproximability

In this paper we propose the PCP-like theorem for sub-linear time inappr...
03/10/2020

Optimal-size problem kernels for d-Hitting Set in linear time and space

We improve two linear-time data reduction algorithms for the d-Hitting S...
11/16/2019

Constructing the Bijective BWT

The Burrows-Wheeler transform (BWT) is a permutation whose applications ...
06/22/2020

Constant-Space, Constant-Randomness Verifiers with Arbitrarily Small Error

We study the capabilities of probabilistic finite-state machines that ac...
06/17/2020

Variation diminishing linear time-invariant systems

This paper studies the variation diminishing property of k-positive syst...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Hartmanis and Simon [22] showed that register machines which can add and multiply and perform bitwise Boolean operations in unit time and which can hold arbitrarily big numbers can solve with polynomially many primitive steps NP

-hard problems (and beyond). In contrast to this, register machines which have only addition, subtraction and comparison instructions on the integers can carry out arbitrary polynomial time operations in polynomial time but not more, so they are a realistic model of computation with primitive steps that are more comprehensive than those of Turing machines or counter machines. Due to the addition counting unit costs, some operations became faster, for example multiplication can be done in

instead of the fastest known , which counts multitape Turing machine steps and was recently discovered by Harvey and van der Hoeven [12]. By definition, addition and subtraction are instead of . On the other hand, Stockmeyer [28] showed that even if a machine can add, subtract and multiply in unit time, certain operations like computing the modulus are not faster than in the usual

steps and halving and testing on odd versus even both cannot be done in

steps. In the same way as Turing machines need, for many regular sets, to inspect all digits to determine membership, also a register machine which can add is not faster than a Turing machine, as Stockmeyer’s example for certain regular sets (like the set of odd numbers) show.

Floyd and Knuth [9] systematically studied register machines which can add, subtract and compare and called them “addition machines”; in the following text, addition machines and register machines are used synonymously. Knuth recalled this work in Floyd’s obituary [20] as one of the joint works he enjoyed a lot in the later stage of their collaboration. They found that addition machines form a natural model and they provided various algorithms for arithmetic on them, in particular as Floyd and Knuth looked for alternatives to the usual Turing machine models with their tiny primitive steps. Anisimov [1, 2] studied the idea of Floyd and Knuth of using ideas borrowed from Fibonacci numbers for implementing arithmetic on large numbers with addition and investigated it thoroughly; later, he and Novokshonov [3, 4, 23] implemented the algorithms of Floyd and Knuth [3, 4, 23].

Schönhage [26] has proven that allowing subtraction and comparison increases the power of machines which can only add and do equality iff allowing division increases the power of such machines. Simon [27] extended these studies, in particular he took register machines which can add, subtract and compare as the base case and then looked into details – beyond what he had done jointly with Hartmanis [22] – what impact additional operations have. This work was extended also by other authors as Trahan and Bhanukumar [29].

The model of Floyd and Knuth [9] is indeed well-motivated from the fact that many of the above investigated relations increase the computational power of polynomial time operations substantially. Floyd and Knuth were, however, less interested in comparing their model with variations than in establishing the power of linear time operations and a fine-graned time-complexity for natural operations on their model. Their model is motivated from the idea that a central processing unit of a computer has only few accumulators or registers which perform basic arithmetic and other operations and for which few of these registers are sufficient; when abstracting the model and allowing arbitrarily large values for the registers, the additive operations turn out to be an adequate choice, as they, as mentioned before, preserve the class of polynomial time operations.

The precise model of Floyd and Knuth [9] allows the following operations: Adding or subtracting two registers and assigning the value to (a perhaps same or different) third register; conditional branching in dependence of outcome of the comparison of two registers with , unconditional branching including loops; reading of input into registers and writing of output from registers. However, Floyd and Knuth [9] did not allow operations with constants; as some programs need to handle constants, they allowed that one additional input is read in and stored in a dedicated register in order to have access to constants. Floyd and Knuth also considered operations with real numbers on other subgroups of the real numbers than the integers, but the present work does not go into details of these data structures different from the integers. However, the authors of the present work think that operations and comparisons with constants are very natural and should be included into the instruction set.

Floyd and Knuth [9] were precisely interested in two types of questions: (a) How fast are their algorithms in terms of the order of steps needed in dependence of input size? Here the precise way of measuring the size depended on their own algorithms and the question was mainly whether these can be outperformed. Furthermore, for those where the time complexity is optimal, what is the number of registers needed by the addition machine to carry out these operations. Both the time complexity and the number of registers (as some type of space measure) are aspects fundamental to computer science. Space has two aspects, (1) the size of the numbers in the registers and (2) the amount of registers itself. As (2) is constant, it does not influence the asymptotic space usage; however, it makes a big difference what this constant is and when too small, many operations can only be carried out in suboptimal time. Furthermore, explicit restrctions on the (1) might contervene the spirit of the work which allows the numbers to be unrestricted in size and to measure the influence of the size only indirectly by its affect on the runtime of the algorithm, so that (2) is the only real space parameter available. The expressibility of SAT-formulas depends on the number of literals allowed per clause; 1 literal per clause restricts expressiveness very much and allows not to code anything interesting; 2 literals per clause allows to code so much that counting solutions becomes hard while checking solvability is still easy; 3 literals per clause makes the problem hard to solve and NP-complete. Similarly, the number of registers available allows for more and more complex linear time algorithms to be carried out and it is of scientific interest to find for natural operations like multiplying, dividing and so on where this threshold is. As the basics were alreaddy known from the works of Hartmanis and Simon [22, 27] as well as others, the research looked now more at the details. Floyd and Knuth were able to determine optimally the computation time and the number of needed registers for the greatest common divisor of two numbers (linear time, 3 registers) and obtained for other operations like multiplying and dividing the optimal time bound while they were unsure of the number of registers needed (Question (2)). Questions (3), (4) and (5) then are questions where they obtained a good algorithm, but could not prove its optimality with respect to computation time; for these topics the number of registers were secondary to them, though they are important. Here is the precise list of the open questions of Floyd and Knuth [9]:

  1. Can the upper bound in Theorem 1 be replaced by ?

  2. Can an addition machine with only registers compute in operations? Can it compute the quotient in operations?

  3. Can a addition machine compute mod in operations, given ?

  4. Can an addition machine sort an arbitrary sequene of positive integers in steps?

  5. Can the powers of in the binary representation of be computed and output by an addition machine in steps? For example, if , the program should output the numbers in some order. (This means, it does not need to be the top-down default order.)

  6. Is there an efficient algorithm to determine whether a given matrix of integers is representable as a product of matrices of the form ? ( is the diagonal matrix of the identity mapping and has a single and everything else at the coordinate .)

For details of [9, Theorem 1] and further explanations to the notions of (1) and (6), please consult their paper. The present work provides positive answers to (2) and (5): Floyd and Knuth [9] showed that one can compute the greatest common divisor with three registers in linear time and they also showed that, in the absence of constants as operands, this number is optimal. For the operations in listed in (2) they need six registers. Theorems 3 and 3 below shows that one can solve the opeartions in (2) with four registers used — or, if one like Floyd and Knuth [9] does not allow operations with constants, the algorithm needs five registers and still matches the bound of the Open Question. The operations considered in Problem (2) are to compute the square of a number and the integer division in the time needed by the algorithms of Floyd and Knuth which used six registers, but to bring down the register number to at most five. The runtime for the squaring has to be where is the number of bits of the input while for the integer division, the algorithm has to run in time linear in the number of bits of the output, note that the latter can be smaller than the number of digits of each of the inputs.

In Question (5), Floyd and Knuth [9] looked at the binary of natural numbers and wanted to know the powers of two involved. In other words, given an unknown finite set and reading the input , can an addition machine compute and list out all terms with in any order in time ? The answer is affirmative and the basic algorithm is to translate a number given as binary input by reading the bits at the top position and adding up the corresponding powers so that one gets the binary number in reverse order and then to read out the bits at the top while doubling up a variable from , , , and outputting in the case that . This algorithm runs in time and since in some cases numbers are output, the runtime is optimal.

For (3) and (4), the problem of answering these questions is the question what the expression for multivariate little Oh Calculus precisely means. There are various competing definitions and the followings are the two popular notions where the first is taken from the Wikipedia page of the big Oh Calculus:

  1. The definition on Wikipedia, based on the algorithms textbook of Cormen, Leiserson, Rivest and Stein [8]: Multivariate big Oh-expressions can be resolved by considering the inputs as sets of tuples without any deeper structure. Then the cut-off-point (whether to consider variables of natural numbers to start with or to start with ) makes a difference: The Wikipedia page provides an example that depends on this start of the natural numbers. Here a further example: in the case if the variables start with while this fails in the case that varibles start with (choose and let run up). In particular the Wikipedia definition says contains all functions for which there is a constant such that, for all but finitely many pairs , . In other words, there are two constants such that whenever then . Multi-variate little Oh-expressions are not directly discussed on the Wikipedia page, but analogously to the above, one can define that iff for every rational there is a such that for all with , .

  2. Another popular definition of multivariate big Oh and Little Oh Calculus does not use a maximum like Wikipedia but a minimum; for example Howell [13, 14] justifies this definition in his technical report and uses it in his textbook. For the little Oh-notation, this means that iff for every rational there is a constant such that whenever then .

One can define variants (a) and (b) analogously if more than two variables are involved or, as in (4), the number of variables is a variable itself.

The findings are now the following: When one takes the Wikipedia definition (a) of the little Oh Calculus, one can answer both (3) and (4) to the negative, that is, these algorithmic improvements do not exist. However, consider a special case of (4) where a register machine first reads , then and then a and then has to output . In this special case the register machine has only to archive, but not sort the numbers. Now for Definition (a) of the little Oh Calculus, the answer is that this cannot be done in steps. However, for verion (b) of the definition, the answer would be that it is indeed possible (where now the minimum of and all the read values of have to be above ). Thus the answer to questions (3) and (4) might indeed be quite sensitive to what underlying definition for the multivariate Oh Calculus is chosen and therefore these questions are only partially answered until answers for all common variants of the little Oh Calculus are found.

The work on this paper revealed a close connection between automatic functions and the number of registers in a register machine; here an automatic function is a generalisation of the notion of regular sets to that of functions, for more details on this topic, see Section 5 below. For machines having one register only, all functions computed are, perhaps partial, automatic functions, independently of the computation time used; furthermore, they are a quite restricted subset of the set of all automatic functions. On the other hand, automatic functions can all be implemented with only few registers. For automatic functions with several inputs (this number of inputs has to be constant), one can compute them with five registers. Furthermore, registers machines with two registers can compute non-automatic functions in linear time; even in the setting of Floyd and Knuth where there are no operations with constants.

The results in this paper were jointly developed by the authors when Xiaodong Jia wrote his UROP thesis about this topic [21].

2 On the Methods and Notions in this Paper

[Allowed Commands] In the following let be registers (which might refer to the same register in this definition only) and be a constant from and let be a line number. Furthermore, ranges over the comparison operators . The following type of commands are allowed:

  1. ; ; ; ; ; ;

  2. Read ; Write ;

  3. If then begin end else begin end;

  4. If then begin end else begin end;

  5. Goto .

The else-part of if-then-else statements might also be omitted; similarly bracketing by “begin” and “end” might be omitted for single statements. In the below, additional constructs will be allowed, as long as those can be translated into the above constructs in a way that the number of operations only increases by a constant per use of the construct and that the number of registers used in total by the program is not changed. These additional constructs therefore increase only the readability without doing an essential change.

[Usage of Constants] Floyd and Knuth [9] allowed the uage of integer constants only indirectly by reading the constant once from the input and storing it in some register where it is available until no longer needed. The main constant needed is ( is the difference of a register with itself). However, this one additional register is also enough, as every integer constant to be added or subtracted can be replaced by adding or subtracting the a constant number of times. Similarly for comparing with , one subtracts from , does the comparison and adds back to .

Though operations with constants are an obstacle for proving lower bounds, the authors of this paper think that allowing operations with constants is natural. For example, early CPUs like MC6800 from Motorola could add and subtract and compare either one register (called accumulator) with another one or with constants, furthermore, the CPU could not multiply – one of the authors used a computer with this CPU at his secondary school.

If one wants to translate results on numbers of registers sufficient for a computation of a function from the model of this paper to the model of Floyd and Knuth, one has in general to add one to the number of registers used.

[Variables and Registers] When writing programs, one might in addition to the registers also consider variables which hold values from a constant range, say bits. These variables do not count for the bound on the number of registers, as they can be implemented by doubling up the program (in the case of bits) and then jumping back and forth between the two copies of the problem, which will then be adjusted to the variable having the value in the first copy and the value in the second copy; here is an example.

  1. Read ; read ;

  2. If then else ;

  3. Let ; let ;

  4. Let ;

  5. Write .

An optimised way of implementing this without using is the following:

  1. Read ; read ;

  2. If then goto 5;

  3. Let ; let ;

  4. Goto 7;

  5. Let ; let ;

  6. Let ;

  7. Write .

In the worst case, the full program has to be transformed into consecutive copies where is the number of values the variable can take. One loads a value into such a variable by jumping into the corresponding copy; if it is read out, the variable is at each of the copies of the program replaced by the corresponding integer constant. Floyd and Knuth [9] used a similar method at the program where they permuted the order of the registers without giving the code for it. Letters at the beginning of the English alphabet are used for such constant range variables, while letters at the end of the English alphabet are used for registers.

3 Solving Open Problems (2) and (5) of Floyd and Knuth

In Open Problem (2), Floyd and Knuth [9] were interested in the optimal number of registers needed for basic operation on a register machine that can add and subtract and compare; they considered general subgroups of the reals, but concentrated on the integers which are also the model of this work. A side-constraint is that this number of registers should take the time consumption of the operation into account and so, for multiplication, remainder and division, it should be in and not larger. Floyd and Knuth [9] gave an unbeatable algorithm of calculating the remainder using the Fibonacci numbers to go up and down in exponentially growing steps from the smaller number to the bigger number and back. Furthermore, they showed that the tasks cannot be solved with two registers. While the Fibonacci method is unbeatable for many items, some of the following problems they could not do with this method. The present work addresses the corresponding problems left open in their work. The next theorem solves the first part of the Open Problem (2) of Floyd and Knuth [9].

Multiplication can be done using four registers in time linear in the size of the smaller number (in absolute value). In particular the squaring of a number can be done with four registers in linear time, that is, in time proportional to .

Proof.

Multiplication can be done with four registers in time linear in the size of the smaller number. The algorithm is as follows:

  1. Begin Read ; read ; let ;

  2. If then begin let ; let end;

  3. If then begin let ; let end;

  4. If then begin let ; let ; let end;

  5. Let ; let ; let ; let ;

  6. If then goto 7;
    let ; goto 6;

  7. Let ; if then goto 8;
    let ;
    if then begin let ; let end;
    goto ;

  8. If then begin let end;
    Write ; End.

The following arguements verify the runtime-properties of the algorithm. The program has only loops in lines 6 and in lines 7. In the first loop, a register will be doubled up until it reaches the bound and this takes rounds where is the number of bits in ; in the second loop the number gets doubled up until it reaches the value of and each time the high order bit is removed when it is not . Again this takes rounds. Note that this loop terminates as it is entered with being strictly positive and being a power of greater than . Furthermore, note that lowest bit of is the position of the end of the number, it shifts to the front until it equals to the position of the in which is then the exit condition for the loop. Note that this exit condition will eventually be taken, as all higher order bits of are deleted until only one non-zero bit remains.

Lines 2,3,8 handle the sign of the operands, where the product of the sign is stored in in order to handle the negative numbers. See the introduction for the validity of the claim that one can have variables which use only constantly many values without having to bump up the number of registers. Line 4 handles an optimisation which was not required by Floyd and Knuth and which just orders as , not relevant for squaring.

Now some more explanations for the correctness of the overall algorithm; these arguements assume that the the nubmers satisfy , so that the commands to cover special cases can be ignored in the algorithm. Under these assumptions, the modifications in lines 2, 3, 4, 8 do not apply. Now assume that and are the binary numbers and . Now, after line 5, the initial values of are as follows:

  x     11011
  y    100001
  v         1
  w         0

Note that at a one-digit bit is appended as an end-marker of the number, all trailing zeroes are then the result of multiplications with and the algorithm can so detect the last digit of the number without having an extra tracking variable which indicates the word end. The loop in line 6 doubles up until and this gives the following result:

  x     11011
  y    100001
  v    100000
  w         0

In line 7, the registers will be doubled up in each round with a special exit out of the loop if as that means that the coding bit has reached the position of the in and that the multiplication is complete. The following situation arises: If the original input is then the coding bit appended makes the number and the invariant of the Loop in line 7 is that after processing iterations of the loop, the value of is the number times (so zeroes after the last ) while has the value of the product times (in the above example was chosen such that one can see what this product is without much computations).

If then after doubling up , the registers have the same value and the result of the multiplication is identical with the current content times which will be output.

Otherwise, the bit will be after the doubling up of at the same position as the single bit of and is compared with . If then and is subtracted from and is doubled up and then is added to ; if then and only is doubled up, however, also then is added implicityly to as and in both cases, has after doubling up the value times and at the end of the body of the loop the value times , furthermore, has at the end of the body of the loop the value times .

This is now illustrated by giving the values of the various numbers during the iterations of the loop using example from above, note that only and change while and remain always the same, thus they are only in the first two lines.

  x     11011 // x before the loop;
  w         0 // w before the loop;
  y    100001 // y throughout the loop;
  v    100000 // v throughout the loop;
  x     10110 // leading 1 read out and v subtracted from x;
  w    100001 // y added to w;
  x     01100 // leading 1 read out and v subtracted from x;
  w   1100011 // w doubled up and y added to w;
  x     11000 // leading 0 read out and no subtraction;
  w  11000110 // w doubled up and no addition;
  x     10000 // leading 1 read out and v subtracted from x;
  w 110101101 // w doubled up and y added to w;
  x    100000 // after the last doubling up of x, v=x and loop end;
  w 110101101 // value of w from last loop body returned as result.

This example then illustrates how the multiplication works. Whether one multiplies the result in each round with the base (ten at school and two here) or whether one uses shifted versions of to be multiplied with the digits are two very similar approaches to the same type of algorithm. ∎

The next theorem solves the second part of Open Problem (2) of Floyd and Knuth [9].

The integer division with can be carried out with 4 registers in time where is the smallest natural number such that times the absolute value of is greater or equal the absolute value of .

Proof.

The algorithm needs one additional register . As in the algorithm before, the key idea is to read out binary numbers at the top by comparing them with a power of ; here one compares the number to be divided by with a power of times ; this value is stored in the register and later is doubled up in order to meet the bound so that one knows for how many rounds one has to run the algorithm.

  1. Begin read ; read ; let ;

  2. If then begin ; end;
    if then goto 6;
    if then begin ; end else begin end;
    let ;

  3. If then goto 4; ; goto 3;

  4. if then begin ; end;
    if then goto 5;
    let ; let ; let ; goto 4;

  5. If then begin let ; if then let end;

  6. Write end.

The next paragraphs give the verification for the main case that . The other cases are left to the reader; due to these main assumptions, the instructions in Line 2 (except for letting ) and Line 5 can be ignored, as they handle the exceptions for the case that the above assumption is not satisfied. The output in the case of division by is irrelevant.

While register is not greater or equal to , is doubled up in Line 3. Note that therefore for the minimal with after processing Line 3; can be and is identical with the above same-name parameter of the runtime of the algorithm.

For Line 4, let be the respective values of before entering the line which are the absolute values read of what has been read into at the beginning of the program.

The invariants of Line 4 is that after rounds of this line, has the value and has the value and . In round , the algorithm first checks whether and if so, subtracts from and increments by , in other words, gets subtracted from and added to so that the difference before and after this update are the same; furthermore, after the update the property holds. If now then the algorithm quits the loop else it doubles up and goes into the next round of the loop. The algorithm indeed quits after rounds, as is initially , doubled up exactly once in each round and . When the algorithm quits the loop, then and, when dividing by , one sees that so that is indeed in as the integer division by requires. Thus is the intended downrounded value of .

The loops in lines 3 and 4 run times, as in line 3 one measures how often one has to double up (which was ) to reach or overshoot the value of and the loop in line 4 brings up by doubling up this register until is reached. The handling of the sign is done in lines 2 and 5 and the values of are at least in the loops in 3 and 4. ∎

In Open Problem (5), Floyd and Knuth [9] ask whether there is a register machine which can in subquadratic time compute the powers of giving the sum of a given number in arbitrary order (but each outputting only once). For example, for , the algorithm should output in arbitrary order. The next algorithm for this runs in linear time and needs only four registers; thus the algorithm satisfies the subquadratic runtime bound requested by Floyd and Knuth [9].

On input of a number of digits, a register machine with four registers can output the powers of two giving the sum in time linear in .

Proof.

The idea is to first reverse the bits in the representation of and then to read out the powers from the top, now using that the -th bit stands for .

  1. Begin read ; if then begin let end;
    let ; let ; let let ; let ;

  2. If then goto 3; let ; goto 2;

  3. If then goto 4;
    if then begin let ; let end;
    let ; let ; goto 3;

  4. Let ; let ;

  5. If then begin write ; let end;
    let ; let ; if then goto 5;
    End.

First one makes sure that is not negative. Then one enters into a termination condition which is an additional at the end in order to put the correct number of zeroes when inverting the number. This is all done in line 1.

The first loop in line 2 determines a power of which is a proper upper bound on . This bit of this bound is two digits ahead of the largest power of in the sum of . So holds the original input appended with a coding bit , where . Furethermore, the register holds the number .

The next loop in line 3 converts the number to times in and in ; for this one starts with and and the loop invariant of this loop is that after rounds (with ) of the loop, has the form times and has the value and has the value .

In each round of the loop, first it is checked whether and if so, the loop is left. Then it is checked whether and if so, is subtracted from and is added to . At the end, and are doubled up. In the first round, only the doubling up happens, as . After rounds with , has the value times , has the value and has the value . For example, if the input is (thirteen) then the values of after rounds are as follows:

  y 100000 // remains like this through-out the loop
  x 110110 // coding bit appended and doubled up in round 1
  u      0 // initilised as 0 and not modified in round 1
  z     10 // doubled up in round 1
  x 101100 // bit 1 read out and doubled up in round 2
  u     10 // old z added to u in round 2
  z    100 // dobuled up in round 2
  x 011000 // bit 1 read out and doubled up in round 3
  u    110 // old z added to u in round 3
  z   1000 // dobuled up in round 3
  x 110000 // bit 0 read out and doubled up in round 4
  u   0110 // nothing added in round 3, leading zero for readability
  z  10000 // doubled up in round 4
  x 100000 // bit 1 read out and doubled up in round 5
  u  10110 // old z added to u in round 5
  z 100000 // doubled up in round 5
  x 100000 // not modified as x=y in round 6, loop terminated
  z      1 // z is set to 1 in line 4 after termination of loop
  x 101100 // x is doubled up in line 4

In line 5 the powers of two are output whenever the leading bit of is and the leading bit is removed from , after that and are doubled up; when the loop terminates. So the invariant is that after rounds of the loop in line 5, is times , has been output iff was and is . Here the values in the loop with the above data:

  y 100000 the register y remains unchanged
  x 101100 before round 1
  z      1 before round 1
  x 011000 bit 1 is read out, y subtracted from x and x doubled up in round 1
         1 is output in round 1
  z     10 z is doubled up after outputting z in round 1
  x 110000 bit 0 is read out and x is doubled up in round 2
  z    100 z is doubled up without being output in round 2
  x 100000 bit 1 is read out, y subtracted from x and x doubled up in round 3
       100 is output in round 3
  z   1000 z is doubled up after outputting z in round 3
  x 000000 bit 1 is read out, y subtracted from x and x doubled up in round 4
      1000 is output in round 4
     10000 z is doubled up after outputting z in round 4
  x 000000 loop terminates after round 4 as x = 0

The above arguements verify the correctness and the example illustrates the algorithm. Each of the three loops runs approximately times where is the number of binary bits in , thus the runtime is . ∎

4 Open Problems (3) and (4) and Multivariate Oh-Notations

There are several ways to define the Big and Little Oh Notations in several variables. Wikipedia (version (a)) gives with reference to Cormen, Leiserson, Rivest and Stein [8, page 53] for the following definition: is in if there are constants such that for all tuples where at least one coordinate is above , . The analogous definition for is that for all constants there is a constant such that for all tuples with with at least one of the coordinates above , . If one would not require that only one coordinate is above but all coordinates are above , the next result is not applicable. Version (b) of the multivariate Little Oh Calculus requires that not only one coordinate but all coordinates are above and, in the case that the number of coordinates varies as well, that there are at least coordinates in the tuples considered.

Floyd and Knuth [9] asked whether one can compute modulo in time where is the number of digits of and is the number of digits of . The following example answers Question (3) only for the Wikipedia definition (variant (a)) of the Little Oh Calculus

Example .

In this example, when denoting values modulo

, in order to estimate their size modulo

, these are numbered as and not as , as part of this example requires to study the first nonzero bit of such numbers. is always even.

One chooses arbitrarily and to be so large that the constant of the little is below . Furthermore, is and it is understood that is chosen such that and that is an integer. Now one estimates that at every operation (addition or subtraction) of the register machine, the largest register increases its value, modulo , by at most a factor . Note that, modulo , the largest input is and that the output is which is smaller than . Thus one would need that, modulo , the first nonzero digit of the largest registers goes from to which requires at least additions. This amount of additions is larger than , as and , so the algorithm cannot make enough additions and subtractions for producing a result which, modulo , equals .

Consider the task that a register program reads in a positive number followed by positive numbers followed by one number in this order and has then to output . In the following, let and let . This task cannot be done in steps in the case that one applies Definition (a) of the little Oh Calculus and it can be done in steps in the case one applies Definition (b) of the little Oh Calculus.

Proof.

For the result concerning Defintion (a) of the Little Oh Calculus, given a register program with registers, let and fix it at this constant and let be so large that all tuples of -digit numbers the input has to be processed in time - what is possible as is now fixed in the little Oh Calculus and the is chosen so large that the runtime is smaller than times the rational number . At the same time, as -bit numbers are read, the machine must save them in its registers and be able to recall each of them and also know the position of each number. Thus there are after reading the numbers many different -tuples of -bit numbers (with leading bit at the top position) and it is not possible to store them in a one-one way in registers iff all registers have numbers strictly below , as those jointly use only bits and can take only many values. If two -tuples are mapped to the same memory and differ on item , then the algorithm will for one of the -tuples make a mistake when the next number read is . Thus one of the numbers must at least have bits. However, when reading only -bit numbers and the smaller value of , there must be at least additions or subtractions in order to create a number which has properly bits (with the highest order bit being ). As and for all sufficiently large , the computing task is not in when the little Oh Calculus is taken according to Definition (a).

For Definition (b), the idea is to prove that the task is in . By the definition of (b), as provided that both go to infinity and not only their maximum (as (a) requires).

The algorithm is to use operations to create a queue which is fed at the bottom and read out at the top. For the ease of readability of the program, all numbers in the input are required to be positive (so at least ) and this is not tested explicitly (though it would be trivial to do so).

  1. Read ; read ; let ; let ; let ; let ;

  2. let ; let ; let ; ;
    if then goto 2;

  3. Let ; let ; let ; let ;
    if then goto 4 else begin Read ; goto 2 end;

  4. Read ; let ;

  5. Let ; ; let ;
    If then begin let ; if then let end end;
    if then begin let ; let end;
    if and then goto 5;

  6. Write .

This program produces a data structure where the number determines the top position of the data structure and has the bits of the numbers one after the other and has the end positions of each binary number in the structure. So when entering the if-then-else statement at the end of line 3, the data structures for the so far processed binary numbers , , looks like this:

  u  1 000 000 00000
  x    110 101 11011
  y      1 001 00001
  z                1

Furthermore, contains the remaining numbers to be built into the data structure and contains the most recent number added into the data structure.

So when building up the data structure, the role of is to space out the numbers so that when adding to , the bits will not overlap with those of the previous number and therefore are doubled up until . Furthermore, when is added to , is added to in order to mark the last bit and in each round. The inner loop of doubling up is in line 2 and the outer loop also includes line 3 to do the additions of the current number to and of the current end-bit marker to .

In the loop of Line 5, the bits of are read out in parallel by always doubling up so that the position of the leading bit (it might be ) is moved at the position of the only of in binary representation and then one makes two if-statements one dealing with having a in this leading position and one dealing with having a in this leading position. These leading digit of is copied into the last position of and that of causes, when being , the counter to be decreased. If is still positive, that is, if is still , and if is decremented to it means that the number currently in is the number to be passed into the output. If the number was too big (and there is no number archived for that index) then will eventually become and the loop will be aborted and some meaningless output be given. Here the above examples after two bits of are read out in the case that the value of is (what causes the bits of to be copied to ):

  u  1 0 000 00000 00
  x    0 101 11011 00
  y    1 001 00001 00
  z                11

The in are the two first bits of the binary number which was coded as first number in . The last bit, a , is still in and written here as a leading bit for better readability. The spaces in the number are for readability and shifted to the front inline with the doubling up of the numbers, only remained unmodified and there the spaces are just adjusted to those of and for having them at the same positions.

This algorithm verifies that the task of the problem is in as first the numbers which are doubled up in every round of the loop will have at the end numbers of up to bits – more precisely bits in the case that the -th input into the register has bits. Here note that is the first intput into and the second input into is the index of the number to be read out (from the front). The second loop also runs at most the same number of rounds, as after this number of rounds the value of is . and get doubled up in each round and their top bits, which are at the position of the single bit one of , are removed by subtracting if they are not zero. For that reason, the runtime of the algorithm is where each . This is upper bounded by and, for the notion (b) of the little Oh Calculus, . Note there that for a constant , one chooses so large that is above the multiplicative constant of the runtime expression and therefore the runtime is less or equal . ∎

The above implies that Open Problem (4) is answered “no” when one bases is the Little Oh Calculus on version (a) which is the one on Wikipedia, as the task in Open Problem (4) is more comprehensive than the one in the preceding theorem, which is a significantly easier task than that of Problem (4) and this task also does not run in the given time bound provided version (a) of the Little Oh Calculus is chosen. Furthermore, the result shows that there is a real chance that the answer of problems (3) and (4) might actually depend on the version of the Little Oh Calculus chosen, so it could go either in this or in that direction. But this “actual chance” is not yet converted into a proof, but only indicated as a possibility. On one hand, disproving the exsistance of an algorithm for verion (b) is much harder than for version (a) and on the other hand, algorithms confirming that the answer would be “yes” (as in the case of Problem (2)) are not in sight.

The proof of Theorem 4 also showed that, for constant , a register machine with registers needs steps proportional to the number of bits of the input numbers to recall one out of read inputs which have all the same length; in contrast to this, the machine with registers can recall any of inputs by storing them in the first registers and using the -th to read the index of the recalled number. Similarly for sorting constantly many numbers, the time depends on the length of the numbers only in the case that the number of registers is below the number of inputs. This shows that when considering the asymptotically fastest machines only, the number of registers forms a proper hierarchy.

5 Regular Languages and Automatic Functions

[Automatic Functions and Regular Sets] There is a close relation between what can be computed by register machines with few registers and regular sets and automatic functions. Here a function from words to words is called automatic iff there is a dfa (deterministic finite automaton) which can recognise whether the input and suggested output match; here the dfa reads both and

at the same speed, symbol by symbol. Furthermore, it is assumed that the input and output are padded with leading zeroes to get that both numbers have the same length, that is, they are aligned at the back to make the corresponding digits match (like in the school-book algorithm for adding decimal numbers). The more frequent way is to align at the front, but then one has to write all numbers backward in order to avoid problems; in the present work, it is preferred to write numbers the usual way and to align at the back.

For a number , let be the -ary sequence of its digits. A set of numbers is regular iff there is a dfa (deterministic finite automaton) recognising for some and a function is automatic iff there are such that the mapping is automatic as a function from words to words. Here, it is always assumed that and . The positive result that automatic functions can be computed by register machines with four registers can even handle the case that ; however, the choice of requires four registers if at least one of them is not a power of two, otherwise three are sufficient. For the result that every function computed by a machine with one register is automatic, is required, but any works.

Note that there is a Turing machine model for computing automatic functions on one-tape Turing machines with one fixed starting position: Here a function is automatic iff a deterministic Turing machine can compute in linear time from such that new output starts at the same position as the old input . This result also holds when nondeterminsistic machines are used in place of deterministic ones [7].

The first use of the concept of automatic functions and structures dates back to Büchi’s work on the decidability of Presburger arithmetic [5, 6]. The notions were formally introduced by Hodgson [15, 16] and, independently, by Khoussainov and Nerode [19]. Grädel [11] provides the most recent of several surveys in the field [18, 24]. The natural numbers with addition, subtraction, comparison and multiplication by constants forms an automatic structure in which each of these operations is realised by an automatic function, see, for example, [17].

[Compact Writing of Repetative Commands] Multiple identical or similar operations like

  • ; ; ;

  • ; ;

  • ; ; ;

will be abbreviated as

  • ;

  • ;

  • ;

with the understanding that and above are constant and that this is only done if the result of the first operation goes into the next operation as above in the case that several operations are in a block. Multiplication with constants is only possible if this constant is a power of two, as otherwise the bound on the number of registers on the right side of the assignment is compromised; Floyd and Knuth had there always two of them and therefore would need an additional register with ; . However, ; can be realised as ; ; and so while is not permitted in the programs below, is permitted with being different registers. In summary, assignments of the form

  • ;

are allowed provided that are integer constants and, furthermore, is either or a power of or times a power of and are registers different from (there might be more or less than of these registers).

[Variables and Constant-ranged operations] The following conventions simplify the writing of programs below.

  • The instruction where is a constant or a variable (which has constant range); here one needs one additional register and just let and executes times . If is a power of two or for an instruction of type no additional register is needed.

  • If then one can by a sequence of instructions load the value of into the variable and replaces by the remainder of . This is done by initially having and then copies of the operation:

    if then begin let ; let end;

    This command is written as .

  • Let be the transition function of the deterministic finite automaton (dfa) recognising the regular language. In the program below, we can make different copies of the program for the constantly many possible values of the bounded variables and . When there is a need to update the values of in instruction 3, the program just jumps to the corresponding copy/instruction. Thus, we do not count as needing registers.

These methods will be used in several of the programs.

A register machine with three registers can check in linear time whether the -ary representation of a number is a member of a given regular language, where is constant. In the case that is a power of two, only two registers are needed.

Proof.

Without loss of generality one assumes that the input satisfies . The three registers are where holds initially the input and when entering line 2 of the algorithm below, the value (as a -ary number) which is a coding digit appended to separate out when the trailing zeroes of the input end and the new zeroes start which the algorithm appends in subsequent steps. The register is initialised as and after line 2 holds the value , using the convention that either or . The register is only used to multiply numbers with the constant and is needed if is not a power of . Furthermore one holds two variables with constant range, these are for the current symbol and ranges over the possible states of a dfa recognising the language with constant being the start state and being the set of accepting states; membership in can be looked up in a table using the value as an input. The dfa is considered to process the number from the first digit to the last digit and without loss of generality one can assume that and that the dfa never returns to after seeing some other digit — this is to deal with leading zeroes or the zero itself so that the start is accepting iff is in the given regular language. The basic algorithm is the following:

  1. Begin read ; let ; let ; let ;

  2. If then goto 3
    else begin let using ; goto 2 end;

  3. Let using ;
    if then begin let ;
       let ; goto 3 end;

  4. If then write else write End.

The first line reads an input being in the -ary number system, initialises as and appends a digit at the input obtaining . Afterwards in line , is multiplied with until it has one digit more than , note that if the input is and if if the input is positive. An -digit number is at most and therefore after line .

The loop invariants of line 3 after iterations of the loop, at the start of line 3, are that is times and the digit is at the position of the power and is the state where in the case that . Here, for every word , is the state in which the dfa is provided that it was first in state and then read the digits of the word .

The loop starts with multiplying by which makes makes the digit going at the position of . If then the trailing goes into the position of and the loop terminates with which is the correct state of the dfa after reading the -ary representation of the full number. If the orignal input is then , the loop is skipped and is the start state as required.

If then the loop body determines the value of the variable as the smallest number such that, for the current value of , . At the same time, is subtracted times from . See Remark 5 for more details. Thus after this operation, has the value times and has the value . After that is updated: Using the precondition that before the update, is updated to

and so the loop invariant is again true after rounds of the loop body.

This completes the verification of what is done in Line 3. Line 4 is just the output and the look-up whether is an accepting state is trivial.

The program runs in time linear in the number of -ary digits where the constant factor depends on ; the loop in line increases by factor until and the loop body is gone through times (as had been multiplied with in line 1). The loop in line moves -ary digit by digit out at the top until , the latter happens as the last -ary digit is and is a power of ; the loop body is executed times, that is, the simulation of the dfa reads -ary digits and then checks whether the obtained state is accepting.

Note that the register was only needed to multiply with without overwriting one of the registers and . If for some , then one does not need this extra register. In that case, multiplying a register with can be replaced by commands which double up that register. ∎

One-tape Turing machine constructions running for steps can be simulated in steps with three registers in the case that input and output is binary (or has a base of power of two like octal and hexadecimal numbers) and with four registers in the case that input and output are -ary and -ary for aribtrary but fixed .

Proof.

The basic idea is a two-stack simulation of the one-tape Turing machine which here is assumed to have to both sides arbitrarily much space which initially is empty. Each side is in one stack represented by one register plus a third register to indicate the top position. Whenever the border is reached (that is, the bottom of the corresponding stack), a tape symbol for empty space is created. The Turing machine simulation starts at the lower order position of the tape and three symbols will be held under the Turing machine head. The middle one is always under the head so that no bit for marking the head position is needed. The Turing machine has a move parameter which has the the choices “left”, “stay”, “right”, “halt” and the register on the left side is called and that on the right side is called . When the simulation starts, the recoded input is on the right side plus bottom symbol in and the head stands on an empty symbol and the situation is reverse when the machine halts, the output is on the left side and head stands on and is the top symbol of the output. The input and output have to be recoded accordingly. Recoding of input and simulations of the Turing step increase the position of the point and the size of the numbers representing the tape halves by a constant factor. The constants , and denote the tape symbols for , and empty space; is the corresponding state of the Turing machine. In order to recognise when the last bit of an input has been read or when a half tape is empty, a bottom symbol of value is there in both cases and then if the value of the input or half-tape equals the value of the pointer then one knows that the corresponding input or half-tape is empty. The program is as follows.

  1. Begin read ; let ; let ;

  2. If then goto 3;
    let ; goto 2;

  3. Let ; let ; let ; let ; let ;

  4. If then goto 5; let ;
    if then begin ; end else begin end;
    let ; let ; goto 4;

  5. Let ; let ;

  6. Let ; if then goto 7;
    if then begin ; ; ;
     if then begin let ; let end
     else let ;
     let ; let end;
    if then begin let ; let ; let ;
     if then begin let ; let end
     else let ;
     let ; let end;
    goto 6;

  7. Let ;

  8. If then let ;
    if then let ;
    if or then goto 9;
    ; if then goto 9;
    ; goto 8;

  9. Write ; End.

The input/output conventions are such that the program can be written the most simple way. Other input/output conventions (head on the other side of input or output) would require scrolling over input/output what is here left to the Turing machine program. Additional time is for input and for the output. Line 2 chooses the right value of ; line 4 codes the input into the Turing tape half ; line 6 simulates the Turing machine and line 8 codes the output in the Turing tape half into . ∎

Case, Jain, Seah and Stephan [7] showed that one can compute every automatic function with one input length in steps on a one-tape Turing machine where the input and the later produced output start at the same position. This criterion is even “if and only if”. Additional scrolling of the result or the original input as needed for the Theorem 5 is also steps. Using this result, one gets the following corollary.

The output of an automatic function with a number interpreted as an -ary sequence of digits on the input to an output interpreted as an -ary sequence of digits can be computed by a register machine in linear time using four registers; if and are both powers of two then only three registers are needed.

Automatic functions with more than one input can be implemented by a register machine with registers.

Proof.

The idea is to form the convolution of constantly many numbers and feed this combined input into the automatic function. If two inputs are given as -ary and -ary numbers, one forms an -ary number whose digits are, roughly spoken, pairs of the corresponding -ary and -ary digits at the same position.

  1. Read first input and translate this input with an automatic function this -ary number into an -ary number using the same digits, let denote the register holding this number. This translation needs by Theorem 5 four registers, being one of them. This is like translating the binary number (thirteen) into the decimal number (one thousand one hundred and one). Four registers are needed for this.

  2. Read the second input and process it using Theorem 5 with the four registers besides (which is not modified) and let denote the register which holds the result. This number was originally in base and is now in base but has the same digits. For example, the quinary number (three hundred thirty eight) becomes the decimal number (two thousand three hundred twenty three).

  3. Add in total times to . For example, plus plus becomes . This number is now the convolution of both numbers, as the first represents the pair , the next represents the pair , the next represents the pair and the last again represents the pair .

  4. Map with an automatic function this -ary number in to a -ary number where is the output alphabet of the given autoamtic function. Note that every automatic function with constantly many inputs can be viewed as an automatic function from the corresponding convolution as a single input to the output.

If there are three inputs using -adic, -adic and -adic numbers, the automatic functions in the steps before evaluating the main function translate the -adic, -adic and -adic digits each into -adic digits. Steps 2 and 3 are repeated for reading and processing the third input, one adds the corresponding then times to . The so obtained convolution is then mapped to the -adic number according to the given automatic function to be implemented as done in the last step of above algorithm. Analogously one handles even larger amount of inputs. Note that for automatic functions, the number of inputs is constant, so there are no loops which create convolutions with an unforseeable large base.

As the algorithm consits mainly of executing a constant amount of automatic functions, it has linear time complexity (where the parameter is the longest number of digits of an input) and the constant factor in the linear term depends not only on the number of inputs but also on the values of in the simulation of the automatic functions. Due to the storage of the so far constructed part of the convolution, the simulation needs one register, namely , more than the worst case for the simulation of the corresponding automatic function with one input. If all bases involved are powers of two, the whole algorithm needs only four registers, see Theorem 5 for more details. ∎

If a register program has only one register, one read-statement at the beginning and one write-statement at the end then it computes an automatic function which is of the following special form:

There are integer constants such that are powers of and satisfying the following:

  • Either for all
    or for all
    or for all ;

  • Either for all
    or for all
    or for all .

In the second and third line of each item, the computation time is constant and in the first line it is either constant or exponential or nonterminating in the number of binary digits to represent ; furthermore, means that either is undefined on both inputs (the nonterminating case) or are both defined and equal.

Proof.

Note that if there are two different periods for and then one can take and . Thus the theorem can be formulated with just one and one .

Given a program which uses only one register, one transforms this program into a normal form with the following steps:

One introduces a variable which takes over the sign of and has, after this is done, that is atleast throughout the program (it is easy to see how to adjust it); furthermore one replaces statements of the form “let ” by . A statement of the form “let ;” where is a constant is replaced by “let ”, the reason is that the actual value of is and if is , then one has to adjust all the comparisons with constants and the addition of constants accordingly.

Furthermore, after adjusting the constants, if a statement is “let ” then one replaces this by the statements

If then begin let ; goto line ;
If then begin l