On Universal Codes for Integers: Wallace Tree, Elias Omega and Variations

by   Lloyd Allison, et al.
Monash University

A universal code for the (positive) integers can be used to store or compress a sequence of integers. Every universal code implies a probability distribution on integers. This implied distribution may be a reasonable choice when the true distribution of a source of integers is unknown. Wallace Tree Code (WTC) is a universal code for integers based on binary trees. We give the encoding and decoding routines for WTC and analyse the properties of the code in comparison to two well-known codes, the Fibonacci and Elias omega codes. Some improvements on the Elias omega code are also described and examined.



page 1

page 2

page 3

page 4


General form of almost instantaneous fixed-to-variable-length codes and optimal code tree construction

A general class of the almost instantaneous fixed-to-variable-length (AI...

Gopala-Hemachandra codes revisited

Gopala-Hemachandra codes are a variation of the Fibonacci universal code...

Hypersuccinct Trees – New universal tree source codes for optimal compressed tree data structures

We present a new universal source code for unlabeled binary and ordinal ...

Universal codes in the shared-randomness model for channels with general distortion capabilities

We put forth new models for universal channel coding. Unlike standard co...

On some properties of random and pseudorandom codes

We describe some pseudorandom properties of binary linear codes achievin...

A Tighter Upper Bound of the Expansion Factor for Universal Coding of Integers and Its Code Constructions

In entropy coding, universal coding of integers (UCI) is a binary univer...

On a Class of Markov Order Estimators Based on PPM and Other Universal Codes

We investigate a class of estimators of the Markov order for stationary ...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Universal codes for the positive integers,

, are of interest for at least three reasons. The first is in everyday data compression where such a code can be used to store or to transmit a sequence of integers when their true probability distribution is not known. The second is in inductive inference, where a countable set of hypotheses in a statistics and machine learning task is mapped to the set of positive integers: If the true distribution of the set of hypotheses in an inference problem is unknown but they can be plausibly ordered in non-increasing probability, then the

hypothesis can be assigned the probability , where is the code-word of integer , whose length is represented as . Finally, it must also be admitted that there is simply a fascination in trying to devise an efficient code for truly enormous integers.

Elias [1] defined a code having the universal property as one where the code-word length is monotonically increasing and “assigning messages in order of decreasing probability to codewords in order of increasing length gives an average code-word length, for any message set with positive entropy, less than a constant times the optimal average codeword length for that source.” If the source has distribution and entropy

then, for any universal code for positive integers,

where is a constant independent of . The latter sum is at least finite, although the distribution implied by a universal code must itself have infinite entropy. Naturally we hope that is not large. Elias also defined an asymptotically optimal code as one where the ratio

where is a function of with

Wallace proposed a universal code for integers [2]

(WTC) inspired by binary trees. He suggested that its implied probability distribution is a reasonable choice to use when the true distribution of a source of integers is unknown. In the following sections, the WTC, Fibonacci and Elias omega codes are summarised. Encoding/Decoding routines and asymptotic analysis are given for WTC. The properties of the three codes, particularly the lengths of code-words, are compared. Ways of improving the Elias omega code for large integers are also discussed.

Note, if we have non-negative integers, a code for can be shifted and used for by adding one before encoding and subtracting one after decoding. If we have all of the integers to deal with, they can be ordered as , say, and can be encoded according to its position in the list. Encoding and decoding routines for the codes described can be found, and interactively experimented with, at www.allisons.org/ll/MML/Discrete/Universal/.

Ii Three universal codes

The Fibonacci and Elias omega codes for positive integers are described below for later comparison to the Wallace tree code (WTC) for positive integers. The main focus is in the Elias omega code and WTC; the Fibonacci code is included as a fixed point of comparison to the other codes.

Ii-a Fibonacci code for integers

The Fibonacci code [3] is based on the Fibonacci numbers , which are . The code ignores and uses and upwards. To encode an integer do as follows:

  • Find the largest that is less than or equal to and remember .

  • Repeat for , finding , and so on until zero remains.

We have . The code-word for consists of bits. If appears in the sum for set the bit of the code-word to ‘1’ otherwise set it to ‘0’. Set the last bit to ‘1’. It is easy to see that the encoding is well defined and that each . Note that “11” cannot appear in the code-word except at the very end and that a code-word of length contains at least two and up to ‘1’s.

Ii-B Elias omega () code for integers

We first introduce the following variation on the Elias omega code that gives identical code-word lengths to the original definition [1] and differs in only minor details. The code-word for an integer consists of one or more sections: zero or more length sections followed by one value section. The value section is simply the usual binary representation of in bits; note that the value section starts with a ‘1’.

The code-word for is simply “1”; it is the only code-word that starts with a ‘1’. The code-word for has at least one length section before the final value section. In general the value section by itself is not sufficient because a decoder does not know how long it is when . The solution is to first encode the length of the value section minus one (the length must be when , recursively, until the length of a length of of a length gives one.

The leading bit of each section would, on the face of it, be a ‘1’ so that position can instead be used as a flag to indicate either a length section (‘0’) or the final value section (‘1’). The decoder notes the flag. In the case of a ‘0’ it then switches it to a ‘1’ before computing the length of the next section. If present () the first length section is just “0” which stands for one. If the second length section is either “00” which stands for two or “01” which stands for three, and so on.

Note that the omega code is very similar to, and can be thought of as an optimized and shifted version of, the Levenstein code [4] which is defined for . Also note that Rissanen [5] defined the code as an approximation to the omega code; , all positive terms, where is a normalising constant, and .

function omega_r_enc(t_enc)
 { function enc(N)  // bigInt N >= 1
    { var todo=N, nSect, nTet, CW="";
      for( nTet = 1; ; nTet ++ )
       { for( nSect = 1; ; nSect ++ )
          { var section = todo.binary();
            var len = section.length;
            if( len == 1 ) break;
            // else trim section
            section =
            CW = section + CW;
            todo = N.fromInt(len-1);
          }//for nSect
         if( nSect == 1 ) break
         todo = N.fromInt(nSect-1);
       }//for nTet
      CW = t_enc(N.fromInt(nTet)) + CW;
      return CW;
   return enc;

// e.g. ...
function omega_star_enc(N)
   = omega_r_enc(omega_enc)(N);
Fig. 1: Encoder for omegar(t)(N) in JavaScript-styled pseudocode.

Ii-B1 The omega variations

The Elias omega code in effect uses a unary code (“0…” length section, “1…” final value section) to indicate the number () of sections in a code-word. Elias chose the name omega for the code because he considered it to be “penultimate”, that is “not quite ultimate” (p.200)[1]. (That being so, the name psi, say, would have left some room to name codes that are closer to the ultimate.) He noted that the unary code could be replaced with his gamma, delta or omega codes.

In fact the leading bits of the sections can be moved to the front of the code-word and the unary code can be replaced by any other code (universal or not) for positive integers – say Fibonacci or WTC.

Define omegap(s) to be the Elias omega code modified and parameterised to use an integer code ‘s’ for the number of sections. The code-word for is “1”. For the code-word is the omega code-word for with the leading bit of every section trimmed away, and the result preceded by the ‘s’ code-word for the number of sections. omegap(unary) is equivalent to the usual omega code. The code omegap(Fibonacci) would make code-words of one, two and three sections longer and those of six or more sections shorter than under omega. Let omega2 = omegap(omega) which uses the omega code for the number of sections. This code would make code-words of two, four and five sections longer and those of seven or more sections shorter than the usual Elias omega code.

Even omega2 is not ultimate. It is possible to define a code, omegar(t), that uses itself, recursively, to state the number of sections (minus one) in an omega code-word. Note that omegar() needs some other integer code to encode the number of tetrations, i.e. the number of times that omegar is applied. Let omega* = omegar(omega). For example, is encoded using omega* as follows (spaces added in code-words below for readability):

Encoded using omega(3)

where removes the leading bit of each length and value section of recursively applied omega code-words.

Comparing against the omega code, omega* would make code-words of integers with nine or more sections shorter than under omega and the smallest corresponding integer is in Knuth’s up-arrow notation. An encoder for omegar is given in  Fig. 1.

Even omega* is not ultimate because one can reconsider the encoding of the number of tetrations but the integers for which there would be any further improvement would be “very large” indeed.

Ii-C Wallace tree code (WTC) for integers

The Wallace tree code for integers [2][6] are based on full binary trees, and hence depend on the Catalan numbers , with the first few numbers being and so on.

Any full binary tree consists of fork (i.e. internal) nodes and leaf nodes. Each fork node has exactly two sub-trees and each leaf node has zero sub-trees.

The number of full binary trees containing fork nodes is the Catalan number, defined as:

This yields the recurrence . Furthermore, and [7]. Finally, it is useful to define the cumulative Catalan numbers , which are and so on.

Fig. 2: Tree having code-word 1101000

As a preliminary matter, note that a full binary tree can be encoded [8] during a prefix traversal of the tree, outputting a ‘1’ for each fork and a ‘0’ for each leaf (e.g. see  Fig. 2). The end of the tree’s code-word is indicated upon reaching one more ‘0’ than ‘1’s, and this event does not happen earlier within the code-word so this is a prefix code. It is easy to recover the tree from the code-word.

Fig. 3: Three Dyck paths.

Reading a ‘1’ as left and a ‘0’ as down, a tree’s code-word can also be interpreted as encoding a Dyck path [9] in a square lattice from some, initially unknown, point on the diagonal to , that is, a zig-zag path that does not go below the diagonal until it terminates with the final ‘0’. Fig. 3 shows three of the five paths from to with their code-words including the lexicographically first ‘1010100’ and last ‘1110000’. The number of paths (Fig. 4) from row , column , to is given by


It can be shown that .

6 1 6 20 48 90 132 132 0
5 1 5 14 28 42 42 0
4 1 4 9 14 14 0
3 1 3 5 5 0
2 1 2 2 0
1 1 1 0
0 1 0
0 1 2 3 4 5 6
Fig. 4:

The code for integers is most easily explained for integers ; call this version of the code WTC0. The full binary trees are sorted on their code-word lengths and, for a given length, lexicographically. For a given length, the first code-word is of the form and the last . Integer is given the code-word of the full binary tree (in the lexicographic order of full binary trees, counting from zero).

Encoding and decoding routines are given in  Fig. 5. The code-word for is “0”. For , the integers in the range , all have code-words of length .  can be found by searching for the largest cumulative Catalan number, , that is less than . The code-word for is the of the lexicographically ordered code-words of length where . The code-word can be found using . Starting with an empty code-word at position where . If then append a ‘1’ (move left, ) to the code-word and we need a code-word at least that much further up the rankings (). Otherwise, append a ‘0’ (move down, ). Repeat until .

function WTC0enc(N)
 { if( N.isZero() ) return "0";
   var f=cCsearch(N); //min f st cC(f)>N
   var K=N.sub(cCatalan(f-1));
   var r=f, c=f, ans="";
   while( r > 0 )
    { var Decr = paths(r-1, c);
      if( K.GE( Decr ) )
           { ans = ans + "1"; c -- ;
             K = K.sub( Decr ); }
      else { ans = ans + "0"; r -- ; }
   return ans + "0";

function WTC0dec(str)
 { // assumes str is a valid code-word
   if( str == "0" ) return Zero;
   var i, f = Math.floor(str.length/2);
   var r = f, c = f, Ans = cCatalan(f-1);
   for( i = 0; i < str.length; i ++ )
      if( str.charAt(i) == "0" ) r -- ;
      else /* "1" */
       { Ans = Ans.plus( paths(r-1, c) );
         c -- ;
   return Ans;
Fig. 5: Encoder and decoder for WTC0 for all integers .

The decoding routine (also in Fig. 5) follows similar logic to the encoding routine. A valid code-word, , contains ‘1’s and ‘0’s and no proper prefix contains more ‘0’s than ‘1’s. As is processed from left to right, every ‘1’ (move left) means that is known to be at least further up in rank amongst the code-words of this length. Repeat until the end of the code-word.

As noted before, it is easy to “shift” WTC0 code (which encode integers ) to instead encode integers . Call the shifted code WTC1 if we need to distinguish between it and WTC0.

Ii-D Examples

Table I gives examples of integers coded under the three codes. Note that the lengths of Fibonacci code-words increase in steps of one from time to time as grows. The lengths of WTC1 code-words increase in steps of two. Lengths in the Elias omega code increase in steps of various sizes, for example increasing by four on going from to when the value section grows by one and a whole new length section is added; there is no upper limit to the step size.

N Fib Elias WTC1
1 11 1 0
2 011 010 100
3 0011 011 10100
4 1011 000100 11000
5 00011 000101 1010100
6 10011 000110 1011000
7 01011 000111 1100100
8 000011 0011000 1101000
9 100011 0011001 1110000
10 010011 0011010 101010100
11 001011 0011011 101011000
12 101011 0011100 101100100
13 0000011 0011101 101101000
14 1000011 0011110 101110000
15 0100011 0011111 110010100
16 0010011 00000010000 110011000
17 1010011 00000010001 110100100
18 0001011 00000010010 110101000
19 1001011 00000010011 110110000
20 0101011 00000010100 111000100
21 00000011 00000010101 111001000
22 10000011 00000010110 111010000
23 01000011 00000010111 111100000
24 00100011 00000011000 10101010100
100 00101000011 0000101100100 1011101001000
Below code-word lengths are shown, instead of code-words

where googol, = code-word length (in bits)

TABLE I: Examples of code-words

Iii Implied probability distributions

An efficient code for the positive integers implies a probability distribution on them in which . The Fibonacci, Elias omega and Wallace tree (WTC1) codes all imply proper probability distributions on the positive integers: Consider an infinite string of bits generated at random, independent and identically distributed (i.i.d.), with . For each code, the infinite string has some prefix, of length and probability , which is a valid code-word in that code. In principle, by removing the prefix and repeating forever, all possible code-words will be sampled in proportion to their probabilities under the corresponding distribution.

Iii-a Fibonacci:

  • There must be a first occurrence of “11” in the infinite string. It marks the end of a prefix of length which is a valid code-word in the Fibonacci code. The probability of the prefix is . There are positions before the final “11”. Each of these position can hold ‘0’ or ‘1’ but no two adjacent positions, other than the last, can both hold ‘1’.

  • There are code-words of length , (a code-word of length can be formed by prepending one of length with a ‘0’, or prepending one of length with a “10”).

  • The total probability of those integers having code-words of length is .

  • The total probability of all positive integers, , must be one.

Iii-B Elias omega:

  • An Elias code-word is made up of zero or more length sections followed by one value section. The infinite string of bits starts either ‘1’ or ‘0’. If it starts ‘1’, that itself is a prefix which is an Elias code-word for the value one, of probability . If it starts ‘0’ that is decoded as (the lead bit having been changed) which indicates that a section of length follows. If the next section starts ‘1’ it is a value. If it starts ‘0’ it is a length, “00” giving or “01” giving . And so on.

  • Eventually a section starting ‘1’, of length , will appear marking the end of a code-word of length .

  • There are code-words of length .

Iii-C Wtc:

  • Because a one-dimensional random walk (‘0’   left, ‘1’   right, say) returns to the origin with probability one, the infinite string has some prefix of length , that is a valid WTC1 code-word. The probability of the prefix is .

  • There are code-words of length .

  • The total probability of those integers having code-words of length is .

  • The total probability of all positive integers, , must be one.

Table II gives cumulative probabilities for integers having code-words upto a given length. Note that an Elias omega code-word of bits needs at least six sections, the maximum lengths of sections being . The probability of having five or fewer sections is , slightly less than 0.9692.

(bits) Fibonacci Elias WTC1
1 0 0.5 0.5
2 0.25 0.5 0.5
3 0.375 0.75 0.625
4 0.5 0.75 0.625
10 0.859 0.875 0.754
100 0.999… 0.947 0.920
1000 0.999… 0.957 0.975
10000 0.999… 0.963 0.992
100000 0.999… 0.9688 0.997
1000000 0.999… 0.9692 0.9992

TABLE II: Cumulative probabilities up to code-word length .

Iv Comparative code-word lengths

For integers less than the “lead”, in the sense of having the shortest code-words, changes hands between the three codes but is often held by the Fibonacci code (table III) until where it falls out of contention. Beyond that, and up to the decimal digit integer corresponding to , WTC has the shortest code-words, rarely equalled by the Elias omega code (i.e., for values between and , both codes using bits). To contextualize these numbers, this is past the size of the human genome (

base-pairs), the estimated number of baryons in the universe (

) and one googol (). Therefore beyond some further point the Elias omega code must take the lead either permanently or at least most of the time – see section VI.

N Fibonacci Elias WTC1
1 2 1* 1*
2 3* 3* 3*
3 4 3* 5
4 4* 6 5
13 7* 7* 9
16 7* 11 9
610 15* 17 15*
627 15* 17 17
1597 17* 18 17*
2057 17* 19 19
4181 19* 20 19*
6765 20 20 19*
6919 20* 20* 21
8192 20* 21 21
10946 21* 21* 21*
16384 21* 22 21*
17711 22 22 21*
23715 22* 22* 23
28657 23 22* 23
32768 23* 23* 23*
46368 24 23* 23*
65536 24 28 23*
82501 25* 28 25*

TABLE III: Code-word lengths (in bits) for varying (chosen to highlight early points of change) across Fibonacci, Elias and WTC1. Shortest code-word lengths for a given are asterisked

For each of the three codes, integers come in “blocks” that contain integers having code-words of the same length under that code. The sizes of the blocks differ between the codes. For WTC, the blocks are having code-lengths bits respectively. For and up to at least , and .

  1. The smallest such that is . The code-length is bits.

  2. The smallest such that is . The code-lengths are and bits, respectively.

  3. The smallest such that is . The code-lengths are and bits, respectively. (There are larger where .)

V Robustness

If a code is used for storing or transmitting a sequence of integers, as opposed to calculating entropy, the effect of errors may be of interest. The addition of extra error-correcting mechanisms is not covered here. Consider a sequence of integers, , encoded in each of the codes and imagine the consequences of a bit being flipped in error.

In the Fibonacci code, switching a ‘1’ to a ‘0’ in the code-word of causes the value of to be misread unless it is one of the last two ‘1’s. In the latter case the end of is not detected correctly and it absorbs some or all of the code-word of depending on which bit is flipped and on whether or not starts with a ‘1’. If a ‘0’ is flipped to a ‘1’ and this is next to a genuine ‘1’, is taken to end prematurely and an extra integer is apparently inserted. A single bit error affects one or two integers and may cause an error in indexing (Fig. 6). For example:

100011 011 ... = 9 2 ... but
000011 011 ... = 8 2 ...
100001 011 ... = 48 ...
100010 011 ... = 43 ...
101011 011 ... = 12 2 ...
100111 011 ... = 6 4 ... and
100011 1011 ... = 9 4 ... but
100111 1011 ... = 4 1 2 ...

(Spaces for readability only; flipped bits are underlined.)

Fig. 6: Example errors and Fibonacci

In the Elias omega code, switching a bit of ’s value section causes the value to be misread unless it is the section’s leading ‘1’ that becomes ‘0’. In the latter case the section is taken to be a length and parts of one or more following code-words are mistaken as parts of . If a bit in a length section of is flipped, that length is misread and too little or too much is taken for the next section unless it is the lead ‘0’ that becomes a ‘1’. In that case the length section is taken to be the value section, is taken to end prematurely, and the rest of ’s code-word causes further mistakes. A single bit error may affect one or many integers.

In the Wallace tree code, changing a ‘1’ to a ‘0’ in causes a premature end to the code-word (not necessarily at the change) and the remainder is taken as two extra integers. Changing a ‘0’ to a ‘1’ causes to absorb and . A single bit error affects one or three integers and causes an error in indexing (fig.7). For example:

  • 10100 11000 100 ... = 3 4 2 ... but

  • 10000 11000 100 ... = 2 1 1 4 2 ...

  • 10110 11000 100 ... = 90 ...

Fig. 7: Example errors and WTC1

Vi Asymptotic Analysis of Wallace tree code

It is of interest to determine the asymptotic behaviour of WTC for increasing integer . Let denote the length of the binary code-word assigned to integer by WTC1. Recall from Section II-C that denotes the Catalan number, and denotes the sum of the first Catalan numbers. Then, the length, in bits, of the code-word assigned by WTC1 to integer is



denotes the smallest integer such that exceeds .

Vi-a Bounds on

The following lemma provides appropriate upper and lower bounds for .

Lemma 1. Let denote the length of the code-word assigned to integer by WTC defined by (2). Let



Then, for all , we have

These bounds allow us to determine the asymptotic behaviour of the length of WTC1 code-words.




The proofs of Lemma 1 and Theorem 1 are deferred to Appendix A. Complementing the comparisons in section IV, an interesting consequence of Theorem 1 is that there exists some integer such that , , although the precise is unknown and likely inconceivably large.

Theorem 1 can be used to demonstrate both the universality and asymptotic optimality of WTC, building on Elias [1]. To achieve this we first note that the Elias delta code [1], which has asymptotic code-length , is both universal and asymptotically optimal. From Theorem 1 it is clear that code-words of WTC is asymptotically shorter than those of Elias delta code. This establishes the universality and asymptotic optimality of WTC.

Fig. 8: Comparison of exact Wallace-tree code-lengths against upper and lower bounds derived in Lemma 1.

Vi-B Asymptotic code-length formulas for WTC

It is useful to have a simple expression for the code-word lengths of integers under WTC. The requirement for such lengths arises in inductive inference by minimum encoding. It is common to use Rissanen’s code-word length formula to provide an approximate length for the statement of integer parameters. WTC provides an alternative coding scheme in such settings. Theorem 1 suggests the approximate code-word length formula for WTC as:


where is a constant. Possible choices for are:

  • , based on the upper-bound on in Theorem 1, which ensures is non-decreasing;

  • , based on the lower-bound on ; or

  • , which is the average of the two error bounds.

The accuracy of both the bounds given by Lemma 1 and the asymptotic expression (7) is demonstrated in Fig. 8. The figure shows a close correspondence between the asymptotic expression (4) and the exact code-length (2), particularly as increases.

Vii Conclusions

The Wallace tree code (WTC1) for positive integers has shorter code-words than the Elias omega (and Fibonacci) codes for most integers upto at least . Code-word length increases in steps of two from time to time as increases. When using the code to store or transmit a sequence of integers, the effect of a bit error is localised. A formula for the approximate code-word length was derived in section VI-B.

We note that there is a second recursive version of the code, . It has the same code-word lengths as WTC but is based on a non-lexicographical ordering of code-words: For code-words of length , consider all partitions of into and such that . Order code-words of the form ‘1’++++, where sub-code-words and , on and within that on , recursively.

As discussed in section II-B1, the standard Elias omega code in effect uses a unary code (“0…” length section, “1…” final value section) to indicate the number () of sections in a code-word. This unary code can be replaced by another code for positive integers, even recursively, giving the omega* code which is more efficient for huge values.

Importantly, ordering the various codes discussed on increasing asymptotic efficiency gives: Fibonacci, Elias delta, Wallace WTC, Elias omega, omega2 (sec.II-B1), and omega* (all but Fibonacci are asymptotically optimal in the sense of Elias [1]).


The authors would like to thank the late Chris Wallace ().

Appendix A Proof of Lemma 1

The basic approach that we use is to lower and upper-bound the function with two new functions and , respectively. To do this, we find lower and upper-bounds, say and that are continuous in , and solve both and for . We first derive the upper-bound . Our starting point is the following lower-bound on established by Topley [10]:


Setting the right-hand-side of (8) to and taking logarithms of both sides yields


We wish to solve the above equation for , but a closed form solution does not exist due to the troublesome logarithmic term. Instead, we can use the bounds

, where the last step is the result of a first order Taylor series expansion of around the point , the convexity of ensuring that the inequality holds. Using this in (9) yields the following lower-bound for :

Solving for yields

The above lower-bound holds for any value of , but by a judicious choice of it can be tightened. Ignoring terms of order in (9) and solving for yields an initial guess at of . Using this in yields (3), which satisfies for all . Substituting for in (2) yields the upper-bound.

We now derive . We start with the following upper-bound on


Setting the right-hand-side of (10) equal to and taking logarithms of both sides yields


As before, solving (11) directly for is impossible due to the logarithmic term. Instead we note that we can use the bounds

for and , along with the fact that to derive the following upper-bound for :

We now note due to the strictly increasing nature of that if then the solution of will satisfy . We therefore choose , which we previously established is an upper-bound to for , and solve for , yielding

which satisfies for all . Substituting for in (2) completes the proof.

Appendix B Proof of Theorem 1


and let . If we rewrite as

then a straightforward application of L’Hopital’s rule shows that

which itself implies (5) if we note that (by application of Lemma 1). Similarly, by tedious algebra we can show that

which implies (6) as , completing the proof.