 # Completely uniformly distributed sequences based on de Bruijn sequences

We study a construction published by Donald Knuth in 1965 yielding a completely uniformly distributed sequence of real numbers. Knuth's work is based on de Bruijn sequences of increasing orders and alphabet sizes, which grow exponentially in each of the successive segments composing the generated sequence. In this work we present a similar albeit simpler construction using linearly increasing alphabet sizes, and give an elementary proof showing that the sequence it yields is also completely uniformly distributed. In addition, we present an alternative proof of the same result based on Weyl's criterion.

## Authors

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1. Introduction

Let be an infinite sequence of

-dimensional vectors of real numbers. For a given set

, we denote by the number of indexes such that lies in . A sequence of -dimensional vectors in the unit cube is uniformly distributed if for every set included in ,

 limN→∞1NA(¯x1,…,¯xN;I)=|I|,

where is the measure of .

For the theory of uniform distribution see the monograph of Kuipers and Niederreiter . Starting with a sequence of real numbers in the unit interval, we define for every integer the -dimensional sequence where for . The sequence is said to be completely uniformly distributed if for every integer , the sequence is uniformly distributed the -dimensional unit cube.

If a sequence is completely uniformly distributed then it also satisfies many other statistical properties common to all random sequences; see the work of Franklin . For example, for every fixed

the probability of

consecutive terms having any specific relative order is .

The first reference to the concept of complete uniform distribution is the work of Korobov in 1948 [7, 8] where he proves that the sequence where

 fα(z)=∑k≥1e−kαzk,with α∈(1,3)

is completely uniformly distributed. Since then, there have been many constructions of completely uniformly distributed sequences; see the books [9, 13] and the references cited there. In  Levin gives a lower bound of decay of discrepancy of completely uniformly distributed sequences and gives constructions, using nets obtained with Sobol matrices and using Halton sequences, which achieve this low discrepancy bound.

Independently of the school of Korobov, in 1965 Knuth  gives a construction of a completely uniformly distributed sequence of real numbers based on de Bruijn sequences of increasing orders and alphabet sizes. In each of the successive segments composing the sequence, the alphabet sizes of the underlying de Bruijn sequences increase exponentially. Knuth gives an elementary proof that the sequence is completely uniformly distributed by reasoning about the binary representation of the rational numbers . These binary representations are immediate because the alphabet size is always a power of .

In the present work we give a similar albeit simpler construction than the one given by Knuth in  while preserving the property of complete uniform distribution. Our construction is based on de Bruijn sequences of linearly increasing orders and linearly increasing alphabet sizes. The argument used by Knuth to prove that his sequence is completely uniformly distributed does not hold in our case. We give two proofs: one is elementary, and the other is based on Weyl’s criterion. The construction and the proofs we present here can be generalized in a straightforward manner to other growths of alphabet sizes. A first version of this work appears in .

It may be possible to construct completely uniformly distributed sequences with low discrepancy by using variants of de Bruijn sequences such as those considered by Becher and Carton in  (nested perfect necklaces). In that work, they show that the binary expansion of the real number defined by Levin in  using Pascal matrices is indeed the concatenation of such variants of de Bruijn sequences.

We finish this section with a short presentation of de Bruijn sequences. In Section 2 we present Knuth’s sequence, and we devote Section 3 to our construction.

### 1.1. De Bruijn Sequences

A presentation of de Bruijn sequences with a historical background can be read from . A -ary de Bruijn sequence of order  is a sequence of length  which, when viewed as a cycle, contains every possible -ary sequence of length  exactly once as a contiguous subsequence. For example, listed next are two distinct binary de Bruijn sequences of order :

 0,0,0,1,0,1,1,1; 0,0,0,1,1,1,0,1.

Every binary sequence of length appears exactly once as a contiguous subsequence in each example. This includes those instances such as which wrap around the right-hand end of the sequences.

Each -ary de Bruijn sequence of order  is the label of a Hamiltonian cycle in the directed graph where is the set of all -ary sequences of length  and . Due to good properties of de Bruijn graphs, each -ary de Bruijn sequence of order  can also be identified with the label of an Eulerian cycle in . By the BEST theorem there are exactly different -ary de Bruijn sequences of order .

A -ary Ford sequence of order , denoted by , is the lexicographically least -ary de Bruijn sequence of order . In the example above, the first sequence is the binary Ford sequence of order , or . In , Fredricksen and Maiorana introduce an algorithm for generating the -ary Ford sequence of order , which has constant amortized running time, see .

## 2. Knuth’s Sequence

We follow Knuth’s presentation in . Knuth’s sequence can be defined from any given family of de Bruijn sequences. Here, we use Ford sequences because they are conveniently defined and because they can be generated efficiently.

###### Definition.

Given , let denote the -ary Ford sequence of order . An -sequence of order , denoted by , is the finite sequence of rational numbers obtained by dividing every term in by :

 A(n)=f12n,f22n,…,f2n22n=(fi2n)i=1,..,2n2.

A -sequence of order , denoted by , is the sequence obtained from the concatenation of copies of :

 B(n)=⟨A(n);A(n);…;A(n)n22n times⟩.

By construction, the size of is

 |A(n)|=|F(2n,n)|=2n2

and the size of is

 |B(n)|=n22n|A(n)|=n22n2n2.

Notice that, for any given , all terms in and in are numbers in the set . For example, when :

 F(4,2)=0,0,1,0,2,0,3,1,1,2,1,3,2,2,3,3 A(2)=04,04,14,04,24,04,34,14,14,24,14,34,24,24,34,34 B(2)=⟨A(2);…;A(2)32 times% ⟩=04,04,…,34,34A(2),…,04,04,…,34,34A(2)

and , .

###### Definition.

Knuth’s sequence, denoted by , is the infinite sequence of rational numbers resulting from the concatenation of the sequences for :

 K=⟨B(1);B(2);B(3);…⟩.
###### Theorem (Knuth 1965, , page 268).

The sequence is completely uniformly distributed.

Knuth provides an elementary proof of this theorem. Two choices in Knuth’s construction play an important role in his proof. One is that the number of repetitions of each -sequence within a -sequence is . It follows from Knuth’s proof that to ensure complete uniform distribution it is sufficient that the number of repetitions of each -sequence within a -sequence grows faster than . For instance, suffices as well.

The other choice in Knuth’s construction is that the alphabet sizes of the Ford sequences grow exponentially as . This allows us to reason about the rational numbers comprised in the sequence in terms of their binary representations. Knuth considers the distribution of the most-significant bits of each of the terms in a -ary Ford sequence, where . See [6, Lemma 2].

## 3. Main result

We now present the main contribution of this work, which is a variant of Knuth’s construction based on Ford sequences with linearly increasing alphabet sizes.

###### Definition.

Given and a function , let denote the -ary Ford sequence of order . A -sequence of order , denoted by , is the finite sequence of rational numbers obtained by dividing each of the terms in by :

 C(n)=f1n,f2n,…,fnnn=(fin)i=1,..,nn,

and a -sequence of order , denoted by , is the sequence obtained from the concatenation of consecutive copies of :

 D(n,t)=⟨C(n);C(n);…;C(n)t(n) times⟩.

The size of is

 |C(n)|=|F(n,n)|=nn,

and the size of is

 |D(n,t)|=t(n)|C(n)|=t(n)nn.

For any given , all terms in and in are numbers in the set . The key difference between the way -sequences are constructed when compared to -sequences from the previous section is that, as the order of the sequence grows, the alphabet size for the underlying Ford sequence grows linearly () rather than exponentially (). For example, when and :

 F(3,3)=0,0,0,1,0,0,2,0,1,1,0,1,2,0,2,1,0,2,2,1,1,1,2,1,2,2,2C(3)=03,03,03,13,03,03,23,03,13,13,03,13,23,03,23,13,03,23,23,13,13,13,23,13,23,23,23D(3,id)=⟨C(3);C(3);C(3)⟩=03,03,…,23,23C(3),03,03,…,23,23C(3),03,03,…,23,23C(3)

and , .

###### Definition.

Given , the sequence is the infinite sequence of real numbers obtained from the concatenation of the sequences for :

 L(t)=⟨D(1,t);D(2,t);D(3,t);…⟩.
###### Theorem 1.

If is a non-decreasing function and , then the sequence is completely uniformly distributed.

###### Example.

If , then:

 L(sq)=⟨C(1);C(2);…;C(2)4 copies;C(3);…;C(3)9 copies;…⟩,

and is completely uniformly distributed.

### 3.1. Proof of Theorem 1

Within the current section, let to be an arbitrary but fixed function. For convenience we write and in place of and . Consider a prefix of of length , denoted by . Let be the integers determined by such that:

 L1:N=⟨D(1);…;D(r−1);C(r);…;C(r)q times;C(r)1:p⟩

where and . Here, is the order of the rightmost, possibly incomplete -sequence present in . The number is the amount of complete -sequences of order appearing before the rightmost, possibly incomplete -sequence, while is the amount of terms in it. Thus,

 (1) N =r−1∑s=1|D(s)|+q|C(r)|+p=r−1∑s=1t(s)ss+qrr+p.

Let be a positive integer and let be a set such that . Let range freely over the natural numbers, and let the quantity denote the number of windows of of size starting at indices that belong to the set :

 νN=A(¯w1,…,¯wN;I),

where for .

Consider sufficiently large values of such that . This is always possible since is an unbounded, non-decreasing function of . We can decompose into four consecutive sections; namely, sequences , , and :

 L1:N =⟨S(1);S(2);S(3);S(4)⟩,       where S(1) =⟨D(1);D(2);…;D(k−1)⟩ S(2) =⟨D(k);D(k+1);…;D(r−1)⟩ S(3) =⟨C(r);…;C(r)q times⟩ S(4) =C(r)1:p.

Notice that and can potentially be empty, such as when or , respectively.

We denote the cumulative sums of the sizes of the sequences defined above as , and for . Now, we can similarly decompose into five parts. If we let

 ¯wi=(Li,…,Li+k−1)

for , then:

 νN =ν(1)N+ν(2)N+ν(3)N+ν(4)N+εb,       where ν(j)N =A(¯w1+nj−1,…,¯wnj−k+1;I)

for some .

For each , the quantity accounts for windows contained entirely within the sequence , and accounts for all windows crossing any of the three borders between the four sections. This is enough to account for all possible windows, since any given window is either entirely contained in some section, or it starts at a given section and ends at a subsequent one, thereby crossing a border.

We can rewrite the value of each in terms of windows over each sequence . If we let

 ¯s(j)i=(S(j)i,…,S(j)i+k−1)

for and , then:

 (2) ν(j)N=A(¯s(j)1,…,¯s(j)|S(j)|−k+1;I).

Before obtaining more precise expressions for these quantities, we state the following three technical propositions.

###### Proposition 2.

If and such that , then the number of integers from the set contained in is equal to for some .

###### Proof.

Since , there are exactly non-negative integers in the set for some . Similarly for , there are exactly non-negative integers in the set for some . The difference between these two quantities is equal to the number of non-negative integers contained in the set , which is . Observing that , and that all non-negative integers between and belong to the set , the proof is complete. ∎

###### Proposition 3.

If is a positive integer and and are two sequences of real numbers of length , then the product of their element-by-element sums can be expressed as follows:

###### Proof.

By induction on . First, notice that the property holds for :

Next, we see that the inductive step holds for any . First,

Since for every the value and is therefore even, we can add the factor to the product in the second term simply by raising the upper limit to . Similarly, in the fourth term we can add the factor to the product by raising the upper limit to and changing the limits in the sum to . This is true because adding to does not change the value of for any , but when the value

and is therefore odd. The third term can be rewritten as a similar product for a value of

, and substituting into the equation above:

which completes the proof. ∎

###### Proposition 4.

Given , the following holds:

 n∑i=1ii−1≤2nn−1.
###### Proof.

By induction on . The property holds for and :

 1∑i=1ii−1≤2, 2∑i=1ii−1≤4

and the inductive step holds for :

 n+1∑i=1ii−1=n∑i=1ii−1\small≤2nn−1\small by I. H.+(n+1)n≤nnn−1+(n+1)n≤2(n+1)n.

Therefore, the property holds for all . ∎

We now obtain an expression for the number of windows of a -sequence which are contained in the set . This is useful for evaluating .

###### Lemma 5.

Given a positive integer and a set where , let such that and consider the sequence as a cyclic sequence. If we let for , then the number of windows of size from the sequence that lie in is:

 A(¯c1,…,¯cnn;I)=nn|I|+nn−1(2k−1)ε.

for some .

###### Proof.

First, notice that any given window is contained in the set if and only if the following is true:

 ¯ci∈I⟺u1≤C(n)i

where and indices are taken modulo .

Since all terms in are numbers in the set , we multiply both sides of each inequality by , allowing us to reason about integers belonging to a Ford sequence instead of rational numbers. We obtain the following:

 ¯ci∈I⟺nu1≤F(n,n)i

According to Proposition 2, for each inequality above with there are exactly possible solutions in the set for some value . This yields a total of possible solutions to the system of inequalities. Each solution, when seen as an -ary sequence of length , appears exactly times as a contiguous subsequence in . This is true because there are ways of extending an -ary sequence of length to one of length and, by construction, each of these appears exactly once in when viewed as a cycle. Since ranges exactly once over each possible window of , then:

 (3) A(¯c1,…,¯cnn;I) =nn−kk∏d=1[n(vd−ud)+εd]=nnk∏d=1[(vd−ud)+εd/n].

Using Proposition 3, we can expand this into the following:

 (4) nnk∏d=1[(vd−ud)+εd/n] =nnk∏d=1(vd−ud) +nn∑2k−1j=1⎡⎣∏kd=1⎧⎨⎩(vd−ud)⌊j2d−1⌋is evenεd/notherwise⎫⎬⎭⎤⎦.

If we define for as:

 ε′j/n=∏kd=1⎧⎨⎩(vd−ud)⌊j2d−1⌋is evenεd/notherwise⎫⎬⎭

then, for each , the value . This is true because the product on the right-hand side is composed of terms and and, since , there is always at least one term of the second kind. Given that , we can further simplify equation (4) to get:

 (5) nnk∏d=1[(vd−ud)+εd/n]=nn|I|+nn2k−1∑j=1ε′j/n.

Finally, since , there exists some such that:

 ∑2k−1j=1ε′j=(2k−1)ε

and hence, by (3) and (5):

 A(¯c1,…,¯cnn;I)=nn|I|+nn−1(2k−1)ε.

We can now give the proof of our main result.

###### Proof of Theorem 1.

Let be a positive integer and let be a set such that . Let range freely over the natural numbers. We now obtain an expression for and compute its limit when .

Recall the following definitions:

 ν(2)N =A(¯s(2)1,…,¯s(2)|S(2)|−k+1;I) ν(3)N =A(¯s(3)1,…,¯s(3)|S(3)|−k+1;I) S(2) =⟨D(k);D(k+1);…;D(r−1)⟩ S(3) =⟨C(r);…;C(r)q times⟩.

where

 ¯s(j)i=(S(j)i,…,S(j)i+k−1)

for and .

The sequences and are entirely composed of complete -sequences of increasing orders which are larger than or equal to . Moreover, with the exception of the last, rightmost instance in each of and , every single -sequence is immediately succeeded by another -sequence of the same or the following order, including those which are part of a -sequence. Additionally, any window starting at the right-hand end of a -sequence necessarily finishes within the first elements of the following -sequence, all of which are guaranteed to be .

Therefore, the amount of windows of size contained in ranging over and is equal to the sum over each composing -sequence viewed as a cycle, with an error of at most due to the fact that we are counting only windows entirely contained within each sequence. If we let

 ¯c(s)i=(C(s)i,…,C(s)i+k−1)

for and , then:

 ν(2)N =r−1∑s=k[t(s)A(¯c(s)1,…,¯c(s)ss;I)]C-sequences contained in D(s)+εν(2)N ν(3)N =qA(¯c(r)1,…,¯c(r)rr;I)+εν(3)N

for some values , and . From Lemma 5:

 ν(2)N =r−1∑s=k[t(s)(ss|I|+ss−1(2k−1)εs)]+εν(2)N ν(3)N =q(rr|I|+rr−1(2k−1)εr)+εν(3)N

for some values of , . Substituting back into from equation (2):

 νN =ν(1)N +r−1∑s=k[t(s)(ss|I|+ss−1(2k−1)εs)]+εν(2)N +q(rr|I|+rr−1(2k−1)εr)+εν(3)N +ν(4)N+εb

and factoring out terms multiplied by , we get:

 νN =|I|[r−1∑s=kt(s)ss+qrr] +ν(1)N +r−1∑s=k