    # Insertion in constructed normal numbers

Defined by Borel, a real number is normal to an integer base b, greater than or equal to 2, if in its base-b expansion every block of digits occurs with the same limiting frequency as every other block of the same length. We consider the problem of insertion in constructed base-b normal expansions to obtain normality to base (b+1).

## Authors

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Problem description and statement of results

Defined by Émile Borel, a real number is normal to an integer base , greater than or equal to , if in its base- expansion every block of digits occurs with the same limiting frequency as every other block of the same length. Equivalently, a real number is normal to base if the fractional parts of

are uniformly distributed modulo

in the unit interval.

There are many ways to modify normal numbers preserving normality to a given base. A major result is Wall’s theorem  showing that the subsequences of a base- expansion along arithmetic progressions preserve normality, crowned by Kamae and Weiss’  complete characterization of the subsequences that preserve normality. Other normality preserving operations are addition by some numbers [19, 1, 17], multiplication by a rational , transformations by some finite automata  and there are more.

Another form of modification transfers normality from base  to normality to base : Vandehey [18, Theorem 1.2] proved that the subsequence of a base- normal expansion formed by all the digits different from is normal to base . This is, indeed, the removal from a normal base- expansion of all the instances of the digit .

Here we consider the dual, the problem of transferring normality from base  to base .

###### Problem.

How to insert digits along a normal base- expansion so that the resulting expansion is normal to base ?

There are two versions of the insertion problem:

• when insertion liberally uses all the digits in base ,

• when insertion is limited just to the new digit.

In the present work we tackle the two versions of the insertion problem on a class constructed normal numbers. For each version of the problem we give an effective construction that controls the distance between each occurrence of the new digit and the next. An effective construction is a prescription on how to perform the insertion while reading the input sequence from left to right.

Since we look at normality to just one base at a time, instead of fractional expansions of real numbers we deal with sequences of symbols in a given alphabet and we talk about normality to that alphabet. We state the results as transferring normality from an alphabet to alphabet with  not in .

For the liberal insertion problem the input is the concatenation of perfect necklaces over alphabet  of linearly increasing order and the resulting sequence is also a concatenation of perfect necklaces of linearly increasing order but over alphabet . Perfect necklaces were introduced in . They are a variant of the classical de Bruijn sequences. The concatenation of perfect necklaces of linearly increasing order is a normal sequence (this is proved in Proposition 4). We prove the following.

###### Theorem 1.

Let alphabets and with not in . Let be a concatenation of perfect necklaces over alphabet  of linearly increasing order. Then, there is an effective construction of normal to alphabet such that is the concatenation of perfect necklaces over  of linearly increasing order, and  is a subsequence of . And for every integer  greater than , in between the occurrences of the symbol in just before and just after position there are at most symbols.

The one symbol insertion problem has already an adroit solution on arbitrary normal sequences, given by Zylber in 

. However, this solution is not effective in general. It becomes effective when there is an effective upper bound for the difference between the expected —by the uniform probability distribution— and the actual number of occurrences of any given word in any position of the input sequence.

###### Theorem 2 (Zylber [20, Theorem 1]).

Let alphabets and with not in . Let be normal to alphabet . Then, there exists normal to alphabet such that , where is the retract that removes all the instances of the symbol .

We rework Zylber’s solution on normal sequences that are concatenations of nested perfect necklaces of exponentially increasing order. One of these is the celebrated sequence defined by M. Levin [14, Theorem 2] by means of Pascal triangle matrix modulo [5, Theorem 1]. If we denote with the cardinality of alphabet , these sequences are base- expansions of normal numbers  for which the discrepancy of is the smallest known [14, 5]. Here we prove the following corollary of Theorem 2.

###### Corollary 1.

Let alphabets and with not in . Let be the concatenation of nested perfect necklaces over alphabet of order , for . Then, there is an effective construction of normal to alphabet such that , where is the retract that removes all the instances of the symbol . And, for every sufficiently large, in between the occurrences of the symbol  just before and just after position there are at most symbols.

The construction given in Corollary 1 can be adapted to any other input sequence equipped with an effective bound for the difference between the expected and the actual number of occurrences of any given word in any position. In the case of the concatenation of nested perfect necklaces this bound is easy to obtain and it is the smallest known.

This document is organized as follows:
Section 2 presents the basics of perfect necklaces and nested perfect necklaces.
Section 3 solves the liberal insertion problem on the concatenation of perfect necklaces.
Section 4 solves the one-symbol insertion problem on the concatenation of nested perfect necklaces.

It remains to study how to compare the discrepancy of   and the discrepancy of where the base- expansion of results from insertion in the base- expansion of a normal number . It may be possible to obtain metric results similar to those obtained by Fukuyama and Hiroshima  for subsequences of .

## 2 Perfect necklaces and nested perfect necklaces

### 2.1 Perfect necklaces

This section is based on . A word is a finite sequence of symbols in a given alphabet. For a finite alphabet , we write  for its cardinality, for the set of all words of length , for the set of all words and for the set of all infinite sequences. The positions in words and in sequences are numbered starting at . We write for the symbol at position  and we write for the symbols of from position  to position . The length of a word is .

Let be the rotation operator, , position  between  and the length of . We let  denote the application of the rotation  times. A circular word or necklace is the equivalence class of a word under rotations. To denote a necklace we write where  is any of the words in the equivalence class. For instance, contains a single word because for every , and contains three words , and .

###### Definition.

A necklace is -perfect if each word of length occurs many times at positions different modulo , for any convention of the starting point.

Thus, each -perfect necklace has length . Perfect necklaces are a variant of de Bruijn sequences. Recall that a de Bruijn sequence of order over alphabet is a necklace of length and each word of length occurs in it exactly once. Then, -perfect necklaces coincide with the de Bruijn sequences of order .

For alphabet there are just two -perfect necklaces,

 [00 01 10 11] and [00 10 01 11]

This is a -perfect necklace

 .

The following are not -perfect

  and .
###### Definition.

For an alphabet and a positive integer , the -ordered necklace is the concatenation of all words of length  in lexicographic order.

The following are the -ordered necklaces over alphabet for ,

 ,   [00 01 10 11],   [000 001 010 011 100 101 110 111]

Every -ordered necklace is -perfect. Inexplicably, this was not observed by Barbier [3, 4] nor by Champernowne .

###### Remark ([2, Theorem 5]).

Identify words of length over alphabet with the integers to . Let coprime with . The concatenation of words corresponding to the arithmetic sequence yields a -perfect necklace. By taking we obtain that -ordered necklaces are -perfect.

###### Proposition 1.

In the -ordered necklace over alphabet , for each symbol , between one occurrence of and the next there are at most symbols.

###### Proof.

The -ordered necklace is the concatenation of all words of length in lexicographical order. Consider many consecutive words, . Observe that the last symbol in  is necessarily the same as the last symbol in . Let  be that symbol. In between these two occurrences of there are . For some choices of these are the only two occurrences of in these words. All the other cases yield a smaller number of symbols. ∎

### 2.2 Perfect necklaces as Eulerian cycles in astute graphs

The -perfect necklaces are characterized with Eulerian cycles in the so called astute graphs.

###### Definition.

The astute graph is a pair where

and

.

Thus, has vertices and edges. It is Eulerian because it is strongly regular (all vertices have in-degree and out-degree equal to ) and strongly connected (every vertex is reachable from every other vertex). Notice that is the de Bruijn graph of words of length over alphabet .

###### Proposition 2 ([2, Corollary 14]).

Each -perfect necklace over alphabet can be constructed as an Eulerian cycle in .

In some cases several Eulerian cycles in yield the same -perfect necklace, this happens when there is a period inside a cycle.

###### Remark ([2, Theorem 20]).

The number of -perfect necklaces over a -symbol alphabet is

 1k∑db,k|j|ke(j)ϕ(k/j),

where

• , such that is the set of primes that divide both  and , and is the exponent of in the factorization of ,

• is the number of Eulerian cycles in where ,

• is Euler’s totient function, counts the positive integers less than or equal to that are relatively prime to .

### 2.3 From perfect necklaces to normal sequences

For the number of occurrences of a word in a word at any position we write

 |v|u=|{i:v[i,i+|u|−1]=u}|.

For example . If is a -perfect necklace over alphabet then for every word of length at most ,

 k|A||v|−|u|−|u|+1≤|v|u≤k|A||v|−|u|.

Next, we show that the concatenation of perfect necklaces of linearly increasing order is normal. To prove it we use Piatetski-Shapiro’s theorem [16, 15, 7].

###### Proposition 3 (Piatetski-Shapiro theorem).

The sequence is normal to alphabet if and only if there is positive constant  such that for all words ,

###### Proposition 4.

The concatenation of -perfect necklaces over alphabet , for and a linear function of , is normal to alphabet .

###### Proof.

Consider the ratio between the lengths of successive of -perfect necklaces. Since -perfect necklaces have length and is linear function in ,

 limn→∞kn+1kn|A|n+1|A|n=|A|.

Let . And, for every , let be such that .
Then, for every of length ,

 limsupn→∞|v[1,n]|un<|v[1,M(mn+1)]|uM(mn)≤|A||A|−ℓ.

Piatetski-Shapiro theorem (Proposition 3) holds with  and is normal to alphabet . ∎

### 2.4 Nested perfect necklaces

Nested perfect necklaces were introduced in .

###### Definition.

A -perfect necklace over alphabet is nested if or if it is the concatenation of  nested -perfect necklaces.

For example, the following is a nested -perfect necklace over alphabet ,

 [  0011\tiny(1,2)-perfect 0110  \tiny(1,2)-perfect]

Each of these are -perfect necklaces.

       

The concatenation in each row yields a -perfect necklace.

The concatenation of the first two rows yields a nested -perfect necklace.

The concatenation of the last two rows yields a nested -perfect necklace.

The concatenation of all rows yields a nested -perfect necklace.
The -ordered necklaces are perfect but not nested, for example for and ,

 [00 01 0σ\tiny not (1,2)-perfect 10 11 1σ\tiny not (1,2)-perfect σ0 σ1 σσ\tiny not (1,2)-perfect]
###### Remark ([5, Theorem 2]).

For each there are binary nested -perfect necklaces.

### 2.5 From nested perfect necklaces to normal sequences

By Wall’s thesis , normal numbers are exactly those real numbers for which is uniformly distributed, which means that the discrepancy of the first terms

 DN((bnxnmod1)n≥0)=supγ∈[0,1)∣∣∣1N|{n≤N:(bnxmod1)}<γ}|−γ∣∣∣

goes to as goes to infinity. For sequences of the form the smallest known discrepancy of the first terms is , see [14, 7]. Expansions made of nested perfect necklaces of exponentially increasing order yield real numbers with this property.

###### Proposition 5 ([5, Theorem 1]).

Let a prime number. The base- expansion of the number defined by M.Levin using Pascal triangle matrix modulo  is the concatenation of nested -perfect necklaces for . And for every number  whose base- expansion is the concatenation of nested -perfect necklaces for , is .

### 2.6 Aligned occurrences and discrete discrepancy

Given two words and , we write for the number of occurrences of at the positions of congruent to  modulo the length of , that we call aligned occurrences,

 ||v||u=∣∣{i:v[i,i+|u|−1]=u and i≡1mod|u|}∣∣.

For example, and .
The relation between and is as follows,

 |v|u=|u|−1∑i=0||v[1+i,|v|]||u.

So, for any single symbol ,

 |v|a=||v||a.

The next proposition is immediate from the definition of nested -perfect necklaces. The bound given in Point  is analog to the bound given in [14, Lemma 5].

###### Proposition 6.

Let be a -perfect necklace over alphabet . Then for each word of length less than or equal to ,

 1. ⌊(k/|u|)|A|n−|u|⌋−1≤||v||u≤⌈(k/|u|)|A|n−|u|⌉, 2. for every i=0,…|u|−1, ⌊(k/|u|)|A|n−|u|⌋−2≤||v[1+i,|v|]||u≤⌈(k/|u|)|A|n−|u|⌉, 3. if [v] is a nested perfect and v=st where [s] and [t] are (n−1,k)-perfect for every i=0,…|u|−1, ⌊(k/|u|)|A|n−1−|u|⌋−2≤||s[|s|−i+1,|s|] t[1,|t|−i]||u≤⌈(k/|u|)|A|n−1−|u|⌉+1, ⌊(k/|u|)|A|n−1−|u|⌋−2≤||s[1+i,|s|] t[1,i]||u≤⌈(k/|u|)|A|n−1−|u|⌉+1.

We define the discrete discrepancy of a word for length , , by counting aligned occurrences.

###### Definition (Discrete discrepancy at aligned positions).
 ΔA,ℓ(v)=maxu∈Aℓ(∣∣ ∣∣||v||u⌊|v|/ℓ⌋−1|A|ℓ∣∣ ∣∣).
###### Remark.

If is a -perfect necklace over alphabet then for every length  such that ,

 ΔA,ℓ(v)≤2⌊|v|/ℓ⌋.
###### Proof.

The length of is . By Point of Proposition 6, for every word of length less than or equal to ,

 ΔA,ℓ(v) =maxu∈Aℓ(∣∣ ∣∣||v||u⌊|v|/ℓ⌋−1|A|ℓ∣∣ ∣∣) ≤max((⌈(k/ℓ)|A|n−ℓ⌉⌊k|A|n/ℓ⌋−1|A|ℓ),(1|A|ℓ−⌊(k/ℓ)|A|n−ℓ⌋−1⌊k|A|n/ℓ⌋)) ≤2⌊|v|/ℓ⌋.

###### Lemma 1.

Let be concatenation of nested -perfect necklaces over alphabet for . Then, for every there is such that for every ,

 ΔA,ℓ(x[Nℓ,N])=O(ℓ(logN)/N).

If then .

###### Proof.

We follow the idea in the proof of [14, Theorem 2]. We write for the cardinality of . Fix  and let  and  be such that and

 N=(m−1∑d=02db2d)+M.

So, is the sum of the lengths of nested -perfect necklaces for , plus .

Let and for , . Then, and for every word ,

 ||x[1,N]||u =m−1∑d=0||x[Ld−1+1,Ld]||u+||x[Lm−1+1,N]||u.

Since is an incomplete nested perfect necklace, its discrete discrepancy determines the discrete discrepancy of . Let and be integers such that , each and

 M=2m(2m−1∑i=0nibi)+M0.

So, is the sum of the lengths of  nested -perfect necklaces for plus

=

For any word of length ,

 ||x[1,N]||u =m−1∑i=0||x[Li−1+1,Li]||u+||x[Lm−1+1,N]||u+ε, where ε=0 or ε=1

In the concatenation of nested -perfect necklaces, it is certain to find when . In the concatenation of nested -perfect necklaces, it is certain to find  when . Point 1 of Proposition 6 ensures that the difference between the actual and the expected number of aligned occurrences of in a perfect necklace of order at least is at most . Then,

 ||x[1,N]||u ≥b−ℓ/ℓ(m−1∑i=ℓ2ib2i−O(1))+((M−M0)b−ℓ/ℓ−O(2m))≥b−ℓN/ℓ−O(2m) ||x[1,N]||u ≤(L⌈logℓ⌉/ℓ+b−ℓm−1∑i=ℓ2ib2i/ℓ+O(1))+(Mb−ℓ/ℓ+O(2m))≤b−ℓN/ℓ+O(2m).

Since we conclude . Finally notice that if then is , hence we can take . ∎

## 3 Liberal insertion

### 3.1 Tools to prove Theorem 1

Consider alphabets and for not in . Since the length and lexicographic order on words over alphabet respects the length and lexicographic order on words over , by inserting suitable symbols in suitable positions in each -ordered necklace over we obtain each -ordered necklace over . For example, for and ,

Much more is true: for any -perfect necklace over alphabet there is an -perfect necklace over such that the first is a subsequence of the second. This is immediate from the graph theoretical characterization -perfect necklaces as Eulerian cycles on astute graphs: is a subgraph of , and any cycle in an Eulerian graph can be embedded into a full Eulerian cycle. For instance, such an extension can be constructed with Hierholzer’s algorithm for joining cycles together to create an Eulerian cycle of a graph. However this method does not guarantee that along the resulting -perfect necklace, there will be a small gap between one occurrence of the symbol  and the next.

The next lemma gives method to insert symbols in a perfect necklace ensuring a small gap condition. It extends to perfect necklaces the work in [6, Theorem 1] for de Bruijn sequences.

###### Lemma 2 (Main lemma).

Assume alphabets and for not in . For every -perfect necklace  over alphabet there is a -perfect necklace  over alphabet such that is a subsequence of . Moreover, for each such there is satisfying that in between one occurrence of the symbol  and the next there at most  other symbols.

Notice that -ordered necklace over alphabet fails the small gap condition required in Lemma 2. For instance, for , and , there occurrences of with more than symbols in between:

However this other insertion satisfies the small gap condition:

.

Our main tool to prove the Main Lemma 2 is the factorization of the set of edges in the astute graph in convenient sets of pairwise disjoint cycles. We say that two cycles are disjoint if they have no common edges.

###### Proposition 7.

The set of edges in can be partitioned into a disjoint set of cycles identified by the necklaces of length .

###### Proof.

Each edge in is identified with a different word of length . The set of all rotations of a word of length identifies consecutive edges that form a simple cycle in . And each necklace of length corresponds exactly to disjoint simple cycles in , one associated to each congruence class. The partition of the set of words of length in the equivalence classes given by their rotations determines a partition of the set of edges in into disjoint simple cycles. ∎

The fact that is a subgraph of motivates the following definition.

###### Definition (Augmenting graph).

The augmenting graph is the directed graph where is the set of length- words over and is the set of pairs such that , for some word of length and symbols in , and either or  have at least one occurrence of symbol .

The definition above ensures that each of the vertices in that is also a vertex in has exactly one incoming edge and exactly one outgoing edge. This outcoming edge is associated to new symbol . To prove the Main Lemma 2 we plan to construct an Eulerian cycle in by joining the given Eulerian cycle in with disjoint cycles of the augmenting graph that we call petals. These petals must exhaust the augmenting graph . And we must insert the petals in a way that ensures the small gap condition for the symbol

###### Definition (Necklaces for pairs (v,m)).

We now consider necklaces consisting a word and a number that represents a congruence class. Recall that is the rotation operation on words that shifts one position to the right. For alphabet , words of length and congruence classes, if and is between and ,

 [v,m]={(v,m),(θ(v),(m+1)modk),(θ2(v),(m+2)modk),…,(θn(v),(m+n)modk)},
###### Definition (Graph of necklaces).

We define as the graph where

 V ={[v,m]:v∈ˆAn,m=0,…,k−1} E ={(x,y):there is (au,m)∈x and there is (uc,(m+1)modk)∈y, for a,c∈ˆA}

We also define the graph as the subgraph of whose vertices contain at least one occurrence of the symbol .

We define a petal for each vertex in . A petal is a union of disjoint cycles in which are identified by the necklaces of length  that have at least one occurrence of symbol . For this identification we consider the graph . The petal for vertex in starts at the necklace  in .

###### Definition (Petal for vertex in GA(n,k)).

A petal for vertex in is a cycle in induced by a subgraph that contains the necklace .

To exhaust we partition it in petals. For this we define a Petals tree. Recall that a tree is a directed acyclic graph with exactly one path from the root to each vertex. The Petals tree is a root that branches out in a subgraph of  including all its vertices. The root has  many branches, each branch is a Petal for a vertex in that starts with the necklace . The Petals tree has has height , the vertices at distance  to the root have exactly occurrences of the new symbol , for .

###### Definition (Petals tree).

A Petals tree consists of all the vertices in and a root , which is a necklace that corresponds to an Eulerian cycle in . Each vertex , where has exactly one occurrence of the symbol (there are many of them), is a child of .And for , each vertex , where has exactly  occurrences of the symbol , is a child of some vertex  if has exactly occurrences of the symbol  and there is an edge between and in .

There are many Petals trees, any one is good for our purpose. A Petals tree can be obtained by any algorithm that finds a spanning tree of a graph, as Kruskal’s greedy algorithm for the minimal spanning tree, or constructing one based on the classical Breath First search on .

We now focus on how to insert the petals in the given Eulerian cycle in , which is a pointed cycle, this is a cycle with a specified starting edge. We need to define the sections of an Eulerian cycle in .

###### Definition (Section of a cycle).

For a pointed Eulerian cycle in given by the sequence of edges and an integer such that , the -th section of the cycle is the sequence of the vertices that are heads of .

The astute has  vertices and  edges. An Eulerian cycle in has  sections with  vertices each section. Since there are the same number of vertices as sections we would like to choose one vertex from each section to place a petal. The problem is each vertex occurs times in the Eulerian cycle but not necessarily at different sections. We pose it as a matching problem.

###### Definition (Distribution graph).

Given pointed Eulerian cycle in the Distribution graph is a -regular bipartite graph, one part consists of the vertices in , the other part consists of the sections of the Eulerian cycle. There is an edge from a vertex in to a section if belongs to the section .

A matching in a Distribution graph is a set of edges such that no two edges share a common vertex. A vertex is matched if it is an endpoint of one of the edges in the matching. A perfect matching is a matching that matches all vertices in the graph.

###### Proposition 8.

For every Distribution graph there is a perfect matching.

###### Proof.

Let be a finite bipartite graph consisting of two disjoint sets of vertices and with edges that connect a vertex in to a vertex in . For a subset of , let be the set of all vertices in adjacent to some element in . Hall’s marriage theorem  states that there is a matching that entirely covers if and only if for every subset in , . Consider a Distribution graph and call to the set of vertices and to the set of sections. For any such that , the sum of the out-degree of these vertices is . Given that the in-degree for any vertex in is , we have that . Then, there is a matching that entirely covers . Furthermore, since the number of vertices is equal to the number of sections, , the matching is perfect. ∎

To obtain a perfect matching in a Distribution graph we can use any method to compute the maximum flow in a network. We define the flow network by adding two vertices to the Distribution graph, the source and the sink. Add an edge from the source to each vertex in and add an edge from each vertex in to the sink. Assign capacity  to each of the edges of the flow network. The maximum flow of the network is . This flow has the edges of a perfect match.

We have the needed tools for the awaiting proof.

###### Proof of the Main Lemma  2.

To simplify the notation, assume that is a -perfect necklace over alphabet . We construct a -perfect necklace over alphabet . Based on Proposition 2, we need to construct an Eulerian cycle in . Consider a pointed Eulerian cycle in for  and divide it in sections. From Proposition 8 we know that we can choose one vertex in each section according to a perfect matching. Fix a Petals tree. The construction considers all the sections, one after the other, starting at section . The construction starts at the vertex that is the head of the first edge of section . Let be the current vertex.

Case is a vertex in : If  is the chosen vertex in the current section and the petal for , which starts with , has not been inserted yet then insert it now: traverse the edge that adds the symbol  and continue traversing the petal for . If the petal for has already been inserted or is not a chosen vertex then continue with the traversal of edges in the current section.

Case is not a vertex in : If the necklace is a child of the current vertex in the Petals tree and it has not been traversed yet, then traverse it. Otherwise continue with the traversal of the petal that was already part of.

Finally, we prove that the construction of satisfies the minimal gap condition. Since we assumed  is a -perfect necklace and we must prove that in between any occurrence of  and the next in there are at most symbols.

Consider the petals for the vertices in . Before the insertion of petals in sections, each section has no occurrence of the symbol . Since each section has  edges, if we place one petal in each section then two consecutive petals are at most edges away. A petal for a vertex  in starts with the edge that adds the symbol right after . In case the petal is just the single vertex in , then it is a cycle of exactly  edges. So, in between the occurrence of in the first petal and the occurrence of in the second there are at most  other symbols. In case the petal traverses more than one vertex in then, before completing the traversal of the edges in , the path branches out to another vertex in . This is possible only by adding the symbol . In the traversal to other vertices in it also happens that before traversing  consecutive edges there is necessarily one edge that adds the symbol . We conclude that in between any occurrence of and the next there are at most  other symbols. ∎

###### Example.

Consider and . Let the -ordered necklace for