 # Upper bound for the number of closed and privileged words

A non-empty word w is a border of the word u if | w|<| u| and w is both a prefix and a suffix of u. A word u with the border w is closed if u has exactly two occurrences of w. A word u is privileged if | u|≤ 1 or if u contains a privileged border w that appears exactly twice in u. Peltomäki (2016) presented the following open problem: "Give a nontrivial upper bound for B(n)", where B(n) denotes the number of privileged words of length n. Let D(n) denote the number of closed words of length n. Let q>1 be the size of the alphabet. We show that there is a positive real constant c such that D(n)≤ clnnq^n/√(n)n>1 Privileged words are a subset of closed words, hence we show also an upper bound for the number of privileged words.

## Authors

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

A non-empty word is a border of the word if and is both a prefix and a suffix of . A border of the word is the maximal border of if for every border of we have that . A word with the border is closed if has exactly two occurrences of . It follows that occurs only as a prefix and as a suffix of . A word is privileged if or if contains a privileged border that appears exactly twice in . Obviously privileged words are a subset of closed words.

The properties of closed and privileged words have been studied in recent years [KeLeSa2013], [Pelto2013], [ScSh16]. One of the questions that has been investigated is the enumeration of privileged words. In [Nicholson2018_priviligedwords], it was proved that there are constants and such that for all , there are at least privileged words of length . This improves the lower bound for the number of privileged words from [FoJaPeSha2016]. Since every privileged word is a closed word, the result from [Nicholson2018_priviligedwords] forms also a lower bound for the number of closed words.

Concerning an upper bound for the number of privileged words we have found only the following open problem [Pelto2016]: “Give a nontrivial upper bound for ”, where denotes the number of privileged words of length . Also in [Pelto2016], the author presents an idea how to improve the lower bound from [Nicholson2018_priviligedwords]. On the other hand, in [Pelto2016], there is no explicit suggestion how to approach the problem of determining the upper bound.

In the current article we construct an upper bound for the number of closed words of length . Since the privileged words are a subset of closed words, we present also a response to the open problem from [Pelto2016].

We explain in outline our proof. Let be an alphabet with letters, let denote the set of all words of length , and let . It is known that . Let denote the number of words of length that do not contain the factor . Let be the maximal value of for all of length ; formally

 μ(n,m)=max{Aw(n)∣w∈Am}.

Let denote the set of all closed words of length and let denote the set of all closed words of length having a maximal border of length . Let and .

Obviously and , where . We show that if then and if then ; see Lemma 2.5. It follows that

 D(n)=n−1∑m=1D(n,m)≤⌊n2⌋∑m=1qmμ(n−2m,m)+n−1∑m=⌊n2⌋+1q⌈n2⌉. (1)

Let denote the set of positive integers. Let . Let denote the set of all functions such that if and only if and for all . We apply the function , because for some small .

The key observation in our article is that the number of words of length that do not contain some “short" factor of length has the same growth rate as the number of words of length . Formally said, for each there is a positive real constant such that ; see Theorem 2.3. This observation allows us to show that there are real positive constants such that

 ⌊n2⌋∑m=1qmμ(n−2m,m)≤c1lnn⌊n2⌋∑m=⌊c2lnn⌋qmμ(n−2m,m). (2)

In consequence we may count only closed words having a maximal border longer than in order to find an upper bound for . Applying that for , we derive from (1) and (2) our result for the number of closed words.

## 2 Upper bound for the number of closed words

We present an upper bound for the number of words of length that avoid some factor of length ; it means an upper bound for .

###### Lemma 2.1.

If then

 μ(n,m)≤qn(1−1qm)⌊nm⌋.
###### Proof.

Given , let be a set of words , where , , for all , and It follows that and thus . Obviously

 |Un,w|=(qm−1)⌊nm⌋qnmodm=qn(1−1qm)⌊nm⌋.

Note that . It is clear that the set of words of length not containing the factor is a subset of . The lemma follows. ∎

For the proof of Theorem 2.3 we need the following limit.

###### Proposition 2.2.

We have that

 limn→∞n(1−lnnn)n=e.
###### Proof.

Let

 y=limn→∞n(1−lnnn)n. (3)

From (3) we have that

 lny=limn→∞ln[n(1−lnnn)n]=limn→∞[lnn+nln(1−lnnn)]. (4)

Let us consider the second term on the right side of (4):

 limn→∞nln(1−lnnn)=limn→∞ln(1−lnnn)′(1n)′=limn→∞(−1)(1−lnnn2)(1−lnnn)−1n2=limn→∞n(1−lnn)n−lnn. (5)

Since , it follows from (4) and (5) that

 lny=limn→∞[lnn+n(1−lnn)n−lnn]=limn→∞[lnn+1−lnn]=1.

It follows that . This completes the proof. ∎

Let denote the set of positive real numbers.

Let . The following theorem states that the number of words of length avoiding some given "short" factor (of length shorter than ) has the same growth rate as the number of all words of length .

###### Theorem 2.3.

If then there is a constant such that for all we have that

 μ(n,π(n))qn−βlnn≤c.
###### Proof.

From Lemma 2.1 we have that

 μ(n,π(n))qn−βlnn=qn(1−1qπ(n))⌊nπ(n)⌋qn−βlnn=n(1−1qπ(n))⌊nπ(n)⌋. (6)

Realize that .

Obviously there is such that for all ; recall that as tends to infinity. Consequently for all we have that

 n(1−1qπ(n))n≤n(1−lnnn)n. (7)

Proposition 2.2 and (7) imply that

 limn→∞n(1−1qπ(n))n≤e. (8)

Clearly for each function such that and ; recall that . Then the theorem follows from (6) and (8). This completes the proof. ∎

Let . We present Theorem 2.3 in a slightly different manner that will be more useful for us in the following.

###### Corollary 2.4.

If , and then there is a constant such that for all we have that

 μ(n−2¯π(n),¯π(n))qn−h(n)≤c.
###### Proof.

It is easy to verify that , since the number of words of length avoiding some factor of length is bigger or equal to the number of words of length avoiding some factor of length .

Obviously . In consequence we have that .

The corollary follows from Theorem 2.3. This completes the proof. ∎

We show an upper bound for for the cases where and .

Suppose .

• If then .

• If then .

###### Proof.

If , , and then there is obviously at most one word with having a prefix and a suffix ; the prefix and the suffix would overlap with each other. If such exists then the first half of uniquely determines the second half of . If follows that .

Let denote the set of all factors of . If then let

 Z(n,m)={wuw∣u∈An−2m and w∈Am and w∉F(u)}.

If then . It is easy to see that

 |Z(n,m)|≤|Am|μ(n−2m,m).

This completes the proof. ∎

Let be a real constant and . Again we use the function to guarantee that for small .

###### Remark 2.6.

The function defines the maximal length of a “short” border of a closed word. In the proof of Theorem 2.9 the closed words from will be enumerated differently for and for .

The next auxiliary lemma shows an upper bound for , that we will use in the proof of Proposition 2.8.

###### Lemma 2.7.

There is a constant such that for all we have that

 q−h(n)+¯h(n)≤c1q1lnq(1κ−1)lnn
###### Proof.

Let

 y=limn→∞(−h(n)+¯h(n)−1lnq(1κ−1)lnn).

We have that

 y=limn→∞(−⌊1lnqlnn⌋+⌊1κlnq(lnn−lnlnn)⌋−1lnq(1κ−1)lnn)=limn→∞(lnnlnq(−1+1κ)−1lnq(1κ−1)lnn)=0. (9)

This implies that

 limn→∞q−h(n)+¯h(n)q1lnq(1κ−1)lnn=1.

The lemma follows. ∎

The next proposition shows an upper bound for the number of closed words of length having a maximal border of length .

###### Proposition 2.8.

There is a constant such that

 ⌈n2⌉∑m=1qmμ(n−2m,m)≤clnnqn√n, where n>1.
###### Proof.

Since we have that

 ⌈n2⌉∑m=1qmμ(n−2m,m)≤¯h(n)−1∑m=1qmμ(n−2m,m)+⌈n2⌉∑m=¯h(n)qmqn−2m. (10)

Corollary 2.4 implies that for some constant . It follows that

 ¯h(n)−1∑m=1qmμ(n−2m,m)≤¯h(n)∑m=1qmcqn−h(n)≤¯h(n)q¯h(n)cqn−h(n). (11)

Lemma 2.7 and (11) imply that

 ¯h(n)−1∑m=1qmμ(n−2m,m)≤c1¯h(n)cqn−lnnlnq(1−1κ), (12)

where is some real positive constant.

It is easy to verify that

 q−¯h(n)≤q−1κlnq(lnn−lnlnn)+1=q(lnn)1κq−1κlnqlnn. (13)

Thus using (13)

 ⌈n2⌉∑m=¯h(n)qmqn−2m≤qn⌈n2⌉∑m=¯h(n)q−m≤qn−¯h(n)1−q−1≤q(lnn)1κqn−1κlnqlnn1−q−1. (14)

Obviously . Hence taking , we get from (10), (12), and (14) that

 ⌈n2⌉∑m=1qmμ(n−2m,m)≤c1¯h(n)cqn−12lnqlnn+q(lnn)12qn−12lnqlnn1−q−1≤qn−12lnqlnn⎛⎜⎝c1clnn2lnq+q(lnn)121−q−1⎞⎟⎠≤qn−12lnqlnn(c2lnn+c3(lnn)12), (15)

for some constants . Since the proposition follows from (15). ∎

We show an upper bound for .

###### Theorem 2.9.

There is a constant such that

 D(n)≤clnnqn√n, where n>1.
###### Proof.

We have that

 D(n)=n−1∑m=1D(n,m)=⌈n2⌉∑m=1D(n,m)+n−1∑m=⌈n2⌉+1D(n,m). (16)

From Lemma 2.5 and (16) we get that

 D(n)≤⌈n2⌉∑m=1qmμ(n−2m,m)+n−1∑m=⌈n2⌉+1q⌈n2⌉. (17)

Realize that

 n−1∑m=⌈n2⌉+1q⌈n2⌉≤n2q⌈n2⌉

and

 limn→∞nqn2lnnqn√n=0.

Then it follows that from (17), and Proposition 2.8 that there are constants such that

 c2⌈n2⌉∑m=1qmμ(n−2m,m)≥n−1∑m=⌈n2⌉+1q⌈n2⌉ and
 D(n)≤c3⌈n2⌉∑m=1qmμ(n−2m,m). (18)

The theorem follows from (18), and Proposition 2.8

###### Remark 2.10.

Note that the some of the constants , that we used in our results and in particular in Theorem 2.9, depend on .

## Acknowledgments

The author acknowledges support by the Czech Science Foundation grant GAČR 13-03538S and by the Grant Agency of the Czech Technical University in Prague, grant No. SGS14/205/OHK4/3T/14.