The Number of Threshold Words on n Letters Grows Exponentially for Every n≥ 27

by   James D. Currie, et al.
The University of Winnipeg

For every n≥ 27, we show that the number of n/(n-1)^+-free words (i.e., threshold words) of length k on n letters grows exponentially in k. This settles all but finitely many cases of a conjecture of Ochem.



There are no comments yet.


page 1

page 2

page 3

page 4


The Weak Circular Repetition Threshold Over Large Alphabets

The repetition threshold for words on n letters, denoted (n), is the inf...

On Periodicity Lemma for Partial Words

We investigate the function L(h,p,q), called here the threshold function...

Borders, Palindrome Prefixes, and Square Prefixes

We show that the number of length-n words over a k-letter alphabet havin...

The power of deeper networks for expressing natural functions

It is well-known that neural networks are universal approximators, but t...

Customized determination of stop words using Random Matrix Theory approach

The distances between words calculated in word units are studied and com...

Is FFT Fast Enough for Beyond-5G Communications?

In this work, we consider the complexity and throughput limits of the Fa...

The undirected repetition threshold and undirected pattern avoidance

For a rational number r such that 1<r≤ 2, an undirected r-power is a wor...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Throughout, we use standard definitions and notations from combinatorics on words (see [13]). A square is a word of the form , where is a nonempty word. A cube is a word of the form , where is a nonempty word. An overlap is a word of the form , where is a letter and is a (possibly empty) word. The study of words goes back to Thue, who demonstrated the existence of an infinite overlap-free word over a binary alphabet, and an infinite square-free word over a ternary alphabet (see [1]).

A language is a set of finite words over some alphabet . The combinatorial complexity of a language is the sequence , where is defined as the number of words in of length . We say that a language grows exactly as the sequence grows, be it exponentially, polynomially, etc. Since the work of Brandenburg [2], the study of the growth of languages has been a central theme in combinatorics on words. Given a language , a key question is whether it grows exponentially (fast), or subexponentially (slow). Brandenburg [2] demonstrated that both the language of cube-free words over a binary alphabet, and the language of square-free words over a ternary alphabet, grow exponentially. On the other hand, Restivo and Salemi [19] demonstrated that the language of overlap-free binary words grows only polynomially.

Squares, cubes, and overlaps are all examples of repetitions in words, and can be considered in the same general framework. Let be a finite word, where the ’s are letters. A positive integer is a period of if for all . In this case, we say that is an exponent of , and the largest such number is called the exponent of . For a real number , a finite or infinite word is called -free (-free) if contains no finite factors of exponent greater than or equal to (strictly greater than , respectively).

Throughout, for every positive integer , let denote the -letter alphabet . For every , the repetition threshold for letters, denoted , is defined by

Essentially, the repetition threshold describes the border between avoidable and unavoidable repetitions in words over an alphabet of letters. The repetition threshold was first defined by Dejean [7]. Her 1972 conjecture on the values of has now been confirmed through the work of many authors [7, 17, 15, 14, 3, 5, 4, 6, 18]:

The last cases of Dejean’s conjecture were confirmed in 2011 by the first and third authors [6], and independently by Rao [18]

. However, probably the most important contribution was made by Carpi 

[3], who confirmed the conjecture in all but finitely many cases.

In this short note, we are concerned with the growth rate of the language of threshold words over . For every , let denote the language of all -free words over . We call the threshold language of order , and we call its members threshold words of order . Threshold words are also called Dejean words by some authors. For every , the threshold language is the minimally repetitive infinite language over .

The threshold language is exactly the language of overlap-free words over , which is known to grow only polynomially [19].333Currently the best known bounds on are due to Jungers et al. [9].444The threshold between polynomial and exponential growth for repetition-free binary words is known to be  [10]. That is, the language of -free words over grows polynomially, while the language of -free words over grows exponentially. However, Ochem made the following conjecture about the growth of threshold languages of all other orders.

Conjecture 1 (Ochem [16]).

For every , the language of threshold words of order grows exponentially.

Conjecture 1 has been confirmed for by Ochem [16], for by Kolpakov and Rao [12]

, and for all odd

less than or equal to by Tunev and Shur [23]. In this note, we confirm Conjecture 1 for every .

Theorem 2.

For every , the language of threshold words of order grows exponentially.

The layout of the remainder of the note is as follows. In Section 2, we summarize the work of Carpi [3] in confirming all but finitely many cases of Dejean’s conjecture. In Section 3, we establish Theorem 2 with constructions that rely heavily on the work of Carpi. We conclude with a discussion of problems related to the rate of growth of threshold languages.

2 Carpi’s reduction to -kernel repetitions

In this section, let be a fixed integer. Pansiot [17] was first to observe that if a word over the alphabet is -free, then it can be encoded by a word over the binary alphabet . For consistency, we use the notation of Carpi [3] to describe this encoding. Let denote the symmetric group on , and define the morphism by

Now define the map by


for all . To be precise, Pansiot proved that if a word is -free, then can be obtained from a word of the form , where , by renaming the letters.

Let , and let . Pansiot showed that if has a factor of exponent greater than , then either the word itself contains a short repetition, or the binary word contains a kernel repetition (see [17] for details). Carpi reformulated this statement so that both types of forbidden factors appear in the binary word . Let , and let . Then is called a -stabilizing word (of order ) if fixes the points . Let denote the set of -stabilizing words of order . The word is called a kernel repetition (of order ) if it has period and a factor of length such that and . Carpi’s reformulation of Pansiot’s result is the following.

Proposition 3 (Carpi [3, Proposition 3.2]).

Let . If a factor of has exponent larger than , then has a factor satisfying one of the following conditions:

  1. and for some ; or

  2. is a kernel repetition of order .

Now assume that , and define and . Carpi [3] defines an -uniform morphism with the following extraordinary property.

Proposition 4 (Carpi [3, Proposition 7.3]).

Suppose that , and let . Then for every , the word contains no -stabilizing word of length smaller than .

We note that Proposition 4 was proven by Carpi [3] in the case that in a computation-free manner. The improvement to stated here was achieved later by the first and third authors [4], using lemmas of Carpi [3] along with a significant computer check.

Proposition 4 says that for every word , no factor of satisfies condition (i) of Proposition 3. Thus, we need only worry about factors satisfying condition (ii) of Proposition 3, i.e., kernel repetitions. To this end, define the morphism by for all . A word is called a -kernel repetition if it has a period and a factor of length such that and . Carpi established the following result.

Proposition 5 (Carpi [3, Proposition 8.2]).

Let . If a factor of is a kernel repetition, then a factor of is a -kernel repetition.

In other words, if contains no -kernel repetitions, then no factor of satisfies condition (ii) of Proposition 3. Altogether, we have the following theorem, which we state formally for ease of reference.

Theorem 6.

Suppose that . If contains no -kernel repetitions, then is -free.

Finally, we note that the morphism is defined in such a way that the kernel of has a very simple structure.

Lemma 7 (Carpi [3, Lemma 9.1]).

If , then if and only if divides for every letter .

3 Constructing exponentially many threshold words

In this section, let be a fixed integer, and let and , as in the previous section. Since , we have . In order to prove that the threshold language grows exponentially, we construct an exponentially growing language of words that contain no -kernel repetitions. If (or equivalently, if ), then we define by modifying Carpi’s construction of an infinite word over that contains no -kernel repetitions. If (or equivalently, if ), then we define a -uniform substitution , and let be the set of all factors of words obtained by iterating on the letter 1.

Case I:

We first recall the definition of , the infinite word over defined by Carpi [3] that contains no -kernel repetitions. First of all, define , where

Now define , where for all , we have

Note that if (mod ), then . Let be the set of all finite words obtained from a prefix of by exchanging any subset of these 2’s for 1’s. To be precise, if , then if and only if all of the following hold:

  • if (mod );

  • if (mod ); and

  • if is odd.

Note in particular that if is in , then if and only if (mod ).

We claim that no word contains a -kernel repetition. The proof is essentially analogous to Carpi’s proof that contains no -kernel repetitions. We begin with a lemma about the lengths of factors in that lie in .

Lemma 8 (Adapted from Carpi [3, Lemma 9.3]).

Let , and let be a factor of . If , then divides .


The statement is trivially true if , so assume . Set , where is the maximal power of dividing . Suppose, towards a contradiction, that . Since , by Lemma 7, we see that divides , meaning .

Write . Then we have for some . By definition, for any , we have if and only if divides . (Since , we have , and hence implies (mod ).) Thus, we have that the sum is exactly the number of integers in the set that are divisible by , which is exactly . Since , by Lemma 7, we conclude that divides , contradicting the maximality of . ∎

Now, using Lemma 8 in place of [3, Lemma 9.3], a proof strictly analogous to that of [3, Proposition 9.4] gives the following. The only tool in the proof that we have not covered here is [3, Lemma 9.2], which is a short technical lemma about the repetitions in the word , and which can be used without any modification.

Proposition 9.

Suppose that . Then no word contains a -kernel repetition.

Case II:

Define a substitution by

We extend to by , which allows us to iteratively apply to an initial word in . Let , i.e., we have that is the set of factors of all words obtained by iteratively applying to the initial word 1. If a word has period and the length prefix of is in , then we say that is a kernel period of .

Proposition 10.

Suppose that . Then no word in contains a -kernel repetition.


Suppose otherwise that the word is a -kernel repetition. Write , where has kernel period . Without loss of generality, we may assume that no extension of that lies in has period , i.e., that is a maximal repetition in . From the definition of -kernel repetition, we must have

or equivalently,

Since , we certainly have


If , then we have , and hence . We eliminate this possibility by exhaustive search, so we may assume that .

We can write for some suffix of a word in , some prefix of a word in , and some word , where . By inspection, we see that if is any factor of of length , and both and are prefixes of some word in , then (mod ). Since both and are prefixes of , and since , we conclude that is a multiple of .

Recall that we have , where for some word . Since and , we have , and hence has kernel period . Now write , where . Evidently, we have . Note that has period . Further, since the frequency matrix of is invertible modulo , we have , and hence is a kernel period of . Since was a maximal repetition in , we see that is also maximal.

We may now repeat the process described above. Eventually, for some , we reach a word that can be written , where is a kernel period of , and . For all , one proves by induction that and . Thus, from (1), we obtain

for all . Dividing through by , and then simplifying, we obtain


for all .

Since , we obtain from (2). By Lemma 7, the kernel period of is a multiple of , so in fact we have , and in turn . By exhaustive search of all words in of length at most , we find that , where is a set containing exactly words. Indeed, the set contains

  • 160 words with kernel period 76 and length 77,

  • 36 words with kernel period 92 and length 93, and

  • 4 words with kernel period 112 and length 114.

For every , let

Evidently, we have . For every word , let denote the kernel period of , and let denote the maximum length of a repetition with kernel period across all words in . By exhaustive check, for every , we find . However, the word must be in , and by (2), we have

This is a contradiction. We conclude that the set contains no -kernel repetitions. ∎

We now proceed with the proof of our main result.

Proof of Theorem 2.

First suppose that . By Proposition 9, no word contains a -kernel repetition. From the definition of , one easily proves that

By Theorem 6, for every word , the word is in the threshold language of order . Moreover, the maps and are injective, and , since is -uniform and preserves length. It follows that

Since , and hence , are fixed, the quantity is a constant, and we conclude that the language grows exponentially.

Suppose now that . By Proposition 10, no word contains a -kernel repetition. Since for all , we have

By the same argument as above, we see that

and we conclude that the language grows exponentially. ∎

4 Conclusion

Conjecture 1 has now been established for all . We remark that different techniques than those presented here will be needed to establish Conjecture 1 in all but one of these remaining cases. (It appears that the techniques presented here could potentially be used for , but we do not pursue this isolated case.) For example, let . Then we have . By computer search, for every letter , the word contains a -stabilizing word of length , which is less than . By another computer search, the longest word on avoiding -kernel repetitions has length . So there are only finitely many words in that avoid both -kernel repetitions and the forbidden stabilizing words. Similar arguments lead to the same conclusion for all .

For a language , the value is called the growth rate of . If is factorial (i.e., closed under taking factors), then by an application of Fekete’s Lemma, we can safely replace by in this definition. If , then the language grows exponentially, and in this case, is a good description of how quickly the language grows.

For all , we have established that . However, this lower bound tends to as tends to infinity, and this seems far from best possible. Indeed, Shur and Gorbunova proposed the following conjecture concerning the asymptotic behaviour of .

Conjecture 11 (Shur and Gorbunova [22]).

The sequence of the growth rates of threshold languages converges to a limit as tends to infinity.

A wide variety of evidence supports this conjecture – we refer the reader to [22, 8, 21, 20] for details. For a fixed , there are efficient methods for determining upper bounds on which appear to be rather sharp, even for relatively large values of (see [22], for example). Establishing a sharp lower bound on appears to be a more difficult problem. We note that a good lower bound on is given by Kolpakov [11] using a method that requires some significant computation. For all , Kolpakov and Rao [12] give lower bounds for

using a similar method. They were then able to estimate the value of

with precision 0.005 using upper bounds obtained by the method of Shur and Gorbunova [22].

Thus, in addition to resolving the finitely many remaining cases of Conjecture 1, improving our lower bound for when remains a significant open problem.


  • [1] J. Berstel, Axel Thue’s papers on repetitions in words: A translation, Publications du LaCIM (Université du Québec à Montréal), vol. 20, 1995.
  • [2] F. J. Brandenburg, Uniformly growing -th power-free homomorphisms, Theoret. Comput. Sci. 23 (1983), 69–82.
  • [3] A. Carpi, On Dejean’s conjecture over large alphabets, Theoret. Comput. Sci. 385 (2007), 137–151.
  • [4] J. D. Currie and N. Rampersad, Dejean’s conjecture holds for , RAIRO - Theor. Inform. Appl. 43 (2009), 775–778.
  • [5] J. D. Currie and N. Rampersad, Dejean’s conjecture holds for , Theoret. Comput. Sci. 410 (2009), 2885–2888.
  • [6] J. D. Currie and N. Rampersad, A proof of Dejean’s conjecture, Math. Comp. 80 (2011), 1063–1070.
  • [7] F. Dejean, Sur un théorème de Thue, J. Combin. Theory Ser. A 13 (1972), 90–99.
  • [8] I. A. Gorbunova and A. M. Shur, On Pansiot words avoiding 3-repetitions, in Proc. WORDS 2011, Electron. Proc. Theor. Comput. Sci., vol. 63, 2012, pp. 138–146.
  • [9] R. M. Jungers, V. Y. Protasov, and V. D. Blondel, Overlap-free words and spectra of matrices, Theoret. Comput. Sci. 410 (2009), 3670–3684.
  • [10] J. Karhumäki and J. Shallit, Polynomial versus exponential growth in repetition-free binary words, J. Combin. Theory Ser. A 105 (2004), 335–347.
  • [11] R. Kolpakov, Efficient lower bounds on the number of repetition-free words, J. Integer Seq. 10 (2007), 1–16.
  • [12] R. Kolpakov and M. Rao, On the number of Dejean words over alphabets of , , , , and letters, Theoret. Comput. Sci. 412 (2011), 6507–6516.
  • [13] M. Lothaire, Algebraic combinatorics on words, Cambridge University Press, 2002.
  • [14] M. Mohammad-Noori and J. D. Currie, Dejean’s conjecture and Sturmian words, European J. Combin. 28 (2007), 876–890.
  • [15] J. Moulin-Ollagnier, Proof of Dejean’s conjecture for alphabets with , , , , , , and letters, Theoret. Comput. Sci. 95 (1992), 187–205.
  • [16] P. Ochem, A generator of morphisms for infinite words, RAIRO - Theor. Inform. Appl. 40 (2006), 427–441.
  • [17] J. J. Pansiot, A propos d’une conjecture de F. Dejean sur les répétitions dans les mots, Discrete Appl. Math. 7 (1984), 297–311.
  • [18] M. Rao, Last cases of Dejean’s conjecture, Theoret. Comput. Sci. 412 (2011), 3010–3018.
  • [19] A. Restivo and S. Salemi, Overlap free words on two symbols, in M. Nivat and D. Perrin, eds., Automata on Infinite Words, Lecture Notes in Comput. Sci., Vol. 192, Springer-Verlag, 1985, pp. 198–206.
  • [20] A. M. Shur, Growth properties of power-free languages, Comput. Sci. Rev. 6 (2012), 187–208.
  • [21] A. M. Shur, Growth of power-free languages over large alphabets, Theory Comput. Syst. 54 (2014), 224–243.
  • [22] A. M. Shur and I. A. Gorbunova, On the growth rates of complexity of threshold languages, RAIRO - Theor. Inform. Appl. 44 (2010), 175–192.
  • [23] I. N. Tunev and A. M. Shur, On two stronger versions of Dejean’s conjecture, in Proc. 37th Internat. Conf. on Mathematical Foundations of Computer Science: MFCS 2012, Lecture Notes in Comput. Sci., vol. 7464, Springer, 2012, pp. 800–812.