Avoiding squares over words with lists of size three amongst four symbols

04/20/2021 ∙ by Matthieu Rosenfeld, et al. ∙ 0

In 2007, Grytczuk conjecture that for any sequence (ℓ_i)_i≥1 of alphabets of size 3 there exists a square-free infinite word w such that for all i, the i-th letter of w belongs to ℓ_i. The result of Thue of 1906 implies that there is an infinite square-free word if all the ℓ_i are identical. On the other, hand Grytczuk, Przybyło and Zhu showed in 2011 that it also holds if the ℓ_i are of size 4 instead of 3. In this article, we first show that if the lists are of size 4, the number of square-free words is at least 2.45^n (the previous similar bound was 2^n). We then show our main result: we can construct such a square-free word if the lists are subsets of size 3 of the same alphabet of size 4. Our proof also implies that there are at least 1.25^n square-free words of length n for any such list assignment. This proof relies on the existence of a set of coefficients verified with a computer. We suspect that the full conjecture could be resolved by this method with a much more powerful computer (but we might need to wait a few decades for such a computer to be available).

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

A square is a word of the form where is a non-empty word. We say that a word is square-free (or avoids squares) if none of its factors is a square. For instance, is a square while is square-free. In 1906, Thue showed that there are arbitrarily long ternary words avoiding squares [17, 18]. This result is often regarded as the starting point of combinatorics on words, and the generalizations of this particular question received a lot of attention.

Nonrepetitive colorings of graphs were introduced by Alon et al. [1]. A coloring of the vertices (or of the edges) of a graph is said to be nonrepetitive if there is no path of the graph whose color sequence is a square. The nonrepetitive chromatic number (resp. nonrepetitive chromatic index) of a graph is the minimal number of colors in a nonrepetitive coloring of the vertices (resp. the edges) of the graph. Alon et al. showed that nonrepetitive chromatic index is as most where is the maximum degree of [1]. Different authors successively improved the upper bounds on the nonrepetitive chromatic number and the nonrepetitive chromatic index and the best known bound for the nonrepetitive chromatic number is also in [2, 4, 10, 14]. Non-repetitive colorings have since been studied in many other contexts (see for instance [20] for a recent survey on this topic).

Most results regarding non-repetitive colorings of graphs of bounded maximal degree are based on the Lovász Local Lemma, entropy-compression, or related methods and they naturally hold in the stronger setting of list coloring. In this setting, each vertex is assigned a list of colors and the colorings of the graph must assign to each vertex a color from its list. The nonrepetitive list chromatic number is the smallest integer such that the graph is nonrepetitively colorable as soon as all the lists contain at least colors. This notion was studied in relation to many notions of colorings and contrary to the first intuition it is often not the case that the worst possible list assignment is the one that gives the same list to every vertex. For instance, every planar graph has nonrepetitive chromatic number at most 768, but can have an arbitrarily large nonrepetitive list chromatic number [3]. However, the best known bound on the nonrepetitive chromatic number in terms of the maximal degree also holds for the nonrepetititive list chromatic number [20]. It is unknown whether the optimal bounds in terms of the maximal degree also holds for the nonrepetititive list chromatic number are indeed identical or not. The simplest graph for which the question is non-trivial is the path. The result of Thue implies that the nonrepetitive chromatic number of any path over at least four vertices is [17]. There are various simple proofs that the nonrepetitive list chromatic number of any path is at most 4 (see [8] for a proof based on the Lefthanded Local Lemma, [9] for a proof based on entropy compression and [14] for a proof based on a simple counting argument). It was first conjectured by Grytczuk [6] that the nonrepetitive list chromatic number of any path is in fact at most 3. This conjecture has been mentioned many times, but not much progress has been made in the direction of proving or disproving it (see for instance [5, 6, 7, 9, 13, 14, 16, 20, 21] for some of the mentions of this problem).

The question can be reformulated in terms of combinatorics on words.

Question 1.

Let be an infinite alphabet. Is it true that for any sequence of subsets of of size there exists an infinite square-free word such that for all , the -th letter of belongs to ?

As already mentioned the answer is positive if is replaced by . It was even shown in [14] that there are at least such words of length for any list assignment. We first show in this article, that there are at least such words. We then show our main result: if is of size then there is such a word. Our approach is similar to the idea of [14], that is, we show some strong bounds on the number of such words using an inductive argument. Our approach also relies on ideas of an approach of Shur to bounds the number of words power-free languages [15] (this improves an older technique introduced by Kolpakov [11]

). This idea is that instead of directly counting the words we associate a weight to each word and we count the total weight of the set of valid words. In the approach of Kolpakov, he had to deal with three different kinds of squares (short squares, long squares, and mid-length squares) and we borrow a trick from Shur that allows dealing with only two kinds of squares (i.e., the mid-length squares do not need to be dealt with separately). In fact, the main difference between our proof and the approach of Shur is that eigenvectors and eigenvalues are replaced by a more complicated notion (which does not seems to be well studied or even named in the litterature). For our main result, we use a computer to verify the existence of a set of weights with the right property. This approach could probably be applied if

is of size larger than (and even for infinite ), but the computational power required is larger than what modern computers offer. In particular, we suspect that this approach would work if was of size and that would imply the conjecture.

The article is organized as follows. We first provide some notations and definitions in Section 2. In Section 3, we show that for any list assignment with lists of size , there are at least square-free words of size . We use this proof to introduce our technique. In section 4, we use this technique to show our main result. In Section 5, we detail how we use the computer to verify the existence of the set of weights needed for our result. We conclude this article, in Section 6, by explaining why we suspect that the same approach could solve Question 1.

2 Definitions and notations

We use the standard definitions and notations of combinatorics on words introduced in Chapter 1 of [12].

For the sake of notations, all our alphabets are sets of integers. For any fixed alphabet , the set of extensions of a word is the set of words . That is, a word is an extension of another word is u can be obtained by the concatenation of a letter at the end of . For any words and any such that , we write .

A square is a word of the form with a non-empty word. The period of the square is and by abuse of notation we also call the length of the period of . A word is square-free (or avoids squares) if none of its factors is a square.

A list assignement is a sequence of subsets of the integers. A -list assignement is a list assignement such that each list is of size 4, i.e. for all , . We say that a word respects a list assignement if for all , .

3 The number of square-free words over 4 letters

We say that a word of length at least is perfect if its suffix of length contains distinct letters (it is up to a permutation of the alphabet). A word is nice if its suffix of length 3 is up to a permutation of the alphabet. A square-free word is either nice or perfect. The idea is that since nice words are intuitively easier to extend than nice words it is better to count them separately. However, since it is complicated to count them separately, we will count them together by weighting them.

For any set of words , is the quantity obtained by summing the number of perfect words in together with times the number of nice words in . For instance, if then .

Lemma 2.

Let be a -list assignement and let be the number of square-free words of length that respect . Let be a real such that . Then, for any ,

Proof.

We proceed by induction on . Suppose that for every , . Then, for all ,

(1)

We need to show

A word of length is good if

  • it respects ,

  • its prefix of length is in ,

  • and it contains no square of period or .

A word is wrong, if it is good, but not square-free (i.e., if one of its suffixes is a square of period longer than ). Let be the set of good words and be the set of wrong words. Then and

(2)

We will now lower bound and then upper bound to reach our result.

Let be a perfect word from and let be the factor of size at the end of . Let us count the contributions to of the extensions of . Since is perfect the only reason for a square of period or to appear when adding a letter to is if this letter is , so we need to forbid at most one letter of . Moreover, if belongs to then there are at least 2 ways to extend into a perfect word of and one way to extend to a nice word of . In this case, the contribution of the extensions of to is at least . If does not belong to then there are at least ways to extend to a perfect word of and the contribution of the extensions of to is at least . Thus the contribution to of the extensions of any perfect word from is at least . So the contribution of the extensions of any perfect word to is at least times as large as its contribution to .

For any value of the list , any nice word from can be extended in at least 2 perfect words of . The contribution to of the extensions of any nice word from is at least . Since the contribution of a nice word to is , the contribution of the extensions of any nice word to is at least times as large as its contribution to .

Since the contribution of the extensions of any word to is at least times as large as its contribution to , we deduce

(3)

Let us now bound . For all , let be the set of words from that end with a square of period . Clearly, and

(4)

By definition of and , . Let and . Since ends with a square of period , the last letters of are uniquely determined by its prefix of size . Since is a proper prefix of , . And moreover, the last 3 letters of are identical to the last 3 letters of . Thus the contribution of to is the same as the contribution of to . We deduce

Using this last equation with (1) and (4) yields

We can use this bound and (3) in equation (8) to finally upper-bound and we obtain

where the second inequality is a consequence of the theorem hypothesis. ∎

Since satisfies the condition of Lemma 2, we deduce the following theorem.

Theorem 3.

Fix a -list assignement and let be the set of square-free words of length that respect this list assignement. Then for all ,

Proof.

It is easy to verify that is not empty and thus . One easily verifies that satisfies the conditions of Theorem 5. Thus for all ,

Since the weight of any words is at most , for any ,

(5)

For the sake of contradiction suppose that for some . Then, there exists an integer such that . Any factor of a square-free word is square-free, so the sequence is submultiplicative, that is, for all , . In particular, which contradicts equation (5). We deduce that for all , , as desired. ∎

Let us briefly explain why this value of plays a particular role in this proof. If the lists are all identical then any perfect word can be extended in two ways into another perfect word and in one way into a nice word, while any nice word can only be extended in two ways into a perfect word. The matrix of the corresponding automaton is

The dominant eigenvalue of this matrix is and one possible eigenvector is which also explains the choice of the weights. We do not need for our weights to be an eigenvector, but we require a property similar to the statement of Lemma 4

. It happens to be the case that the vector that has this property is also the eigenvector of this matrix, which corresponds to the intuition that the worst choice of list assignment is the one where all the lists are identical.

4 Proof of the main result

We fix . An -list assignement is a sequence of subsets of size of . We also fix .

A word is normalized if it is the smallest of all the words obtained by a permutation of the alphabet. Let be the set of normalized prefixes of minimal squares of period at most . For any , we let be the longest word from that is a suffix of up to a permutation of the alphabet. For any set of words , and any , we let be the set of words from whose longest prefix that belongs to up to a permutation of the alphabet is , that is .

We denote the set of words that contain no square of period at most by . We are now ready to state the following lemma.

Lemma 4.

There exist coefficients such that and for all ,

(6)

where .

The proof of this lemma relies on a computer verification that we delay to section 5. For the rest of this section let us fix coefficients and that respect the conditions of Lemma 4. For each set of words, we let

Whenever we mention the weight of a word , we mean .

The main idea of the proof of Theorem 5 is essentially the same as in Lemma 2, but we have a few more technicities to handle along the way. We are going to count inductively the total weight of the square-free words of size that respect a fixed -list assignment. Intuitively, the set plays the same role as and plays the same role as . We are now ready to state our main Theorem.

Theorem 5.

Let be a -list assignement and for all , let be the set of square-free words of length that respect . Let be a real number such that

Then for all ,

Proof.

We proceed by induction on . Let be an integer such that the lemma holds for any integer smaller than and let us show that .

By induction hypothesis, for all ,

(7)

A word of length is good if

  • it respects ,

  • its prefix of length is in ,

  • and it contains no square of period at most .

A word is wrong, if it is good, but not square-free (i.e., if one of its suffixes is a square of period longer than ). We let be the set of good words and be the set of wrong words. Then and

(8)

Let us first lower-bound .

By definition, is the longest suffix of that is prefix of a square of period of length at most (up to permutation of the alphabet). This implies that for any square-free word and for any word , if and only if . For the same reason, for any square-free word and letter , . We then deduce that the contribution of the extentions of any word to is

By Lemma 4, we deduce that the contribution of the extentions of any word to is at least . We sum the contributions over

(9)

Let us now bound . For all , let be the set of words from that end with a square of period . Then and

(10)

Let us now upper-bound the separately depending on .

Case :

By definition of and , any word from avoids squares of period at most . Hence, implies and .

Case :

Let and . For the sake of contradiction, suppose that . Let be the period of . Then the factor of length that ends a position is identical to the last letters of the word, that is

These two factors are well defined since . Moreover , and the factor of length that ends at position is identical to the last letters of the word, that is

The resulting equality

implies that there is a square of period in . Since is square free, we have and . So the square is of period at most which is a contradiction. Hence if is non-empty, then .

The suffix of length of any word from is a square, so the last letters of any word of are uniquely determined by the remaining prefix, and this prefix belongs to (since ). Hence,

where the second inequality comes from (7). This bound with (10) yields

We use this bound and (9) in equation (8) to finally upper-bound ,

where the second inequality is a consequence of the theorem hypothesis. ∎

Since satisfies the conditions of Theorem 5, we can deduce our main result.

Theorem 6.

Fix a -list assignement and let be the set of square-free words of length that respect this list assignement. Then for all ,

Proof.

By definition, . One easily verifies that satisfies the conditions of Theorem 5. Thus for all ,

Hence,

(11)

For the sake of contradiction suppose that for some . By Lemma 4, so there exists an integer such that . Any factor of a square-free word is square-free, so the sequence is submultiplicative, that is, for all , . In particular, which contradicts equation (11). We deduce that for all , , as desired. ∎

5 Proof of Lemma 4

A classic procedure to compute Perron-Frobenius egeinvector (and eigenvalue) of a matrix is simply to iterate the matrix over some “random” starting vector. Our coefficients plays a similar role to the role of an eigenvector and we use the same idea to compute the desired coefficients. In the case of Perron-Frobenius eigenvector, it is known that under general assumptions this procedure converges toward the desired vector. In our case, we suspect that the procedure is also convergent under rather general assumptions, but we did not try to prove anything in this direction. However, after enough iterations, we find a vector that respects the conditions of our Lemma. Before discussing the details of this procedure we restate the Lemma.

See 4

It is enough to provide coefficients with the right property. Since , instead of providing the coefficients, it is more efficient to provide a computer program that computes the set and the coefficients with the desired properties111The C++ implementation can be found in the ancillary file on the arXiv. Running this program took approximately 2 hours of computation and occupied 76.4 Go of RAM.. This also has the advantage that the same program directly verifies that the coefficients do indeed have the desired property.

The first step is to compute the set of prefixes of minimal squares of period at most . This set is stored inside a trie (also called prefix tree). It is efficient (in particular in terms of memory consumption) since the set is prefixed closed (i.e. contains any prefix of any word from ). Each word of is given a unique integer as an identifier. In a second step, we compute the directed multi-graph over the vertices and such that there is an arc from to , if is the word corresponding to where is the word corresponding to and is any letter. More precisely, if is the word associated with the integer and is the word associated with then the multiplicity of the number of arcs from to in our graph is given by . There are not many arcs with multiplicity larger than 1, but because of the normalization, this may happen (for instance, there are three arcs from the vertex of the word to the vertex of the word , since is the normalization of and and similarly there are four arcs from the empty word to the word ). This graph is useful to efficiently compute for any , the quantity . It is simply the sum of the weights of its out-neighbors minus the weight of the one of largest weight222A vertex is said to be an out-neighbor of a vertex if there is an arc from to ..

We considere the procedure that takes coefficients as input and produces the coefficients such that for each

For every , we call the quantity the growth associated to . If we let be the minimum of the growth over every then and the set of coefficients respect the condition of equation 6. Our goal is then simply to find coefficients that gives the largest value of .

To find, our coefficient we simply start by setting all the to the same value ( in our implementation), and then we iterate our procedure. Between two iterations we renormalize the coefficients by dividing every coefficient by the same constant to keep the average value at some fixed value ( in our implementation). After 50 iterations, we find coefficients and such that the conditions of the Lemma are verified.

We suspect that there are good reasons for which this procedure seems to converge toward the optimal. However, it is enough that we verified that after 50 iterations this deterministic procedure produces coefficients with the desired property.

Let us finally mention that, every computation is carried out using integers (or pairs of integers for rational numbers) so that there are no issues of precision. It is important, since with we can take and in Theorem 6, but with there is no that satisfy the condition of Theorem 5.

6 Conclusion

We showed something slightly stronger than Theorem 6. Indeed, our proof of Theorem 5 also holds if an adversary chooses each list only right before we chose the letter from the corresponding list (instead of fixing the lists from the start we have). Our proof of Theorem 3 also holds in this stronger setting. We suspect that the answer to Question 1 is positive in this stronger setting.

We showed that for any choice of lists of size amongst the number of square-free word of length is at least . By pushing the computation slightly further, we can replace with . We suspect that the number of square-free words is minimal when all the lists are identical (the growth rate of the number of square-free word of length is known to be approximatively 1.3017 [15]). This might even be true in the stronger context where an adversary chose the next list right after we chose the next letter. Note that, it is also really simple to use the same technique to improve the bounds of Theorem 3 by computing coefficients with the aid of a computer (although looking at suffixes of length or instead of by hand is doable and would already improve the bounds).

Finally, let us conclude by mentioning that we believe that this approach could be used to answer positively Question 1. The size of the set depends on the size of . However, since we only consider normalized words for the set is the same for any such that . Let us provide a crude upper bound on the size of this set.

The number of square-free words of size that use (by that we mean that the letters appear in the word) letters is less (we forbid square of period ). So the number of normalized words of size that use letters is less than . The number of normalized minimal square of period that use letter is less than and the number of proper prefixes of length at least of such words is then at most . By summing over and , the number of prefixes of minimal square-free words of period at most over letters is at most

Evaluating this expression with and tells us that ,with , the set has size at most . This is less than times larger than the set that required Go of RAM to be computed. Our bound being crude we suspect that is in fact smaller than that (a more careful computation taking into account squares of period divides this bounds by approximatively ). We might be able to solve this question with this approach in a few decades. It might also be possible to exploit some other symmetries of the problem to reduce the number of words considered.

Let us finally mention that the base idea of the counting technique was recently used in more general context [14, 19]. In particular, Wanless and Wood provided a general result based on a similar idea and applied it to graph colorings, hypergraph colorings and SAT-formula [19]. It is not clear, how the more advanced argument used here (i.e., counting the total weights of the solutions instead of simply counting the number of solutions) can be used in a context more general than combinatorics on words or even whether it can be used in a framework similar to the one developed by Wanless and Wood.

References

  • [1] N. Alon, J. Grytczuk, M. Haluszcza, and O. Riordan. Nonrepetitive colorings of graphs. Random Structures & Algorithms, 21:336–346, 2002.
  • [2] V. Dujmović, G. Joret, J. Kozik, and D. R. Wood. Nonrepetitive colouring via entropy compression. Combinatorica, 36(6):661–686, Dec 2016.
  • [3] V. Dujmović, L. Esperet, G. Joret, B. Walczak, and D.R. Wood. Planar graphs have bounded nonrepetitive chromatic number. Advances in Combinatorics, 2020:5.
  • [4] D. Gonçalves, M. Montassier, and A. Pinlou. Entropy compression method applied to graph colorings. arXiv e-prints, arXiv:1406.4380, 2014.
  • [5] A. Ga̧gol, G. Joret, J. Kozik and P. Micek Pathwidth and Nonrepetitive List Coloring, Electronic Journal of Combinatorics, 23(4), 2016.
  • [6] J. Grytczuk, Nonrepetitive colorings of graphs — A survey, Int J Math Math Sci (2007), ArtID 74639, 10.
  • [7] S. Czerwiński and J.Grytczuk. Nonrepetitive colorings of graphs. Electronic Notes in Discrete Math., 28:453–459, 2007.
  • [8] J. Grytczuk, J. Przybyło and X. Zhu. Nonrepetitive list colourings of paths. Random Struct. Alg., 38: 162-173, 2011.
  • [9] J. Grytczuk, J. Kozik and P. Micek. New approach to nonrepetitive sequences. Random Struct. Alg., 42: 214-225, 2013.
  • [10] J. Harant and S. Jendrol. Nonrepetitive vertex colorings of graphs. Discrete Mathematics, 312(2):374–380, 2012.
  • [11] R. M. Kolpakov. On the number of repetition-free words. Journal of Applied and Industrial Mathematics, 1(4):453–462, 2007.
  • [12] M. Lothaire. Algebraic combinatorics on words, volume 90 of Encyclopedia of Mathematics and its Applications. Cambridge University Press, Cambridge, 2002
  • [13] N. Mhaskar and M. Soltys. Non-repetitive strings over alphabet lists. In WALCOM: algorithms and computation, vol. 8973 of Lecture Notes in Comput.Sci., pp. 270–281. Springer, 2015
  • [14] M. Rosenfeld. Another approach to non-repetitive colorings of graphs of bounded degree. Electronic Journal of Combinatorics, 27(3), 2020.
  • [15] A.M. Shur Two-Sided Bounds for the Growth Rates of Power-Free Languages. In: Developments in Language Theory, vol 5583 of Lecture Notes in Comput.Sci., (2009)
  • [16] E. Škrabuľáková. The Thue choice number versus the Thue chromatic number of graphs. arXiv e-prints, arXiv:1508.02559, August 2015.
  • [17] A. Thue. Über unendliche Zeichenreihen. ’Norske Vid. Selsk. Skr. I. Mat. Nat. Kl. Christiania, 7:1–22, 1906.
  • [18] A. Thue. Über die gegenseitige Lage gleicher Teile gewisser Zeichenreihen. Norske Vid. Selsk. Skr. I. Mat. Nat. Kl. Christiania,, 10:1–67, 1912.
  • [19] I. M. Wanless and D. R. Wood. A general framework for hypergraph colouring. arXiv e-prints, arXiv:2008.00775, 2020.
  • [20] D. R. Wood. Nonrepetitive Graph Colouring. arXiv e-prints, arXiv:2009.02001, 2020.
  • [21] H. Zhao and X. Zhu. -nonrepetitive list colouring of paths. Graphs Combin., 32(4):1635–1640, 2016.